Moe Norman Dox - Search News

Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference

Abstract: Large language models (LLMs) based on transformers have made significant strides in recent years, the success of which is driven by scaling up their model size. Despite their high ...

IEEE

Serving MoE Models on Resource-Constrained Edge Devices via Dynamic Expert Swapping

Abstract: Mixture of experts (MoE) is a popular technique in deep learning that improves model capacity with conditionally-activated parallel neural network modules (experts). However, serving MoE ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference

Serving MoE Models on Resource-Constrained Edge Devices via Dynamic Expert Swapping

Trending now