Abstract: Large language models (LLMs) based on transformers have made significant strides in recent years, the success of which is driven by scaling up their model size. Despite their high ...
Abstract: Mixture of experts (MoE) is a popular technique in deep learning that improves model capacity with conditionally-activated parallel neural network modules (experts). However, serving MoE ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results