Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. 社区发布 | 深度求索开源国内首个 MoE 大模型,技术报告、模型权重同时发布. Adaptive mixtures of local experts. 混合专家模型(MoE)详解. 蝈蝈:Mixture-of-Experts (MoE) 经典论文一览 More @Wikipedia
Hover over any link to get a description of the article. Please note that search keywords are sometimes hidden within the full article and don't appear in the description or title.