LineageLLaMAHome / Notes / Papers / LLaMAv

Home
[8]
Blogs Builds CV Muse Posts TEL Wishlist Me
Notes
[2]
ML-Concepts Reading List
Papers
[14]
ALiBi Attention is All You Need Batch Norm Chinchilla Deepseek-R1 GLU Improves Transformers GPT-2 Layer Norm Layer Norm in Transformers RLPR RNN Overview RoFormer Rubrics As Reward Scaling Laws for Neural Language Model
LLaMA

Home

LineageLLaMAHome / Notes / Papers / LLaMAv

Home
[8]
Blogs Builds CV Muse Posts TEL Wishlist Me
Notes
[2]
ML-Concepts Reading List
Papers
[14]
ALiBi Attention is All You Need Batch Norm Chinchilla Deepseek-R1 GLU Improves Transformers GPT-2 Layer Norm Layer Norm in Transformers RLPR RNN Overview RoFormer Rubrics As Reward Scaling Laws for Neural Language Model
LLaMA

Home

❯

❯

❯

Feb 16, 20261 min read

transformers

LLaMA: Open and Efficient Foundation Language Models

Watch: Yannic’s Video

Note:

However, recent work from Hoffmann et al. (2022) shows that, for a given compute budget, the best performances are not achieved by the largest models, but by smaller models trained on more data.

ashwinms.com // v3.0.1● SYSTEM ONLINENotes/Papers/LLaMAMODE: READERSWITCH: CLICK READER ICON

Graph View

Backlinks

Chinchilla
Lineage of Transformer

Find me on:

GitHub X Substack LinkedIn

hi@ashwinms.com