LineagePapersHome / Notes / Papersv
  1. Home
    [8]
    BlogsBuildsCVMusePostsTELWishlistMe
  2. Notes
    [2]
    ML-ConceptsReading List
  3. Papers
    [15]
    ALiBiAttention is All You NeedBatch NormChinchillaDeepseek-R1GLU Improves TransformersGPT-2Layer NormLayer Norm in TransformersLLaMARLPRRNN OverviewRoFormerRubrics As RewardScaling Laws for Neural Language Model
  • Home
    • Blogs
      • Backprop = chain of VJP's!
      • Context Tracker Assistant
      • Why your Model isn't Learning?
      • You think you knew Dropouts ?
    • Builds
      • Context Tracker
      • Deeply Learning
      • Go-Kafka-Consumer
      • HNews Radio
      • Kt Doorman Client
      • Paper Tracker
      • Seq2Seq Transformer
      • SO-101 Lerobot '25
    • CV
      • Resume
    • Muse
      • Why should you bet on yourself?
    • Notes
      • ML-Concepts
        • An Overview Of Gradient Descent Optimization Algorithms
        • On Optimizers
      • Papers
        • ALiBi
        • Attention is All You Need
        • Batch Norm
        • Chinchilla
        • Deepseek-R1
        • GLU Improves Transformers
        • GPT-2
        • Layer Norm
        • Layer Norm in Transformers
        • LLaMA
        • RLPR
        • RNN Overview
        • RoFormer
        • Rubrics As Reward
        • Scaling Laws for Neural Language Model
      • Reading List
    • Posts
      • Build Idea - My Boss
      • Side-Quest MathAcademy
    • TEL
      • Phase_0
        • Baseline Experiments
          • Baseline/Hyper-Param Sweep
            • Experiments with Weight Decay
            • LR vs Batch Size Empirical Sweep
          • Mem & GPU Exp
            • Activation Checkpointing
            • Micro-batching + Gradient Accumulation
          • Experiment Template
          • Pre vs Post Norms, With & Without Warmups
        • Implementing a Baseline Model
        • Lineage of Transformer
        • Resources
        • Training GPT-2 with bigger data
      • Phase_1
        • Stage 1
      • Phase_2
      • Phase_3
    • Wishlist
      • Embedable Widget for Hevy
      • Obsidian Canvas AI
    • Me
LineagePapersHome / Notes / Papersv
  1. Home
    [8]
    BlogsBuildsCVMusePostsTELWishlistMe
  2. Notes
    [2]
    ML-ConceptsReading List
  3. Papers
    [15]
    ALiBiAttention is All You NeedBatch NormChinchillaDeepseek-R1GLU Improves TransformersGPT-2Layer NormLayer Norm in TransformersLLaMARLPRRNN OverviewRoFormerRubrics As RewardScaling Laws for Neural Language Model
  • Home
    • Blogs
      • Backprop = chain of VJP's!
      • Context Tracker Assistant
      • Why your Model isn't Learning?
      • You think you knew Dropouts ?
    • Builds
      • Context Tracker
      • Deeply Learning
      • Go-Kafka-Consumer
      • HNews Radio
      • Kt Doorman Client
      • Paper Tracker
      • Seq2Seq Transformer
      • SO-101 Lerobot '25
    • CV
      • Resume
    • Muse
      • Why should you bet on yourself?
    • Notes
      • ML-Concepts
        • An Overview Of Gradient Descent Optimization Algorithms
        • On Optimizers
      • Papers
        • ALiBi
        • Attention is All You Need
        • Batch Norm
        • Chinchilla
        • Deepseek-R1
        • GLU Improves Transformers
        • GPT-2
        • Layer Norm
        • Layer Norm in Transformers
        • LLaMA
        • RLPR
        • RNN Overview
        • RoFormer
        • Rubrics As Reward
        • Scaling Laws for Neural Language Model
      • Reading List
    • Posts
      • Build Idea - My Boss
      • Side-Quest MathAcademy
    • TEL
      • Phase_0
        • Baseline Experiments
          • Baseline/Hyper-Param Sweep
            • Experiments with Weight Decay
            • LR vs Batch Size Empirical Sweep
          • Mem & GPU Exp
            • Activation Checkpointing
            • Micro-batching + Gradient Accumulation
          • Experiment Template
          • Pre vs Post Norms, With & Without Warmups
        • Implementing a Baseline Model
        • Lineage of Transformer
        • Resources
        • Training GPT-2 with bigger data
      • Phase_1
        • Stage 1
      • Phase_2
      • Phase_3
    • Wishlist
      • Embedable Widget for Hevy
      • Obsidian Canvas AI
    • Me
Home

❯

Notes

❯

Papers

Folder: Notes/Papers

15 items under this folder.

  • Feb 16, 2026

    LLaMA

    • transformers
  • Feb 16, 2026

    Scaling Laws for Neural Language Model

    • llm
    • scaling-laws
  • Feb 15, 2026

    Chinchilla

    • transformers
    • llm
    • scaling-laws
  • Feb 14, 2026

    ALiBi

    • transformers
    • Attention
    • Positional-Encoding
  • Feb 13, 2026

    GLU Improves Transformers

    • transformers
  • Feb 13, 2026

    RoFormer

    • transformers
    • Embeddings
    • Positional-Encoding
  • Feb 12, 2026

    Layer Norm in Transformers

    • transformers
    • ML
  • Feb 11, 2026

    GPT-2

    • GPT-2
    • ML
    • transformers
  • Jan 26, 2026

    Attention is All You Need

    • transformers
    • ML
    • Attention
  • Jan 23, 2026

    RNN Overview

    • RNN
    • ML
  • Jan 19, 2026

    Layer Norm

    • transformers
    • RNN
  • Jan 15, 2026

    Batch Norm

    • transformers
    • RNN
    • ML
  • Jan 08, 2026

    Rubrics As Reward

    • transformers
    • RLFT
  • Jan 06, 2026

    Deepseek-R1

    • transformers
    • RLFT
    • GRPO
  • Jan 06, 2026

    RLPR

    • transformers
    • RLFT

ashwinms.com // v3.0.1● SYSTEM ONLINENotes/Papers/indexMODE: READERSWITCH: CLICK READER ICON

Graph View

Find me on:

GitHubXSubstackLinkedIn

hi@ashwinms.com

Created with Quartz v4.5.2 © 2026