LineageBackprop = chain of VJP's!Home / Blogs / Backprop = chain of VJP's!v
  1. Home
    [8]
    BuildsCVMuseNotesPostsTELWishlistMe
  2. Blogs
    [3]
    Context Tracker AssistantWhy your Model isn't Learning?You think you knew Dropouts ?
  3. Backprop = chain of VJP's!
  • Home
    • Blogs
      • Backprop = chain of VJP's!
      • Context Tracker Assistant
      • Why your Model isn't Learning?
      • You think you knew Dropouts ?
    • Builds
      • Context Tracker
      • Deeply Learning
      • Go-Kafka-Consumer
      • HNews Radio
      • Kt Doorman Client
      • Paper Tracker
      • Seq2Seq Transformer
      • SO-101 Lerobot '25
    • CV
      • Resume
    • Muse
      • Why should you bet on yourself?
    • Notes
      • ML-Concepts
        • An Overview Of Gradient Descent Optimization Algorithms
        • On Optimizers
      • Papers
        • ALiBi
        • Attention is All You Need
        • Batch Norm
        • Chinchilla
        • Deepseek-R1
        • GLU Improves Transformers
        • GPT-2
        • Layer Norm
        • Layer Norm in Transformers
        • LLaMA
        • RLPR
        • RNN Overview
        • RoFormer
        • Rubrics As Reward
        • Scaling Laws for Neural Language Model
      • Reading List
    • Posts
      • Build Idea - My Boss
      • Side-Quest MathAcademy
    • TEL
      • Phase_0
        • Baseline Experiments
          • Baseline/Hyper-Param Sweep
            • Experiments with Weight Decay
            • LR vs Batch Size Empirical Sweep
          • Mem & GPU Exp
            • Activation Checkpointing
            • Micro-batching + Gradient Accumulation
          • Experiment Template
          • Pre vs Post Norms, With & Without Warmups
        • Implementing a Baseline Model
        • Lineage of Transformer
        • Resources
        • Training GPT-2 with bigger data
      • Phase_1
        • Stage 1
      • Phase_2
      • Phase_3
    • Wishlist
      • Embedable Widget for Hevy
      • Obsidian Canvas AI
    • Me
LineageBackprop = chain of VJP's!Home / Blogs / Backprop = chain of VJP's!v
  1. Home
    [8]
    BuildsCVMuseNotesPostsTELWishlistMe
  2. Blogs
    [3]
    Context Tracker AssistantWhy your Model isn't Learning?You think you knew Dropouts ?
  3. Backprop = chain of VJP's!
  • Home
    • Blogs
      • Backprop = chain of VJP's!
      • Context Tracker Assistant
      • Why your Model isn't Learning?
      • You think you knew Dropouts ?
    • Builds
      • Context Tracker
      • Deeply Learning
      • Go-Kafka-Consumer
      • HNews Radio
      • Kt Doorman Client
      • Paper Tracker
      • Seq2Seq Transformer
      • SO-101 Lerobot '25
    • CV
      • Resume
    • Muse
      • Why should you bet on yourself?
    • Notes
      • ML-Concepts
        • An Overview Of Gradient Descent Optimization Algorithms
        • On Optimizers
      • Papers
        • ALiBi
        • Attention is All You Need
        • Batch Norm
        • Chinchilla
        • Deepseek-R1
        • GLU Improves Transformers
        • GPT-2
        • Layer Norm
        • Layer Norm in Transformers
        • LLaMA
        • RLPR
        • RNN Overview
        • RoFormer
        • Rubrics As Reward
        • Scaling Laws for Neural Language Model
      • Reading List
    • Posts
      • Build Idea - My Boss
      • Side-Quest MathAcademy
    • TEL
      • Phase_0
        • Baseline Experiments
          • Baseline/Hyper-Param Sweep
            • Experiments with Weight Decay
            • LR vs Batch Size Empirical Sweep
          • Mem & GPU Exp
            • Activation Checkpointing
            • Micro-batching + Gradient Accumulation
          • Experiment Template
          • Pre vs Post Norms, With & Without Warmups
        • Implementing a Baseline Model
        • Lineage of Transformer
        • Resources
        • Training GPT-2 with bigger data
      • Phase_1
        • Stage 1
      • Phase_2
      • Phase_3
    • Wishlist
      • Embedable Widget for Hevy
      • Obsidian Canvas AI
    • Me
Home

❯

Blogs

❯

Backprop = chain of VJP's!

Backprop = chain of VJP's!

Dec 12, 20251 min read

  • substack
  • deeply-learning
  • Blog

Read on Substack

Backprop = a chain of VJPs ! by Ashwin mirskar

Peeking under the rug where the MATH is for backprop.

Read on Substack

ashwinms.com // v3.0.1● SYSTEM ONLINEBlogs/Backprop-=-chain-of-VJP's!MODE: READERSWITCH: CLICK READER ICON

Graph View

Backlinks

  • Deeply Learning
  • Training GPT-2 with bigger data

Find me on:

GitHubXSubstackLinkedIn

hi@ashwinms.com

Created with Quartz v4.5.2 © 2026