Blogs PyTorch Flex Attention FLOP Calculus for Model Budgeting’ Github nanoGPT LLM from Scratch Book Videos LLM Inference Explained: Prefill vs Decode and Why Latency Matters