NOTE : This content is presented exactly as it appears in InterviewReady’s AI Engineering Transition Path on GitHub. All credit goes to the original authors.
Research papers for software engineers to transition to AI Engineering
Tokenization
Vectorization
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- IMAGEBIND: One Embedding Space To Bind Them All
- SONAR: Sentence-Level Multimodal and Language-Agnostic Representations
- FAISS library
- Facebook Large Concept Models
Infrastructure
Core Architecture
- Attention is All You Need
- FlashAttention
- Multi Query Attention
- Grouped Query Attention
- Google Titans outperform Transformers
- VideoRoPE: Rotary Position Embedding
Mixture of Experts
RLHF
- Deep Reinforcement Learning with Human Feedback
- Fine-Tuning Language Models with RHLF
- Training language models with RHLF
Chain of Thought
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
- Chain of thought
- Demystifying Long Chain-of-Thought Reasoning in LLMs
Reasoning
- Transformer Reasoning Capabilities
- Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
- Scale model test times is better than scaling parameters
- Training Large Language Models to Reason in a Continuous Latent Space
- DeepSeek R1
- A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods
- Latent Reasoning: A Recurrent Depth Approach
- Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo
Optimizations
- The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
- FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
- ByteDance 1.58
- Transformer Square
- Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps
- 1b outperforms 405b
- Speculative Decoding
Distillation
SSMs
- RWKV: Reinventing RNNs for the Transformer Era
- Mamba
- Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
- Distilling Transformers to SSMs
- LoLCATs: On Low-Rank Linearizing of Large Language Models
- Think Slow, Fast
Competition Models
Hype Makers
- Can AI be made to think critically
- Evolving Deeper LLM Thinking
- LLMs Can Easily Learn to Reason from Demonstrations Structure
Hype Breakers
Image Transformers
Video Transformers
- ViViT: A Video Vision Transformer
- Joint Embedding abstractions with self-supervised video masks
- Facebook VideoJAM ai gen
Case Studies
- Automated Unit Test Improvement using Large Language Models at Meta
- Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answering
- OpenAI o1 System Card
- LLM-powered bug catchers
- Chain-of-Retrieval Augmented Generation
- Swiggy Search
- Swarm by OpenAI
- Netflix Foundation Models
- Model Context Protocol
- uber queryGPT
More Resources
I manage my lists here: https://interviewready.io/resources/