AI Engineering Transition Path

NOTE : This content is presented exactly as it appears in InterviewReady’s AI Engineering Transition Path on GitHub. All credit goes to the original authors.

Research papers for software engineers to transition to AI Engineering

Tokenization

Byte-pair Encoding
Byte Latent Transformer: Patches Scale Better Than Tokens

Vectorization

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
IMAGEBIND: One Embedding Space To Bind Them All
SONAR: Sentence-Level Multimodal and Language-Agnostic Representations
FAISS library
Facebook Large Concept Models

Infrastructure

TensorFlow
Deepseek filesystem
Milvus DB
Billion Scale Similarity Search : FAISS
Ray

Core Architecture

Attention is All You Need
FlashAttention
Multi Query Attention
Grouped Query Attention
Google Titans outperform Transformers
VideoRoPE: Rotary Position Embedding

Mixture of Experts

Sparsely-Gated Mixture-of-Experts Layer
GShard
Switch Transformers

RLHF

Deep Reinforcement Learning with Human Feedback
Fine-Tuning Language Models with RHLF
Training language models with RHLF

Chain of Thought

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain of thought
Demystifying Long Chain-of-Thought Reasoning in LLMs

Reasoning

Transformer Reasoning Capabilities
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Scale model test times is better than scaling parameters
Training Large Language Models to Reason in a Continuous Latent Space
DeepSeek R1
A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods
Latent Reasoning: A Recurrent Depth Approach
Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo

Optimizations

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
ByteDance 1.58
Transformer Square
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps
1b outperforms 405b
Speculative Decoding

Distillation

Distilling the Knowledge in a Neural Network
BYOL - Distilled Architecture
DINO

SSMs

RWKV: Reinventing RNNs for the Transformer Era
Mamba
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
Distilling Transformers to SSMs
LoLCATs: On Low-Rank Linearizing of Large Language Models
Think Slow, Fast

Competition Models

Google Math Olympiad 2
Competitive Programming with Large Reasoning Models
Google Math Olympiad 1

Hype Makers

Can AI be made to think critically
Evolving Deeper LLM Thinking
LLMs Can Easily Learn to Reason from Demonstrations Structure

Hype Breakers

Separating communication from intelligence
Language is not intelligence

Image Transformers

Image is 16x16 word
CLIP
deepseek image generation

Video Transformers

ViViT: A Video Vision Transformer
Joint Embedding abstractions with self-supervised video masks
Facebook VideoJAM ai gen

Case Studies

Automated Unit Test Improvement using Large Language Models at Meta
Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answering
OpenAI o1 System Card
LLM-powered bug catchers
Chain-of-Retrieval Augmented Generation
Swiggy Search
Swarm by OpenAI
Netflix Foundation Models
Model Context Protocol
uber queryGPT

More Resources

I manage my lists here: https://interviewready.io/resources/