Vision transformer (vit)

AI by hand ✍️

Hello, This is Prof Tom Yeh.

I look forward to seeing you at this week's LLM paper club.

I will use Vision Transformer (ViT) as the baseline and extend it to Llama 1, 2, 3, 4 live.

Baseline: ViT

+ RMSNorm

+ model dimensions

+ layers

+ RoPE

+ Group Query Attention

+ Sparse Attention

+ Flash Attention

+ context length

I highly recommend you to preview this spreadsheet before the club meeting, so we can take a deep dive right away.

Subscribe to receive this spreadsheet now and more original AI by Hand ✍️ resources in the future.