← Back to projects

Lightweight BEV Transformer – Attention & Temperature

Completed

Tiny PyTorch codebase for playing with transformer attention on simple BEV (bird’s-eye-view) grids and seeing how the softmax temperature changes attention patterns and downstream predictions.

BEV transformer attention
2025

About this project

  • Generates synthetic BEV grids (32×32) with rectangles standing in for:
    • cars (5×3)
    • pedestrians (2×2)
  • Trains a small transformer over BEV patches:
    • patch embedding: 4×4 → 8×8 = 64 patches
    • 2 encoder layers, 2 heads, 64-dim embeddings
  • Uses object queries + soft-argmax detection head:
    • predicts object centers (x, y in [0, 1])
    • predicts class (car vs pedestrian)
  • Exposes internal self-attention maps so you can:
    • overlay them on the BEV grid
    • compare different temperatures in the softmax

The whole thing is CPU-friendly and stays under ~100k parameters.

Gallery

Screenshot 2026-04-27 at 17-19-15 attention_layer0.png (PNG Image 4396 × 2370 pixels) — Scaled (39%)

Screenshot 2026-04-27 at 17-19-15 attention_layer0.png (PNG Image 4396 × 2370 pixels) — Scaled (39%)