Vision Transformer Pruning via Mechanistic Interpretability
Removed 85% of attention heads and 94% of the computational graph; retrained model achieves ~2× CPU inference speedup with only ~9% top-1 accuracy drop. Code →
Removed 85% of attention heads and 94% of the computational graph; retrained model achieves ~2× CPU inference speedup with only ~9% top-1 accuracy drop. Code →
Implemented a new collective algorithm using MSCCL; measured average 50% (up to 200%) speedups on torus networks; running tests on the Leonardo supercomputer. Code →
Rewrote pipeline in NumPy to move from 0.1 FPS to 60 FPS and improved detection accuracy to 95%. Code →
Social platform (Full-stack)
Go + Vue.js + SQLite + Docker; posts/follows/bans; OpenAPI-documented REST API. Code →