balloon_head
balloon_head
balloon_head
balloon_head

Vision Transformer Pruning via Mechanistic Interpretability

Removed 85% of attention heads and 94% of the computational graph; retrained model achieves ~2× CPU inference speedup with only ~9% top-1 accuracy drop. Code →

NCCL Swing Collective Algorithm Implementation

Implemented a new collective algorithm using MSCCL; measured average 50% (up to 200%) speedups on torus networks; running tests on the Leonardo supercomputer. Code →

Social platform (Full-stack)

Go + Vue.js + SQLite + Docker; posts/follows/bans; OpenAPI-documented REST API. Code →

Forest surveillance cameras: real-time A.I. fire detection

Rewrote pipeline in NumPy to move from 0.1 FPS to 60 FPS and improved detection accuracy to 95%. Code →