Low-Rank Adaptation (LoRA) has become the default fine-tuning strategy for large language models. However, uniform rank selection across all layers is a blunt instrument — some layers require high-rank updates while others converge with rank-2 adapters.
Method
During the first 500 training steps, we collect gradient snapshots per layer and perform truncated SVD. The effective rank — defined as the number of singular values exceeding 1% of the spectral norm — determines the LoRA rank for each subsequent training phase.
Results
- Average rank reduced from r=16 (uniform) to r=6.1 (adaptive)
- Trainable parameters: −62% vs. uniform LoRA
- Training throughput: +28% on identical hardware
- Downstream eval: within 0.3% of uniform baseline across 12 tasks
Production Integration
This method is now the default rank-selection strategy in our fine-tuning pipeline. Rank profiles are persisted alongside model checkpoints for full reproducibility and auditability.