Run:AI Model Streamer streams LLM weights from Azure Blob to GPU memory
Run:AI Model Streamer can load model weights from Azure Blob Storage directly into GPU memory instead of copying them to local disk first. The approach targets LLM inference cold starts during autoscaling and claims model startup times up to 6× faster while reducing periods when allocated GPUs sit idle.