NVIDIA Dynamo on AKS for Autoscaling LLM Inference
NVIDIA Dynamo is presented as a way to run large language model inference with autoscaling on Azure Kubernetes Service (AKS). The reference suggests Dynamo targets Kubernetes-based deployments that need to scale LLM inference workloads dynamically.