We’re introducing a new set of basic observability metrics for all GPU Droplets and DOKS clusters, giving you a powerful, simple way to monitor and optimize your AI workloads.
When running large-scale training, inference, and complex data processing—cluster performance and stability are paramount. Our new observability features are designed to give you the visibility you need to ensure effective utilization of your resources and quickly debug any performance bottlenecks.
Get real-time, individual metrics from your NVIDIA and AMD GPUs and their network interfaces on critical factors like utilization, temperature, power consumption, and more—all directly within the DigitalOcean Insights UI, and with zero setup required.
We’ve grouped the new metrics into five intuitive categories to provide a comprehensive view of your GPU and DOKS cluster health and performance:
Utilization: Understand how busy your GPU cores and memory are. This includes key metrics like GPU Occupancy and Memory Utilization, allowing you to optimize your setup for peak performance live.
Temperature: Monitor thermal conditions to prevent overheating and ensure stable operation under heavy load.
Power: Track power consumption, which is essential for understanding GPU performance and efficiency.
Throttle: Identify if your GPU is limiting its performance due to thermal, power, or voltage constraints. This is crucial for debugging sudden performance degradations.
Interconnect: Gain insights into the network interface performance connecting your GPU resources.
Observability shouldn’t be a hurdle. That’s why we’ve made this feature as seamless as possible:
Default on: Observability will be enabled by default the moment you create a GPU Droplet. There is no configuration or effort required on your part.
Free: These essential observability metrics are included with the AI/ML Ready images for GPU Droplets.
We’re committed to continually improving the GPU experience and plan to add more advanced, differentiated features to our observability suite in the future.
Simplified Deployment: Our intuitive platform makes it easy to provision and manage your AI infrastructure, allowing you to focus on developing your applications rather than managing complex setups.
Cost-Effectiveness: GPU Droplets start at $0.76/GPU/hour and we offer flexible configurations (including single and eight GPU options), helping you optimize costs for your specific use cases.
Seamless Integration: Leverage GPU Droplets with your existing DigitalOcean projects, integrating with our Kubernetes service.
Reliability: Benefit from enterprise-grade SLAs, HIPAA-eligibility and SOC 2 compliance, and the peace of mind that comes with building on DigitalOcean’s trusted cloud infrastructure.
Start exploring your new GPU metrics today in the DigitalOcean Insights UI today and take control of your cluster’s performance.


