版本：v1.8.2

Introduction

What is Volcano

Volcano is a cloud native system for high-performance workloads, which has been accepted by Cloud Native Computing Foundation (CNCF) as its first and only official container batch scheduling project. Volcano supports popular computing frameworks such as:

Volcano also provides various scheduling capabilities including heterogeneous device scheduling, network topology-aware scheduling, multi-cluster scheduling, online-offline workloads colocation and more.

Why Volcano

Job scheduling and management become increasingly complex and critical for high-performance batch computing. Common requirements are as follows:

Support for diverse scheduling algorithms
More efficient scheduling
Non-intrusive support for mainstream computing frameworks
Support for multi-architecture computing

Volcano is designed to cater to these requirements. In addition, Volcano inherits the design of Kubernetes APIs, allowing you to easily run applications that require high-performance computing on Kubernetes.

Features

Unified Scheduling

Support native Kubernetes workload scheduling
Provide complete support for frameworks like PyTorch, TensorFlow, Spark, Flink, Ray through VolcanoJob
Unified scheduling for both online microservices and offline batch jobs to improve cluster resource utilization

Rich Scheduling Policies

Gang Scheduling: Ensure all tasks of a job start simultaneously
Binpack Scheduling: Optimize resource utilization through compact task allocation
Heterogeneous Device Scheduling: Efficient GPU sharing (CUDA/MIG modes) and NPU scheduling
Proportion/Capacity Scheduling: Resource sharing/preemption/reclaim based on queue quotas
NodeGroup Scheduling: Support node group affinity scheduling
DRF Scheduling: Support fair scheduling of multi-dimensional resources
SLA Scheduling: Scheduling guarantee based on service quality
Task-topology Scheduling: Optimize performance for communication-intensive applications
NUMA Aware Scheduling: Optimize resource allocation for multi-core processors

Volcano supports custom plugins and actions to implement more scheduling algorithms.

Queue Resource Management

Support multi-dimensional resource quota control (CPU, Memory, GPU, etc.)
Provide multi-level queue structure and resource inheritance
Support resource borrowing, reclaiming and preemption between queues
Implement multi-tenant resource isolation and priority control

Multi-architecture computing

Volcano can schedule computing resources from multiple architectures:

x86
Arm
Kunpeng
Ascend
GPU

Network Topology-aware Scheduling

Supports network topology-aware scheduling to optimize data transmission for distributed training tasks, reducing communication overhead and improving training speed.

Online and Offline Workloads Colocation

Enhances resource utilization while ensuring QoS through:

Unified scheduling
Dynamic resource overcommitment
CPU burst
Resource isolation

Multi-cluster Scheduling

Support cross-cluster job scheduling for larger-scale resource pool management.

For details: volcano-global

Descheduling

Support dynamic descheduling to optimize cluster load distribution.

For details: descheduler

Monitoring and Observability

Complete logging system
Rich monitoring metrics
Dashboard for graphical interface

Dashboard: dashboard
Metrics: metrics

Ecosystem

Volcano integrates with these high-performance computing frameworks:

Future Outlook

Volcano will continue to expand its functional boundaries through community collaboration and technical innovation, becoming a leader in high-performance computing and cloud-native batch scheduling.

What is Volcano​

Why Volcano​

Features​

Unified Scheduling​

Rich Scheduling Policies​

Queue Resource Management​

Multi-architecture computing​

Network Topology-aware Scheduling​

Online and Offline Workloads Colocation​

Multi-cluster Scheduling​

Descheduling​

Monitoring and Observability​

Ecosystem​

Future Outlook​