Kubernetes v1.36: 10 Key Enhancements for Workload-Aware Scheduling

Kubernetes continues to evolve its scheduling capabilities to handle complex AI/ML and batch workloads. In v1.35, the project introduced the foundational Workload API alongside gang scheduling and opportunistic batching. Now, Kubernetes v1.36 takes a major leap forward with a clear separation of concerns between static templates and runtime state, paving the way for more sophisticated scheduling decisions. This article explores 10 critical improvements in v1.36 that advance workload-aware scheduling, from architectural shifts to real-world integrations.

1. Architectural Revolution: Workload as Template, PodGroup as Runtime

The biggest change in v1.36 is the clean separation of API concerns. The Workload API now serves purely as a static template, defining the desired PodGroup specifications. Meanwhile, the new PodGroup API manages runtime state, including scheduling policies and pod conditions. This decoupling streamlines the scheduler’s job – it no longer needs to parse Workload objects, just PodGroup instances. Controllers (like the Job controller) stamp out PodGroup objects from Workload templates, enabling per-replica sharding of status updates for better scalability.

Kubernetes v1.36: 10 Key Enhancements for Workload-Aware Scheduling

2. New PodGroup Scheduling Cycle in kube-scheduler

To support the new PodGroup API, the kube-scheduler introduces a dedicated scheduling cycle for PodGroups. This cycle enables atomic workload processing, meaning the scheduler can consider all pods in a group as a single unit instead of individual requests. This is critical for gang scheduling scenarios where all pods must be allocated simultaneously. The new cycle also lays the groundwork for future innovations, such as advanced placement strategies and coordinated preemption.

3. Topology-Aware Scheduling – First Iterations

v1.36 debuts the first phase of topology-aware scheduling for PodGroups. This feature allows the scheduler to consider physical or logical topology constraints – such as node zones, racks, or GPU clusters – when placing pods from the same group. By spreading or co-locating pods based on workload requirements, topology-aware scheduling reduces latency for distributed training jobs and improves resource utilization. While still early, this iteration marks the beginning of more intelligent placement decisions in Kubernetes.

4. Workload-Aware Preemption – Smarter Decisions

Preemption in Kubernetes has traditionally been pod-centric. v1.36 introduces workload-aware preemption, where the scheduler can consider the broader impact of evicting pods belonging to a workload group. Instead of preempting individual pods, the scheduler now evaluates whether breaking a gang or batch job makes sense. This reduces the chance of thrashing and ensures that preemption decisions align with workload priorities and scheduling policies.

5. Dynamic Resource Allocation with ResourceClaim Support for PodGroups

Dynamic Resource Allocation (DRA) is a powerful mechanism for managing specialized hardware like GPUs and FPGAs. In v1.36, ResourceClaim support for PodGroups allows batch workloads to request and release resources as a group. This integration ensures that all pods in a PodGroup can acquire the necessary resources atomically, avoiding partial allocations that block job completion. DRA support makes Kubernetes more suitable for high-performance computing and AI training workloads.

6. Job Controller Integration – Real-World Readiness

To demonstrate the practicality of the new APIs, v1.36 delivers the first phase of integration between the Job controller and the new Workload/PodGroup API. The Job controller can now define Workload templates and create PodGroup objects automatically. This integration validates that real-world controllers can leverage the advanced scheduling capabilities without manual intervention. It also provides a migration path for existing Job users to adopt workload-aware scheduling.

7. Performance and Scalability Gains through Status Sharding

The decoupled API design isn’t just architecturally cleaner – it also improves performance. By moving runtime state to the PodGroup object, Kubernetes enables per-replica sharding of status updates. Large batch workloads with thousands of pods can now update their scheduling state in parallel, reducing contention on the Workload object. This sharding allows the scheduler and controllers to scale linearly with the size of the workload, a critical requirement for modern AI/ML clusters.

8. Streamlined Scheduler Logic – No More Watching Workload Objects

In v1.35, the scheduler had to watch Workload objects to understand PodGroup requirements. v1.36 eliminates that overhead. The scheduler now reads all necessary scheduling information directly from the PodGroup API. This simplification reduces cache pressure, lowers API server load, and makes the scheduler code more maintainable. It also allows third-party schedulers to integrate with Kubernetes workload scheduling without deep knowledge of the Workload template structure.

9. Transition to v1alpha2 API Group – What Changed from v1.35

The Workload and PodGroup APIs are now part of the scheduling.k8s.io/v1alpha2 API group, completely replacing the previous v1alpha1 version. In v1.35, PodGroups and runtime states were embedded inside the Workload resource. v1.36 promotes them to a separate, first-class object. This transition means existing users must update their manifests and controllers to use the new v1alpha2 API. The change also includes updated configuration examples – controllers now define Workload templates and let PodGroup objects handle runtime state.

10. Future Enhancements: Roadmap Ahead

v1.36 is not the end – it’s a foundation. The PodGroup scheduling cycle, topology-aware scheduling, and workload-aware preemption are all in their early stages. The Kubernetes community plans to extend these capabilities with advanced placement constraints, cross-cluster scheduling, and deeper integration with other controllers (e.g., CronJob, MPIJob). The clear separation of template and runtime state also opens the door for more sophisticated scheduling frameworks. Organizations adopting v1.36 are well-positioned to benefit from these upcoming innovations.

Conclusion: Kubernetes v1.36 marks a transformative step in workload-aware scheduling. By separating the Workload template from the PodGroup runtime object, introducing dedicated scheduling cycles, and integrating with the Job controller, the project has made batch and AI/ML scheduling more efficient, scalable, and realistic. Whether you're running training jobs or distributed data pipelines, these 10 enhancements provide the building blocks for smarter, more resource-efficient Kubernetes clusters.