Development
14 min read
24 views

Zero-Restart Scaling: Kubernetes In-Place Pod Resizing and DRA for Stateful Simulation Bridges

IT
InstaTunnel Team
Published by our engineering team
Zero-Restart Scaling: Kubernetes In-Place Pod Resizing and DRA for Stateful Simulation Bridges

Quick answer

Zero-Restart Scaling: Using Kubernetes DRA and In-Place : localhost tunnel answer

A localhost tunnel gives your local app a public HTTPS URL without opening router ports, which is useful for demos, QA, mobile testing, and provider callbacks.

How do I expose localhost without opening ports?

Use a reverse HTTPS tunnel. Your machine connects outbound to the tunnel service, and the public URL forwards requests back to your local app.

When should I use a localhost tunnel?

Use one for webhook testing, OAuth callbacks, client demos, QA previews, mobile device checks, and short-lived development reviews.

Killing and rescheduling a pod just to grant it more GPU memory is a catastrophic disruption for stateful, real-time rendering bridges. This is a guide to how Kubernetes’ In-Place Pod Resize (stable since v1.35) and Dynamic Resource Allocation (stable since v1.34, and still expanding in v1.36) let you scale spatial-networking pipelines vertically without dropping a frame — and where the rough edges still are.

Introduction: The High-Stakes Disruption of Pod Rescheduling

Real-time 3D simulation and digital twin synchronization have become mission-critical workloads in industrial computing. Platforms like NVIDIA Omniverse are used to build high-fidelity digital replicas of factory floors, logistics hubs, and aerospace systems, and these environments are intensely stateful — they process continuous streams of spatial telemetry and dense point-cloud data.

To connect physical hardware to cloud-based simulation engines, teams commonly run localized data-routing pods — “simulation bridges” — inside cloud or edge Kubernetes clusters. These bridges hold long-lived WebSocket or gRPC connections, buffer frame sequences in memory, and often use local GPU acceleration to decimate or pre-render dense CAD/point-cloud data before forwarding it to a centralized visualization environment.

Historically, if a bridge pod ran into its resource limits, Kubernetes had only one tool: evict and reschedule. A vertical autoscaler or operator had to kill the pod, find a node with spare capacity, re-mount volumes, and boot the container back up. For a stateless web service that’s a minor blip. For a stateful simulation bridge, it’s disruptive in several concrete ways:

  • Broken tunnels — live TCP/WebSocket sessions to physical sensors drop instantly, triggering timeout cascades on edge devices.
  • Cache invalidation — in-memory frame buffers and spatial indexes are wiped, forcing a slow re-initialization.
  • Visual stutter — real-time rendering sync freezes or drops frames for human operators and automated inspection systems.

Two Kubernetes features close this gap: In-Place Pod Resize (container CPU/memory mutation without a restart) and Dynamic Resource Allocation, or DRA (attribute-based hardware device claims that don’t require pod recreation to change). Neither is brand new anymore, and getting the version history right matters if you’re planning a production rollout — so here’s where things actually stand.

Getting the Timeline Right

It’s worth being precise about when each feature actually stabilized, because these two features graduated in different releases, not together as a single package:

Feature Alpha Beta Stable (GA)
In-Place Pod Resize (container-level, KEP-1287) v1.27 (2023) v1.33 (May 2025) v1.35 (Dec 17, 2025)
Dynamic Resource Allocation (core, resource.k8s.io/v1) v1.26 v1.32–1.33 v1.34 (Aug/Sep 2025)
In-place Pod-level resize (aggregate resources, KEP-5419) v1.35 v1.36 (April 22, 2026) — (still beta)

So by the time Kubernetes 1.35 (“Timbernetes”) shipped, DRA had already been GA for one release cycle. What 1.35 actually delivered was in-place resize graduating to stable, building on top of an already-stable DRA foundation — the two features complementing each other, but on separate timelines. The v1.35 release also lifted a longstanding restriction: memory limit decreases, which were previously disallowed entirely, are now permitted, gated by a best-effort kubelet check against current usage.

The latest Kubernetes release as of this writing is v1.36 (“Haru”), shipped April 22, 2026, which extends both features further — covered near the end of this piece.

The Role of cgroups v2

In-place resizing depends on the Linux kernel’s cgroups v2 unified hierarchy, which lets the kubelet rewrite resource boundaries (cpu.max, memory.max) on a running process’s cgroup without sending it a termination signal. The kernel enforces the new boundary immediately; the process’s PID, open sockets, and in-memory state are untouched.

Traditional GPU scheduling, by contrast, relied on the Device Plugin model, which requests GPUs as opaque integer counts (nvidia.com/gpu: 1). There’s no way to request a fractional VRAM increase or swap a device profile without tearing the pod down — this is the gap DRA fills.

Deep-Dive: The Mechanics of In-Place Pod Resizing

The resize subresource, not a raw PATCH

One correction worth flagging up front: an in-place resize isn’t applied via a generic PATCH against the pod object. Kubernetes exposes this as a dedicated /resize subresource, and kubectl needs to be v1.32 or later to use it:

kubectl patch pod omniverse-local-bridge \
  --subresource resize \
  --type='json' \
  -p='[
    {"op": "replace", "path": "/spec/containers/0/resources/requests/cpu", "value": "8"},
    {"op": "replace", "path": "/spec/containers/0/resources/limits/cpu", "value": "8"},
    {"op": "replace", "path": "/spec/containers/0/resources/requests/memory", "value": "32Gi"},
    {"op": "replace", "path": "/spec/containers/0/resources/limits/memory", "value": "32Gi"}
  ]'

Routing resizes through their own subresource matters operationally too: it means you can grant a patch verb on pods/resize to an autoscaling controller without granting it write access to the rest of the pod spec — a meaningfully tighter RBAC surface than a blanket pod-update permission.

How Kubernetes tracks a resize

The lifecycle is tracked through pod status fields and conditions, not a single top-level enum:

Field / Condition Meaning
spec.containers[*].resources Desired resources — what you asked for.
status.containerStatuses[*].resources Actual/allocated resources currently applied to the running container.
PodResizePending (reason: Deferred) Node temporarily lacks capacity; kubelet will retry.
PodResizePending (reason: Infeasible) The request can never be satisfied on this node (e.g., it exceeds total node capacity, or the pod uses a static CPU/memory manager policy). The pod keeps running at its prior allocation.
PodResizeInProgress Kubelet has accepted the resize and is actively applying it.

QoS class immutability

A pod’s QoS class (Guaranteed, Burstable, BestEffort) is computed from the relationship between requests and limits, and it cannot change as a result of a resize — this remains one of the explicit non-goals of KEP-1287. If a Guaranteed bridge pod (requests == limits) has its limit patched without an equal change to its request, the API server rejects the patch. Scale both fields together.

The downscaling guardrail

Before the GA release, decreasing a memory limit was blocked outright. As of v1.35, decreases are allowed, but the kubelet performs a best-effort safety check: it reads the container’s current memory usage via cgroup stats before committing a lower memory.max. If usage exceeds the proposed limit, the kubelet holds the resize in PodResizePending (reason Deferred) rather than risking an OOM-kill. This check is explicitly not guaranteed — it’s a time-of-check/time-of-use race, so aggressive downscaling on workloads with volatile memory footprints (like a simulation bridge mid-burst) still deserves caution.

When a node can’t satisfy every pending resize at once, deferred requests are retried in priority order: first by PriorityClass, then by QoS class (Guaranteed before Burstable), and finally by how long a request has been waiting.

Deep-Dive: Dynamic Resource Allocation for GPUs

DRA’s core API kinds all live under the stable resource.k8s.io/v1 API group (not v1alpha3, which was the pre-GA version used during the 1.32–1.33 beta period):

  • DeviceClass — a cluster-scoped definition (created by admins or device vendors) that classifies a pool of hardware using CEL (Common Expression Language) selector expressions — e.g., matching devices where device.driver == "gpu.nvidia.com".
  • ResourceSlice — a live inventory published per-node by the device driver, describing available devices, their attributes, and capacity.
  • ResourceClaim — the device analog of a PersistentVolumeClaim: a concrete request for hardware matching a DeviceClass, with a lifecycle independent of any single pod.
  • ResourceClaimTemplate — embedded in a pod spec so Kubernetes auto-generates a dedicated ResourceClaim per pod instance and tears it down when the pod terminates.

Architectural comparison

Legacy Device Plugins DRA (resource.k8s.io/v1)
Allocation Integer counts (nvidia.com/gpu: 1) Attribute-based (VRAM, MIG profile, driver, topology) via CEL
Live reconfiguration Requires full pod recreation Claims can be updated on a template basis; new claims can be resolved without touching unrelated pod fields
Device sharing Static, vendor-specific hacks Native sharing (and, as of v1.36 beta, native GPU partitioning)

Architecting a Zero-Downtime Simulation Bridge

Here’s a corrected manifest using the stable DRA schema (note the exactly: block wrapping deviceClassName and selectors, which the current stable API requires):

apiVersion: v1
kind: Pod
metadata:
  name: omniverse-local-bridge
  namespace: spatial-net
  labels:
    app: simulation-pipeline
spec:
  resourceClaims:
    - name: dynamic-gpu-allocation
      resourceClaimTemplateName: omniverse-gpu-template

  containers:
    - name: spatial-router-container
      image: cr.enterprise.internal/spatial/omniverse-bridge:v2026.2.1
      imagePullPolicy: IfNotPresent

      # Controls whether a resource change forces a container restart
      resizePolicy:
        - resourceName: cpu
          restartPolicy: NotRequired
        - resourceName: memory
          restartPolicy: NotRequired

      # Baseline resources — Guaranteed QoS (requests == limits)
      resources:
        requests:
          cpu: "4"
          memory: "16Gi"
        limits:
          cpu: "4"
          memory: "16Gi"

      claims:
        - name: dynamic-gpu-allocation

      ports:
        - containerPort: 8080
          name: websocket-sync
        - containerPort: 9090
          name: grpc-telemetry
---
apiVersion: resource.k8s.io/v1
kind: ResourceClaimTemplate
metadata:
  name: omniverse-gpu-template
  namespace: spatial-net
spec:
  spec:
    devices:
      requests:
        - name: primary-rendering-core
          exactly:
            deviceClassName: enterprise-nvidia-gpu
            selectors:
              - cel:
                  expression: |-
                    device.attributes["gpu.nvidia.com"].profile == "1g.5gb" ||
                    device.attributes["gpu.nvidia.com"].profile == "3g.20gb"

To scale this pod up, an autoscaling controller sends a JSON patch against the pod’s /resize subresource (as shown above), while separately updating the companion ResourceClaim to request a larger MIG profile. The kubelet checks the node’s unallocated CPU/memory, coordinates with the DRA driver over gRPC to remap the GPU device, and updates the cgroup boundaries — all while spatial-router-container’s WebSocket loop keeps reading uninterrupted.

Production Edge Cases and Guardrails

Deferred capacity. If a node is near-saturated, a resize request sits in PodResizePending (Deferred) until capacity clears — retried automatically per the priority ordering described above, but with no guaranteed time bound. For a latency-sensitive bridge, it’s worth setting a fallback threshold (e.g., a controller escalating to a manual migration if a resize has been deferred for more than a few seconds) rather than trusting an open-ended wait during a live burst.

Controller/spec drift. A resize applied directly via the /resize subresource does not update the owning Deployment or StatefulSet’s pod template. If the node later fails and the controller reschedules a replacement, it will spawn a pod at the original, un-scaled resource profile. Production setups typically pair in-place resizing with a VPA in InPlaceOrRecreate mode (beta, building on KEP-1287) or a custom operator that mirrors the applied resize back onto the controller via annotation, so a rescheduled replacement starts at the right size.

Static resource-manager policies. Resizing a Guaranteed-QoS pod is explicitly out of scope for KEP-1287 when the node runs a static CPU or memory manager policy (used to pin exclusive cores/NUMA memory to a pod) — such a resize is rejected as Infeasible. If your bridge nodes rely on static CPU pinning for jitter-sensitive workloads, plan around this rather than assuming in-place resize will apply universally; some community discussion has floated future support for this combination, but it isn’t part of the stable feature as of v1.36.

What’s New Since GA: Kubernetes v1.36 (“Haru”, April 2026)

Since this stack originally went GA, the following release (v1.36, shipped April 22, 2026 and the current stable line as of mid-2026) pushed both features further — relevant if you’re designing a bridge architecture today rather than in December 2025:

  • Pod-level in-place resize (beta, on by default, requires cgroups v2) extends resizing from individual containers to the pod’s aggregate resource envelope, gated behind four feature flags (PodLevelResources, InPlacePodVerticalScaling, InPlacePodLevelResourcesVerticalScaling, NodeDeclaredFeatures). Useful for multi-container bridge pods (e.g., a router container plus a telemetry sidecar) where you want to scale a shared resource budget rather than each container individually.
  • Partitionable devices and Consumable Capacity in DRA moved to beta and are enabled by default — this is the native equivalent of manually requesting MIG slices via CEL selectors as in the manifest above, letting DRA understand GPU partitions directly rather than treating a MIG slice as an opaque device.
  • Device Taints and Tolerations (beta) let a DRA driver mark a degrading GPU directly in its ResourceSlice (e.g., after an ECC error), so new claims avoid it without the driver needing to yank the device from inventory entirely.
  • Resource Health Status for Pods (allocatedResourcesStatus field, beta) surfaces per-device health directly in pod status and via kubectl describe pod — useful for distinguishing “the bridge container crashed” from “the GPU it was allocated is unhealthy,” across both DRA and legacy device plugins.

None of this changes the core architecture described above, but the health-status and device-taint features in particular close a real observability gap for exactly this kind of latency-sensitive GPU bridge: you can now know a device is degrading before it forces an unplanned migration.

Conclusion

In-Place Pod Resize (GA in v1.35) and Dynamic Resource Allocation (GA in v1.34) — released one cycle apart, not simultaneously — together remove the need to destroy a pod to change its resource footprint, CPU/memory or GPU. For stateful simulation bridges holding live WebSocket/gRPC state, that’s the difference between a live pipeline that expands and contracts with sensor load, and one that drops connections every time load spikes. The mechanics matter in practice: resizes go through a dedicated /resize subresource, QoS class is immutable, downscaling is checked but not guaranteed safe, and static CPU/memory pinning remains an explicit gap. Building around those constraints — rather than around the idealized version of the feature — is what makes a zero-restart architecture actually zero-restart in production.


Changelog

Corrections to the original draft: 1. DRA’s GA release was v1.34, not v1.35. The draft implied DRA “graduated to full enterprise stability alongside v1.35.” DRA’s core (resource.k8s.io/v1) went GA in Kubernetes 1.34 (Aug/Sep 2025); v1.35 (Dec 2025) is where In-Place Pod Resize (KEP-1287) reached GA. Added a version-history table to make this explicit. 2. Resize API mechanics corrected. The original JSON-patch example patched the pod object directly. Kubernetes requires resizes to go through the dedicated /resize subresource (kubectl patch --subresource resize, requiring kubectl v1.32+). Updated the example and added the RBAC implication. 3. Pod status model corrected. The draft’s table implied Allocated, Resources, PodResizePending, and Infeasible were parallel top-level fields. In reality: desired/actual resources live in spec/status.containerStatuses, and Deferred/Infeasible are reasons on the PodResizePending condition, with PodResizeInProgress as a separate condition. Rebuilt the table to reflect this. 4. DRA manifest API version corrected. The original YAML used resource.k8s.io/v1alpha3, a pre-GA API version. Updated to the stable resource.k8s.io/v1 schema, including the exactly: wrapper the current API requires around deviceClassName and selectors. 5. Added the memory-limit-decrease change and deferred-resize retry ordering (priority class → QoS → wait time) — both new in the v1.35 GA release and not mentioned in the draft. 6. Added nuance to the static-manager-policy limitation — confirmed as an explicit KEP-1287 non-goal (resize marked Infeasible) rather than a blanket incompatibility, with a note that this remains an open area. 7. Added a new “What’s New in v1.36” section covering pod-level resize (beta), partitionable DRA devices, device taints/tolerations, and DRA resource health status — since v1.36 (“Haru,” April 2026) is the current stable release as of this writing, six months on from when the original draft was framed as describing the “latest” state. 8. Removed leftover SEO/AI-draft artifacts (run-on title, no paragraph breaks in the intro) and reformatted throughout as clean Markdown.

Primary sources consulted: - Kubernetes 1.35: In-Place Pod Resize Graduates to Stable — official K8s blog - Resize CPU and Memory Resources assigned to Containers — official docs - Resize CPU and Memory Resources assigned to Pods — official docs (pod-level resize) - In-Place Update of Pod Resources, KEP-1287 — enhancement proposal - Kubernetes v1.34: DRA has graduated to GA — official K8s blog - Dynamic Resource Allocation — official docs - Allocate Devices to Workloads with DRA — official docs (stable API examples) - Kubernetes v1.36: ハル (Haru) — official K8s blog - Kubernetes v1.36 Release: New Features, Stable APIs & Breaking Changes — PerfectScale - Kubernetes releases — version/support matrix

Continue from this article into the most relevant product guides and workflows.

Related Topics

#Kubernetes in-place pod resize, Dynamic Resource Allocation K8s, stateful GPU pod scaling, NVIDIA Omniverse local bridge architecture, zero-downtime hardware bridge, Kubernetes v1.35 features, cgroup v2 resource mutation, real-time rendering pipeline scaling, spatial computing networking, 3D simulation infrastructure, dynamic GPU provisioning, zero-restart container scaling, Kubernetes DRA resource claims, edge hardware bridge orchestration, stateful workload autoscaling, avoiding pod eviction, Kubernetes GPU device classes, continuous hardware tunneling, cloud-native spatial data, resizing pod resources in place, DevSecOps infrastructure 2026, low-latency 3D streaming proxy, uninterrupted spatial simulation, dynamic compute scaling, hardware accelerator provisioning, persistent state rendering tunnels, Kubernetes API ResourceSlice, advanced pod lifecycle management

Keep building with InstaTunnel

Read the docs for implementation details or compare plans before you ship.

Share this article

More InstaTunnel Insights

Discover more tutorials, tips, and updates to help you build better with localhost tunneling.

Browse All Articles