InPlacePodVerticalScaling: Scale Pods Without Restarting
In many Kubernetes setups, adjusting the CPU or memory assigned to a pod has historically meant one thing: restarting it. That’s because updating a pod’s resource requests or limits typically requires the pod to be recreated by its controller. While this works fine for replicated workloads behind a service, it's far less ideal when you're dealing with singleton pods, batch jobs, or long-lived processes where restarts are expensive or disruptive. In some cases, restarting isn’t even an option, think data pipelines in the middle of a transformation or analytical queries that take hours to complete.
Kubernetes v1.27 introduced InPlacePodVerticalScaling
as an alpha feature, which was later promoted to beta in v1.33. This guide walks through how to enable and use this feature, what changes under the hood, and where it makes a real difference. You’ll see how to update CPU allocations on the fly without disruption and how to increase memory when needed, even if it requires a container restart. For developers and platform engineers managing resource-sensitive workloads, this feature opens up a new dimension of flexibility in Kubernetes.
What Is InPlacePodVerticalScaling?¶
Traditionally, changing a pod’s resource requests or limits meant terminating the pod and allowing a controller, like a Deployment, to bring up a new one with updated specifications. This works fine when your workload runs multiple replicas behind a Service, as traffic simply shifts to other instances during the update. But when you're dealing with singleton pods, batch jobs, or stateful workloads where scale-out isn't an option or restart delays break processing windows, this approach falls short.
InPlacePodVerticalScaling
changes that. Introduced as an alpha feature in Kubernetes v1.27 and promoted to beta in v1.33, it enables you to change the CPU and memory configuration of a running pod without terminating it. Kubernetes does this by applying changes directly to the container's cgroup allocation.
It also supports resizing of restartable init containers (commonly used for sidecars), further expanding its usefulness in service mesh and observability tooling scenarios.
This has huge implications for operational flexibility. You can dynamically adjust workloads to respond to changing resource demands, whether it's a batch job that unexpectedly spikes in memory usage or a Java application that needs more CPU only during initialization.
To enable this feature, it must be explicitly activated in your cluster.
Enabling the Feature Gate¶
To use in-place scaling, ensure the following:
- Your Kubernetes cluster must be version 1.27 or later.
- The feature gate
InPlacePodVerticalScaling
must be enabled on both the control plane and all nodes.
On the API server, add this argument:
--feature-gates=InPlacePodVerticalScaling=true
And on each Kubelet node:
featureGates:
InPlacePodVerticalScaling: true
Restart the kubelet and API server after applying these changes.
Creating a Pod That Supports In-Place Resizing¶
Here’s a pod spec that allows CPU resizing without restarting the container and configures memory changes to trigger a restart:
apiVersion: v1
kind: Pod
metadata:
name: inplace-demo
spec:
containers:
- name: app
image: nginx
command: ["sleep", "3600"]
resources:
limits:
cpu: "500m"
memory: "512Mi"
requests:
cpu: "500m"
memory: "512Mi"
resizePolicy:
- resourceName: cpu
restartPolicy: NotRequired
- resourceName: memory
restartPolicy: RestartContainer
This configuration means we can scale CPU allocations without downtime, but increasing memory will cause the container to restart.
Scaling CPU Without a Restart¶
If you need more CPU, you can patch the pod's spec like this:
kubectl patch pod inplace-demo --subresource=resize --patch '
{
"spec": {
"containers": [
{
"name": "app",
"resources": {
"requests": {"cpu": "800m"},
"limits": {"cpu": "800m"}
}
}
]
}
}'
The change is applied immediately. You can confirm it was successful by checking that the container did not restart:
kubectl get pod inplace-demo -o jsonpath='{.status.containerStatuses[0].restartCount}'
And verify that the new values are reflected:
kubectl get pod inplace-demo -o jsonpath='{.status.containerStatuses[0].resources}'
If both values show cpu: 800m
, your CPU was successfully increased without interruption.
Increasing Memory While a Pod Is Running¶
Let’s say your workload is processing a large dataset and hits a memory ceiling. You can give it more memory by patching the pod as follows:
kubectl patch pod inplace-demo --subresource=resize --patch '
{
"spec": {
"containers": [
{
"name": "app",
"resources": {
"requests": {"memory": "2Gi"},
"limits": {"memory": "2Gi"}
}
}
]
}
}'
Because our resize policy for memory is set to RestartContainer
, this change will restart the container. You can confirm the restart by checking the restart count:
kubectl get pod inplace-demo -o jsonpath='{.status.containerStatuses[0].restartCount}'
This restart is expected and necessary if your application doesn't support dynamic memory allocation. If your workload can handle memory increases at runtime (e.g., JVM with dynamic heap sizing), you can set the memory resize policy to NotRequired
.
What Happens Under the Hood?¶
Once a resize is requested, the Kubelet coordinates with the container runtime to apply the new settings. The pod’s .status
section gets updated to reflect both what’s been requested and what’s actually enforced.
Key fields to be aware of:
spec.containers[].resources
: the desired configurationstatus.containerStatuses[].resources
: the current configuration on the running containerstatus.containerStatuses[].allocatedResources
: confirmed by the nodestatus.resize
: indicates progress (InProgress
,Deferred
,Infeasible
, etc.)
This separation allows Kubernetes to reflect real-world drift between what was asked and what was applied, especially helpful in cases where resizing is deferred due to lack of node resources.
Handling Resource Limits and Failures¶
If you attempt to resize a pod with resource values beyond what the node can provide, Kubernetes won’t apply the change. Instead, it marks the request as Infeasible
or Deferred
, depending on whether the request can be satisfied later.
Here’s how to test an oversized CPU request:
kubectl patch pod inplace-demo --subresource=resize --patch '
{
"spec": {
"containers": [
{
"name": "app",
"resources": {
"requests": {"cpu": "1000"},
"limits": {"cpu": "1000"}
}
}
]
}
}'
Then inspect the status:
kubectl get pod inplace-demo -o json | jq '.status.conditions[] | select(.type=="PodResizePending")'
When This Is Useful¶
This feature is most helpful when you need to adjust resources dynamically without sacrificing uptime. Some examples:
- A JVM-based application that spikes in CPU during boot but settles after initialization.
- CI pipelines that build or test code on-demand and may need temporary resource boosts.
- Stateful services that should not be restarted mid-session.
- Data processing jobs that consume more memory as they load and analyze large datasets.
- Sidecar-based pods (e.g., for logging or service mesh proxies) that need runtime adjustments without container restarts.
Limitations¶
As of Kubernetes v1.33:
- Only CPU and memory can be resized.
- QoS class cannot change after pod creation.
- Reducing memory limits requires a restart.
- Non-restartable init containers and ephemeral containers are not supported.
- containerd 1.6.9 or higher is required.
Final Thoughts¶
InPlacePodVerticalScaling offers fine-grained control over pod resource allocation. It fills a critical gap for workloads that require elasticity but cannot afford downtime. If you're running Kubernetes v1.33+, it’s stable enough to experiment with in development or staging environments, and potentially in production, depending on your tolerance for restart behavior.
With each release, Kubernetes is expanding the scope of this feature, from regular containers to sidecars, making it increasingly production-ready for complex, multi-container workloads.
Try It Out¶
You can test this locally using Minikube:
minikube start --feature-gates=InPlacePodVerticalScaling=true
Or use a real Kubernetes cluster (v1.27+), ensuring the feature gate is enabled on all nodes.
FAQs
What is InPlacePodVerticalScaling in Kubernetes?
InPlacePodVerticalScaling is a Kubernetes feature (beta as of v1.33) that allows you to increase CPU or memory resources of a running pod without recreating it. CPU changes can happen without a restart, while memory updates may require one based on the configured resize policy.
How do I enable InPlacePodVerticalScaling?
Enable the InPlacePodVerticalScaling
feature gate on both the API server and kubelets by passing --feature-gates=InPlacePodVerticalScaling=true
. This requires Kubernetes v1.27 or later and a supported container runtime (e.g., containerd ≥ 1.6.9).
What types of workloads benefit most from this feature?
InPlacePodVerticalScaling is ideal for:
- Singleton or stateful pods that can't be restarted easily
- Data pipelines and analytics jobs in progress
- CI jobs needing temporary resource boosts
- JVM-based apps with dynamic CPU needs
- Sidecar-heavy workloads where restarts are disruptive
What are the limitations of InPlacePodVerticalScaling in v1.33?
- Only CPU and memory are supported
- QoS class can't change after pod creation
- Memory reduction always requires a restart
- Does not support init or ephemeral containers
- Requires containerd v1.6.9+
How do I know if a resource update succeeded or failed?
Use kubectl get pod <pod-name> -o yaml
to inspect:
.status.resize
for operation state (e.g., InProgress, Deferred, Infeasible).status.containerStatuses[].allocatedResources
to verify what was actually applied
If a request exceeds node capacity, Kubernetes marks the resize asInfeasible
or defers it.