Restore Kubernetes Objects from etcd Without Downtime

May 14, 2025

Kubernetes

Read time: 4 minutes

Abhimanyu Saharan

Over the years of maintaining Kubernetes infrastructure, I’ve encountered several situations where something small caused noticeable disruption, a deleted ConfigMap, a misapplied manifest, or a CI job wiping out a critical resource. One such case occurred in our QA environment, where a teammate accidentally deleted a ConfigMap needed by a test application. The app began failing silently, and our CI pipelines were blocked mid-run. A full etcd restore was technically an option, but restoring the entire cluster state just to recover one object didn’t make sense.

That incident led me to adopt a more focused recovery strategy, one that addresses exactly what’s broken without touching anything else. This post outlines that approach: extracting and restoring individual Kubernetes resources from an etcd snapshot. If you're responsible for cluster stability and need minimal-impact recovery, this method is worth integrating into your operational toolkit.

Why a Full Restore Should Be the Last Resort¶

etcd is the central datastore for every object in a Kubernetes cluster. A snapshot restore doesn’t just replace a single resource, it rewinds the entire cluster to an earlier state. That rollback can have unintended consequences: orphaned pods, outdated secrets, or corrupted controller caches.

In high-availability environments, the goal is often to fix a single broken piece without disturbing the rest of the system. This is where surgical recovery comes in. Instead of resetting the entire state, we target exactly what was lost, recover it, and leave everything else untouched.

What This Method Enables¶

This guide explains how to extract and restore specific resources, like ConfigMaps, Secrets, Deployments, and more, from an etcd snapshot. You’ll learn how to:

Mount a snapshot locally and run a throwaway etcd instance
Navigate etcd's internal structure to locate the exact resource
Decode the binary etcd values into clean YAML
Reintroduce only the affected resource back into the live cluster

This workflow avoids the need for a full cluster rollback and reduces both downtime and risk.

Prerequisites¶

Before starting, make sure you have access to the following tools:

etcd v3.4 or higher
etcdctl (etcd’s CLI interface)
auger (for decoding binary values from etcd snapshots into readable YAML)
kubectl (for applying Kubernetes objects)
A recent etcd snapshot file (live-cluster-snapshot.db)

Create a backup of your current state before making any changes:

If you’re starting with a compressed snapshot:

Step 1: Restore the Snapshot to a Local Directory¶

Use etcdctl to unpack the snapshot into a temporary directory:

This creates a local copy of the cluster state as it existed at the time of the snapshot.

Step 2: Start a Temporary Local etcd Instance¶

This step launches etcd in standalone mode, purely for data inspection:

Ensure it’s listening properly:

No need to join this instance to your live cluster. We’re only using it for read access to the snapshot data.

Step 3: Locate and Extract the Resource¶

To recover a specific ConfigMap, you need to know its etcd key path. For a ConfigMap named app-config in the production namespace:

This will return entries like:

To fetch and decode it:

The resulting file should look like a standard Kubernetes manifest:

apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
  namespace: production
data:
  api-url: "https://api.example.com"
  log-level: "debug"

Step 4: Apply It to the Live Cluster¶

First, run a dry test to ensure the YAML is valid:

Then apply:

If the object was truly missing, you should see:

In some cases, if the object exists in a broken state, you may need to delete and reapply.

Step 5: Clean Up¶

Once recovery is complete, tear down the temporary setup:

Leaving stray etcd processes running on your system is never a good idea, even if they’re running locally.

etcd Key Patterns for Common Kubernetes Resources¶

Here are some quick references for locating other resource types in etcd:

Resource Type	etcd Key Path Format
ConfigMap	/registry/configmaps/{namespace}/{name}
Secret	/registry/secrets/{namespace}/{name}
Deployment	/registry/deployments/{namespace}/{name}
Pod	/registry/pods/{namespace}/{name}
ServiceAccount	/registry/serviceaccounts/{namespace}/{name}
CRDs	/registry/{group}/{resource}/{namespace}/{name}

Handling More Complex Scenarios¶

Reapplying to a Different Namespace¶

You can easily repurpose a resource for a different namespace:

This is helpful in testing environments where you want to restore production data in isolation.

Encrypted etcd¶

If your cluster uses encryption at rest (such as with KMS), make sure the temporary etcd instance is configured with the appropriate keys. Otherwise, you won’t be able to decode the data.

Bulk Recovery¶

To restore all ConfigMaps in a namespace:

You can then review and selectively apply the relevant ones.

Final Thoughts¶

Precision recovery from etcd snapshots is a critical skill that’s often overlooked in production operations. Having the ability to restore exactly what’s broken, not more, not less, saves time, reduces blast radius, and builds confidence in incident response procedures.

I’ve used this method in real-world scenarios, and it’s proven to be one of the most effective tools for stabilizing clusters without introducing new risk. Whether you’re supporting a high-availability production cluster or simply want a safer recovery strategy, this is one workflow you should have on hand.

FAQs

Why shouldn't I restore the full etcd snapshot to recover a single resource?

A full etcd restore reverts the entire cluster state, which can lead to unintended side effects like outdated secrets, orphaned pods, or controller cache inconsistencies. It's better to surgically extract and reapply only the missing resource to avoid unnecessary disruption.

What tools are required to restore a specific Kubernetes object from an etcd snapshot?

You need:

etcdctl (v3.4 or higher)
A recent etcd snapshot
auger (to decode etcd entries)
kubectl (to reapply the resource)

How do I locate a Kubernetes object inside an etcd snapshot?

You must:

Launch a temporary local etcd instance from the snapshot
Use etcdctl to list keys (e.g., /registry/configmaps/<namespace>/<name>)
Extract the binary value and decode it using auger to produce valid YAML

Can I recover a resource to a different namespace or environment?

Yes. You can modify the decoded YAML to change the namespace field before applying it. This is useful for restoring production data into staging or test environments.

What precautions should I take before and after performing this recovery?

Before: Backup the current cluster state
During: Run kubectl apply --dry-run=client to validate the resource
After: Clean up the temporary etcd instance to avoid running stray services locally