Operational Playbook
This playbook covers the full lifecycle of chant-produced Kubernetes manifests — from build through production debugging. The same content is available to AI agents via the /chant-k8s skill.
Build & validate
Section titled “Build & validate”| Step | Command | What it catches |
|---|---|---|
| Lint source | chant lint src/ | Hardcoded namespaces (WK8001) |
| Build manifests | chant build src/ --output manifests.yaml | Post-synth: secrets in env (WK8005), latest tags (WK8006), API keys (WK8041), missing probes (WK8301), no resource limits (WK8201), privileged containers (WK8202), and more |
| Server dry-run | kubectl apply -f manifests.yaml --dry-run=server | K8s API validation: schema errors, admission webhooks |
Run lint on every edit. Run build + dry-run before every apply.
Deploy to Kubernetes
Section titled “Deploy to Kubernetes”# Buildchant build src/ --output manifests.yaml
# Diff before applyingkubectl diff -f manifests.yaml
# Dry run (validates with admission webhooks)kubectl apply -f manifests.yaml --dry-run=server
# Applykubectl apply -f manifests.yamlRollout & rollback
Section titled “Rollout & rollback”# Watch rollout progresskubectl rollout status deployment/my-app --timeout=300s
# Check rollout historykubectl rollout history deployment/my-app
# Undo last rolloutkubectl rollout undo deployment/my-app
# Roll back to a specific revisionkubectl rollout undo deployment/my-app --to-revision=2Debugging strategies
Section titled “Debugging strategies”Pod status and events
Section titled “Pod status and events”# Overviewkubectl get pods -l app.kubernetes.io/name=my-appkubectl get events --sort-by=.lastTimestamp -n <namespace>
# Deep dive into a specific podkubectl describe pod <pod-name>
# Logs (current and previous crash)kubectl logs <pod-name>kubectl logs <pod-name> --previouskubectl logs <pod-name> -c <container-name> # specific containerkubectl logs deployment/my-app --all-containers
# Debug containers (K8s 1.25+)kubectl debug <pod-name> -it --image=busybox --target=<container>
# Port-forwarding for local testingkubectl port-forward svc/my-app 8080:80kubectl port-forward pod/<pod-name> 8080:8080Resource inspection
Section titled “Resource inspection”# Get all resources in namespacekubectl get all -n <namespace>
# YAML output for debuggingkubectl get deployment/my-app -o yaml
# Check resource usagekubectl top pods -l app.kubernetes.io/name=my-appkubectl top nodesCommon error patterns
Section titled “Common error patterns”| Status | Meaning | Diagnostic command | Typical fix |
|---|---|---|---|
| Pending | Not scheduled | kubectl describe pod → Events | Check resource requests, node selectors, taints, PVC binding |
| CrashLoopBackOff | App crashing on start | kubectl logs --previous | Fix app startup, check probe config, increase initialDelaySeconds |
| ImagePullBackOff | Image not found | kubectl describe pod → Events | Verify image name/tag, check imagePullSecrets, registry auth |
| OOMKilled | Out of memory | kubectl describe pod → Last State | Increase memory limit, profile app memory usage |
| Evicted | Node disk/memory pressure | kubectl describe node | Increase limits, add node capacity, check for log/tmp bloat |
| CreateContainerError | Container config issue | kubectl describe pod → Events | Check volume mounts, configmap/secret refs, security context |
| Init:CrashLoopBackOff | Init container failing | kubectl logs -c <init-container> | Fix init container command, check dependencies |
Deployment strategies
Section titled “Deployment strategies”- RollingUpdate (default): Gradually replaces pods. Set
maxSurgeandmaxUnavailable. - Recreate: All pods terminated before new ones created. Use for stateful apps that cannot run multiple versions.
- Canary: Deploy a second Deployment with 1 replica + same selector labels. Route percentage via Ingress annotations or service mesh.
- Blue/Green: Two full Deployments (blue/green), switch Service selector between them.
Production safety
Section titled “Production safety”Pre-apply validation
Section titled “Pre-apply validation”# Always diff before applyingkubectl diff -f manifests.yaml
# Server-side dry run (validates with admission webhooks)kubectl apply -f manifests.yaml --dry-run=server
# Client-side dry run (fast, but no webhook validation)kubectl apply -f manifests.yaml --dry-run=clientUse server-side dry-run before production applies — it catches schema errors and runs admission webhooks. Client-side dry-run is faster but only validates locally.
Troubleshooting reference
Section titled “Troubleshooting reference”| Symptom | Likely cause | Resolution |
|---|---|---|
| Pod stuck in Pending | Insufficient CPU/memory on nodes | Scale up cluster or reduce resource requests |
| Pod stuck in Pending | PVC not bound | Check StorageClass exists, PV available |
| Pod stuck in Pending | Node selector/affinity mismatch | Verify node labels match selectors |
| Pod stuck in ContainerCreating | ConfigMap/Secret not found | Ensure referenced ConfigMaps/Secrets exist |
| Pod stuck in ContainerCreating | Volume mount failure | Check PVC status, CSI driver health |
| Service returns 503 | No ready endpoints | Check pod readiness probes, selector match |
| Service returns 503 | Wrong port configuration | Verify targetPort matches containerPort |
| Ingress returns 404 | Backend service not found | Check Ingress rules, service name/port |
| Ingress returns 404 | Wrong path matching | Check pathType (Prefix vs Exact) |
| HPA not scaling | Metrics server not installed | Install metrics-server |
| HPA not scaling | Resource requests not set | Add CPU/memory requests to containers |
| CronJob not running | Invalid cron expression | Validate cron syntax (5-field format) |
| NetworkPolicy blocking | Default deny applied | Add explicit allow rules for required traffic |
| RBAC permission denied | Missing Role/RoleBinding | Check ServiceAccount bindings and verb permissions |