Operations

LVM Safety Checklist

A practical pre-flight and post-change checklist for production storage operations.

Generate Plan In Builder

Pre-Change Identity Validation

Confirm target devices by serial/model, not only by /dev name. Capture host inventory and expected topology before command execution.

State Capture Before Execution

Archive pvs, vgs, lvs -a -o +devices, lsblk -f, mount state, and relevant configuration files to support rollback and post-incident analysis.

Decision Support: Stage-Gate The Change

Pre-change gate: proceed only after identity checks, baseline capture, backup recency confirmation, and peer review are complete.

In-change gate: move step-by-step with explicit checkpoints and abort criteria; do not batch risky commands blindly.

Post-change gate: close change only after storage state, mounts, service behavior, and monitoring all validate as expected.

Execution Discipline

Run generated plans in dry-run mode first. Use peer review for destructive operations. Prefer staged rollout where possible, and document explicit abort criteria before starting.

Post-Change Verification

Validate LVM state, filesystem health, mount behavior, and service-level behavior under expected load. Confirm monitoring and alerts reflect the new topology.

For foundational context see How LVM Works and for parser-based documentation aid see Import Existing LVM Layout.

Example: Production Expansion Change Window

Before extending a critical data LV, the team captures full state output, runs a dry script review, confirms backup recency, and assigns explicit rollback triggers. After change, they verify mounts, IO latency, and alerting before closing the ticket.

Additional Change-Window Scenarios

Thin-pool emergency expansion: operators pause nonessential writes, capture state, extend capacity, and verify both data and metadata headroom before resuming normal traffic.

Filesystem migration maintenance: team validates mount targets, service dependencies, and rollback criteria at each stage before cutover completion.

RAID-layer plus LV change: admins validate array health first, then perform LV operations, and verify both layers before declaring success.

Common Checklist Failures

Skipping baseline capture: without pre-change evidence, post-incident diagnosis and rollback confidence degrade quickly.

Treating dry-run as full validation: dry-run improves review quality but does not validate live device identity or service behavior.

Closing tickets too early: change closure should wait until post-change storage and application checks both pass.

FAQ

Is dry run enough to guarantee safety? No. Dry run helps review command order but does not replace validation, backups, and peer review.

Should I run changes during business hours? Follow your change policy and maintenance windows; storage changes should align with risk and rollback capability.

What should I verify during the change window? Verify command sequence, intermediate state, abort criteria, and communication checkpoints before each next step.

What confirms post-change success? Confirm LVM state, filesystem/mount behavior, service health, and monitoring outputs all match expected outcomes.

What is the most common checklist miss? Routine changes often skip baseline capture and peer review, which weakens incident response when results deviate.

How should I build scripts for review? Generate them on the homepage builder and attach state-capture output to the change request.