Terraform State Management on AWS: Imports, State Moves, and Emergency Repairs
Quick summary: Terraform state is the source of truth for your infrastructure. When it breaks, your entire IaC strategy breaks with it. This guide covers state imports, moves, emergency repairs, and the backend best practices that prevent state disasters on AWS.
Key Takeaways
- Terraform state is the source of truth for your infrastructure
- Terraform state is the source of truth for your infrastructure
Table of Contents
Your Terraform state file is the most important artifact in your infrastructure codebase. It’s not stored in git. It’s not backed by snapshots. It’s sitting in an S3 bucket, and if something goes wrong with it, you lose the ability to manage your infrastructure safely.
Most teams don’t realize how fragile Terraform state is until something breaks. That’s when a broken state file becomes an emergency: “How do I import this database? Can I move this resource without destroying it? What do I do if the state lock is stuck?”
This guide walks through the state management operations you’ll need on AWS—and the patterns that keep state disasters from happening in the first place.
Why Terraform State Is the Source of Truth (And What Breaks When It’s Wrong)
Terraform state is a JSON file that maps your resource declarations to actual AWS resources. When you run terraform apply, Terraform reads your code, compares it to the state file, and calculates what needs to change. The state file is the source of truth—not your code, not AWS, but the state file itself.
This is by design. It solves the “who owns this resource?” problem. If Terraform stopped tracking a resource, it would assume the resource needs to be created. If AWS added a property to your RDS database after you created it, Terraform wouldn’t know you didn’t explicitly ask for it.
State becomes dangerous when:
- State and code diverge — Someone manually changed a resource in the AWS console, but the state file doesn’t reflect it
- State is corrupted — The S3 bucket failed, or the state file became unreadable
- State lock is stuck — Someone killed a
terraform applymid-run, and the lock remains, blocking all future operations - You need to consolidate state — You have resources across different state files and need to unify them
- You need to refactor infrastructure — You renamed a resource or moved it to a different module, but Terraform would destroy it if you apply
When state breaks, you have three options:
- Repair it using state surgery
- Import missing resources that Terraform doesn’t know about
- Destroy and rebuild, which is expensive when the resource is a database with data
terraform import: Bringing Existing AWS Resources Under Management
The most common state operation: you have an AWS resource that exists, but Terraform doesn’t know about it. Maybe it was created manually. Maybe it was created by another tool. Maybe it was created by a previous Terraform setup that you’ve lost track of.
terraform import brings existing resources into your state file without destroying them.
When to Use Import
- Onboarding existing infrastructure — You have an S3 bucket created manually; now you want Terraform to manage it
- Recovering from lost state — Your state file was deleted, but the AWS resources still exist
- Consolidating multiple tools — You created something with the AWS console, and now your team wants to manage everything with Terraform
- Adopting Terraform incrementally — You’re migrating team-by-team, and some resources exist outside Terraform
The Import Workflow
- Write the resource block in your Terraform code (without values):
resource "aws_s3_bucket" "example" {
# Leave empty — the import will populate this from AWS
}Get the resource ID from AWS. For an S3 bucket, it’s the bucket name. For an RDS database, it’s the DB instance identifier. For an EC2 instance, it’s the instance ID.
Run the import command:
terraform import aws_s3_bucket.example my-bucket-nameTerraform fetches the resource configuration from AWS and writes it to the state file. Your code block remains a skeleton — you now fill it in based on what Terraform imported.
Run
terraform planto verify the imported resource matches your code. If properties don’t match, update your code.
AWS-Specific Import Gotchas
S3 buckets are resource IDs, not ARNs:
# Correct
terraform import aws_s3_bucket.example my-bucket
# Wrong
terraform import aws_s3_bucket.example arn:aws:s3:::my-bucketRDS databases use instance identifier, not endpoint:
terraform import aws_db_instance.example mydb-instanceSecurity groups use security group ID:
terraform import aws_security_group.example sg-0123456789abcdef0EC2 instances use instance ID:
terraform import aws_instance.example i-0123456789abcdef0Always check the Terraform AWS provider docs for the correct ID format. If you use the wrong ID, import will fail early, which is better than silently importing the wrong resource.
The Import Problem: Importing Creates Drift
Here’s the catch: terraform import brings the current state of the resource into Terraform. But your code doesn’t match it yet. Until you update your code to match the imported state, you have drift.
Example: You import an S3 bucket that has:
- Versioning enabled
- A bucket policy
- A CORS configuration
- Server-side encryption
Terraform imported all of this to the state file. But your code doesn’t declare any of it. When you run terraform plan, Terraform sees a mismatch and tries to remove all of those settings. If you apply, you lose them.
The safe import workflow:
- Import the resource
- Read the AWS console and reverse-engineer the full configuration in code
- Run
terraform planand verify zero changes - Commit to git
This is tedious for complex resources. Many teams use tools like terraformer or terracognita to auto-generate code from existing resources. But the generated code is a starting point—you still need to review it.
terraform state mv: Refactoring Without Destroying Resources
When your infrastructure grows, you refactor. You might move a resource to a different module. You might rename a resource. You might split a monolithic state file into multiple state files.
Without state surgery, refactoring is dangerous. Rename aws_instance.app to aws_instance.web_server, and Terraform sees a new resource declaration and an old one that’s disappearing. It would destroy the instance and create a new one.
terraform state mv renames a resource in the state file without changing AWS.
Common Refactoring Scenarios
Scenario 1: Rename a resource
# Before
resource "aws_s3_bucket" "main" {
bucket = "my-bucket"
}
# After
resource "aws_s3_bucket" "uploads" {
bucket = "my-bucket"
}Move it in state:
terraform state mv aws_s3_bucket.main aws_s3_bucket.uploadsScenario 2: Move a resource to a different module
# Move from the root module to a child module
terraform state mv aws_rds_instance.db module.database.aws_rds_instance.dbScenario 3: Consolidate multiple state files
If you have separate Terraform configurations for different parts of your infrastructure (one for networking, one for databases, one for applications), you might want to consolidate them.
Pull state from source configuration into state of target configuration:
# In the target configuration
terraform state pull > target-state.json
# In the source configuration
terraform state pull > source-state.json
# Manually merge source-state into target-state (carefully)
# Upload back to target
terraform state push target-state.jsonThis is dangerous and rarely needed. In most cases, it’s better to keep separate state files for different concerns (networking, databases, applications) with clear ownership.
Safety First: Always Plan Before Moving
Before moving state, always run terraform plan and ensure you understand what Terraform will do:
terraform state mv aws_instance.web web_server
# Now plan before you commit
terraform planIf terraform plan shows destroy + create instead of zero changes, something went wrong. Revert the state move immediately:
terraform state mv web_server aws_instance.webterraform state rm: Orphaning Resources Without Deletion
Sometimes you need to remove a resource from Terraform management without destroying it in AWS.
Real scenario: You provisioned an RDS database with Terraform, but the database team now manages it directly. You want to stop Terraform from managing it, but you absolutely cannot destroy the database.
terraform state rm removes the resource from state without touching AWS.
terraform state rm aws_db_instance.legacy
terraform plan # Verify zero changes
terraform applyNow Terraform doesn’t know about that database anymore. The database still exists. If someone else manages it (or if it’s manually managed), that’s fine. Terraform won’t try to destroy it on the next apply.
The Danger: State Rm Is Permanent
Once you remove a resource from state, Terraform forgets about it entirely. If you later want to bring it back under Terraform management, you need to use terraform import. If someone accidentally runs terraform state rm without understanding the consequences, you could lose track of critical infrastructure.
Prevent this: State modifications should be code-reviewed, logged, and only performed by senior infrastructure engineers who understand the implications.
Emergency State Repair: Common Failure Scenarios on AWS
State breaks in production. Here are the patterns that actually happen.
Scenario 1: State Lock Is Stuck
You run terraform apply, it fails halfway through, and the lock file remains in DynamoDB. No one can run Terraform until the lock is cleared.
Check the lock:
terraform force-unlock <LOCK_ID>The lock ID is in the error message when you try to run Terraform.
Warning: Only force-unlock if you’re absolutely certain no other Terraform process is running. If you force-unlock while another process is modifying state, you’ll have corruption.
Scenario 2: State File Is Unreadable
Your S3 bucket has versioning enabled, and someone accidentally put a bad file in the state location. Terraform can’t read it.
Recover from S3 version history:
# List versions
aws s3api list-object-versions --bucket my-terraform-state --prefix terraform.tfstate
# Restore from a previous version
aws s3api get-object --bucket my-terraform-state --key terraform.tfstate --version-id <VERSION_ID> terraform.tfstate.backup
# Validate the backup
terraform state listThis is why S3 versioning is mandatory for production state backends.
Scenario 3: You Accidentally Deleted the State File
Your CI/CD pipeline has permission to delete objects in the S3 bucket, and a runaway process deleted terraform.tfstate.
You have a few options:
- Recover from S3 versioning (best case) — S3 keeps old versions; restore the most recent one
- Recover from backups — You’ve configured Terraform to back up state before operations; restore from backup
- Rebuild state from scratch — Use
terraform importto import all resources back into a new state file (tedious but safe)
Scenario 4: State File Corruption
The state file is readable, but it’s malformed JSON. Terraform can’t parse it.
# Backup the current state
terraform state pull > terraform.tfstate.corrupted
# Try to fix it manually (be very careful)
# The state file is JSON; use jq to validate:
jq . terraform.tfstate.corrupted > /dev/null
# If jq fails, JSON is invalid. You'll need to restore from backup.State Backend Best Practices on AWS
Preventing state disasters is easier than recovering from them.
1. Always Use Remote State
Never store state locally on a developer’s machine. Use S3 + DynamoDB backend:
terraform {
backend "s3" {
bucket = "terraform-state-prod"
key = "prod/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-locks"
}
}2. Enable S3 Versioning
With versioning, you can recover from accidental deletions or corruptions:
resource "aws_s3_bucket_versioning" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
versioning_configuration {
status = "Enabled"
}
}3. Enable Server-Side Encryption
Terraform state contains sensitive data (database passwords, API keys, private keys). Always encrypt at rest:
resource "aws_s3_bucket_server_side_encryption_configuration" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "AES256"
}
}
}Use KMS encryption if you need key rotation or cross-account access.
4. Enable DynamoDB State Locking
Without locking, two developers can run terraform apply simultaneously, and both will read the same state, causing a race condition. DynamoDB prevents this:
resource "aws_dynamodb_table" "terraform_locks" {
name = "terraform-locks"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
}5. Block Public Access
Terraform state is private. Block all public access:
resource "aws_s3_bucket_public_access_block" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}6. Implement State Backup
Before any Terraform operation, back up state:
terraform {
backend "s3" {
bucket = "terraform-state-prod"
key = "prod/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-locks"
}
}In your CI/CD pipeline, pull state before operations:
terraform state pull > terraform.tfstate.$(date +%s).backupStore these backups in a separate S3 bucket with a longer retention policy.
When to Use terraform state replace-provider
If you’re migrating from one AWS account to another, or from Terraform Cloud to a local S3 backend, you might need to tell Terraform that a provider has changed.
terraform state replace-provider \
'registry.terraform.io/-/aws' \
'registry.terraform.io/hashicorp/aws'This updates all resources in state to point to the new provider without destroying them. It’s rarely needed, but it exists for provider migration scenarios.
Conclusion: State Management Is Infrastructure Foundation
Terraform state is critical infrastructure. Treat it with the same rigor as your databases and production servers.
- Automate backups. Never rely on manual backups.
- Use remote state with locking. Never manage state locally.
- Plan before you move or modify state. State surgery is powerful and dangerous.
- Document state ownership. Who is responsible for recovering state disasters?
- Test recovery procedures. Have you actually recovered from a state backup? Do it in dev first.
Teams that don’t invest in state management practices usually learn this lesson the hard way: at 2 AM, when a state lock is stuck and production can’t be deployed.
If your team is managing complex AWS infrastructure and struggling with state operations, there’s no shame in getting professional help. At FactualMinds, we help infrastructure teams build safe Terraform workflows that prevent state disasters in the first place. Whether you’re adopting Terraform for the first time or migrating from other IaC tools, we ensure your state strategy is built on solid foundations.
