Terraform State Management on AWS: S3, DynamoDB, and Getting It Right

Local Terraform state works fine for a single engineer running one project. The moment a second person runs terraform apply, or you have more than one environment, local state creates a real risk: two people applying simultaneously corrupt the state file, and without locking, Terraform will happily let it happen.

Remote state in S3 with DynamoDB locking is the correct pattern for teams. But the details of how you set it up — encryption, versioning, state file organization, access control — determine whether it works reliably or creates its own class of problems.

The S3 backend configuration

The basic backend configuration:

terraform {
  backend "s3" {
    bucket         = "my-company-terraform-state"
    key            = "prod/app/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-state-lock"
  }
}

This stores state at s3://my-company-terraform-state/prod/app/terraform.tfstate and uses terraform-state-lock DynamoDB table for locking.

The encrypt = true setting enables server-side encryption on the state file. This encrypts with the S3-managed key (SSE-S3) by default. For stronger key management, specify a KMS key:

backend "s3" {
  bucket         = "my-company-terraform-state"
  key            = "prod/app/terraform.tfstate"
  region         = "us-east-1"
  encrypt        = true
  kms_key_id     = "arn:aws:kms:us-east-1:123456789:key/your-key-id"
  dynamodb_table = "terraform-state-lock"
}

With a CMK (customer-managed key), you control key rotation and can audit decrypt operations via CloudTrail. Useful for compliance environments where state file access needs to be audited.

Creating the S3 bucket correctly

The state bucket needs specific properties:

resource "aws_s3_bucket" "terraform_state" {
  bucket = "my-company-terraform-state"
}

resource "aws_s3_bucket_versioning" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id
  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id
  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "AES256"
    }
  }
}

resource "aws_s3_bucket_public_access_block" "terraform_state" {
  bucket                  = aws_s3_bucket.terraform_state.id
  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

Versioning is non-negotiable. When terraform apply goes wrong — and it will, eventually — you need to roll back to a previous state. Without versioning, a corrupted state file is unrecoverable without manually reconstructing it from the actual AWS resource state (tedious and error-prone).

Public access block. State files contain resource IDs, ARNs, and sometimes sensitive values (if outputs include secrets). Never make the state bucket public.

DynamoDB locking table

resource "aws_dynamodb_table" "terraform_lock" {
  name         = "terraform-state-lock"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }
}

Use PAY_PER_REQUEST billing. Lock operations are infrequent — one acquire and one release per terraform apply. Provisioned capacity is wasteful and adds cost for essentially zero activity.

One lock table per region, not per workspace. A single DynamoDB table handles locking for all your state files. The LockID is the S3 path to the state file, so each state file gets its own lock entry without needing separate tables.

State file organization

The key parameter in the backend configuration determines where in the S3 bucket the state file is stored. How you organize keys determines how workspaces scale.

By environment and component (recommended):

my-company-terraform-state/
  prod/
    vpc/terraform.tfstate
    eks/terraform.tfstate
    rds/terraform.tfstate
    app/terraform.tfstate
  staging/
    vpc/terraform.tfstate
    eks/terraform.tfstate
    ...
  dev/
    vpc/terraform.tfstate
    ...

Each component (vpc, eks, rds) is a separate Terraform root module with its own state file. This limits blast radius: running terraform apply on app doesn’t touch the VPC state.

Using S3 key prefixes with Terraform workspaces:

Terraform workspaces automatically prefix the state key: with workspace staging, the key app/terraform.tfstate becomes env:/staging/app/terraform.tfstate. Workspaces work for simple environment separation but have limits: all workspaces share the same backend configuration, which creates problems when environments need different AWS accounts or regions.

For multi-account setups: Each AWS account has its own state bucket in that account. A prod account bucket, a staging account bucket, a dev account bucket. State files never cross account boundaries, and IAM controls who can read/write state per environment.

IAM access control for state

State files contain sensitive information. Access should be restricted to the identities that need it.

Minimum permissions for a Terraform operator:

{
  "Effect": "Allow",
  "Action": [
    "s3:GetObject",
    "s3:PutObject",
    "s3:DeleteObject"
  ],
  "Resource": "arn:aws:s3:::my-company-terraform-state/prod/*"
},
{
  "Effect": "Allow",
  "Action": "s3:ListBucket",
  "Resource": "arn:aws:s3:::my-company-terraform-state",
  "Condition": {
    "StringLike": {
      "s3:prefix": ["prod/*"]
    }
  }
},
{
  "Effect": "Allow",
  "Action": [
    "dynamodb:GetItem",
    "dynamodb:PutItem",
    "dynamodb:DeleteItem"
  ],
  "Resource": "arn:aws:dynamodb:us-east-1:123456789:table/terraform-state-lock"
}

Scope by prefix. A developer working on prod/app doesn’t need access to prod/eks or staging/*. Use S3 key prefix conditions to limit access per-component or per-environment.

Separate read-only access for plan vs. apply. terraform plan reads state but doesn’t write. If you want to allow developers to run plan without apply permission, scope the S3 access to GetObject only for the plan role.

CI/CD pipeline roles. Your pipeline (GitHub Actions, GitLab CI, Jenkins) needs an IAM role with access to state. Use OIDC federation for GitHub Actions to get temporary credentials without long-lived access keys. Never store AWS access keys in CI secrets if OIDC is available.

The bootstrap problem

You need an S3 bucket and DynamoDB table to store Terraform state — but if you use Terraform to create them, where does that Terraform’s state live?

Two valid approaches:

Option 1: Chicken-and-egg with a bootstrap script. Create the state bucket and DynamoDB table once with a minimal Terraform configuration using local state (or manually via the AWS CLI), then migrate to remote state. The bootstrap config is small and the local state file is committed to version control (acceptable for a one-time bootstrap resource that rarely changes).

# Create state bucket and lock table manually, then import or bootstrap
aws s3api create-bucket --bucket my-company-terraform-state --region us-east-1
aws s3api put-bucket-versioning --bucket my-company-terraform-state \
  --versioning-configuration Status=Enabled
aws dynamodb create-table \
  --table-name terraform-state-lock \
  --attribute-definitions AttributeName=LockID,AttributeType=S \
  --key-schema AttributeName=LockID,KeyType=HASH \
  --billing-mode PAY_PER_REQUEST

Option 2: Terragrunt or dedicated bootstrap module. Terragrunt has built-in support for automatically creating the remote state bucket and lock table before the first terragrunt apply. Less manual, more opinionated.

Handling state drift and corruption

State drift happens when infrastructure changes outside Terraform — manually in the console, via another tool, or by a process Terraform doesn’t know about. Terraform’s state no longer matches reality.

Fix with terraform refresh (deprecated) or terraform apply -refresh-only (current): reads actual AWS resource state and updates the state file to match, without making any changes. Then run terraform plan to see what Terraform would do to reconcile.

State file corruption is rare but happens when:

Two simultaneous applies without locking (DynamoDB locking prevents this)
A crash mid-apply leaves state partially updated
Manual editing of the state file goes wrong

Recovery: use S3 versioning to restore the previous state version. In the S3 console, find the state file, view versions, and restore the most recent good version. Then run terraform plan to assess the drift between restored state and actual resources.

Never manually edit the state file for routine operations. Use:

terraform state mv — rename a resource in state (when refactoring)
terraform state rm — remove a resource from state (when you want Terraform to forget about a resource without destroying it)
terraform import — add an existing resource to state (when taking over a manually-created resource)

Sensitive values in state

A critical issue that often surprises teams: Terraform state contains all resource attributes, including sensitive ones. If you create an RDS instance, the master password appears in the state file. If you use a random_password resource, the generated value is in the state file.

Mitigations:

Don’t put secrets in Terraform outputs. They end up in state and may appear in CI logs.
Use Secrets Manager / Parameter Store. Have Terraform create the secret resource (which stores the value in Secrets Manager), then reference the secret ARN in dependent resources. The actual secret value is in Secrets Manager, not in Terraform state.
Encrypt state at rest. The KMS-encrypted S3 backend means the state file is unreadable without KMS decrypt permission. Combined with strict S3 access, this limits who can extract sensitive values.
Audit state file access. S3 access logging + CloudTrail S3 data events captures every GetObject call on the state bucket.

Terragrunt: when to add it

Terragrunt adds a layer on top of Terraform that solves some specific pain points for multi-environment setups:

DRY backend configuration: define the S3 bucket and DynamoDB table once, inherit across all modules
Automatic dependency ordering: run-all apply applies modules in dependency order
Bootstrap: automatically creates the state backend before first apply

Terragrunt is worth adopting when you have 10+ root modules across 3+ environments and the DRY problem is causing real pain. For smaller setups, the overhead of learning Terragrunt isn’t worth it.

What breaks at team scale

The patterns that work fine for one engineer and fall apart at ten:

Shared state files for unrelated resources. A monolithic state file that contains VPC, ECS cluster, RDS, and application resources means a terraform plan on any change requires locking the entire file. Decompose into focused modules.

No environment isolation. All environments in the same state file (using workspaces) means a mistake in one workspace can affect another. Separate state files per environment, ideally in separate AWS accounts.

Access keys in CI. Long-lived access keys in GitHub secrets can be rotated but are vulnerable to exposure. OIDC federation for temporary credentials is more secure and doesn’t require rotation.

Missing DLM on state bucket. Old state file versions accumulate forever in S3. Add a lifecycle rule to expire noncurrent versions after 90 days (or whatever your incident recovery window is). Keep at least 90 days to handle detected-late mistakes.

Getting this right

State management seems like plumbing, but getting it wrong at team scale creates operational incidents. If you’re setting up Terraform for a growing engineering team or migrating from local state, I can help.

Nick Allevato is an AWS Certified Solutions Architect Professional with 20 years of infrastructure experience. He runs Cold Smoke Consulting, an independent AWS consulting practice.