Apr 13, 2026 // aws

AWS Lambda Cold Starts: What They Are and How to Fix Them

Cold starts add 200ms to 10 seconds of latency to Lambda invocations. Here's what causes them, which ones actually matter, and the practical fixes for each runtime.

Cold starts are the most complained-about Lambda characteristic and the most misunderstood. Teams add Provisioned Concurrency to functions that don’t need it, or avoid Lambda entirely for latency-sensitive workloads, based on cold start fears that don’t match their actual traffic pattern.

Here’s the factual breakdown — what causes cold starts, when they actually matter, and what to do about them.


What a cold start is

When Lambda invokes a function, it needs an execution environment: a container with your runtime, your code, and your dependencies loaded into memory. When no warm environment is available, Lambda creates a new one. That initialization process is the cold start.

A cold start has two components:

Platform init: Lambda allocates the execution environment, downloads your deployment package or container image, and starts the runtime. This is largely out of your control. Durations vary by runtime: ~100-300ms for Node.js and Python, ~500ms-2s for Java with JVM startup, ~1-5s for Java with Spring Boot, ~300ms-1s for .NET.

Function init: Your initialization code outside the handler function runs. Database connections, SDK clients, configuration loading — anything you initialize at the top level of your Lambda file runs on every cold start, not on every invocation.

Total cold start duration: platform init + function init. A Python function with a 10ms handler that opens a new database connection on every cold start might have a 200ms cold start and a 12ms warm invocation.


When cold starts actually matter

Cold starts only affect the first invocation on a new execution environment. Subsequent invocations on the same environment are warm and have no initialization overhead.

They matter for:

  • Customer-facing API endpoints with inconsistent traffic patterns — if your API gets a burst of traffic after a quiet period, the first wave of requests hits cold starts
  • Latency-sensitive operations — payment authorization, real-time bidding, anything with a hard sub-100ms requirement
  • Functions invoked from synchronous sources (API Gateway, ALB, AppSync) where the caller waits for the response

They usually don’t matter for:

  • Background processing — SQS consumers, S3 event processors, scheduled jobs don’t have end-user latency requirements
  • High-traffic functions — functions invoked hundreds of times per second maintain warm environments continuously; cold starts are a negligible fraction of total invocations
  • Batch jobs — cold starts are amortized over the job duration

Before optimizing for cold starts, check your CloudWatch metrics. Look at InitDuration in Lambda logs to see actual cold start frequency and duration. If cold starts represent less than 1% of your invocations and the duration is under 500ms, it’s not worth engineering time.


Runtime choice: the biggest lever

The runtime is the largest determinant of cold start duration for platform init.

Fastest (generally under 200ms platform init):

  • Python 3.x
  • Node.js 20/22
  • Go (compiled binary, no runtime overhead)
  • Rust (via custom runtime)

Moderate (200-800ms):

  • .NET 8 (significantly improved from earlier versions)
  • Ruby

Slowest (1-5s+ without mitigation):

  • Java (JVM startup + class loading)
  • Kotlin (JVM)
  • Scala (JVM)

For Java specifically: Spring Boot is the worst offender. A Spring Boot Lambda with auto-configuration can take 5-10 seconds to cold start. Alternatives:

  • Micronaut or Quarkus with native compilation: reduce to 100-300ms
  • AWS Lambda SnapStart (Java 21+): snapshots the initialized execution environment and restores from it, reducing cold starts to ~500ms regardless of framework
  • GraalVM native image: compiles Java to a native binary, eliminating JVM startup

If you’re building new Lambda functions and latency matters, choose Python or Node.js unless there’s a strong reason for another runtime.


Deployment package size

Lambda downloads your deployment package before executing. Larger packages = longer platform init.

Limits and guidelines:

  • Direct upload: max 50MB (zip)
  • S3 deployment: max 250MB (unzipped)
  • Container images: max 10GB (but image pull adds significant cold start time for large images)

For container image-based Lambdas: Image size matters a lot. A 2GB Docker image can add 10-30 seconds to cold start for the first pull. Subsequent pulls use the cached layers. Reduce image size:

  • Multi-stage builds (strip build tools from the final image)
  • Lambda-specific base images (public.ecr.aws/lambda/python:3.12) — already optimized
  • Avoid bundling test dependencies or documentation

For zip-based Lambdas: Only include what’s needed. For Python, use pip install --no-deps for packages that have no additional dependencies, and use layer-based dependency management to separate infrequently-changing dependencies from your function code (Lambda caches layers independently).


Function init code: what runs on cold start

The code outside your handler function runs on every cold start and is amortized across all invocations on that execution environment.

import boto3
import psycopg2

# This runs on cold start
ssm = boto3.client('ssm')
db_password = ssm.get_parameter(Name='/db/password', WithDecryption=True)['Parameter']['Value']
conn = psycopg2.connect(host='db.example.com', password=db_password)

def handler(event, context):
    # This runs on every invocation
    cursor = conn.cursor()
    cursor.execute("SELECT ...")
    return cursor.fetchall()

This is the correct pattern — open the database connection once at cold start, reuse it across invocations. The alternative (opening a connection in the handler) creates a new connection on every invocation, which is slower and exhausts database connection limits.

What to initialize at module level:

  • SDK clients (boto3, requests sessions)
  • Database connections (with reconnection logic for stale connections)
  • Configuration values from SSM or Secrets Manager
  • ML models loaded from S3

What not to do at module level:

  • Operations that can fail with no retry (if module-level init fails, the function is broken until the execution environment is recycled)
  • Very large operations that significantly increase cold start time for code paths that rarely use them

Memory allocation and cold start

Lambda allocates CPU proportionally to memory. More memory = more CPU = faster initialization.

A function with 128MB of memory has 1/8th the CPU of a function with 1024MB. For CPU-bound initialization (importing heavy Python libraries, JVM startup), increasing memory reduces cold start duration.

This isn’t always intuitive. A Python function with 128MB that takes 800ms to cold start might cold start in 200ms at 1024MB, at a higher per-invocation cost but with better latency. Use AWS Lambda Power Tuning (an open-source Step Functions state machine) to find the optimal memory/cost/latency tradeoff.


Provisioned Concurrency: when to use it

Provisioned Concurrency pre-initializes Lambda execution environments, keeping them warm permanently. Cold starts are eliminated for the provisioned count.

Cost: You pay for provisioned concurrency even when the function isn’t invoked. A function with 10 provisioned concurrency units running all month costs approximately the same as 10 Lambda functions running continuously.

Rough math: 10 provisioned concurrency × 512MB × 720 hours = ~$15-20/month, before any invocation costs.

When Provisioned Concurrency is justified:

  • Customer-facing APIs with strict p99 latency requirements (SLA <500ms including Lambda)
  • Functions with Java/Spring Boot that have 5-10s cold starts — the latency impact is severe enough to justify the cost
  • Functions invoked from human-interactive workflows where a multi-second delay is unacceptable

When it’s not justified:

  • Background processing (SQS, S3, EventBridge triggers)
  • Functions with Python/Node.js and sub-500ms cold starts
  • Low-traffic functions invoked a few times per day

Auto Scaling Provisioned Concurrency: Rather than a fixed count, use Application Auto Scaling with a target tracking policy on ProvisionedConcurrencyUtilization. This scales provisioned concurrency up before traffic peaks (if you schedule it) and down during quiet periods to avoid paying for idle capacity.


Lambda SnapStart (Java 21+)

SnapStart is the most impactful Java cold start improvement since the JVM. When enabled:

  1. Lambda initializes the execution environment (including your @PostConstruct / Spring context startup)
  2. Takes a snapshot of the initialized memory state
  3. For subsequent cold starts, restores from snapshot rather than re-initializing

Cold start time drops from 5-10s (Spring Boot) to ~500-800ms (snapshot restore). The function code sees the same initialized state on every cold start.

Caveats:

  • Available for Java 21 runtime only
  • Snapshot is taken at deployment time — if your init code generates unique IDs (UUIDs, random seeds), those will be the same across all cold starts restored from the same snapshot. Use CRaC lifecycle hooks to re-randomize these after restore.
  • Not compatible with ephemeral storage modifications during init

For Java Lambda workloads, SnapStart is the highest-ROI optimization available.


VPC and cold starts

Lambda functions inside a VPC attach to Hyperplane ENIs, which historically added 10+ seconds to cold starts. This was a major complaint from the community.

AWS fixed this in 2020. VPC Lambda cold starts now add negligible latency (typically under 200ms) due to the pre-provisioned Hyperplane ENI pool. If you’re still avoiding VPC Lambda for cold start reasons, that concern is outdated.

That said: only put Lambda in a VPC when it needs VPC resources (RDS, ElastiCache, internal services). Lambda outside a VPC doesn’t need NAT Gateway for internet access and avoids the $0.045/GB NAT data transfer charge.


Practical checklist

For a Lambda function with cold start complaints:

  1. Check actual cold start frequency and duration — CloudWatch Logs Insights query on @type = "REPORT" and initDuration > 0
  2. Check runtime — switch from Java to Python/Node.js if the runtime is the bottleneck
  3. Check deployment package size — anything over 10MB for a Python function is suspicious
  4. Check memory allocation — increase memory and benchmark with Lambda Power Tuning
  5. Move initialization outside the handler — SDK clients, DB connections, config
  6. Enable SnapStart if using Java 21
  7. Add Provisioned Concurrency only if p99 latency is a hard requirement and the cost is justified by the SLA

When cold starts are a symptom of the wrong architecture

If you’re spending significant engineering effort fighting Lambda cold starts for a latency-sensitive workload, the architecture might be wrong for the use case. Lambda works well for:

  • Event-driven, asynchronous processing
  • Low-to-medium traffic APIs with burst tolerance
  • Background jobs and scheduled tasks

For sub-50ms latency requirements at consistent high traffic, a containerized service (ECS/EKS) with pre-warmed instances is often a better fit than Lambda. The operational cost is higher, but you’re not paying the Lambda cold start tax.


Getting help

Lambda performance optimization sits at the intersection of runtime behavior, memory tuning, and architecture decisions. If you’re working through cold start issues or trying to decide whether Lambda is the right compute tier for a workload, let’s talk.


Nick Allevato is an AWS Certified Solutions Architect Professional with 20 years of infrastructure experience. He runs Cold Smoke Consulting, an independent AWS consulting practice.


← all writing