Lambda is easy to get running. Getting it running well in production — predictable latency, controlled costs, observable failures — is a different problem.
Most Lambda functions in production were written by someone who learned from a tutorial and then shipped. The tutorial covered invocation and basic IAM. It didn’t cover cold starts, dead-letter queues, memory optimization, or structured logging. Those gaps accumulate until something fails in a way that’s hard to debug.
Here’s the practical guide.
Cold starts: understand them before you fight them
A cold start happens when Lambda needs to initialize a new execution environment for your function — download the code package, start the runtime, run initialization code outside the handler. For Node.js and Python, this is typically 100–500ms. For Java and .NET, it can be 1–5 seconds.
Cold starts happen when:
- There’s no warm execution environment available (first invocation, after a period of inactivity)
- Lambda scales out to handle a traffic spike
What actually helps:
Provisioned Concurrency keeps a configurable number of execution environments pre-initialized and warm. You pay for the reserved capacity even when it’s idle, but you guarantee no cold starts for those requests. Use it for latency-sensitive APIs.
Reduce package size. Larger deployment packages take longer to initialize. Audit your dependencies — are you bundling an entire SDK when you only use 3 functions from it? Lambda layers can also reduce per-function package size by sharing common dependencies.
Move initialization outside the handler. SDK clients, database connections, and configuration loading should be at module level (outside the def handler / exports.handler), not inside the handler. Lambda reuses the execution environment across invocations — initialization code outside the handler runs once per environment, not once per invocation.
# Good — client initialized once per execution environment
import boto3
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('my-table')
def handler(event, context):
return table.get_item(Key={'id': event['id']})
# Bad — client re-initialized on every invocation
def handler(event, context):
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('my-table')
return table.get_item(Key={'id': event['id']})
Memory and timeout: tune both
Lambda memory setting (128MB–10,240MB) controls CPU allocation proportionally — more memory means more CPU. This means increasing memory often reduces execution duration, and the cost tradeoff isn’t always obvious.
Lambda cost = duration × memory. A function running at 1,024MB that completes in 100ms costs the same as a function running at 512MB that completes in 200ms. If adding memory reduces duration proportionally, cost is neutral. If it reduces duration more than proportionally, you actually save money by using more memory.
Use AWS Lambda Power Tuning to find the optimal memory setting for your function. It runs your function at multiple memory levels and shows you the cost/performance curve. It consistently finds that the default 128MB is suboptimal for most real workloads.
Timeout: Set it to the realistic maximum runtime for your function, plus a reasonable buffer — not the maximum allowed (15 minutes). A function that should complete in 3 seconds with a 15-minute timeout will sit consuming resources for 15 minutes if it hangs. A 10-second timeout fails fast and surfaces the problem.
Error handling and dead-letter queues
Lambda has two invocation models with different error behavior:
Synchronous invocations (API Gateway, ALB) — the caller gets the error response immediately. Handle errors in your code and return appropriate HTTP status codes.
Asynchronous invocations (S3 events, SNS, EventBridge) — Lambda retries failed invocations twice by default, then discards the event. Without a Dead Letter Queue, failed events disappear silently.
Configure a Dead Letter Queue (SQS or SNS) for every Lambda function triggered asynchronously. When retries are exhausted, the event goes to the DLQ where you can inspect it, alert on it, and reprocess it.
In CloudFormation or Terraform:
DeadLetterConfig:
TargetArn: !GetAtt MyDLQ.Arn
Also set Maximum Retry Attempts and Maximum Event Age to control retry behavior. The defaults (2 retries, 6-hour event age) are often too permissive for time-sensitive workloads.
Observability: structured logging and X-Ray
Structured logging makes Lambda logs queryable in CloudWatch Logs Insights. Instead of print("Processing order 123"), emit JSON:
import json
print(json.dumps({
'level': 'INFO',
'message': 'Processing order',
'order_id': '123',
'customer_id': 'cust-456'
}))
CloudWatch Logs Insights can then query by order_id or customer_id across all invocations.
AWS X-Ray provides distributed tracing — you can see the full request path from API Gateway through Lambda to DynamoDB, with timing for each segment. Enable it on the function (Active tracing) and instrument your SDK calls with the X-Ray SDK.
Both are cheap relative to the debugging time they save. Enable them by default on all production functions.
IAM execution roles: least privilege
Every Lambda function should have its own IAM execution role scoped to exactly what that function needs. Common mistakes:
- One role for all Lambdas — if any function is compromised or buggy, it has permissions across all services your Lambdas use
s3:*on all buckets — function that reads one bucket has write access to all- Hardcoded credentials — use the execution role instead; never put AWS credentials in environment variables or code
The IAM Access Analyzer (mentioned in the IAM post) works well for Lambda roles — it shows you which permissions were actually used and helps you scope down.
Cost: the Lambda bill surprise
Lambda pricing is per GB-second of compute plus per-invocation. Costs are usually low until they’re not.
The most common Lambda cost surprise: a Lambda with a 5-minute timeout processing messages from an SQS queue with a bug that causes every message to fail. The function runs for 5 minutes on each attempt, retries, and the queue depth grows. Costs compound quickly.
Monitor:
ConcurrentExecutions— unexpected spikes indicate runaway invocationsDuration— unexpected increases indicate regressions or new bottlenecksErrors+Throttles— error spikes that feed retry loops
Set CloudWatch alarms on all four. They’re cheap to create and save significant money when something goes wrong.
When to bring in help
Serverless architectures look simple from the outside and accumulate complexity quickly in production. If you have Lambda functions running business-critical workloads without proper observability, DLQ configuration, or tuned memory settings, you’re carrying operational risk that hasn’t surfaced yet.
I do serverless architecture reviews as part of broader AWS engagements, and I can help you find the gaps before they become incidents.
Contact me or email nick@coldsmokeconsulting.com.
Nick Allevato is an AWS Certified Solutions Architect Professional with 20 years of infrastructure experience. He runs Cold Smoke Consulting, an independent AWS consulting practice.