Blenra LogoBlenra
Cloud & DevOps

How to Automate AWS ECS Graceful Shutdowns Using AI Prompts

By Naveen Teja Palle6 min read
AWS ECS Graceful Shutdown

Your ECS deployment is running. You trigger a new release. ECS starts draining the old task and spinning up the new one. But somewhere in that 30-second window, three users get 502 Bad Gateway errors in the middle of checkout flows.

This is the graceful shutdown problem, and it affects virtually every AWS ECS service that handles long-running requests — HTTP endpoints, WebSocket connections, queue consumers, and background jobs alike.

The good news: AWS ECS exposes clean shutdown lifecycle hooks via SIGTERM signals, and the patterns for handling them are well-established. The prompts below generate the exact Node.js, Python, and ECS CDK configuration needed to make zero-downtime deployments truly zero-downtime.

Understanding ECS Task Lifecycle

When ECS needs to stop a task (during a deployment, scale-in, or Spot interruption), it follows a specific sequence:

  1. ECS deregisters the task from the load balancer — traffic stops routing to this task.
  2. ECS sends SIGTERM to PID 1 in the container — your application's signal to start shutting down cleanly.
  3. 30 seconds pass (the default stopTimeout) — during which your app must finish in-flight requests and exit.
  4. If the container is still running after 30s, ECS sends SIGKILL — forceful termination regardless of state.

The critical insight: by default, most Node.js and Python web servers don't handle SIGTERM at all. The process just dies immediately, dropping any in-flight HTTP requests. Your application needs to explicitly catch SIGTERM, stop accepting new requests, finish existing ones, and exit cleanly.

⚠️ The Load Balancer Timing Gap

There's a known race condition: ECS sends SIGTERM at roughly the same time it deregisters the task from the ALB, but the ALB may still be routing traffic for 1–2 seconds as connection draining propagates. Best practice: add a 5-second initial delay after receiving SIGTERM before actually closing the HTTP server.

Prompt 1: Node.js Express — SIGTERM Handler

This prompt generates a production-grade Express.js server with a complete graceful shutdown implementation:

"Act as a Senior Node.js Backend Engineer specializing in AWS ECS deployments. Write a production-grade Express.js server with complete graceful shutdown handling for ECS SIGTERM signals. Requirements: (1) On SIGTERM, add a 5-second initial delay before starting shutdown (to account for ALB connection draining lag). (2) Stop accepting new incoming connections using server.close(). (3) Close all open idle connections immediately using a keep-alive connection tracking Map. (4) Set a 30-second hard timeout fallback that calls process.exit(1) if connections haven't drained by then. (5) Gracefully shut down a database connection pool (e.g., pg pool) and any Redis client. (6) Log each shutdown step with structured JSON logging including timestamps. (7) Also handle SIGINT for local development. Write this as TypeScript with full type safety."

Prompt 2: Python FastAPI — SIGTERM with Lifespan

FastAPI's modern lifespan API (replacing the deprecated on_event handlers) is the correct pattern for managing shutdown in Python ECS services:

"Act as a Senior Python Engineer. Write a FastAPI application using the modern lifespan context manager pattern that gracefully handles ECS SIGTERM shutdown signals. Requirements: (1) Use @asynccontextmanager for the lifespan function (not deprecated on_event). (2) On startup: initialize a PostgreSQL async connection pool (asyncpg) and a Redis client. (3) On SIGTERM: set a global asyncio.Event() flag that causes health check endpoints to return 503 immediately (for ALB deregistration). (4) Add a 5-second sleep after setting the flag before closing connection pools. (5) On shutdown: cleanly close the asyncpg pool and Redis client with proper await calls. (6) Include a /health endpoint that returns 200 normally and 503 during shutdown. (7) Use uvicorn.run() with workers=1 for ECS Fargate single-task mode."

Prompt 3: AWS CDK — ECS Task Definition with Correct stopTimeout

The application code is only half the equation. Your ECS Task Definition must also be configured with the correct stopTimeout and ALB deregistration delay settings:

"Act as an AWS CDK Expert (TypeScript). Write a complete CDK construct class named 'GracefulShutdownService' that creates an ECS Fargate service with proper graceful shutdown configuration. Requirements: (1) Set container stopTimeout to Duration.seconds(60) to give the application enough time to drain. (2) Configure the ALB target group with deregistrationDelay: Duration.seconds(30). (3) Set the ECS health check gracePeriod to Duration.seconds(10). (4) Add the container stop timeout of 60 seconds to the task definition. (5) Configure the service with minimumHealthyPercent: 100 and maximumPercent: 200 for zero-downtime deployments. (6) Add an ALB listener rule with a health check path of '/health'. (7) Export the service URL as a CloudFormation output. Include all imports."

Prompt 4: ECS Fargate Startup Time Optimization

Graceful shutdown is one side of the coin. If your replacement task takes 90 seconds to start, you'll have a capacity gap even with perfect shutdown handling. This prompt generates startup optimization strategies:

"Act as an AWS Fargate Performance Engineer. List and implement the top 5 techniques to reduce ECS Fargate task startup time from a typical 45–60 seconds to under 15 seconds. For each technique, provide: (1) A concrete explanation of why it reduces startup time. (2) The specific CDK TypeScript code or Dockerfile change required to implement it. Techniques to cover: ECR pull-through cache configuration, container image layer optimization (multi-stage Docker builds), task definition CPU/memory right-sizing for faster cold starts, ECS capacity providers with managed scaling, and pre-warming strategies using scheduled scaling before peak traffic windows."

Pro Tips: Real-World ECS Graceful Shutdown

💡 Use Health Check to Drive ALB Deregistration

The fastest way to ensure the ALB stops sending traffic before your app shuts down is to make your /health endpoint return 503 immediately upon receiving SIGTERM. The ALB's health checks will detect this within one check interval (typically 5–30s) and immediately stop routing requests to the task.

⚠️ SQS Consumers Need Special Handling

If your ECS task consumes from SQS, stopping it mid-message-processing will cause the message to become visible again after the visibility timeout expires. Use the ChangeMessageVisibility API to extend visibility timeout on messages currently being processed, and ensure your consumer loop checks the SIGTERM flag before polling for new messages.

Frequently Asked Questions

Q: What is the default ECS task stop timeout and can I change it?

A: The default stopTimeout in ECS is 30 seconds. You can increase this up to 120 seconds for standard launch type tasks and up to 120 seconds for Fargate tasks. Set it in your container definition via the stopTimeout field in CDK or CloudFormation. Always match this to your application's actual shutdown time plus a 10–20 second buffer.

Q: Does ECS Fargate Spot affect graceful shutdown behavior?

A: Yes, significantly. Fargate Spot tasks receive a 2-minute SIGTERM warning before spot interruption, giving you more time than a regular deployment. However, you must handle the Spot interruption notice via the ECS Task Metadata endpoint (available inside the container) to detect when this is a Spot interruption vs. a normal deployment stop.

Q: How do I test that my graceful shutdown is actually working?

A: Use this procedure: (1) Deploy your service. (2) Generate constant HTTP traffic using a tool like k6 or hey. (3) Trigger a redeployment. (4) Monitor your load balancer's HTTP 5xx metrics during the deployment. If graceful shutdown is working correctly, you should see zero 5xx errors during the deployment window. Also check CloudWatch Container Insights for any "ExitCode 137" events (which indicate SIGKILL was used).

NP

Naveen Teja Palle

Cloud & DevOps Engineer · AWS Specialist

DevOps engineer with hands-on production experience managing ECS Fargate clusters, CI/CD pipelines, and zero-downtime deployment strategies on AWS. Writes practical guides for engineers who want to ship without dropping requests.

500+ AWS & DevOps Prompts

From CDK patterns to Lambda optimization — browse our complete Cloud & DevOps prompt library.

Explore Cloud Prompts →