Rodrigo Ascenção

DevOps Engineer

How to Scale Instances on ECS Cluster

Have you ever been in a situation where your workloads were consuming too much of your EC2 resources, and you realized it was time to scale vertically? I’ve been there, and now I want to teach you the step-by-step process behind it.

When running workloads on Amazon ECS using EC2 launch type, the instances inside the cluster are managed by a Capacity Provider and an Auto Scaling Group.

Because of that, you should never modify the EC2 instances directly. Any infrastructure change must happen through the Launch Template and Auto Scaling Group configuration, so the environment remains reproducible and managed properly.


1 - Update the Launch Template

The first step is updating the Launch Template used by the ECS Auto Scaling Group.

This is where you define the EC2 instance configuration, including:

For vertical scaling, the most common change is updating the instance size, for example:

t3.medium -> t3.large

Create a new Launch Template version with the updated configuration instead of editing existing instances manually.

This ensures:


2 - Update the Auto Scaling Group

After creating the new Launch Template version, update the ECS Auto Scaling Group to use it.

Navigate to:

EC2 -> Auto Scaling Groups -> Your ECS ASG

Then update:

Launch Template Version -> Latest / New Version

At this point, nothing changes immediately in the running environment yet.
The ASG is simply configured to start using the new instance configuration for future EC2 launches.


3 - Roll Out the New Instances

This is the official rollout step where the ECS cluster gradually replaces the old instances with the new ones.

Inside the Auto Scaling Group:

Instance Refresh -> Start Instance Refresh

Recommended configuration:

Minimum Healthy Percentage = 100%

This is extremely important for production environments.

With Min Healthy = 100%, the ASG will never reduce the current healthy capacity during the rollout process.

Example:

This process repeats until all old instances are replaced.


How ECS Handles the Migration

As the new EC2 instances join the cluster:

  1. ECS registers the new container instances
  2. ECS scheduler starts placing tasks on them
  3. Old instances enter DRAINING state
  4. Running tasks are gracefully rescheduled
  5. Old instances are terminated only after workloads are safely migrated

This approach provides:


Important Notes

Never Resize Instances Manually

Even if you stop/start or resize an EC2 instance manually, the Auto Scaling Group may eventually replace it.

The ASG is the source of truth.

Always update:

Never the individual instance itself.


Recommended Production Practices

Use Incremental Scaling

Avoid jumping from very small to very large instances immediately.

Example:

t3.medium -> t3.large -> t3.xlarge

This makes rollback and troubleshooting easier.


Monitor ECS Capacity During Rollout

Keep an eye on:

Especially during production rollouts.


Validate Instance Registration

Before draining old instances, confirm the new ones are:


Final Thoughts

Scaling ECS EC2 instances properly is more about managing infrastructure lifecycle than just changing instance sizes.

The safest and cleanest workflow is always:

Launch Template -> Auto Scaling Group -> Instance Refresh

This keeps the environment predictable, reproducible, and production-safe while allowing ECS to gracefully migrate workloads between instances.