ACM Private CA Certificate Rotation Strategy: Manual vs Automated

ACM Private CA Certificate Rotation Strategy: Manual vs Automated#

Certificate management is one of those infrastructure tasks that feels simple until something goes wrong—usually at three in the morning. Unlike AWS Certificate Manager’s public certificates, which AWS handles automatically, ACM Private CA puts you in the driver’s seat. You issue the certificates, you manage their lifecycle, and you’re responsible for rotating them before they expire. The good news is that with proper planning and automation, certificate rotation becomes predictable and reliable. The challenge is understanding when to automate versus when a manual process makes sense, and how to implement either approach safely.

In this article, we’ll walk through the complete picture of certificate rotation for ACM Private CA: from understanding the lifecycle and expiration mechanics, to building automated renewal pipelines, handling applications that require restarts, and monitoring for problems before they become outages.

Understanding ACM Private CA and Its Lifecycle#

Before diving into rotation strategies, it’s worth understanding what makes ACM Private CA different from its public sibling. When you use AWS Certificate Manager for public certificates, AWS automatically handles renewal ninety days before expiration. You don’t think about it, don’t touch it, don’t even know it happened. It’s a managed service in the truest sense.

ACM Private CA, by contrast, is a certificate authority under your control. You use it to issue certificates for internal applications, services, and infrastructure. Because these certificates are private to your organization, AWS can’t automatically renew them—they don’t know when or where you want to deploy the new certificate, or whether your application can handle the rotation. You need to actively manage the renewal process yourself.

The typical lifecycle of a private CA certificate looks like this: you request a certificate from your private CA, receive the signed certificate and chain, deploy it to your application or load balancer, and watch the expiration date on your calendar. Days before expiration, you repeat the process—request a new certificate, distribute it, and ensure the application starts using it. Miss that window, and your certificate expires while still in use, breaking TLS connections and potentially disrupting service.

The expiration duration you choose when creating a certificate affects when you need to rotate it. Common choices are one year, three years, or even longer. A one-year certificate means you rotate annually; a three-year certificate gives you more breathing room but requires longer-term planning. Most organizations rotate on an annual or semi-annual schedule, which provides a good balance between operational overhead and security practice.

Planning Your Rotation Timeline#

The first step in any certificate rotation strategy is deciding when you’ll rotate. This requires some backward planning. If your certificate expires on December 31st and you want to rotate it at least seven days before expiration, you need to complete the new certificate request, validation, issuance, and deployment by December 24th at the latest.

For manual rotations, I recommend picking a date at least thirty days before expiration. Thirty days gives you enough time to handle unexpected issues—a failed deployment, a missed notification, a team member on vacation—without rushing. For automated rotations, you can shorten this window to seven or ten days, since there’s less room for human error.

Document your rotation schedule explicitly. Create a calendar entry, add it to your runbooks, and preferably automate a reminder in your alerting system. Here’s what I typically see work well: set up a CloudWatch alarm that triggers when a certificate has fewer than forty-five days until expiration. This gives you a clear, measurable signal that it’s time to act, and it’s early enough to be safe.

The rotation date should account for your deployment process. If deploying to production requires change advisory board approval, or if you have a release window on a specific day, plan your rotation to fit within those constraints. There’s nothing worse than having a shiny new certificate ready to deploy but being blocked by process.

Monitoring Certificate Expiry with CloudWatch#

Before you build a rotation process, you need visibility into what’s expiring and when. CloudWatch is your friend here, but ACM Private CA doesn’t automatically emit metrics about certificate expiration. You need to create that visibility yourself.

The most practical approach is to periodically query your private CA for issued certificates and extract their expiration dates. You can do this with a Lambda function running on a schedule—say, daily or weekly—that calls the ACM API to describe your issued certificates, extracts expiration dates, and pushes custom metrics to CloudWatch.

Here’s what that Lambda function might look like:

import boto3
import json
from datetime import datetime

acm_client = boto3.client('acm')
cloudwatch = boto3.client('cloudwatch')

def lambda_handler(event, context):
    # List all certificates in your private CA
    paginator = acm_client.get_paginator('list_certificates')
    
    for page in paginator.paginate(
        Filters=[
            {
                'Name': 'keyUsage',
                'Values': ['DIGITAL_SIGNATURE', 'KEY_ENCIPHERMENT']
            }
        ]
    ):
        for cert_arn in page.get('CertificateArns', []):
            # Describe each certificate to get expiration
            response = acm_client.describe_certificate(CertificateArn=cert_arn)
            cert = response['Certificate']
            
            # Calculate days until expiration
            expiration = cert['NotAfter']
            now = datetime.now(expiration.tzinfo)
            days_until_expiry = (expiration - now).days
            
            # Push custom metric to CloudWatch
            cloudwatch.put_metric_data(
                Namespace='PrivateCA',
                MetricData=[
                    {
                        'MetricName': 'DaysUntilExpiration',
                        'Value': days_until_expiry,
                        'Unit': 'Count',
                        'Dimensions': [
                            {
                                'Name': 'CertificateArn',
                                'Value': cert_arn
                            },
                            {
                                'Name': 'DomainName',
                                'Value': cert.get('DomainName', 'Unknown')
                            }
                        ]
                    }
                ]
            )
    
    return {
        'statusCode': 200,
        'body': 'Metrics published successfully'
    }

This function can be triggered by EventBridge on a daily schedule. Once the metrics are in CloudWatch, you can create alarms for any certificate with fewer than forty-five days until expiration, or create a dashboard that gives you a bird’s-eye view of your certificate inventory and upcoming expirations.

You can also publish these metrics to SNS for email notifications, or send them to your existing monitoring system via an integration. The key is making expiration dates visible and actionable before they become emergencies.

The Case for Manual Rotation#

Manual certificate rotation makes sense in several scenarios, and it’s worth understanding when to choose this approach despite the appeal of automation.

First, consider organizational scale. If you’re managing fewer than ten private certificates, manual rotation is perfectly reasonable. The overhead is minimal—request a certificate quarterly or annually, validate it’s correct, deploy it. It’s straightforward enough that you don’t gain much from automation complexity.

Second, think about risk tolerance. Some applications or environments are sensitive enough that you want human eyes on the certificate before it goes live. Maybe you operate in a highly regulated environment, or maybe you’re protecting critical infrastructure. In these cases, a manual process with clear approval steps provides desirable governance.

Third, rotation frequency matters. If you’re rotating every three years due to long certificate validity periods, automation might be overkill. You probably won’t forget a manual rotation that happens so infrequently.

For a manual process, create a runbook that someone on your team can follow reliably. Here’s what a solid runbook looks like:

Certificate Rotation Runbook

Check the current certificate. Use the AWS Management Console or CLI to verify the certificate ARN, domain name, and exact expiration date. Document this in a ticket or notes.
Request a new certificate from your private CA. Use the AWS Console or CLI, specifying the same domain name(s), validity period, and any special extensions. Request a two-year validity if the original was one year—this staggers future rotations.
Wait for issuance. ACM Private CA typically issues certificates within minutes, but verify the certificate appears in your console and download both the certificate and certificate chain.
Validate the certificate. Check the domain name, serial number, and key usage extensions. Run openssl x509 to inspect it locally if needed. Verify it chains correctly to your private CA root certificate.
Deploy to a test environment first. If you have a non-production environment, deploy the new certificate there and verify that TLS handshakes succeed. Test application functionality briefly.
Schedule production deployment during a maintenance window or low-traffic period. Notify your team that a certificate rotation is happening.
Deploy to production. The exact steps depend on your infrastructure. For an Application Load Balancer, it’s updating the listener’s certificate. For a web server, it’s replacing the certificate file and restarting the service (if necessary). Document the exact steps in your runbook.
Verify the new certificate is active. Use openssl s_client or a tool like testssl.sh to confirm the certificate is being served. Check application logs for any TLS errors.
Archive or retire the old certificate. Note the expiration date of the old certificate, monitor for any stragglers still using it, and remove it from ACM after confirming all workloads have migrated.
Update your calendar and alert for the next rotation. Set a reminder for about forty-five days before the new certificate expires.

The advantage of this process is its simplicity and built-in safety checks. The disadvantage is that it’s repetitive, error-prone at scale, and requires someone to actually remember to do it.

Automating Certificate Rotation with Lambda and CodePipeline#

When you’re managing dozens of certificates, or when your certificates rotate more frequently, automation becomes compelling. AWS Lambda and CodePipeline are natural tools for orchestrating certificate rotation workflows.

A Lambda-based approach works like this: on a schedule (daily or weekly), a Lambda function checks each certificate for impending expiration. When it finds one due for renewal, it automatically requests a new certificate from your private CA, validates it’s been issued successfully, and triggers downstream actions to deploy it.

Here’s a foundational Lambda function for automated certificate request:

import boto3
import json
from datetime import datetime, timedelta

acm_client = boto3.client('acm')
ssm_client = boto3.client('ssm')

def lambda_handler(event, context):
    # Configuration - store this in Parameter Store for easier management
    ca_arn = ssm_client.get_parameter(Name='/private-ca/arn')['Parameter']['Value']
    rotation_threshold_days = 45
    
    # List all certificates in the private CA
    paginator = acm_client.get_paginator('list_certificates')
    
    certificates_to_rotate = []
    
    for page in paginator.paginate():
        for cert_arn in page.get('CertificateArns', []):
            response = acm_client.describe_certificate(CertificateArn=cert_arn)
            cert = response['Certificate']
            
            # Skip if not from private CA
            if not cert.get('CertificateAuthorityArn'):
                continue
            
            # Calculate days until expiration
            expiration = cert['NotAfter']
            now = datetime.now(expiration.tzinfo)
            days_until_expiry = (expiration - now).days
            
            # If within threshold, prepare for rotation
            if days_until_expiry <= rotation_threshold_days:
                certificates_to_rotate.append({
                    'arn': cert_arn,
                    'domain': cert.get('DomainName'),
                    'sans': cert.get('SubjectAlternativeNames', []),
                    'days_remaining': days_until_expiry,
                    'expires': expiration.isoformat()
                })
    
    # If there are certificates to rotate, request new ones
    if certificates_to_rotate:
        for cert_info in certificates_to_rotate:
            try:
                # Request new certificate with 1-year validity
                new_cert = acm_client.request_certificate(
                    DomainName=cert_info['domain'],
                    SubjectAlternativeNames=cert_info['sans'],
                    CertificateAuthorityArn=ca_arn,
                    ValidityPeriod={
                        'Value': 1,
                        'Type': 'YEARS'
                    }
                )
                
                # Store the new certificate ARN in Parameter Store for later retrieval
                ssm_client.put_parameter(
                    Name=f"/certificates/pending-rotation/{cert_info['domain']}",
                    Value=new_cert['CertificateArn'],
                    Type='String',
                    Overwrite=True
                )
                
                # Publish event for downstream processing
                print(f"Requested new certificate for {cert_info['domain']}")
                
            except Exception as e:
                print(f"Error requesting certificate for {cert_info['domain']}: {str(e)}")
    
    return {
        'statusCode': 200,
        'rotated_count': len(certificates_to_rotate),
        'certificates': certificates_to_rotate
    }

This function runs on a schedule (EventBridge rule set to run daily), identifies certificates within the rotation window, and requests new ones. The new certificate ARN is stored in Parameter Store for the next stage of the pipeline to consume.

The challenge with full automation is handling the downstream deployment. You can’t just request a new certificate and hope it gets deployed—you need to orchestrate the deployment based on where each certificate is used. This is where AWS CodePipeline becomes valuable.

A CodePipeline-based rotation workflow might look like this:

Source Stage: A Lambda function or EventBridge rule triggers the pipeline when certificates are pending rotation. The pipeline pulls the certificate ARN and metadata from Parameter Store.
Validation Stage: Another Lambda function retrieves the issued certificate, verifies it’s valid, checks that it chains correctly to your CA, and ensures the domain name and extensions match what you expected.
Approval Stage: Optionally, add a manual approval step before production deployment. This is especially valuable if you want to keep some human oversight despite automation.
Deploy Stage: Depending on your infrastructure, this could invoke multiple Lambda functions to deploy the certificate to different targets—Application Load Balancers, EC2 instances, API Gateway, etc.
Verification Stage: A final Lambda function performs smoke tests—checking that TLS handshakes succeed, that your application is still healthy, and that no errors appear in logs.

Here’s what a deployment Lambda for ALB certificates might look like:

import boto3
import json

elbv2_client = boto3.client('elbv2')
acm_client = boto3.client('acm')
ssm_client = boto3.client('ssm')

def lambda_handler(event, context):
    # Get the new certificate ARN from CodePipeline artifacts or Parameter Store
    domain = event.get('domain')
    new_cert_arn = ssm_client.get_parameter(
        Name=f"/certificates/pending-rotation/{domain}"
    )['Parameter']['Value']
    
    # Get the target load balancer information
    target_albs = ssm_client.get_parameter(
        Name=f"/certificates/albs/{domain}"
    )['Parameter']['Value']
    
    # For each ALB, update the HTTPS listener with the new certificate
    alb_arns = json.loads(target_albs)
    
    for alb_arn in alb_arns:
        try:
            # Get ALB details
            response = elbv2_client.describe_load_balancers(
                LoadBalancerArns=[alb_arn]
            )
            alb = response['LoadBalancers'][0]
            
            # Find HTTPS listeners
            listeners = elbv2_client.describe_listeners(
                LoadBalancerArn=alb_arn
            )
            
            for listener in listeners['Listeners']:
                if listener['Protocol'] == 'HTTPS':
                    # Update listener with new certificate
                    elbv2_client.modify_listener(
                        ListenerArn=listener['ListenerArn'],
                        DefaultActions=listener['DefaultActions'],
                        Certificates=[{'CertificateArn': new_cert_arn}]
                    )
                    print(f"Updated listener on {alb_arn} with certificate {new_cert_arn}")
            
            # Mark rotation as complete
            ssm_client.put_parameter(
                Name=f"/certificates/last-rotated/{domain}",
                Value=datetime.now().isoformat(),
                Type='String',
                Overwrite=True
            )
            
        except Exception as e:
            print(f"Error updating ALB {alb_arn}: {str(e)}")
            raise
    
    return {
        'statusCode': 200,
        'message': f'Successfully rotated certificate for {domain}',
        'new_certificate_arn': new_cert_arn
    }

This approach automates the entire rotation process once configured. The key is storing your infrastructure metadata (which ALBs use which certificates, which EC2 instances, which API Gateway deployments) in Parameter Store or DynamoDB so that Lambda can find the right targets and update them.

Integrating with Configuration Management Tools#

If you’re already using configuration management tools like Ansible, Chef, or Puppet, you can integrate certificate rotation into your existing infrastructure-as-code workflow.

For Ansible-based infrastructure, you might create a playbook that pulls certificates from ACM, verifies they’re issued, and deploys them:

---
- name: Rotate Private CA Certificates
  hosts: webservers
  vars:
    ca_arn: "arn:aws:acm-pca:us-east-1:ACCOUNT:certificate-authority/CAuuid"
    rotation_threshold_days: 45

  tasks:
    - name: Query ACM for certificates needing rotation
      amazon.aws.aws_acm_facts:
        ca_arn: "{{ ca_arn }}"
      register: certificate_facts

    - name: Check expiration dates
      debug:
        msg: "Certificate {{ item.DomainName }} expires in {{ (item.NotAfter - now).days }} days"
      loop: "{{ certificate_facts.certificates }}"
      when: (item.NotAfter - now).days <= rotation_threshold_days

    - name: Request new certificate
      amazon.aws.acm:
        name: "{{ inventory_hostname }}"
        domain_name: "{{ ansible_fqdn }}"
        ca_arn: "{{ ca_arn }}"
        validity: 1y
      register: new_cert

    - name: Wait for certificate issuance
      amazon.aws.acm_facts:
        arn: "{{ new_cert.arn }}"
      retries: 30
      delay: 10
      until: certificate_facts.certificate.status == 'ISSUED'

    - name: Retrieve certificate and chain
      amazon.aws.aws_s3:
        bucket: "certificate-store"
        object: "{{ new_cert.arn | basename }}.crt"
        dest: "/etc/ssl/certs/{{ inventory_hostname }}.crt"

    - name: Restart web server
      systemd:
        name: nginx
        state: restarted

This playbook can be triggered via Ansible Tower or AWX on a schedule, or manually when you’re ready to rotate. It handles certificate retrieval, stores them on the server, and restarts the web service.

The advantage of this approach is that it fits naturally into existing infrastructure workflows and can be version-controlled alongside your other configuration. The downside is that it requires Ansible (or whatever tool you’re using) to have direct access to your infrastructure, and certificate distribution happens outside of AWS services.

Handling Applications That Require Restarts#

Many traditional applications—Apache, Nginx, Java applications with embedded keystores—can’t dynamically reload certificates without restarting. This poses a challenge for zero-downtime rotation because the moment you replace the certificate file, running processes are still using the old one until they restart.

The solution is to plan your rotation during a maintenance window, or to use a blue-green deployment strategy where you gradually shift traffic from old instances to new ones, giving you time to rotate certificates without downtime.

Here’s a practical blue-green pattern for applications that require restart:

Create a new Auto Scaling Group (the “green” environment) with the same configuration as your current one (the “blue” environment).
Attach new EC2 instances from the green ASG to a target group, but don’t route traffic to them yet.
Trigger a Lambda function that requests a new certificate and deploys it only to the green instances. These instances start with the new certificate already in place.
Wait for health checks to pass on the green instances.
Gradually shift traffic from blue to green over a period of minutes (using weighted target group routing or ALB listener rules).
Once all traffic is on green, terminate the blue ASG.
Rename green to blue for the next cycle.

The beauty of this approach is that blue instances keep using the old certificate until they’re terminated, while green instances use the new one from the start. No restart, no certificate files being swapped out from under running processes, and users experience no interruption.

Automating this with Lambda and CodePipeline is feasible but complex. It requires managing ASG lifecycle, health checks, traffic shifting, and rollback logic. For most organizations, running this blue-green rotation quarterly or semi-annually as a planned maintenance event is more practical than trying to fully automate it.

Here’s a simplified Lambda that orchestrates the blue-green rotation:

import boto3
import time

asg_client = boto3.client('autoscaling')
elbv2_client = boto3.client('elbv2')
lambda_client = boto3.client('lambda')

def lambda_handler(event, context):
    blue_asg_name = event['blue_asg_name']
    green_asg_name = event['green_asg_name']
    target_group_arn = event['target_group_arn']
    new_cert_arn = event['new_cert_arn']
    
    # Step 1: Launch green instances
    print("Launching green instances...")
    asg_client.set_desired_capacity(
        AutoScalingGroupName=green_asg_name,
        DesiredCapacity=3
    )
    
    # Wait for instances to be in service
    time.sleep(60)
    
    # Step 2: Deploy certificate to green instances
    print("Deploying certificate to green instances...")
    response = asg_client.describe_auto_scaling_groups(
        AutoScalingGroupNames=[green_asg_name]
    )
    green_instances = [i['InstanceId'] for i in response['AutoScalingGroups'][0]['Instances']]
    
    # Invoke Lambda on each instance (via SSM or direct invocation)
    for instance_id in green_instances:
        # In practice, you'd use SSM Session Manager or EC2 Instance Connect
        # to execute deployment commands on the instance
        print(f"Deploying certificate to {instance_id}")
    
    # Step 3: Wait for health checks to pass
    print("Waiting for health checks...")
    time.sleep(30)
    
    # Step 4: Shift traffic gradually
    print("Shifting traffic to green...")
    # Update target group to include green instances
    target_group_response = elbv2_client.describe_target_groups(
        TargetGroupArns=[target_group_arn]
    )
    target_group = target_group_response['TargetGroups'][0]
    
    # Get current targets (blue instances)
    current_targets = elbv2_client.describe_target_health(
        TargetGroupArn=target_group_arn
    )
    
    # Register green instances with reduced weight (for weighted routing)
    for instance_id in green_instances:
        elbv2_client.register_targets(
            TargetGroupArn=target_group_arn,
            Targets=[{'Id': instance_id, 'Port': 443}]
        )
    
    # Gradually increase weight on green, decrease on blue
    for weight in range(0, 101, 10):
        print(f"Shifting to {weight}% green...")
        # Update listener rules with weighted routing
        time.sleep(30)
    
    # Step 5: Deregister blue instances and scale down
    print("Scaling down blue instances...")
    asg_client.set_desired_capacity(
        AutoScalingGroupName=blue_asg_name,
        DesiredCapacity=0
    )
    
    return {
        'statusCode': 200,
        'message': 'Blue-green rotation complete'
    }

This function orchestrates the entire blue-green process, but in practice you’d want to add error handling, health check validation, and automatic rollback if green instances fail to come up or fail health checks.

Operational Considerations and Best Practices#

Regardless of whether you choose manual or automated rotation, a few operational practices will keep you out of trouble.

Certificate Naming and Organization: Use consistent naming conventions that include the domain, purpose, and validity period. For example, api-prod-2024-2025 clearly indicates this is for the API’s production environment and covers the 2024-2025 validity period. Store metadata about each certificate—what it’s used for, which systems depend on it, when it was issued—in Parameter Store, DynamoDB, or your change management system. This metadata becomes invaluable when troubleshooting or planning rotations.

Certificate Validation: Before deploying a new certificate to production, always validate it locally. Run openssl x509 -in certificate.crt -text -noout to inspect the certificate details. Verify the domain name matches, the key usage extensions are correct, and the certificate chains to your private CA root. A simple validation mistake—deploying a certificate for the wrong domain, for instance—can cause service disruptions.

Staging and Testing: Always test certificate rotation in a non-production environment first. If you’re automating rotation, deploy to a dev or staging environment, verify that applications work correctly, and then promote to production. This catches configuration errors, permission issues, or application-specific quirks before they affect users.

Monitoring and Alerting: Beyond the CloudWatch metrics discussed earlier, monitor your applications for certificate-related errors after each rotation. Look for TLS handshake failures, hostname verification errors, or certificate chain validation issues in your application logs. Set up alerts for these patterns so you catch problems quickly.

Documentation and Runbooks: Even if you’re automating rotation, maintain clear documentation of your process. Document where certificates are stored, which systems use them, the rotation schedule, and the steps to manually intervene if automation fails. This documentation is essential for on-call engineers who need to respond to certificate issues.

Certificate Pinning and HPKP: Be careful if your applications or clients use certificate pinning or HTTP Public Key Pinning. These mechanisms “pin” to a specific certificate or public key, which breaks immediately when you rotate to a new certificate. If you use pinning, ensure your rotation process accounts for it—you might need to deploy new certificates weeks before rotating the old ones, or use pinning strategies that pin to a CA certificate rather than end-entity certificates.

Backup and Disaster Recovery: Store copies of your issued certificates and private keys securely. If your private CA is unavailable or compromised, you’ll need access to existing certificates to keep services running while you provision replacements. Use AWS Secrets Manager or a secure backup system to store certificate backups.

Choosing Between Manual and Automated Rotation#

So when should you automate, and when should you stick with manual processes?

Choose manual rotation if you’re managing fewer than five certificates, if you rotate infrequently (annually or less), if you’re in a high-security environment where human review is mandatory, or if your organization is just starting to use ACM Private CA and you want to understand the process deeply before automating.

Choose automated rotation if you’re managing more than ten certificates, if you rotate frequently (quarterly or more), if your application infrastructure is already highly automated and deployed via CodePipeline or similar, or if human intervention at rotation time creates a bottleneck.

In practice, many organizations use a hybrid approach: they automate the certificate request and issuance steps (which are straightforward and have minimal risk), but add a manual approval gate before deployment to production (which provides governance and a chance to catch issues). This gives you the operational benefits of automation without sacrificing safety.

The key insight is that automation isn’t an all-or-nothing decision. You can automate parts of the process—monitoring, requests, validation—while leaving deployment manual. You can automate deployment to non-critical environments while keeping production manual. Start simple, measure your pain points, and automate the parts that cause the most friction.

Conclusion#

Certificate rotation is not exciting infrastructure work, but it’s essential. ACM Private CA gives you control over your internal certificate infrastructure, which is powerful and necessary for many organizations, but that control comes with responsibility. The approach you choose—whether manual, semi-automated, or fully automated—should reflect your organization’s scale, risk tolerance, and operational maturity.

The key takeaways: monitor expiration dates proactively with CloudWatch, plan rotations at least forty-five days before expiry, validate new certificates before deploying, and use either automated Lambda-based workflows or clear manual runbooks depending on your scale. Whether you’re implementing a simple quarterly manual process or a sophisticated pipeline that rotates certificates across dozens of systems, the foundational principle is the same—give yourself time, test in non-production first, and keep monitoring after rotation to catch issues immediately.

As your infrastructure evolves and your certificate inventory grows, revisiting your rotation strategy to add automation becomes increasingly valuable. Start with the simplest approach that works for your current situation, and upgrade as your needs demand.