Credit: locked

1. Introduction

Ever locked yourself out of your own S3 bucket? That’s like asking a golfer if he’s ever landed in a bunker. We’ve all been there.

Scenario:

A sudden power outage knocks out your internet. When service resumes, your ISP has assigned you a new IP address. Suddenly, the S3 bucket you so carefully protected with that fancy bucket policy that restricts access by IP… is protecting itself from you. Nice work.

And here’s the kicker, you can’t change the policy because…you can’t access the bucket! Time to panic? Read on…

This post will cover:

  • Why this happens
  • How to recover
  • How to prevent it next time with a “safe room” approach to bucket policies

2. The Problem: Locking Yourself Out

S3 bucket policies are powerful and absolute. A common security pattern is to restrict access to a trusted IP range, often your home or office IP. That’s fine, but what happens when those IPs change without prior notice?

That’s the power outage scenario in a nutshell.

Suddenly (and without warning), I couldn’t access my own bucket. Worse, there was no easy way back in because the bucket policy itself was blocking my attempts to update it. Whether you go to the console or drop to a command line, you’re still hitting that same brick wall—your IP isn’t in the allow list.

At that point, you have two options, neither of which you want to rely on in a pinch:

  1. Use the AWS root account to override the policy.
  2. Open a support ticket with AWS and wait.

The root account is a last resort (as it should be), and AWS support can take time you don’t have.


3. The Safe Room Approach

Once you regain access to the bucket again, it’s time to build a policy that includes an emergency backdoor from a trusted environment. We’ll call that the “safe room”. Your safe room is your AWS VPC.

While your home IP might change with the weather, your VPC is rock solid. If you allow access from within your VPC, you always have a way to manage your bucket policy.

Even if you rarely touch an EC2 instance, having that backdoor in your pocket can be the difference between a quick fix and a day-long support ticket.


4. The Recovery & Prevention Script

A script to implement our safe room approach must at least:

  • Allow S3 bucket listing from your home IP and your VPC.
  • Grant bucket policy update permissions from your VPC.
  • Block all other access.

Options & Nice-To-Haves

  • Automatically detect the VPC ID (from the instance metadata).
    • …because you don’t want to fumble for it in an emergency
  • Accept your home IP as input.
    • …because it’s likely changed and you need to specify it
  • Support AWS CLI profiles.
    • …because you should test this stuff in a sandbox
  • Include a dry-run mode to preview the policy.
    • …because policies are dangerous to test live

This script helps you recover from lockouts and prevents future ones by ensuring your VPC is always a reliable access point.


5. Using the Script

Our script is light on dependencies but you will need to have curl and the aws script installed on your EC2.

A typical use of the command requires only your new IP address and the bucket name. The aws CLI will try credentials from the environment, your ~/.aws config, or an instance profile - so you only need -p if you want to specify a different profile. Here’s the minimum you’d need to run the command if you are executing the script in your VPC:

./s3-bucket-unlock.sh -i <your-home-ip> -b <bucket-name>

Options:

  • -i Your current public IP address (e.g., your home IP).
  • -b The S3 bucket name.
  • -v (Optional) VPC ID; auto-detected if not provided.
  • -p (Optional) AWS CLI profile (defaults to $AWS_PROFILE or default).
  • -n Dry run (show policy, do not apply).

Example with dry run:

./s3-bucket-unlock.sh -i 203.0.113.25 -b my-bucket -n

The dry run option lets you preview the generated policy before making any changes—a good habit when working with S3 policies.


6. Lessons Learned

Someone once said that we learn more from our failures than from our successes. At this rate I should be on the AWS support team soon…lol. Well, I probably need a lot more mistakes under my belt before they hand me a badge. In any event, ahem, we learned something from our power outage. Stuff happens - best be prepared. Here’s what this experience reinforced:

  • IP-based policies are brittle.
    • Your home IP will change. Assume it.
  • We should combine IP AND VPC-based controls.
    • VPC access is more stable and gives you a predictable backdoor. VPC access is often overlooked when setting up non-production projects.
  • Automation saves future you under pressure.
    • This script is simple, but it turns a frustrating lockout into a 60-second fix.
  • Root accounts are a last resort, but make sure you have your password ready!
    • Avoid the need to escalate by designing resilient access patterns upfront.

Sometimes it’s not a mistake - it’s a failure to realize how fragile access is. My home IP was fine…until it wasn’t.


7. Final Thoughts

Our script will help us apply a quick fix. The process of writing it was a reminder that security balances restrictions with practical escape hatches.

Next time you set an IP-based bucket policy, ask yourself:

  • What happens when my IP changes?
  • Can I still get in without root or AWS support?

Disclaimer

Thanks to ChatGPT for being an invaluable backseat driver on this journey. Real AWS battle scars + AI assistance = better results.

Hosting a Secure Static Website with S3 and CloudFront: Part IIb

Introduction

In Part IIa, we detailed the challenges we faced when automating the deployment of a secure static website using S3, CloudFront, and WAF. Service interdependencies, eventual consistency, error handling, and AWS API complexity all presented hurdles. This post details the actual implementation journey.

We didn’t start with a fully fleshed-out solution that just worked. We had to “lather, rinse and repeat”. In the end, we built a resilient automation script robust enough to deploy secure, private websites across any organization.

The first take away - the importance of logging and visibility. While logging wasn’t the first thing we actually tackled, it was what eventually turned a mediocre automation script into something worth publishing.


1. Laying the Foundation: Output, Errors, and Visibility

1.1. run_command()

While automating the process of creating this infrastructure, we need to feed the output of one or more commands into the pipeline. The output of one command feeds another. But each step of course can fail. We need to both capture the output for input to later steps and capture errors to help debug the process. Automation without visibility is like trying to discern the elephant by looking at the shadows on the cave wall. Without a robust solution for capturing output and errors we experienced:

  • Silent failures
  • Duplicated output
  • Uncertainty about what actually executed

When AWS CLI calls failed, we found ourselves staring at the terminal trying to reconstruct what went wrong. Debugging was guesswork.

The solution was our first major building block: run_command().

    echo "Running: $*" >&2
    echo "Running: $*" >>"$LOG_FILE"

    # Create a temp file to capture stdout
    local stdout_tmp
    stdout_tmp=$(mktemp)

    # Detect if we're capturing output (not running directly in a terminal)
    if [[ -t 1 ]]; then
        # Not capturing → Show stdout live
        "$@" > >(tee "$stdout_tmp" | tee -a "$LOG_FILE") 2> >(tee -a "$LOG_FILE" >&2)
    else
        # Capturing → Don't show stdout live; just log it and capture it
        "$@" >"$stdout_tmp" 2> >(tee -a "$LOG_FILE" >&2)
    fi

    local exit_code=${PIPESTATUS[0]}

    # Append stdout to log file
    cat "$stdout_tmp" >>"$LOG_FILE"

    # Capture stdout content into a variable
    local output
    output=$(<"$stdout_tmp")
    rm -f "$stdout_tmp"

    if [ $exit_code -ne 0 ]; then
        echo "ERROR: Command failed: $*" >&2
        echo "ERROR: Command failed: $*" >>"$LOG_FILE"
        echo "Check logs for details: $LOG_FILE" >&2
        echo "Check logs for details: $LOG_FILE" >>"$LOG_FILE"
        echo "TIP: Since this script is idempotent, you can re-run it safely to retry." >&2
        echo "TIP: Since this script is idempotent, you can re-run it safely to retry." >>"$LOG_FILE"
        exit 1
    fi

    # Output stdout to the caller without adding a newline
    if [[ ! -t 1 ]]; then
        printf "%s" "$output"
    fi
}

This not-so-simple wrapper gave us:

  • Captured stdout and stderr for every command
  • Real-time terminal output and persistent logs
  • Clear failures when things broke

run_command() became the workhorse for capturing our needed inputs to other processes and our eyes into failures.

1.2. Lessons from the Evolution

We didn’t arrive at run_command() fully formed. We learned it the hard way:

  • Our first iterations printed output twice
  • Capturing both streams without swallowing stdout took fine-tuning
  • We discovered that without proper telemetry, we were flying blind

2. Automating the Key AWS Resources

2.1. S3 Bucket Creation

The point of this whole exercise is to host content, and for that, we need an S3 bucket. This seemed like a simple first task - until we realized it wasn’t. This is where we first collided with a concept that would shape the entire script: idempotency.

S3 bucket names are globally unique. If you try to create one that exists, you fail. Worse, AWS error messages can be cryptic:

  • “BucketAlreadyExists”
  • “BucketAlreadyOwnedByYou”

Our naive first attempt just created the bucket. Our second attempt checked for it first:

create_s3_bucket() {
    if run_command $AWS s3api head-bucket --bucket "$BUCKET_NAME" --profile $AWS_PROFILE 2>/dev/null; then
        echo "Bucket $BUCKET_NAME already exists."
        return
    fi

    run_command $AWS s3api create-bucket \
        --bucket "$BUCKET_NAME" \
        --create-bucket-configuration LocationConstraint=$AWS_REGION \
        --profile $AWS_PROFILE
}

Making the script “re-runable” was essential unless of course we could guarantee we did everything right and things worked the first time. When has that every happened? Of course, we then wrapped the creation of the bucket run_command() because every AWS call still had the potential to fail spectacularly.

And so, we learned: If you can’t guarantee perfection, you need idempotency.

2.2. CloudFront Distribution with Origin Access Control

Configuring a CloudFront distribution using the AWS Console offers a streamlined setup with sensible defaults. But we needed precise control over CloudFront behaviors, cache policies, and security settings - details the console abstracts away. Automation via the AWS CLI gave us that control - but there’s no free lunch. Prepare yourself to handcraft deeply nested JSON payloads, get jiggy with jq, and manage the dependencies between S3, CloudFront, ACM, and WAF. This is the path we would need to take to build a resilient, idempotent deployment script - and crucially, to securely serve private S3 content using Origin Access Control (OAC).

Why do we need OAC?

Since our S3 bucket is private, we need CloudFront to securely retrieve content on behalf of users without exposing the bucket to the world.

Why not OAI?

AWS has deprecated Origin Access Identity in favor of Origin Access Control (OAC), offering tighter security and more flexible permissions.

Why do we need jq?

In later steps we create a WAF Web ACL to firewall our CloudFront distribution. In order to associate the WAF Web ACL with our distribution we need to invoke the update-distribution API which requires a fully fleshed out JSON payload updated with the Web ACL id.

GOTHCHA: Attaching a WAF WebACL to an existing CloudFront distribution requires that you use the update-distribution API, not associate-web-acl as one might expect.

Here’s the template for our distribution configuration (some of the Bash variables used will be evident when you examine the completed script):

{
  "CallerReference": "$CALLER_REFERENCE",
   $ALIASES
  "Origins": {
    "Quantity": 1,
    "Items": [
      {
        "Id": "S3-$BUCKET_NAME",
        "DomainName": "$BUCKET_NAME.s3.amazonaws.com",
        "OriginAccessControlId": "$OAC_ID",
        "S3OriginConfig": {
          "OriginAccessIdentity": ""
        }
      }
    ]
  },
  "DefaultRootObject": "$ROOT_OBJECT",
  "DefaultCacheBehavior": {
    "TargetOriginId": "S3-$BUCKET_NAME",
    "ViewerProtocolPolicy": "redirect-to-https",
    "AllowedMethods": {
      "Quantity": 2,
      "Items": ["GET", "HEAD"]
    },
    "ForwardedValues": {
      "QueryString": false,
      "Cookies": {
        "Forward": "none"
      }
    },
    "MinTTL": 0,
    "DefaultTTL": $DEFAULT_TTL,
    "MaxTTL": $MAX_TTL
  },
  "PriceClass": "PriceClass_100",
  "Comment": "CloudFront Distribution for $ALT_DOMAIN",
  "Enabled": true,
  "HttpVersion": "http2",
  "IsIPV6Enabled": true,
  "Logging": {
    "Enabled": false,
    "IncludeCookies": false,
    "Bucket": "",
    "Prefix": ""
  },
  $VIEWER_CERTIFICATE
}

The create_cloudfront_distribution() function is then used to create the distribution.

create_cloudfront_distribution() {
    # Snippet for brevity; see full script
    run_command $AWS cloudfront create-distribution --distribution-config file://$CONFIG_JSON
}

Key lessons:

  • use update-configuation, not associate-web-acl for CloudFront distributions
  • leverage jq to modify the existing configuration to add the WAF Web ACL id
  • manually configuring CloudFront provides more granularity than the console, but requires some attention to the details

2.3. WAF IPSet + NAT Gateway Lookup

Cool. We have a CloudFront distribution! But it’s wide open to the world. We needed to restrict access to our internal VPC traffic - without exposing the site publicly. AWS WAF provides this firewall capability using Web ACLs. Here’s what we need to do:

  1. Look up our VPC’s NAT Gateway IP (the IP CloudFront would see from our internal traffic).
  2. Create a WAF IPSet containing that IP (our allow list).
  3. Build a Web ACL rule using the IPSet.
  4. Attach the Web ACL to the CloudFront distribution.

Keep in mind that CloudFront is designed to serve content to the public internet. When clients in our VPC access the distribution, their traffic needs to exit through a NAT gateway with a public IP. We’ll use the AWS CLI to query the NAT gateway’s public IP and use that when we create our allow list of IPs (step 1).

find_nat_ip() {
    run_command $AWS ec2 describe-nat-gateways --filter "Name=tag:Environment,Values=$TAG_VALUE" --query "NatGateways[0].NatGatewayAddresses[0].PublicIp" --output text --profile $AWS_PROFILE
}

We take this IP and build our first WAF component: an IPSet. This becomes the foundation for the Web ACL we’ll attach to CloudFront.

The firewall we create will be composed of an allow list of IP addresses (step 2)…

create_ipset() {
    run_command $AWS wafv2 create-ip-set \
        --name "$IPSET_NAME" \
        --scope CLOUDFRONT \
        --region us-east-1 \
        --addresses "$NAT_IP/32" \
        --ip-address-version IPV4 \
        --description "Allow NAT Gateway IP"
}

…that form the rules for our WAF Web ACL (step 3).

create_web_acl() {
    run_command $AWS wafv2 create-web-acl \
        --name "$WEB_ACL_NAME" \
        --scope CLOUDFRONT \
        --region us-east-1 \
        --default-action Block={} \
        --rules '[{"Name":"AllowNAT","Priority":0,"Action":{"Allow":{}},"Statement":{"IPSetReferenceStatement":{"ARN":"'$IPSET_ARN'"}},"VisibilityConfig":{"SampledRequestsEnabled":true,"CloudWatchMetricsEnabled":true,"MetricName":"AllowNAT"}}]' \
        --visibility-config SampledRequestsEnabled=true,CloudWatchMetricsEnabled=true,MetricName="$WEB_ACL_NAME"
}

This is where our earlier jq surgery becomes critical - attaching the Web ACL requires updating the entire CloudFront distribution configuration. And that’s how we finally attach that Web ACL to our CloudFront distribution (step 4).

DISTRIBUTION_CONFIG=$(run_command $AWS cloudfront get-distribution-config --id $DISTRIBUTION_ID)
<h1 id="usejqtoinjectwebaclidintoconfigjson">Use jq to inject WebACLId into config JSON</h1>

UPDATED_CONFIG=$(echo "$DISTRIBUTION_CONFIG" | jq --arg ACL_ARN "$WEB_ACL_ARN" '.DistributionConfig | .WebACLId=$ACL_ARN')
<h1 id="passupdatedconfigbackintoupdate-distribution">Pass updated config back into update-distribution</h1>

echo "$UPDATED_CONFIG" > updated-config.json
run_command $AWS cloudfront update-distribution --id $DISTRIBUTION_ID --if-match "$ETAG" --distribution-config file://updated-config.json

At this point, our CloudFront distribution is no longer wide open. It is protected by our WAF Web ACL, restricting access to only traffic coming from our internal VPC NAT gateway.

For many internal-only sites, this simple NAT IP allow list is enough. WAF can handle more complex needs like geo-blocking, rate limiting, or request inspection - but those weren’t necessary for us. Good design isn’t about adding everything; it’s about removing everything that isn’t needed. A simple allow list was also the most secure.

2.4. S3 Bucket Policy Update

When we set up our bucket, we blocked public access - an S3-wide security setting that prevents any public access to the bucket’s contents. However, this also prevents CloudFront (even with OAC) from accessing S3 objects unless we explicitly allow it. Without this policy update, requests from CloudFront would fail with Access Denied errors.

At this point, we need to allow CloudFront to access our S3 bucket. The update_bucket_policy() function will apply the policy shown below.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "cloudfront.amazonaws.com"
      },
      "Action": "s3:GetObject",
      "Resource": "arn:aws:s3:::$BUCKET_NAME/*",
      "Condition": {
        "StringEquals": {
          "AWS:SourceArn": "arn:aws:cloudfront::$AWS_ACCOUNT:distribution/$DISTRIBUTION_ID"
        }
      }
    }
  ]
}

Modern OAC best practice is to use the AWS:SourceArn condition to ensure only requests from your specific CloudFront distribution are allowed.

It’s more secure because it ties bucket access directly to a single distribution ARN, preventing other CloudFront distributions (or bad actors) from accessing your bucket.

"Condition": {
    "StringEquals": { "AWS:SourceArn": "arn:aws:cloudfront::$AWS_ACCOUNT:distribution/$DISTRIBUTION_ID" }
}

With this policy in place, we’ve completed the final link in the security chain. Our S3 bucket remains private but can now securely serve content through CloudFront - protected by OAC and WAF.


3. Putting It All Together

We are now ready to wrap a bow around these steps in an idempotent Bash script.

  1. Create an S3 Bucket (or verify it Exists)
    • This is where we first embraced idempotency. If the bucket is already there, we move on.
  2. Create a CloudFront Distribution with OAC
    • The foundation for serving content securely, requiring deep JSON config work and the eventual jq patch. Restrict Access with WAF
  3. Discover the NAT’s Gateway IP - The public IP representing our VPC
    • Create a WAF IPSet (Allow List) – Build the allow list with our NAT IP.
    • Create a WAF Web ACL – Bundle the allow list into a rule.
    • Attach the Web ACL to CloudFront – Using jq and update-distribution.
  4. Grant CloudFront Access to S3
    • Update the bucket policy to allow OAC originating requests from our distribution.

Each segment of our script is safe to rerun. Each is wrapped in run_command(), capturing results for later steps and ensuring errors are logged. We now have a script we can commit and re-use with confidence whenever we need a secure static site. Together, these steps form a robust, idempotent deployment pipeline for a secure S3 + CloudFront website - every time.

You can find the full script here.


4. Running the Script

A hallmark of a production-ready script is an ‘-h’ option. Oh wait - your script has no help or usage? I’m supposed to RTFC? It ain’t done skippy until it’s done.

Scripts should include the ability to pass options that make it a flexible utility. We may have started out writing a “one-off” but recognizing opportunities to generalize the solution turned this into another reliable tool in our toolbox.

Be careful though - not every one-off needs to be Swiss Army knife. Just because aspirin is good for a headache doesn’t mean you should take the whole bottle.

Our script now supports the necessary options to create a secure, static website with a custom domain and certificate. We even added the ability to include additional IP addresses for your allow list in addition to the VPC’s public IP.

Now, deploying a private S3-backed CloudFront site is as easy as:

Example:

./s3-static-site.sh -b my-site -t dev -d example.com -c arn:aws:acm:us-east-1:cert-id

Inputs:

  • -b - the bucket name
  • -t - the tag I used to identify my VPC NAT gateway
  • -c - the certificate ARN I created for my domain
  • -d - the domain name for my distribution

This single command now deploys an entire private website - reliably and repeatably. It only takes a little longer to do it right!


5. Key Takeaways from this Exercise

The process of working with ChatGPT to construct a production ready script that creates static websites took many hours. In the end, several lessons were reinforced and some gotchas discovered. Writing this blog itself was a collaborative effort that dissected both the technology and the process used to implement it. Overall, it was a productive, fun and rewarding experience. For those not familiar with ChatGPT or who are afraid to give it a try, I encourage you to explore this amazing tool.

Here are some of the things I took away from this adventure with ChatGPT.

  • ChatGPT is a great accelerator for this type of work - but not perfect. Ask questions. Do not copy & paste without understanding what it is you are copying and pasting!
  • If you have some background and general knowledge of a subject ChatGPT can help you become even more knowledgeable as long as you ask lots of follow-up questions and pay close attention to the answers.

With regard to the technology, some lessons were reinforced, some new knowledge was gained:

  • Logging (as always) is an important feature when multiple steps can fail
  • Idempotency guards make sure you can iterate when things went wrong
  • Discovering the NAT IP and subsequently adding a WAF firewall rule was needed because of the way CloudFront works
  • Use the update-distribution API call not associate-web-acl when adding WAF ACLs to your distribution!

Thanks to ChatGPT for being an ever-present back seat driver on this journey. Real AWS battle scars + AI assistance = better results.

Wrap Up

In Part III we wrap it all up as we learn more about how CloudFront and WAF actually protect your website.

Disclaimer

This post was drafted with the assistance of ChatGPT, but born from real AWS battle scars.

If you like this content, please leave a comment or consider following me. Thanks.

Hosting a Secure Static Website with S3 and CloudFront: Part IIa

Overcoming Challenges in AWS Automation: Lessons from Deploying a Secure S3 + CloudFront Static Website

Introduction

After designing a secure static website on AWS using S3, CloudFront, and WAF as discussed in Part I of this series, we turned our focus to automating the deployment process. While AWS offers powerful APIs and tools, we quickly encountered several challenges that required careful consideration and problem-solving. This post explores the primary difficulties we faced and the lessons we learned while automating the provisioning of this infrastructure.

1. Service Interdependencies

A key challenge when automating AWS resources is managing service dependencies. Our goal was to deploy a secure S3 website fronted by CloudFront, secured with HTTPS (via ACM), and restricted using WAF. Each of these services relies on others, and the deployment sequence is critical:

  • CloudFront requires an ACM certificate
    • before a distribution with HTTPS can be created.
  • S3 needs an Origin Access Control (OAC)
    • configured before restricting bucket access to CloudFront.
  • WAF must be created and associated with CloudFront
    • after the distribution is set up.

Missteps in the sequence can result in failed or partial deployments, which can leave your cloud environment in an incomplete state, requiring tedious manual cleanup.

2. Eventual Consistency

AWS infrastructure often exhibits eventual consistency, meaning that newly created resources might not be immediately available. We specifically encountered this when working with ACM and CloudFront:

  • ACM Certificate Validation:
    • After creating a certificate, DNS validation is required. Even after publishing the DNS records, it can take minutes (or longer) before the certificate is validated and usable.
  • CloudFront Distribution Deployment:
    • When creating a CloudFront distribution, changes propagate globally, which can take several minutes. Attempting to associate a WAF policy or update other settings during this window can fail.

Handling these delays requires building polling mechanisms into your automation or using backoff strategies to avoid hitting API limits.

3. Error Handling and Idempotency

Reliable automation is not simply about executing commands; it requires designing for resilience and repeatability:

  • Idempotency:
    • Your automation must handle repeated executions gracefully. Running the deployment script multiple times should not create duplicate resources or cause conflicts.
  • Error Recovery:
    • AWS API calls occasionally fail due to rate limits, transient errors, or network issues. Implementing automatic retries with exponential backoff helps reduce manual intervention.

Additionally, logging the execution of deployment commands proved to be an unexpected challenge. We developed a run_command function that captured both stdout and stderr while logging the output to a file. However, getting this function to behave correctly without duplicating output or interfering with the capture of return values required several iterations and refinements. Reliable logging during automation is critical for debugging failures and ensuring transparency when running infrastructure-as-code scripts.

4. AWS API Complexity

While the AWS CLI and SDKs are robust, they are often verbose and require a deep understanding of each service:

  • CloudFront Distribution Configuration:
    • Defining a distribution involves deeply nested JSON structures. Even minor errors in JSON formatting can cause deployment failures.
  • S3 Bucket Policies:
    • Writing secure and functional S3 policies to work with OAC can be cumbersome. Policy errors can lead to access issues or unintended public exposure.
  • ACM Integration:
    • Automating DNS validation of ACM certificates requires orchestrating multiple AWS services (e.g., Route 53) and carefully timing validation checks. We did not actuall implement an automated process for this resource. Instead, we considered this a one-time operation better handled manually via the console.

Lessons Learned

Throughout this process, we found that successful AWS automation hinges on the following principles:

  • Plan the dependency graph upfront:
    • Visualize the required services and their dependencies before writing any automation.
  • Integrate polling and backoff mechanisms:
    • Design your scripts to account for delays and transient failures.
  • Prioritize idempotency:
    • Your infrastructure-as-code (IaC) should be safe to run repeatedly without adverse effects.
  • Test in a sandbox environment:
    • Test your automation in an isolated AWS account to catch issues before deploying to production.
  • Implement robust logging:
    • Ensure that all automation steps log their output consistently and reliably to facilitate debugging and auditing.

Conclusion

Automating AWS deployments unlocks efficiency and scalability, but it demands precision and robust error handling. Our experience deploying a secure S3 + CloudFront website highlighted common challenges that any AWS practitioner is likely to face. By anticipating these issues and applying resilient practices, teams can build reliable automation pipelines that simplify cloud infrastructure management.

Next up, Part IIb where we build our script for creating our static site.

Disclaimer

This post was drafted with the assistance of ChatGPT, but born from real AWS battle scars.

If you like this content, please leave a comment or consider following me. Thanks.

Hosting a Secure Static Website with S3 and CloudFront: Part I

Introduction

While much attention is given to dynamic websites there are still many uses for the good ‘ol static website. Whether for hosting documentation, internal portals, or lightweight applications, static sites remain relevant. In my case, I wanted to host an internal CPAN repository for storing and serving Perl modules. AWS provides all of the necessary components for this task but choosing the right approach and configuring it securely and automatically can be a challenge.

Whenever you make an architectural decision various approaches are possible. It’s a best practice to document that decision in an Architectural Design Record (ADR). This type of documentation justifies your design choice, spelling out precisely how each approach either meets or fails to meet functional or non-functional requirements. In the first part of this blog series we’ll discuss the alternatives and why we ended up choosing our CloudFront based approach. This is our ADR.

Requirements

Description Notes
1. HTTPS website for hosting a CPAN repository Will be used internally but we would like secure transport
2. Controlled Access Can only be accessed from within a private subnet in our VPC
3. Scalable Should be able to handle increasing storage without reprovisioning
4. Low-cost Ideally less than $10/month
5. Low-maintenance No patching or maintenance of applicaation or configurations
6. Highly available Should be available 24x7, content should be backed up

Alternative Approaches

Now that we’ve defined our functional and non-functional requirements let’s look at some approaches we might take in order to create a secure, scalable, low-cost, low-maintenance static website for hosting our CPAN repository.

Use an S3 Website-Enabled Bucket

This solution at first glance seems like the quickest shot on goal. While S3 does offer a static website hosting feature, it doesn’t support HTTPS by default, which is a major security concern and does not match our requirements. Additionally, website-enabled S3 buckets do not support private access controls - they are inherently public if enabled. Had we been able to accept an insecure HTTP site and public access this approach would have been the easiest to implement. If we wanted to accept public access but required secure transport we could have used CloudFront with the website enabled bucket either using CloudFront’s certificate or creating our own custom domain with its own certificate.

Since our goal is to create a private static site, we can however use CloudFront as a secure, caching layer in front of S3. This allows us to enforce HTTPS, control access using Origin Access Control (OAC), and integrate WAF to restrict access to our VPC. More on this approach later…

Pros:

  • Quick & Easy Setup Enables static website hosting with minimal configuration.
  • No Additional Services Needed Can serve files directly from S3 without CloudFront.
  • Lower Cost No CloudFront request or data transfer fees when accessed directly.

Cons:

  • No HTTPS Support Does not natively support HTTPS, which is a security concern.
  • Public by Default Cannot enforce private access controls; once enabled, it’s accessible to the public.
  • No Fine-Grained Security Lacks built-in protection mechanisms like AWS WAF or OAC.
  • Not VPC-Restricted Cannot natively block access from the public internet while still allowing internal users.

Analysis:

While using an S3 website-enabled bucket is the easiest way to host static content, it fails to meet security and privacy requirements due to public access and lack of HTTPS support.

Deploying a Dedicated Web Server

Perhaps the obvious approach to hosting a private static site is to deploy a dedicated Apache or Nginx web server on an EC2 instance. This method involves setting up a lightweight Linux instance, configuring the web server, and implementing a secure upload mechanism to deploy new content.

Pros:

  • Full Control: You can customize the web server configuration, including caching, security settings, and logging.
  • Private Access: When used with a VPC, the web server can be accessed only by internal resources.
  • Supports Dynamic Features: Unlike S3, a traditional web server allows for features such as authentication, redirects, and scripting.
  • Simpler Upload Mechanism: Files can be easily uploaded using SCP, rsync, or an automated CI/CD pipeline.

Cons:

  • Higher Maintenance: Requires ongoing security patching, monitoring, and potential instance scaling.
  • Single Point of Failure: Unless deployed in an autoscaling group, a single EC2 instance introduces availability risks.
  • Limited Scalability: Scaling is manual unless configured with an ALB (Application Load Balancer) and autoscaling.

Analysis:

Using a dedicated web server is a viable alternative when additional flexibility is needed, but it comes with added maintenance and cost considerations. Given our requirements for a low-maintenance, cost-effective, and scalable solution, this may not be the best approach.

Using a Proxy Server with a VPC Endpoint

A common approach I have used to securely serve static content from an S3 bucket is to use an internal proxy server (such as Nginx or Apache) running on an EC2 instance within a private VPC. In fact, this is the approach I have used to create my own private yum repository, so I know it would work effectively for my CPAN repository. The proxy server retrieves content from an S3 bucket via a VPC endpoint, ensuring that traffic never leaves AWS’s internal network. This approach requires managing an EC2 instance, handling security updates, and scaling considerations. Let’s look at the cost of an EC2 based solution.

The following cost estimates are based on AWS pricing for us-east-1:

EC2 Cost Calculation (t4g.nano instance)

Item Pricing
Instance type: t4g.nano (cheapest ARM-based instance) Hourly cost: \$0.0052/hour
Monthly usage: 730 hours (assuming 24/7 uptime) (0.0052 x 730 = \$3.80/month)

Pros:

  • Predictable costs No per-request or per-GB transfer fees beyond the instance cost.
  • Avoids external traffic costs All traffic remains within the VPC when using a private endpoint.
  • Full control over the web server Can customize caching, security, and logging as needed.

Cons:

  • Higher maintenance
    • Requires OS updates, security patches, and monitoring.
  • Scaling is manual
    • Requires autoscaling configurations or manual intervention as traffic grows.
  • Potential single point of failure
    • Needs HA (High Availability) setup for reliability.

Analysis:

If predictable costs and full server control are priorities, EC2 may be preferable. However, this solution requires maintenance and may not scale with heavy traffic. Moreover, to create an HA solution would require additional AWS resources.

CloudFront + S3 + WAF

As alluded to before, CloudFront + S3 might fit the bill. To create a secure, scalable, and cost-effective private static website, we chose to use Amazon S3 with CloudFront (sprinkling in a little AWS WAF for good measure). This architecture allows us to store our static assets in an S3 bucket while CloudFront acts as a caching and security layer in front of it. Unlike enabling public S3 static website hosting, this approach provides HTTPS support, better scalability, and fine-grained access control.

CloudFront integrates with Origin Access Control (OAC), ensuring that the S3 bucket only allows access from CloudFront and not directly from the internet. This eliminates the risk of unintended public exposure while still allowing authorized users to access content. Additionally, AWS WAF (Web Application Firewall) allows us to restrict access to only specific IP ranges or VPCs, adding another layer of security.

Let’s look at costs:

Item Cost Capacity Total
Data Transfer Out First 10TB is \$0.085 per GB 25GB/month of traffic Cost for 25GB: (25 x 0.085 = \$2.13)
HTTP Requests \$0.0000002 per request 250,000 requests/month Cost for requests: (250,000 x 0.0000002 = \$0.05)
Total CloudFront Cost: \$2.13 (Data Transfer) + \$0.05 (Requests) = \$2.18/month

Pros:

  • Scales effortlessly
    • AWS handles scaling automatically based on demand.
  • Lower maintenance
    • No need to manage servers or perform security updates.
  • Includes built-in caching & security
    • CloudFront integrates WAF and Origin Access Control (OAC).

Cons:

  • Traffic-based pricing
    • Costs scale with data transfer and request volume.
  • External traffic incurs costs
    • Data transfer fees apply for internet-accessible sites.
  • Less customization
    • Cannot modify web server settings beyond what CloudFront offers.
  • May require cache invalidations for often updated assets

Analysis:

And the winner is…CloudFront + S3!

Using just a website enabled S3 bucket fails to meet the basic requiredments so let’s eliminate that solution right off the bat. If predictable costs and full server control are priorities, Using an EC2 either as a proxy or a full blown webserver may be preferable. However, for a low-maintenance, auto-scaling solution, CloudFront + S3 is the superior choice. EC2 is slightly more expensive but avoids CloudFront’s external traffic costs. Overall, our winning approach is ideal because it scales automatically, reduces operational overhead, and provides strong security mechanisms without requiring a dedicated EC2 instance to serve content.

CloudFront+S3+WAF

  • CloudFront scales better - cost remains low per GB served, whereas EC2 may require scaling for higher traffic.
  • CloudFront includes built-in caching & security, while EC2 requires maintenance and patching.

Bash Scripting vs Terraform

Now that we have our agreed upon approach (the “what”) and documented our “architectural decision”, it’s time to discuss the “how”. How should we go about constructing our project? Many engineers would default to Terraform for this type of automation, but we had specific reasons for thinking this through and looking at a different approach. We’d like:

  • Full control over execution order (we decide exactly when & how things run).
  • Faster iteration (no need to manage Terraform state files).
  • No external dependencies - just AWS CLI.
  • Simple solution for a one-off project.

Why Not Terraform?

While Terraform is a popular tool for infrastructure automation, it introduces several challenges for this specific project. Here’s why we opted for a Bash script over Terraform:

  • State Management Complexity

    Terraform relies on state files to track infrastructure resources, which introduces complexity when running and re-running deployments. State corruption or mismanagement can cause inconsistencies, making it harder to ensure a seamless idempotent deployment.

  • Slower Iteration and Debugging

    Making changes in Terraform requires updating state, planning, and applying configurations. In contrast, Bash scripts execute AWS CLI commands immediately, allowing for rapid testing and debugging without the need for state synchronization.

  • Limited Control Over Execution Order

    Terraform follows a declarative approach, meaning it determines execution order based on dependencies. This can be problematic when AWS services have eventual consistency issues, requiring retries or specific sequencing that Terraform does not handle well natively.

  • Overhead for a Simple, Self-Contained Deployment

    For a relatively straightforward deployment like a private static website, Terraform introduces unnecessary complexity. A lightweight Bash script using AWS CLI is more portable, requires fewer dependencies, and avoids managing an external Terraform state backend.

  • Handling AWS API Throttling

    AWS imposes API rate limits, and handling these properly requires implementing retry logic. While Terraform has some built-in retries, it is not as flexible as a custom retry mechanism in a Bash script, which can incorporate exponential backoff or manual intervention if needed.

  • Less Direct Logging and Error Handling

    Terraform’s logs require additional parsing and interpretation, whereas a Bash script can log every AWS CLI command execution in a simple and structured format. This makes troubleshooting easier, especially when dealing with intermittent AWS errors.

When Terraform Might Be a Better Choice

Although Bash was the right choice for this project, Terraform is still useful for more complex infrastructure where:

  • Multiple AWS resources must be coordinated across different environments.
  • Long-term infrastructure management is needed with a team-based workflow.
  • Integrating with existing Terraform deployments ensures consistency.

For our case, where the goal was quick, idempotent, and self-contained automation, Bash scripting provided a simpler and more effective approach. This approach gave us the best of both worlds - automation without complexity, while still ensuring idempotency and security.


Next Steps

  • In Part IIa of the series we’ll discuss the challenges we faced with AWS automation.
  • Part IIb we’ll discuss in detail the script we built.
  • Finally, Part III will wrap things up with a better explanation of why this all works.

Disclaimer

This post was drafted with the assistance of ChatGPT, but born from real AWS battle scars.

If you like this content, please leave a comment or consider following me. Thanks.

Hosting a Secure Static Website with S3 and CloudFront: Part III

This is the last in our three part series where we discuss the creation of a private, secure, static website using Amazon S3 and CloudFront.

Introduction

Amazon S3 and CloudFront are powerful tools for hosting static websites, but configuring them securely can be surprisingly confusing-even for experienced AWS users. After implementing this setup for my own use, I discovered a few nuances that others often stumble over, particularly around CloudFront access and traffic routing from VPC environments. This post aims to clarify these points and highlight a potential gap in AWS’s offering.

The Secure S3 + CloudFront Website Setup

The typical secure setup for hosting a static website using S3 and CloudFront looks like this:

  1. S3 Bucket: Store your website assets. Crucially, this bucket should not be publicly accessible.
  2. CloudFront Distribution: Distribute your website content, with HTTPS enabled and custom domain support via ACM.
  3. Origin Access Control (OAC): Grant CloudFront permission to read from your private S3 bucket.
  4. S3 Bucket Policy: Configure it to allow access only from the CloudFront distribution (via OAC).

This setup ensures that even if someone discovers your S3 bucket URL, they won’t be able to retrieve content directly. All access is routed securely through CloudFront.

The VPC Epiphany: Why Is My Internal Traffic Going Through NAT?

For many AWS users, especially those running workloads inside a VPC, the first head-scratcher comes when internal clients access the CloudFront-hosted website. You might notice that this traffic requires a NAT gateway, and you’re left wondering:

  • “Isn’t this all on AWS’s network? Why is it treated as public?”
  • “Can I route CloudFront traffic through a private path in my VPC?”

Here’s the key realization:

CloudFront is a public-facing service. Even when your CloudFront distribution is serving content from a private S3 bucket, your VPC clients are accessing CloudFront through its public endpoints.

  • CloudFront -> S3: This is private and stays within the AWS network.
  • VPC -> CloudFront: This is treated as public internet traffic, even though it often stays on AWS’s backbone.

This distinction is not immediately obvious, and it can be surprising to see internal traffic going through a NAT gateway and showing up with a public IP.

CloudFront+S3+WAF

Why This Feels Like a Product Gap

For my use case, I wasn’t interested in CloudFront’s global caching or latency improvements; I simply wanted a secure, private website hosted on S3, with a custom domain and HTTPS. AWS currently lacks a streamlined solution for this. A product offering like “S3 Secure Website Hosting” could fill this gap by combining:

  • Private S3 bucket
  • Custom domain + HTTPS
  • Access control (VPC, IP, IAM, or WAF)
  • No CloudFront unless explicitly needed

Securing Access to Internal Clients

To restrict access to your CloudFront-hosted site, you can use AWS WAF with an IPSet containing your NAT gateway’s public IP address. This allows only internal VPC clients (routing through the NAT) to access the website while blocking everyone else.

Conclusion

The S3 + CloudFront setup is robust and secure - once you understand the routing and public/private distinction. However, AWS could better serve users needing simple, secure internal websites by acknowledging this use case and providing a more streamlined solution.

Until then, understanding these nuances allows you to confidently deploy secure S3-backed websites without surprises.

Disclaimer

This post was drafted with the assistance of ChatGPT, but born from real AWS battle scars.

If you like this content, please leave a comment or consider following me. Thanks.

Credit: Management 101

Rob’s Rule of Three: How Leaders Make Decisions When Everything’s on Fire

We’ve all been there. Someone bursts into your office (or sends you a message on Teams) with their hair on fire. “Everything is broken! The system is down! Customers are complaining!”

Your adrenaline spikes. Your brain starts racing. Over the years, I’ve learned that the best leaders don’t rush to react. They don’t speed up, they slow down, assess, and respond with clarity. Ever hear of the OODA loop? Did you know that late commitment is an important element of agility?

My experience handling many a supposed Chernobyl meltdowns has led me to what I call Rob’s Rule of Three. It’s my personal framework that I’ve used for years to successfully cut through the noise and make decisions under stress. You too can be a successful executive IF you can just…

…keep your head when all about you are losing theirs and blaming it on you…

Thank you Mr. Kipling.

To remind myself, my team and my organization of the imporatance of these rules I wrote them on my whiteboard. When someone brought me a “problem” they were reminded of them.

Rob’s Rule of Three

  1. Is it really a problem?
  2. Is it my problem to solve?
  3. If yes, do I need to solve it right now?

Let’s break this down.

1. Is It Really a Problem?

Panic distorts reality. Panic creates a distorted projection of the future. Our imaginings of possible negative outcomes never really match what actually happens does it?

Many issues that feel urgent in the moment turn out to be noise or minor hiccups. Before reacting, help the person and the organization explain:

  • What is actually happening?
  • What is the impact? Who is affected?
  • What are the consequences if we do nothing?

Many times, the fire is extinquished under the blanket of scrutiny.

2. Is It My Problem to Solve?

As a leader, you can’t and shouldn’t solve everything. Sometimes your role is to delegate, coach, or empower others to handle it. Yes, you probably climbed the ladder to your current perch by personally handling many a crisis. People look at you as a hero. But that’s history. You’re not in the hero business anymore. You are a leader of heroes. Resist the urge to be the hero. It’s a trap. - Who owns this system or process? - Who is closest to the problem? - Is this something no one else is equipped to handle?

If it’s not your problem, hand it off and move on. And sometimes when there is no one else capable of handling the issue, you have identified an organization hole or an opportunity to mentor your replacement!

3. If Yes, Do I Need to Solve It Right Now?

Not everything is a five-alarm fire. Some problems can wait. The universe changes every second Padawan. Feel the force. - Will delaying action cause harm? - Can we mitigate the issue for now in some way and plan a proper fix later?

Most issues can be scheduled into normal workstreams. Reminding people of the process reinforces calm. Reserve your immediate energy for true emergencies.

Scott Adams, creator of Dilbert, once joked about Wally’s Rule - that after three days, most requests are irrelevant. While it’s a comic exaggeration, there’s truth in it: Some problems evaporate if you simply wait.

When you have a cold, if you get lot’s of rest, drink plenty of fluids and have some chicken soup it will go away in 5-7 days. If you do nothing it might take a week.

Why This Matters

When panic strikes, leaders are judged by their ability to stay calm and make sound decisions. The Rule of Three cuts through the noise, reduces reactionary decisions, and reinforces trust within your team.

I have often reminded my team and our internal customers of the team’s track record when there is panic in the air:

“When was the last time we didn’t solve a customer problem? When did we ever leave a system broken? Stop. Think. We’re still here. The business is still running. You are the A-Team! Together we’ll solve this one too!”

Calm is faster. Calm is smarter. Slow is fast, less is more.

Take an Emergency Lightly

This is my personal addendum. “Take an emergency lightly” doesn’t mean ignoring it. It means approaching it with the confidence that you and your team will handle it. Because “that’s sort of what we do”.

So, next time someone runs in with their hair on fire, stop, drop, but don’t roll. Remember the Rule of Three. And take it lightly, while applying a little self affirmation.

Stuar Smalley

  • “I’m good enough.”
  • “I’m smart enough.”
  • “…and doggone it people like me!”
Credit: Perl Camel

Isn’t Perl Dead

…let’s just move on shall we ;-)

Essential Knowledge for Perl Consultants

So you want to be the guy, the one that swoops in to the shop that has been saddled with the legacy Perl application because you’ve been doing Perl since the last century? You know that shop, they have a Perl application and a bunch of developers that only do Python and they’ve suddenly becom allergic to learning something new (to them). From my own experience, here are some of the technologies you’ll encounter and should be familiar with to be the guy.

  • [x] mod_perl
  • [x] FastCGI
  • [x] Moose
  • [x] HTML::Template
  • [x] Mason
  • [ ] Template::Toolkit

I checked off the things I’ve encountered in my last three jobs.

Of course, the newer Perl based frameworks are good to know as well:

  • Mojolicious
  • Catalyst

Some “Nice to Knows”

  • Apache
  • docker
  • cpanm
  • carton
  • make
  • bash

…and of these, I think the most common thing you’ll encounter on sites that run Perl applications is mod_perl.

Thar’s gold in them thar hills!

Well, maybe not gold, but certainly higher rates and salaries for experienced Perl developers. You’re a unicorn! Strut your stuff. Don’t back down and go cheap. Every day someone leaves the ranks of Perl development only to become one of the herd leaving you to graze alone.

Over the last three years I’ve earned over a half-million dollars in salary and consulting fees. Some of you are probably earning more. Some less. But here’s the bottom line, your skills are becoming scarcer and scarcer. And here’s the kicker…these apps aren’t going away. Companies are loathe to touch some of their cash cows or invest in any kind of “rewrite”. And here’s why…

  • They don’t know what the application even does!
  • They don’t have any bandwidth for rewriting applications that “work”.
  • They love technical debt or never even heard of it.

And here’s what they want you do for a big pile of their cash:

  • fix a small bug that may take you a day to find, but only a minute to fix
  • upgrade their version of perl
  • upgrade the platform the app runs on because of security vulnerabilities
  • containerize their application to run in the cloud
  • add a feature that will take a you a week to find out how to implement and a day to actually implement

The Going Rate?

According to the “interweb” the average salary for an experienced Perl developer is around $50/hour or about $100K or so. I’m suspicious of those numbers to be honest. Your mileage may vary but here’s what I’ve been able to get in my last few jobs:

  • $180K/year + bonus
  • $160K/year + a hearty handshake
  • $100/hour

…and I’m not a great negotiator. I do have over 20 years of experience with Perl and over 40 years of experience in IT. I’m not shy about promoting the value of that experience either. I did turn down a job for $155K/year that would have required some technical leadership, a position I think should have been more like $185k/year to lead a team of Perl developers across multiple time zones.

Your best prospects are…your current customers!

Even if you decide to leave a job or are done with an assignement, don’t burn bridges. Be willing to help them with a transition. Be polite, ask for a recommendation if appropriate. If they’re not planning on rehiring, they may be willing to contract with you for spot assignments.

Some Miss Manners Advice

  • Be nice…always
  • Suggest improvements but don’t be upset if they like the crappy app just the way it is
  • Write good documentation! Help someone else pick up your work (it could be you a year from now).
  • Be a mentor but not a know-it-all, you don’t know-it-all, and nobody likes a know-it-all even if you do
  • Don’t be stubborn and fight with the resident guru unless his bad decision is about to take the company off the cliff (and even then don’t fight with him, take it to the boss)
  • Ask questions and take a keen interest in their domain, you never know when a similar job might present itself

AppRunner - Update

In my last blog I introduced AppRunner a relatively new service from AWS that helps application developers concentrate on their applications instead of infrastructure. Similar to Elastic Beanstalk, AppRunner is an accelerator that gets your web application deployed in the most efficient way possible.

One of my concerns has been whether Amazon is actually committed to enhancing and extending this promising service. Searching the 2023 re:Invent announcements I was disappointed to see any news about new features for AppRunner. However, it was encouraging to see that they did include a typical promotional seminar on AppRunner.

The video is definitley worth watching, but the “case for AppRunner” is a bit tedious. They seem to be trying to equate AppRunner with modernization and reduction of technical debt. If hiding the complexities of deploying a scalable web application to the cloud (specifically AWS) is modernization then ok? I guess?

But let’s be honest here. It’s magic. You give them a container, they give you a scalable web application. I’m not sure that’s modernization or reducing technical debt. It sounds more like going “all in”. Which, by the way is totally cool with me. For my money, if you are going to leverage the cloud, then you damn well ought to leverage it. Don’t be shy. Take advantage of these services that reduce friction for developers and help you create value for your customers.

Sidebar re:Invent

Since I referenced a re:Invent webinar I should mention that I’ve attended re:Invent 5 times. It was an amazing and fun experience. However, the last time I attended it was such a cluster f*ck that I decided it just wasn’t worth the effort and haven’t been back since. Their content is on-line (thank you AWS!) now and I can pick and choose to attend on-line now instead of trying to figure out how to manage my limited time and the demand they have for specific seminars. If you do go, plan on standing in line (a lot).

The straw that broke this camel’s back was picking up some kind of virus at the Venetian on day 1. Oh, the humanity! They make you walk through the casino to get to where you need to go. I have no doubt that I picked up some nasty bug somewhere between a craps and black jack table. This was way before COVID, but I wouldn’t even dream of going there witout an N95 mask.

Unfortunatley, I spent the first few days in bed missing most of the conference. I literally almost died. To this day I’m not sure how I got on a plane on Friday and made it home. After I nearly hit my head on the porcelain throne as I returned everything I happened to have eaten in Las Vegas to their water recycling plant, I passed out. When I woke up on the polished Venetian bathroom floor I decided that as cool as the swag was and how great it was to come home with more T-shirts than I would ever need, it just wasn’t worth the energy required to attend re:Invent. Speaking of cool…if you do happen to pass out in a Venetian bathroom, the marble floors are soothingly cool and you will get a good nights rest.

Do not underestimate the amount of energy you need to attend re:Invent! Prepare yourself. To really experience re:Invent you need to wake at 6am, join the herd of people that parade to breakfast, plan your attack and move like a ninja through the venues. Seriously, start your day with their amazing breakfast.

I am partial to the Venetian, so that’s where I tried to stay by booking early. The Venetian can accomodate 15,000 people for breakfast and they do an amazing job. Gluten free? Yup? Veggie? Yup. You will not go hungry.

re:Invent now hosts over 50,000 attendees. The first year I went to re:Invent there were less about 10,00 in attendance. Honestly, it has become a complete mess. Buses take attendees between venues. But don’t count on getting to your seminar on-time. And if you are late, tough luck. Your spot will be given to the stand-bys.

Enough about re:Invent…but if you do go, get yourself invited to some vendor event - they are awesome! And don’t forget re:Play!

Back to AppRunner Updates

In my last blog I mentioned a technical issue I had with AppRunner. Well, it turns out I’m not crazy. Their documentation was wrong (and lacking) and here’s the explanation I got from AWS support.

Hello Rob,

Thank you for your continued patience while working on this case with
me. I am reaching out to you with an update on the issue of
associating custom domain with the AppRunner service using AWS CLI. To
recap, I understand that you wanted to use AWS CLI to link custom
domain with AppRunner service, so that you could use www subdomain
with the custom domain. For that, as mentioned in the AppRunner
documentation at [1] we tried using the associate-custom-domain AWS
CLI command [2] and we noticed that the command was returning only the
status of the link and the CertificateValidationRecord objects were
not returned as a part of the output.

For this, I reached out the internal team since as per the
documentation, the CertificateValidationRecord objects should have
been returned. Upon working with the internal team, we realized that
we need to run describe-custom-domains AWS CLI command [3] together
with the associate-custom-domain AWS CLI command to get the
CertificateValidationRecord objects and then we need to add these
records manually to the custom domain in Route53 as a CNAME record
with the record name and value obtained from the
describe-custom-domains AWS CLI command. We have to perform the manual
actions even if the Route53 and AppRunner is in the same account when
working with AWS CLI. I am also providing the step by step details
below:

<ol>
<li>Run the associate-custom-domain AWS CLI command:</li>
</ol>
"aws apprunner associate-custom-domain --service-arn <AppRunner-Service-ARN> --domain-name <Custom-Domain> --enable-www-subdomain"

This will return the output as follows:

<h1 id="output:">Output:</h1>

{
"DNSTarget": "xxxxxxxxxx.us-east-1.awsapprunner.com",
"ServiceArn": "AppRunner-Service-ARN",
"CustomDomain": {
"DomainName": "Custom-Domain",
"EnableWWWSubdomain": true,
"Status": "creating"
}
<h1>}</h1>

<ol>
<li>Now, run the describe-custom-domains AWS CLI command few seconds after running the associate-custom-domain AWS CLI command:</li>
</ol>
"aws apprunner describe-custom-domains --service-arn <AppRunner-Service-ARN>"

This will return an output with CertificateValidationRecords objects as follows:

<h1 id="output:">Output:</h1>

{
"DNSTarget": "xxxxxxxxxx.us-east-1.awsapprunner.com",
"ServiceArn": "AppRunner-Service-ARN",
"CustomDomains": [
{
"DomainName": "Custom-Domain",
"EnableWWWSubdomain": true,
"CertificateValidationRecords": [
{
"Name": "_5bf3e29fca6c29d29fc2b6e023bcaee3.apprunner-sample.com.",
"Type": "CNAME",
"Value": "_3563838161b023d78b951b036072e510.mhbtsbpdnt.acm-validations.aws.",
"Status": "PENDING_VALIDATION"
},
{
"Name": "_7f20ef08b12fbdddb670d0c7fb3c8076.www.apprunner-sample.com.",
"Type": "CNAME",
"Value": "_e1b6f670fac2f42ce30d160c2e3d92ea.mhbtsbpdnt.acm-validations.aws.",
"Status": "PENDING_VALIDATION"
},
{
"Name": "_14fc6b4f0d6b6a5524e7c3147eaec89d.2a57j78h5fsbzb7ey72hbx9c01pbxcf.apprunner-sample.com.",
"Type": "CNAME",
"Value": "_baecf356e1894de83dfca1b51cd8999f.mhbtsbpdnt.acm-validations.aws.",
"Status": "PENDING_VALIDATION"
}
],
"Status": "pending_certificate_dns_validation"
}
]
<h1>}</h1>

<ol>
<li>In Route53, you need to manually add the records with the record
type as CNAME with corresponding name and values.</li>
</ol>
*I realize that the additional step of using describe-custom-domains
AWS CLI command is not mentioned in the documentation and for that I
have updated the internal team and the documentation team to get this
information added to the documentation.* Also, I tested the above steps
and I can confirm that the above workflow is validated and the domain
was associated successfully using AWS CLI.

Now, coming to the actual query of associating custom domain in a
different account from the AppRunner service, the internal team has
confirmed that currently the console custom domain functionality only
works if the Route 53 domain and App Runner service are in the same
account. The same is mentioned in Step 5 of the AppRunner
documentation at [1].

However, in case of AWS CLI, you need to perform the same steps as
above and you need to manually add the CertificateValidationRecords to
the account owning the Route 53 domain(s). You can view the
certificate validation record via the CLI using the
describe-custom-domain command as mentioned above.

So, I’m happy to report that the issue was resolved which gives me more confidence that AppRunner has a future.

Next Steps

For my application, since AppRunner still does not support EFS or mounting external file systems, I will need to identify how I am using my EFS session directory and remove that dependency.

Looking at my application, I can see a path using S3. Using S3 as a session store will not be particulary difficult. S3 will not have the performance characteristics of EFS but I’m not sure that matters. Deleting session objects becomes a bit more complex since we can’t just delete a “directory”.

APIs?

Another intriguing use for AppRunner is to use it to implement services, either RESTful APIs or back-end services seldom invoked services.

APIs are definitely one of the target uses for this service as discussed in the re:Invent video. Triggering a task is also a use case I want to explore. Currently, I use a CloudWatch event to trigger a Lambda that invokes a Fargate task for doing things like a nighlty backup. That dance seems like it can be replaced (somehow) by using AppRunner…hmmm..need to noodle this so more…

Conclusion

So far, I luv me some AppRunner.

AppRunner - The Good, Bad and the Ugly

Background

In May of 2021, AWS released AppRunner to the public.

AWS App Runner is an AWS service that provides a fast, simple, and cost-effective way to deploy from source code or a container image directly to a scalable and secure web application in the AWS Cloud. You don’t need to learn new technologies, decide which compute service to use, or know how to provision and configure AWS resources.

App Runner connects directly to your code or image repository. It provides an automatic integration and delivery pipeline with fully managed operations, high performance, scalability, and security.

AppRunner
Architecture

What makes AppRunner so compelling are these important features:

  • Creates the entire infrastructure required for hosting a scalable web application
  • Connects directly to your code or image repository AND can automatically redeploy the application when assets are updated
  • Can scale up or down based on traffic
  • Very cost effective - you only pay for resources consumed when your application is in use
  • Associate your custom domain with an AppRunner endpoint.
  • It’s much less expensive than provisioning all of the resources necessary to host your application yourself!

My Use Case

Back in 2012, I started a SaaS application (Treasurer’s Briefcase) for providing recordkeeping services for small non-profit organizations like PTOs, PTAs and Booster clubs. Back then, I cobbled together the infrastructure using the console, then started to explore CloudFormation and eventually re-architected everything using Terraform.

The application is essentially based on a LAMP stack - well sort of since I use a different templating web framework rather than PHP. The stack consists of an EC2 that hosts the Apache server, an EC2 that hosts some backend services, an ALB, a MySQL RDS instance and a VPC. There are a few other AWS services used like S3, SQS and EFS, but essentially the stack is relatively simple. Even so, provisioning all of that infrastructure using Terraform alone and creating a development, test, and production environments was a bit daunting but a great learning experience.

Starting with the original infrastructure, I reverse engineered it using terraforming and then expanded it using terraform.

The point being, it wasn’t necessarily easy to get it all right the first time. Keeping up with Terraform was also a challenge as it evolved over the years too. Moreover, maintaining infrastructure was just another task that provided no incremental value to the application. Time spent on that task took away from creating new features and enhancements that could provide more value to customers.

Enter AppRunner…with the promise of taking all of that work and chucking it out the window. Imagine creating a Docker container with your application and handing it to AWS and saying “host this for me, make it scalable, create and maintain an SSL certificate for me, create a CI/CD pipeline to redeploy the application when I make changes and make it cheap.” I’m in.

Not So Fast Skippy

AppRunner has evolved over the years and has become much more mature. However, it still has some warts and pimples that might make you think twice about using it. Back in 2021 it was an interesting new service, an obvious evolutionary step from Fargate Tasks which provide some of the same features as AppRunner. Applications that utilized Fargate Tasks as the basis for running their containerized web applications still had to provision a VPC, load balancers, and manage scaling on their own. AppRunner bundles all of those capabilities and creates a compelling argument for moving Fargate based apps to AppRunner.

Prior to October 2022 AppRunner did not support the ability to access resources from within a VPC. That made it impossible for example, to use a non-publicly accessible RDS instance. With that addition in October of 2022, it was now possible to have a web application that could access your RDS in your VPC.

The fall of 2023 has seen several changes that make AppRunner even more compelling:

Change Description Date
Release: App Runner adds supports for AWS Europe (Paris), AWS Europe (London), and AWS Asia Pacific (Mumbai) Regions AWS App Runner now supports AWS Europe (Paris), AWS Europe (London), and AWS Asia Pacific (Mumbai) Regions. November 8, 2023
Release: App Runner adds dual stack support for incoming network traffic AWS App Runner now adds dual stack support for incoming traffic through public endpoints. November 2, 2023
Release: App Runner automates Route 53 domain configuration for your services AWS App Runner automates Route 53 domain configuration for your App Runner service web applications. October 4, 2023
Release: App Runner adds support for monorepo source-code based services AWS App Runner now supports the deployment and maintenance for monorepo source-code based services. September 26, 2023
Release: App Runner adds more features to auto scaling configuration management AWS App Runner enhances auto scaling configuration management features. September 22, 2023

Some of the limitations of AppRunner currently include:

  • Inability to mount file systems (like EFS)
  • While you can associate a custom domain, it cannot provision www from the console (in fact I have been unable to get it to work properly from the CLI either although the documentation indicates it should work)
  • Still unavailable in some regions
  • Cannot use security groups to limit access

The first limitation is a bit of show-stopper for more than a few web applications that might rely on mounted file systems to access assets or provide a stateful storage environment. For my application I use EFS to create session directories for logged in users. Using EFS I can be assured that each EC2 in my web farm accesses the user’s session regardless of which EC2 serves the request. Without EFS, I will be forced to re-think how to create a stateful storage environment for my web app. I could use S3 as storage (and probably should) but EFS provided a “quick-shot-on-goal” at the time.

The second limitation was just frustrating as associating a custom domain sort of kinda works. If I associate a domain managed by AWS (in the same account as my AppRunner application) then I was able to get the TLD to resolve and work as expected. AppRunner was able to associate my appliation to the domain AND provide an SSL certificate. It will redirect any http request to https. Unfortunately, I could not associate www sub-domain using the CLI as documented. In fact I could not even get the CLI to work without trying to enable the www sub-domain. Working with AWS support confirmed my experience and I still have a ticket pending with support on this issue. I’m confident that will be resolved soon(?) so it should not limit my ability to use this service in the future.

Conclusion

AppRunner is an exciting new service that will make application development and deployment seamless allowing developers to focus on the application not the infrastructure.

You can find the AppRunner roadmap and current issues here.

Credit: I want to believe...

I’m So Done With Agile

Every development project ultimately has a goal of providing some kind of value to the organization that has decided to initiate a software development project.

The bottom line of any software development project is the bottom line. Does the cost of the project AND the maintenance of the project create a profit?

I know what you are thinking. Not all software applications are designed to produce profit. Untrue. Even applications we call “internal” create value or contribute to the creation of value.

What is Failure?

Let’s talk about and characterize failure first. Because its much easier to define (as anyone who has had the misfortune of working with a product development team that cannot define “done” knows). And I’ve been told that that most software development projects fail.

  1. The project is canceled.

    This is the “first order broke” condition of projects. It took too long, it went over budget and looked to continue to be a money pit (someone understood the fallacy of sunk costs), the environment changed making the application moot or a new CEO decided to replace all internal applications with some SaaS, PaaS, or his own pet project.

  2. The application was launched and did not meet the goals of the project.

    This can mean a lot of things: the project does not solve enough of the business problems to justify the continued cost of maintenance. Or perhaps the application did not generate enough revenue to justify its existence because of poor market acceptance. People just hate using it.

  3. The project is in use, people use it, but the ROI is too far in the future or perhaps indeterminate.

    The project becomes a drag on the organization. No one wants to pull the plug because they have no alternative (or believe they don’t). There’s no appetite to rewrite, refactor or reimagine the application. It becomes a huge boat anchor that a handful of engineers keep running by kicking it in the ass whenever it stalls.

What is Success?

  1. The project launches on time and under budget.

    Keep in mind that this is (mostly) a necesasry, but insufficient condition for success. Yes, there are some successful projects that are over budget or late, but its sort of like starting Monopoly owing everyone money. You need to catch up and catch up fast.

  2. The application completely solves the business problem.

    Again, a necessary but insufficient condition for success. If the application is difficult to maintain and requires constant attention that costs more than it saves or produces, it’s not a success.

  3. The application just works

    …and is a critical component in a complex workflow - without it nothing else would - its cost to develop and maintain is easily justified by the the nature of its job. It successfully completes its mission every single day.

This Was About Agile Right?

Oh yeah, Agile. I read articles about Agile and people’s experience with it all the time. I suspect most opinions are based on few data points and mostly from one person’s negative (or rarely positive) experience with Agile. My opinions (and that’s all they are…YMMV) are based on working with some fairly large clients that I am not at liberty to divulge. One FANG, one Fortune 50 company, one major manufacturer of phones and multiple companies with more than 5000 employees. I’m not opining based on one ride on the merry-go-round. I’m the kind of person that always believes that I just don’t get it, and I need to learn more, read more and accept more to overcome my ignorance and lack of experience. It’s a viewpoint that has allowed me to grow in my career and learn a lot of very useful things that have conspired to make me, if not wealthy, not concerned about money.

I am now having a lot fun going back to my roots of being a software developer. While I have been on the management side of projects employing the Agile process I am now in the belly of the beast. It smells bad, feels wrong and kills productivity. But, again, YMMV.

Why Does Agile Suck?

  1. Product Owners - All “product owners” are not created equal. They have varying degrees of understanding of their own domain. Some even believe developers have ESP. To be fair, some expect developers (and rightly so) to “ask questions”. The problem is, what happens when the developer does not understand the domain. What questions should they ask? They are clueless.

    Product owners should assume nothing (in my opinion) and determine the level of domain expertise developers have. It is their responsibility to make that assessment - if they don’t they must be explicit with requirements, otherwise you’ll almost certainly end up with a project or feature that does not meet your needs.

  2. Scrum Masters - are generally useless. The Vanna White’s of software development. I’d almos say Vanna had a harder job. Turning letters that have been lit up for you puts a lot of pressure on you to turn the “right” letter.
  3. Ceremonies - Most developers hate meetings. There is of course the rare “meeting moth” that has decided it is easier to pontificate, hash things out, and “parking lot” ideas than actually code. But let’s talk about the programmers we all know and love. Leave them alone with a problem and some good requirements, decent tools and they’ll produce something for you. Developers generally loathe people. To hold meetings apparently you need to round some people up and torture them for 30 to 45 minutes every day. Oh, so let’s make “stand-ups”! We’ll cap them at 15 minutes and 20 people will get a fraction of a minute to recap what they did yesterday. Only there’s that one guy. Yeah, you know who you are. You think we care about your busy day and the fact that you worked through the night solving problems you created? Nope. Please, please, continue babbling so the time runs out on this farce!
  4. Pointing - Sadly, there’s a poor Italian mathematician turning over in his grave as we speak. His claim to fame usurped by idiots who use numbers without understanding what they represent or the context for their use. That’s right - points mean diddlysquat. I’ve seen them used for billing by the scammy outsourcing companies that point their own stories and then bill accordingly. Or there is the pointing that uses them to determine how much “work” someone can do in 2 weeks. Ha! Gotcha! I’m pointing everying as 13! Is that Fibonacci?
  5. Retrospectives - Fuck that shit. Let’s just do the Festivis pole, air our list of grievances and then proceed to the feats of strength. It would be more useful and of course highly entertaining. Hoochie Mama!
  6. Sprints - We don’t know where we’re going but let’s sprint there! Isn’t that the irony of Agile? We don’t know wtf the requirements are until we start coding (according to some people’s definition of Agile) and yet we want to get it done quickly. It’s like watching my 3 year old son on his Big Wheel motoring down the street careening off of curbs and smashing this absolute wonder of plastic technology into my neighbors bushes. No one is seriously hurt, it’s a bit amusing but we really haven’t gotten anywhere.

So, here’s the bottom line. Any idea worth something greater than 0 that also has a wee bit of marketing behind it quickly becomes an opportunity for gypsies, tramps and thieves to exploit the ignorant masses. Take Christianity for example. Need I say more? Agile has become the Chrisitianity of corporate America. No one dare mention that is doesn’t solve our problems or make us feel any better. Fuck Agile, the ceremonies, the training, the roles the practice…it is the most unproductive enviroment one can devise for developing software. Look it up…Bill Gates wrote an entire BASIC interpreter and shoved it into 4K of a ROM. He then worked on a 32K version that was essentially a complete OS. He didn’t need Agile to do that.

So, let’s be clear. Agile is social engineering. An attempt to organize human beings in order to create something that no one of them could do alone (or so it goes). Somehow I don’t think Agile works. Some will say, yeah, well not every project should use Agile. Yes, that’s true, but the sad fact is that corporate America is not nuanced. They are binary. They want single solutions to complex problems and do not want to hear…it depends. And so they consume the entire bottle of aspirin.

There will be a day when people look back at the unproductive, waste and utter insansity that is “Agile”. They will marvel at the way that a single, possibly good idea for some things, was transformed into a dogma that haunted software development for a decade.

I’m hopeful however that really smart companies know that instituting things like Agile are the bellweather of their demise. They will avoid trying to fit round pegs into square holes. They will embrace the idea that you can plan things properly, but plans can change without embracing a chaotic, highly disorganized process that actually masquerades as a structured protocol.

You have been warned. When some consultant you hire to justify the outsourcing of your development team says that they can replace your current processes with an Agile team from Elbonia and a scrum master from Bumblefuck…be afraid…be very afraid. There is no free lunch.

One final thought…why is software development so hard? And why do we struggle so to create applications?

It’s not a hard question actually. The goal of software development is to codify a solution to a problem. But first…and here is the reveal…you have to define the problem. That is, in and of itself the most difficult thing in the development process. Missed requirements are, in my experience, the biggest reason for “re-work”. Note I did not say “bugs” or “defects”. Most maintenance on systems is because of missed requirements, not because programmers make mistakes. Oh, for sure, they do. But really? Think. Look back at your tickets and do a root cause analysis.

There are other reasons software development is hard. First, people do not communicate well. The do not communicate precisely and they do not communicate accurately. Next, the tools to express the solutions to our problems are complex and incomplete. Better ingredients make better pizzas. Papa Johns!

Okay, I have to wrap this up…Agile sucks. I hate Agile. I want to mute myself when I’m in stand-ups just to say every day “Oh, I was on mute.” and torture everyone that thinks this ceremony is useful.

Oh,I’m having issues with my internet so I may have to drop soon….open the pod bay doors Hal?