Hosting a Secure Static Website with S3 and CloudFront: Part IIb

Introduction

In Part IIa, we detailed the challenges we faced when automating the deployment of a secure static website using S3, CloudFront, and WAF. Service interdependencies, eventual consistency, error handling, and AWS API complexity all presented hurdles. This post details the actual implementation journey.

We didn’t start with a fully fleshed-out solution that just worked. We had to “lather, rinse and repeat”. In the end, we built a resilient automation script robust enough to deploy secure, private websites across any organization.

The first take away - the importance of logging and visibility. While logging wasn’t the first thing we actually tackled, it was what eventually turned a mediocre automation script into something worth publishing.


1. Laying the Foundation: Output, Errors, and Visibility

1.1. run_command()

While automating the process of creating this infrastructure, we need to feed the output of one or more commands into the pipeline. The output of one command feeds another. But each step of course can fail. We need to both capture the output for input to later steps and capture errors to help debug the process. Automation without visibility is like trying to discern the elephant by looking at the shadows on the cave wall. Without a robust solution for capturing output and errors we experienced:

  • Silent failures
  • Duplicated output
  • Uncertainty about what actually executed

When AWS CLI calls failed, we found ourselves staring at the terminal trying to reconstruct what went wrong. Debugging was guesswork.

The solution was our first major building block: run_command().

    echo "Running: $*" >&2
    echo "Running: $*" >>"$LOG_FILE"

    # Create a temp file to capture stdout
    local stdout_tmp
    stdout_tmp=$(mktemp)

    # Detect if we're capturing output (not running directly in a terminal)
    if [[ -t 1 ]]; then
        # Not capturing → Show stdout live
        "$@" > >(tee "$stdout_tmp" | tee -a "$LOG_FILE") 2> >(tee -a "$LOG_FILE" >&2)
    else
        # Capturing → Don't show stdout live; just log it and capture it
        "$@" >"$stdout_tmp" 2> >(tee -a "$LOG_FILE" >&2)
    fi

    local exit_code=${PIPESTATUS[0]}

    # Append stdout to log file
    cat "$stdout_tmp" >>"$LOG_FILE"

    # Capture stdout content into a variable
    local output
    output=$(<"$stdout_tmp")
    rm -f "$stdout_tmp"

    if [ $exit_code -ne 0 ]; then
        echo "ERROR: Command failed: $*" >&2
        echo "ERROR: Command failed: $*" >>"$LOG_FILE"
        echo "Check logs for details: $LOG_FILE" >&2
        echo "Check logs for details: $LOG_FILE" >>"$LOG_FILE"
        echo "TIP: Since this script is idempotent, you can re-run it safely to retry." >&2
        echo "TIP: Since this script is idempotent, you can re-run it safely to retry." >>"$LOG_FILE"
        exit 1
    fi

    # Output stdout to the caller without adding a newline
    if [[ ! -t 1 ]]; then
        printf "%s" "$output"
    fi
}

This not-so-simple wrapper gave us:

  • Captured stdout and stderr for every command
  • Real-time terminal output and persistent logs
  • Clear failures when things broke

run_command() became the workhorse for capturing our needed inputs to other processes and our eyes into failures.

1.2. Lessons from the Evolution

We didn’t arrive at run_command() fully formed. We learned it the hard way:

  • Our first iterations printed output twice
  • Capturing both streams without swallowing stdout took fine-tuning
  • We discovered that without proper telemetry, we were flying blind

2. Automating the Key AWS Resources

2.1. S3 Bucket Creation

The point of this whole exercise is to host content, and for that, we need an S3 bucket. This seemed like a simple first task - until we realized it wasn’t. This is where we first collided with a concept that would shape the entire script: idempotency.

S3 bucket names are globally unique. If you try to create one that exists, you fail. Worse, AWS error messages can be cryptic:

  • “BucketAlreadyExists”
  • “BucketAlreadyOwnedByYou”

Our naive first attempt just created the bucket. Our second attempt checked for it first:

create_s3_bucket() {
    if run_command $AWS s3api head-bucket --bucket "$BUCKET_NAME" --profile $AWS_PROFILE 2>/dev/null; then
        echo "Bucket $BUCKET_NAME already exists."
        return
    fi

    run_command $AWS s3api create-bucket \
        --bucket "$BUCKET_NAME" \
        --create-bucket-configuration LocationConstraint=$AWS_REGION \
        --profile $AWS_PROFILE
}

Making the script “re-runable” was essential unless of course we could guarantee we did everything right and things worked the first time. When has that every happened? Of course, we then wrapped the creation of the bucket run_command() because every AWS call still had the potential to fail spectacularly.

And so, we learned: If you can’t guarantee perfection, you need idempotency.

2.2. CloudFront Distribution with Origin Access Control

Configuring a CloudFront distribution using the AWS Console offers a streamlined setup with sensible defaults. But we needed precise control over CloudFront behaviors, cache policies, and security settings - details the console abstracts away. Automation via the AWS CLI gave us that control - but there’s no free lunch. Prepare yourself to handcraft deeply nested JSON payloads, get jiggy with jq, and manage the dependencies between S3, CloudFront, ACM, and WAF. This is the path we would need to take to build a resilient, idempotent deployment script - and crucially, to securely serve private S3 content using Origin Access Control (OAC).

Why do we need OAC?

Since our S3 bucket is private, we need CloudFront to securely retrieve content on behalf of users without exposing the bucket to the world.

Why not OAI?

AWS has deprecated Origin Access Identity in favor of Origin Access Control (OAC), offering tighter security and more flexible permissions.

Why do we need jq?

In later steps we create a WAF Web ACL to firewall our CloudFront distribution. In order to associate the WAF Web ACL with our distribution we need to invoke the update-distribution API which requires a fully fleshed out JSON payload updated with the Web ACL id.

GOTHCHA: Attaching a WAF WebACL to an existing CloudFront distribution requires that you use the update-distribution API, not associate-web-acl as one might expect.

Here’s the template for our distribution configuration (some of the Bash variables used will be evident when you examine the completed script):

{
  "CallerReference": "$CALLER_REFERENCE",
   $ALIASES
  "Origins": {
    "Quantity": 1,
    "Items": [
      {
        "Id": "S3-$BUCKET_NAME",
        "DomainName": "$BUCKET_NAME.s3.amazonaws.com",
        "OriginAccessControlId": "$OAC_ID",
        "S3OriginConfig": {
          "OriginAccessIdentity": ""
        }
      }
    ]
  },
  "DefaultRootObject": "$ROOT_OBJECT",
  "DefaultCacheBehavior": {
    "TargetOriginId": "S3-$BUCKET_NAME",
    "ViewerProtocolPolicy": "redirect-to-https",
    "AllowedMethods": {
      "Quantity": 2,
      "Items": ["GET", "HEAD"]
    },
    "ForwardedValues": {
      "QueryString": false,
      "Cookies": {
        "Forward": "none"
      }
    },
    "MinTTL": 0,
    "DefaultTTL": $DEFAULT_TTL,
    "MaxTTL": $MAX_TTL
  },
  "PriceClass": "PriceClass_100",
  "Comment": "CloudFront Distribution for $ALT_DOMAIN",
  "Enabled": true,
  "HttpVersion": "http2",
  "IsIPV6Enabled": true,
  "Logging": {
    "Enabled": false,
    "IncludeCookies": false,
    "Bucket": "",
    "Prefix": ""
  },
  $VIEWER_CERTIFICATE
}

The create_cloudfront_distribution() function is then used to create the distribution.

create_cloudfront_distribution() {
    # Snippet for brevity; see full script
    run_command $AWS cloudfront create-distribution --distribution-config file://$CONFIG_JSON
}

Key lessons:

  • use update-configuation, not associate-web-acl for CloudFront distributions
  • leverage jq to modify the existing configuration to add the WAF Web ACL id
  • manually configuring CloudFront provides more granularity than the console, but requires some attention to the details

2.3. WAF IPSet + NAT Gateway Lookup

Cool. We have a CloudFront distribution! But it’s wide open to the world. We needed to restrict access to our internal VPC traffic - without exposing the site publicly. AWS WAF provides this firewall capability using Web ACLs. Here’s what we need to do:

  1. Look up our VPC’s NAT Gateway IP (the IP CloudFront would see from our internal traffic).
  2. Create a WAF IPSet containing that IP (our allow list).
  3. Build a Web ACL rule using the IPSet.
  4. Attach the Web ACL to the CloudFront distribution.

Keep in mind that CloudFront is designed to serve content to the public internet. When clients in our VPC access the distribution, their traffic needs to exit through a NAT gateway with a public IP. We’ll use the AWS CLI to query the NAT gateway’s public IP and use that when we create our allow list of IPs (step 1).

find_nat_ip() {
    run_command $AWS ec2 describe-nat-gateways --filter "Name=tag:Environment,Values=$TAG_VALUE" --query "NatGateways[0].NatGatewayAddresses[0].PublicIp" --output text --profile $AWS_PROFILE
}

We take this IP and build our first WAF component: an IPSet. This becomes the foundation for the Web ACL we’ll attach to CloudFront.

The firewall we create will be composed of an allow list of IP addresses (step 2)…

create_ipset() {
    run_command $AWS wafv2 create-ip-set \
        --name "$IPSET_NAME" \
        --scope CLOUDFRONT \
        --region us-east-1 \
        --addresses "$NAT_IP/32" \
        --ip-address-version IPV4 \
        --description "Allow NAT Gateway IP"
}

…that form the rules for our WAF Web ACL (step 3).

create_web_acl() {
    run_command $AWS wafv2 create-web-acl \
        --name "$WEB_ACL_NAME" \
        --scope CLOUDFRONT \
        --region us-east-1 \
        --default-action Block={} \
        --rules '[{"Name":"AllowNAT","Priority":0,"Action":{"Allow":{}},"Statement":{"IPSetReferenceStatement":{"ARN":"'$IPSET_ARN'"}},"VisibilityConfig":{"SampledRequestsEnabled":true,"CloudWatchMetricsEnabled":true,"MetricName":"AllowNAT"}}]' \
        --visibility-config SampledRequestsEnabled=true,CloudWatchMetricsEnabled=true,MetricName="$WEB_ACL_NAME"
}

This is where our earlier jq surgery becomes critical - attaching the Web ACL requires updating the entire CloudFront distribution configuration. And that’s how we finally attach that Web ACL to our CloudFront distribution (step 4).

DISTRIBUTION_CONFIG=$(run_command $AWS cloudfront get-distribution-config --id $DISTRIBUTION_ID)
<h1 id="usejqtoinjectwebaclidintoconfigjson">Use jq to inject WebACLId into config JSON</h1>

UPDATED_CONFIG=$(echo "$DISTRIBUTION_CONFIG" | jq --arg ACL_ARN "$WEB_ACL_ARN" '.DistributionConfig | .WebACLId=$ACL_ARN')
<h1 id="passupdatedconfigbackintoupdate-distribution">Pass updated config back into update-distribution</h1>

echo "$UPDATED_CONFIG" > updated-config.json
run_command $AWS cloudfront update-distribution --id $DISTRIBUTION_ID --if-match "$ETAG" --distribution-config file://updated-config.json

At this point, our CloudFront distribution is no longer wide open. It is protected by our WAF Web ACL, restricting access to only traffic coming from our internal VPC NAT gateway.

For many internal-only sites, this simple NAT IP allow list is enough. WAF can handle more complex needs like geo-blocking, rate limiting, or request inspection - but those weren’t necessary for us. Good design isn’t about adding everything; it’s about removing everything that isn’t needed. A simple allow list was also the most secure.

2.4. S3 Bucket Policy Update

When we set up our bucket, we blocked public access - an S3-wide security setting that prevents any public access to the bucket’s contents. However, this also prevents CloudFront (even with OAC) from accessing S3 objects unless we explicitly allow it. Without this policy update, requests from CloudFront would fail with Access Denied errors.

At this point, we need to allow CloudFront to access our S3 bucket. The update_bucket_policy() function will apply the policy shown below.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "cloudfront.amazonaws.com"
      },
      "Action": "s3:GetObject",
      "Resource": "arn:aws:s3:::$BUCKET_NAME/*",
      "Condition": {
        "StringEquals": {
          "AWS:SourceArn": "arn:aws:cloudfront::$AWS_ACCOUNT:distribution/$DISTRIBUTION_ID"
        }
      }
    }
  ]
}

Modern OAC best practice is to use the AWS:SourceArn condition to ensure only requests from your specific CloudFront distribution are allowed.

It’s more secure because it ties bucket access directly to a single distribution ARN, preventing other CloudFront distributions (or bad actors) from accessing your bucket.

"Condition": {
    "StringEquals": { "AWS:SourceArn": "arn:aws:cloudfront::$AWS_ACCOUNT:distribution/$DISTRIBUTION_ID" }
}

With this policy in place, we’ve completed the final link in the security chain. Our S3 bucket remains private but can now securely serve content through CloudFront - protected by OAC and WAF.


3. Putting It All Together

We are now ready to wrap a bow around these steps in an idempotent Bash script.

  1. Create an S3 Bucket (or verify it Exists)
    • This is where we first embraced idempotency. If the bucket is already there, we move on.
  2. Create a CloudFront Distribution with OAC
    • The foundation for serving content securely, requiring deep JSON config work and the eventual jq patch. Restrict Access with WAF
  3. Discover the NAT’s Gateway IP - The public IP representing our VPC
    • Create a WAF IPSet (Allow List) – Build the allow list with our NAT IP.
    • Create a WAF Web ACL – Bundle the allow list into a rule.
    • Attach the Web ACL to CloudFront – Using jq and update-distribution.
  4. Grant CloudFront Access to S3
    • Update the bucket policy to allow OAC originating requests from our distribution.

Each segment of our script is safe to rerun. Each is wrapped in run_command(), capturing results for later steps and ensuring errors are logged. We now have a script we can commit and re-use with confidence whenever we need a secure static site. Together, these steps form a robust, idempotent deployment pipeline for a secure S3 + CloudFront website - every time.

You can find the full script here.


4. Running the Script

A hallmark of a production-ready script is an ‘-h’ option. Oh wait - your script has no help or usage? I’m supposed to RTFC? It ain’t done skippy until it’s done.

Scripts should include the ability to pass options that make it a flexible utility. We may have started out writing a “one-off” but recognizing opportunities to generalize the solution turned this into another reliable tool in our toolbox.

Be careful though - not every one-off needs to be Swiss Army knife. Just because aspirin is good for a headache doesn’t mean you should take the whole bottle.

Our script now supports the necessary options to create a secure, static website with a custom domain and certificate. We even added the ability to include additional IP addresses for your allow list in addition to the VPC’s public IP.

Now, deploying a private S3-backed CloudFront site is as easy as:

Example:

./s3-static-site.sh -b my-site -t dev -d example.com -c arn:aws:acm:us-east-1:cert-id

Inputs:

  • -b - the bucket name
  • -t - the tag I used to identify my VPC NAT gateway
  • -c - the certificate ARN I created for my domain
  • -d - the domain name for my distribution

This single command now deploys an entire private website - reliably and repeatably. It only takes a little longer to do it right!


5. Key Takeaways from this Exercise

The process of working with ChatGPT to construct a production ready script that creates static websites took many hours. In the end, several lessons were reinforced and some gotchas discovered. Writing this blog itself was a collaborative effort that dissected both the technology and the process used to implement it. Overall, it was a productive, fun and rewarding experience. For those not familiar with ChatGPT or who are afraid to give it a try, I encourage you to explore this amazing tool.

Here are some of the things I took away from this adventure with ChatGPT.

  • ChatGPT is a great accelerator for this type of work - but not perfect. Ask questions. Do not copy & paste without understanding what it is you are copying and pasting!
  • If you have some background and general knowledge of a subject ChatGPT can help you become even more knowledgeable as long as you ask lots of follow-up questions and pay close attention to the answers.

With regard to the technology, some lessons were reinforced, some new knowledge was gained:

  • Logging (as always) is an important feature when multiple steps can fail
  • Idempotency guards make sure you can iterate when things went wrong
  • Discovering the NAT IP and subsequently adding a WAF firewall rule was needed because of the way CloudFront works
  • Use the update-distribution API call not associate-web-acl when adding WAF ACLs to your distribution!

Thanks to ChatGPT for being an ever-present back seat driver on this journey. Real AWS battle scars + AI assistance = better results.

Wrap Up

In Part III we wrap it all up as we learn more about how CloudFront and WAF actually protect your website.

Disclaimer

This post was drafted with the assistance of ChatGPT, but born from real AWS battle scars.

If you like this content, please leave a comment or consider following me. Thanks.


Next post: How to Unlock Your S3 Bucket After a Policy Fail

Previous post: Hosting a Secure Static Website with S3 and CloudFront: Part IIa