From perldoc perlmod
…
An "END" code block is executed as late as possible, that is, after perl
has finished running the program and just before the interpreter is
being exited, even if it is exiting as a result of a die() function.
(But not if it's morphing into another program via "exec", or being
blown out of the water by a signal--you have to trap that yourself (if
you can).) You may have multiple "END" blocks within a file--they will
execute in reverse order of definition; that is: last in, first out
(LIFO). "END" blocks are not executed when you run perl with the "-c"
switch, or if compilation fails....
Perl’s END
blocks are useful inside your script for doing things
like cleaning up after itself, closing files or disconnecting from
databases. In many cases you use an END
block to guarantee certain
behaviors like a commit or rollback of a transaction. You’ll
typically see END
blocks in scripts, but occassionally you might find
one in a Perl module.
Over the last four years I’ve done a lot of maintenance work on legacy Perl applications. I’ve learned more about Perl in these four years than I learned in the previous 20. Digging into bad code is the best way to learn how to write good code. It’s sometimes hard to decide if code is good or bad but to paraphrase a supreme court justice, I can’t always define bad code, but I know it when I see it.
One of the gems I’ve stumbled upon was a module that provided needed
functionality for many scripts that included its own END
block.
Putting an END
block in a Perl module is an anti-pattern and just bad
mojo. A module should never contain an END
block. Here are
some alternatives:
cleanup()
methodDESTROY
) in an object-oriented moduleModules should provide functionality - not take control of your script’s shutdown behavior.
The first and most obvious answer is that they were unaware of how
DESTROY
blocks can be employed. If you know something about the
author and you’re convinced they know better, then why else?
I theorize that the author was trying to create a module that would encapsulate functionality that he would use in EVERY script he wrote for the application. While the faux pas might be forgiven I’m not ready to put my wagging finger back in its holster.
If you want to write a wrapper for all your scripts and you’ve already settled on using a Perl module to do so, then please for all that is good and holy do it right. Here are some potential guidelines for that wrapper:
new()
method that instantiates your wrapperinit()
method that encapsulates the common
startup operations with options to control whether some are executed
or notrun()
method that executes the functionality for the scriptfinalize()
method for executing cleanup proceduresAll of these methods could be overridden if you just use a plain ‘ol Perl
module. For those that prefer composition over inheritance, use a
role with something like Role::Tiny
to provide those universally
required methods. Using Role::Tiny
provides better flexibility
by allowing you to use those methods before or after your
modifications to their behavior.
The particular script I was working on included a module(whose name I
shall not speak) that included such an END
block. My script
should have exited cleanly and quietly. Instead, it produced
mysterious messages during shutdown. Worse yet I feared some
undocumented behaviors and black magic might have been conjured up
during that process! After a bit of debugging, I found the culprit:
END
block baked into itEND
block printed debug messages to STDERR while doing other
cleanup operationsMy initial attempt to suppress the module’s END
block:
END {
use POSIX;
POSIX::_exit(0);
}
This works as long as my script exits normally. But if my
script dies due to an error, the rogue END
block still
executes. Not exactly the behavior I want.
Here’s what I want to happen:
END
block from executing.Ensure my script always exits with a meaningful status code.
A better method for scripts that need an END
block to claw back
control:
use English qw(-no_match_vars);
use POSIX ();
my $retval = eval {
return main();
};
END {
if ($EVAL_ERROR) {
warn "$EVAL_ERROR"; # Preserve exact error message
}
POSIX::_exit( $retval // 1 );
}
END
blocks – Since POSIX::_exit()
terminates
the process immediatelymain()
throws an exception, we
log it without modifying the messagemain()
forgets to return a
status, we default to 1
, ensuring no silent failures.Of course, you should know what behavior you are bypassing if you
decide to wrestle control back from some misbehaving module. In my
case, I knew that the behaviors being executed in the END
block
could safely be ignored. Even if they couldn’t be ignored, I can still
provide those behaviors in my own cleanup procedures.
Isn’t this what future me or the poor wretch tasked with a dumpster dive into a legacy application would want? Explicitly seeing the whole shebang without hours of scratching your head looking for mysterious messages that emanate from the depths is priceless. It’s gold, Jerry! Gold!
Next up…a better wrapper.
In our previous posts, we explored how Apache 2.4 changed its
handling of directory requests when DirectorySlash Off
is set,
breaking the implicit /dir → /dir/
redirect behavior that worked in
Apache 2.2. We concluded that while an external redirect is the only
reliable fix, this change in behavior led us to an even bigger
question:
Is this a bug or an intentional design change in Apache?
After digging deeper, we’ve uncovered something critically important that is not well-documented:
DirectoryIndex
.This post explores why this happens, whether it’s a feature or a bug, and why Apache’s documentation should explicitly clarify this behavior.
Let’s revisit the problem: we tried using an internal rewrite to append a trailing slash for directory requests:
RewriteEngine On
RewriteCond %{REQUEST_URI} !/$
RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_URI} -d
RewriteRule ^(.*)$ /$1/ [L]
Expected Behavior:
- Apache should internally rewrite /setup
to /setup/
.
- Since DirectoryIndex index.roc
is set, Apache should serve index.roc
.
Actual Behavior:
- Apache internally rewrites /setup
to /setup/
, but then
immediately fails with a 403 Forbidden.
- The error log states:
AH01276: Cannot serve directory /var/www/vhosts/treasurersbriefcase/htdocs/setup/: No matching DirectoryIndex (none) found
- Apache is treating /setup/
as an empty directory instead of recognizing index.roc
.
Unlike what many admins assume, Apache does not start over after an internal rewrite.
/setup
).mod_rewrite
.index.html
, index.php
, etc.) exists, DirectoryIndex
resolves it.DirectoryIndex
.DirectoryIndex
after an internal rewrite./setup/
as an empty directory with no default file and denies access with 403 Forbidden.First, let’s discuss why this worked in Apache 2.2.
The key reason internal rewrites worked in Apache 2.2 is that Apache restarted the request processing cycle after a rewrite. This meant that:
index.html
, index.php
, or any configured default file.DirectorySlash
was handled earlier in the request cycle,
Apache 2.2 still applied directory handling rules properly, even after
an internal rewrite.In Apache 2.4, this behavior changed. Instead of restarting the
request cycle, Apache continues processing the request from where it
left off. This means that after an internal rewrite, DirectoryIndex
is
never reprocessed, leading to the 403 Forbidden errors we
encountered. This fundamental change explains why no internal solution
works the way it did in Apache 2.2.
mod_rewrite
or DirectoryIndex
docs.Since Apache won’t reprocess DirectoryIndex, the only way to guarantee correct behavior is to force a new request via an external redirect:
RewriteEngine On
RewriteCond %{REQUEST_URI} !/$
RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_URI} -d
RewriteRule ^(.*)$ http://%{HTTP_HOST}/$1/ [R=301,L]
This forces Apache to start a completely new request cycle,
ensuring that DirectoryIndex
is evaluated properly.
We believe this behavior should be explicitly documented in: - mod_rewrite documentation (stating that rewrites do not restart request processing). - DirectoryIndex documentation (noting that it will not be re-evaluated after an internal rewrite).
This would prevent confusion and help developers troubleshoot these issues more efficiently.
In our previous blog post, we detailed a clever external redirect
solution to address the Apache 2.4 regression that broke automatic
directory handling when DirectorySlash Off
was set. We teased the
question: Can we achieve the same behavior with an internal
redirect? Spoiler alert - Nope.
In this follow-up post, we’ll explore why internal rewrites fall short, our failed attempts to make them work, and why the external redirect remains the best and only solution.
/dir
without a trailing slash, Apache
would automatically redirect them to /dir/
.index.html
(or any file
specified by DirectoryIndex
).DirectorySlash Off
is set, Apache stops auto-redirecting
directories./dir
as /dir/
, Apache tries to serve
/dir
as a file, which leads to 403 Forbidden errors.DirectoryIndex
no longer inherits globally from the document
root - each directory must be explicitly configured to serve an
index file.DirectoryIndex
after an internal rewrite.Since Apache wasn’t redirecting directories automatically anymore, we thought we could internally rewrite requests like this:
RewriteEngine On
# If the request is missing a trailing slash and is a directory, rewrite it internally
RewriteCond %{REQUEST_URI} !/$
RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_URI} -d
RewriteRule ^(.*)$ /$1/ [L]
Why This Doesn’t Work:
DirectoryIndex
after the rewrite.DirectoryIndex
is not explicitly defined for each
directory, Apache still refuses to serve the file and throws a
403 Forbidden./dir
as a raw file request instead of checking for an index page.To make it work, we tried to manually configure every directory:
<Directory "/var/www/vhosts/treasurersbriefcase/htdocs/setup/">
Require all granted
DirectoryIndex index.roc
</Directory>
DirectoryIndex
after an internal
rewrite. Even though DirectoryIndex index.roc
is explicitly set,
Apache never reaches this directive after rewriting /setup
to /setup/
./setup/
as an empty directory, leading to a 403
Forbidden error.This means that even if we were willing to configure every directory manually, it still wouldn’t work as expected.
We briefly considered whether FallbackResource
could help by
redirecting requests for directories to their respective index files:
<Directory "/var/www/vhosts/treasurersbriefcase/htdocs/setup/">
DirectoryIndex index.roc
Options -Indexes
Require all granted
FallbackResource /setup/index.roc
</Directory>
FallbackResource
is designed to handle 404 Not Found errors, not 403 Forbidden errors./setup/
as a valid directory but
refuses to serve it, FallbackResource
is never triggered.DirectoryIndex
after an internal rewrite.This was a red herring in our troubleshooting FallbackResource
was
never a viable solution.
Another way to handle this issue would be to explicitly rewrite
directory requests to their corresponding index file, bypassing
Apache’s DirectoryIndex
handling entirely:
RewriteEngine On
RewriteCond %{REQUEST_URI} !/$
RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_URI} -d
RewriteRule ^(.*)$ /$1/index.roc [L]
DirectoryIndex
handling in Apache 2.4.DirectorySlash Off
, since the request is rewritten directly to a file.index.html
,
index.php
, etc.), additional rules would be required.While this approach technically works, it reinforces our main conclusion: - Apache 2.4 no longer restarts the request cycle after a rewrite, so we need to account for it manually. - The external redirect remains the only scalable solution.
Instead of fighting Apache’s new behavior, we can work with it using an external redirect:
RewriteEngine On
RewriteCond %{REQUEST_URI} !/$
RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_URI} -d
RewriteRule ^(.*)$ http://%{HTTP_HOST}/$1/ [R=301,L]
/dir
to /dir/
before Apache even tries to serve it.DirectoryIndex
behavior globally, just like Apache 2.2.So, can we solve with an internal redirect?
DirectoryIndex
after an internal rewrite.DirectoryIndex
settings fail if Apache
has already decided the request is invalid.FallbackResource
never applied, since Apache rejected the request with 403 Forbidden, not 404
Not Found.Does our external redirect solution still hold up?
Yes!!, and in fact, it’s not just the best solution - it’s the only reliable one.
DirectorySlash Off
is set.If you haven’t read our original post yet, check it out here for the full explanation of our external redirect fix.
In part III of this series we’ll beat this dead horse and potentially explain why this was a problem in the first place…
I’m currently re-architecting my non-profit accounting SaaS (Treasurer’s Briefcase) to use Docker containers for a portable development environment and eventually for running under Kubernetes. The current architecture designed in 2012 uses an EC2 based environment for both development and run-time execution.
As part of that effort I’m migrating from Apache 2.2 to 2.4. In the
new development environment I’m using my Chromebook to access the
containerized application running on the EC2 dev server. I described
that setup in my previous
blog. In
that setup I’m accessing the application on port 8080 as http://local-dev:8080
.
If, as in the setup describe in my blog, you are running Apache on a
non-standard port (e.g., :8080
) - perhaps in Docker, EC2, or
via an SSH tunnel you may have noticed an annoying issue after
migrating from Apache 2.2 to Apache 2.4…
Previously, when requesting a directory without a trailing slash
(e.g., /setup
), Apache automatically redirected to /setup/
while preserving the port. However, in Apache 2.4, the redirect is
done externally AND drops the port, breaking relative URLs and
form submissions.
For example let’s suppose you have a form under the /setup
directory that
has as its action
“next-step.html”. The expected behavior on that page
would be to post to the page /setup/next-step.html
. But what really
happens is different. You can’t even get to the form in the first
place with the URL http://local-dev:8080/setup
!
http://yourserver:8080/setup
=> http://yourserver:8080/setup/
http://yourserver:8080/setup
=> http://yourserver/setup/
(port 8080
is missing!)This causes problems for some pages in web applications running behind Docker, SSH tunnels, and EC2 environments, where port forwarding is typically used.
If you’re experiencing this problem, you can confirm it by running:
curl -IL http://yourserver:8080/setup
You’ll likely see:
HTTP/1.1 301 Moved Permanently
Location: http://yourserver/setup/
Apache dropped 8080
from the redirect, causing requests to break.
Several workarounds exist, but they don’t work in our example.
DirectorySlash
: Prevents redirects but causes 403 Forbidden
errors when accessing directories.FallbackResource
: Works, but misroutes unrelated requests.Instead, we need a solution that dynamically preserves the port when necessary.
To restore Apache 2.2 behavior, we can use a rewrite rule that only preserves the port if it was in the original request.
<VirtualHost *:8080>
ServerName yourserver
DocumentRoot /var/www/html
<Directory /var/www/html>
Options -Indexes +FollowSymLinks
DirectoryIndex index.html index.php
Require all granted
DirectorySlash On # Keep normal Apache directory behavior
</Directory>
# Fix Apache 2.4 Directory Redirects: Preserve Non-Standard Ports
RewriteEngine On
RewriteCond %{REQUEST_URI} !/$
RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_URI} -d
RewriteCond %{SERVER_PORT} !^80$ [OR]
RewriteCond %{SERVER_PORT} !^443$
RewriteRule ^(.*)$ http://%{HTTP_HOST}/$1/ [R=301,L]
UseCanonicalName Off
</VirtualHost>
/setup
=> /setup/
).!=80
, !=443
):8080
, making it flexible for any non-standard port8080
locally to 80
on the jump box./setup
redirected externally without the port, causing failures.mod_rewrite
and a RewriteRule
that ensures that port 8080
is preservedApache 2.4’s port-dropping behavior is an unexpected regression from 2.2, but we can fix it with a simple rewrite rule that restores the expected behavior without breaking anything.
If you’re running Docker, EC2, or SSH tunnels, this is a fix that will prevent you from jumping through hoops by altering the application or changing your networking setup.
Hmmmm…maybe we can use internal redirects instead of an external redirect??? Stay tuned.
If you’re doing some development on a remote server that you access via a bastion host (e.g., an EC2 instance), and you want a seamless way to work from a Chromebook, you can set up an SSH tunnel through the bastion host to access your development server.
This guide outlines how to configure a secure and efficient development workflow using Docker, SSH tunnels, and a Chromebook.
Your Chromebook is a great development environment, but truth be told, the cloud is better. Why? Because you can leverage a bucket load of functionality, resources and infrastructure that is powerful yet inexpensive. Did I mention backups? My Chromebook running a version of debian rocks, but in general I use it as a conduit to the cloud.
So here’s the best of both worlds. I can use a kick-butt terminal
(terminator
) on my Chromie and use its networking mojo to access my
web servers running in the cloud.
In this setup:
Here’s what it looks like in ASCII art…
+--------------+ +--------------+ +--------------+
| Chromebook | | Bastion Host | | EC2 |
| | 22 | | 22 | |
| Local SSH |----| Jump Box |----| Development |
| Tunnel:8080 | 80 | (Accessible) | 80 | Server |
+--------------+ +--------------+ +--------------+
To create an SSH tunnel through the bastion host:
ssh -N -L 8080:EC2_PRIVATE_IP:80 user@bastion-host
-N
: Do not execute remote commands, just forward ports.-L 8080:EC2_PRIVATE_IP:80
: Forwards local port 8080 to port 80 on the development server (EC2 instance).user@bastion-host
: SSH into the bastion host as user
.Once connected, any request to localhost:8080
on the Chromebook will
be forwarded to port 80 on the EC2 instance.
To maintain the tunnel connection automatically:
~/.ssh/config
):Host bastion
HostName bastion-host
User your-user
IdentityFile ~/.ssh/id_rsa
LocalForward 8080 EC2_PRIVATE_IP:80
ServerAliveInterval 60
ServerAliveCountMax 3
ssh -fN bastion
curl -I http://localhost:8080
You should see a response from your EC2 instance’s web server.
If your EC2 instance runs a Dockerized web application, expose port 80 from the container:
docker run -d -p 80:80 my-web-app
Now, accessing http://localhost:8080
on your Chromebook browser will
open the web app running inside the Docker container on EC2.
This setup allows you to securely access a remote development environment from a Chromebook, leveraging SSH tunneling through a bastion host.
localhost:8080
) to a remote web app.Now you can develop on remote servers with a Chromebook, as if they were local!
I’m excited to announce the release of OrePAN2::S3, a new Perl distribution designed to streamline the creation and management of private CPAN (DarkPAN) repositories using Amazon S3 and CloudFront. This tool simplifies the deployment of your own Perl module repository, ensuring efficient distribution and scaling capabilities.
This effort is the terminal event (I hope) in my adventure in Perl packaging that led me down the rabbit hole of CloudFront distributions?
My adventure in packaging started many years ago when it seemed like a good idea to use RPMs. No really, it was!
The distribtions embraced Perl and life was good…until of course it wasn’t. Slowly but surely, those modules we wanted weren’t available in any of the repos. Well, here we are in 2025 and you can’t even grovel enough to get AWS to include Perl in its images? Sheesh, really? I have to
yum install perl-core
? Seems like someone has specifically put Perl on the shit list.I’ve been swimming upstream for years using
cpanspec
and learning how to create my ownyum
repos with all of the stuff Amazon just didn’t want to package. I’ve finally given up. CPAN forever!cpanm
is awesome! Okay, yeah, but I’m still not all in withcarton
. Maybe some day?
Key Features of OrePAN2::S3
Seamless Integration with AWS: OrePAN2::S3
leverages Amazon S3
for storage and CloudFront for content delivery, providing a highly
available, scalable solution for hosting your private CPAN repository.
User-Friendly Setup: The distribution offers straightforward scripts and configurations, enabling you to set up your DarkPAN with minimal effort…I hope. If you find some points of friction, log an issue.
Flexible Deployment Options: Whether you prefer a simple
S3-backed website or a full-fledged CloudFront distribution,
OrePAN2::S3
accommodates both setups to suit your specific
needs. If you really just want a website enable S3 bucket to serve
as your DarkPAN we got your back. But be careful…
Getting Started:
To begin using OrePAN2::S3
, ensure you have the following
prerequisites:
AWS Account: An active Amazon Web Services account.
S3 Bucket: A designated S3 bucket to store your CPAN modules.
CloudFront Distribution (optional): For enhanced content delivery and caching.
Detailed instructions for setting up your S3 bucket and CloudFront distribution are available in the project’s repository.
Why Choose OrePAN2::S3
?
Managing a private CPAN repository can be complex, but with
OrePAN2::S3
, the process becomes efficient and scalable. By harnessing
the power of AWS services, this distribution ensures your Perl modules
are readily accessible and securely stored.
Oh, and let’s give credit where credit is due. This is all based on
OrePAN2
.
For more information and to access the repository, visit: https://github.com/rlauer6/OrePAN2-S3
We look forward to your
feedback and
contributions to make OrePAN2::S3
even more robust and
user-friendly.
Ever locked yourself out of your own S3 bucket? That’s like asking a golfer if he’s ever landed in a bunker. We’ve all been there.
Scenario:
A sudden power outage knocks out your internet. When service resumes, your ISP has assigned you a new IP address. Suddenly, the S3 bucket you so carefully protected with that fancy bucket policy that restricts access by IP… is protecting itself from you. Nice work.
And here’s the kicker, you can’t change the policy because…you can’t access the bucket! Time to panic? Read on…
This post will cover:
S3 bucket policies are powerful and absolute. A common security pattern is to restrict access to a trusted IP range, often your home or office IP. That’s fine, but what happens when those IPs change without prior notice?
That’s the power outage scenario in a nutshell.
Suddenly (and without warning), I couldn’t access my own bucket. Worse, there was no easy way back in because the bucket policy itself was blocking my attempts to update it. Whether you go to the console or drop to a command line, you’re still hitting that same brick wall—your IP isn’t in the allow list.
At that point, you have two options, neither of which you want to rely on in a pinch:
The root account is a last resort (as it should be), and AWS support can take time you don’t have.
Once you regain access to the bucket again, it’s time to build a policy that includes an emergency backdoor from a trusted environment. We’ll call that the “safe room”. Your safe room is your AWS VPC.
While your home IP might change with the weather, your VPC is rock solid. If you allow access from within your VPC, you always have a way to manage your bucket policy.
Even if you rarely touch an EC2 instance, having that backdoor in your pocket can be the difference between a quick fix and a day-long support ticket.
A script to implement our safe room approach must at least:
This script helps you recover from lockouts and prevents future ones by ensuring your VPC is always a reliable access point.
Our script is light on dependencies but you will need to have curl
and the aws
script installed on your EC2.
A typical use of the command requires only your new IP address and the
bucket name. The aws CLI will try credentials from the environment,
your ~/.aws config
, or an instance profile - so you only need -p
if you
want to specify a different profile. Here’s the minimum you’d need to run the
command if you are executing the script in your VPC:
./s3-bucket-unlock.sh -i <your-home-ip> -b <bucket-name>
Options:
-i
Your current public IP address (e.g., your home IP).-b
The S3 bucket name.-v
(Optional) VPC ID; auto-detected if not provided.-p
(Optional) AWS CLI profile (defaults to $AWS_PROFILE
or default
).-n
Dry run (show policy, do not apply).Example with dry run:
./s3-bucket-unlock.sh -i 203.0.113.25 -b my-bucket -n
The dry run option lets you preview the generated policy before making any changes—a good habit when working with S3 policies.
Someone once said that we learn more from our failures than from our successes. At this rate I should be on the AWS support team soon…lol. Well, I probably need a lot more mistakes under my belt before they hand me a badge. In any event, ahem, we learned something from our power outage. Stuff happens - best be prepared. Here’s what this experience reinforced:
Sometimes it’s not a mistake - it’s a failure to realize how fragile access is. My home IP was fine…until it wasn’t.
Our script will help us apply a quick fix. The process of writing it was a reminder that security balances restrictions with practical escape hatches.
Next time you set an IP-based bucket policy, ask yourself:
Thanks to ChatGPT for being an invaluable backseat driver on this journey. Real AWS battle scars + AI assistance = better results.
In Part IIa, we detailed the challenges we faced when automating the deployment of a secure static website using S3, CloudFront, and WAF. Service interdependencies, eventual consistency, error handling, and AWS API complexity all presented hurdles. This post details the actual implementation journey.
We didn’t start with a fully fleshed-out solution that just worked. We had to “lather, rinse and repeat”. In the end, we built a resilient automation script robust enough to deploy secure, private websites across any organization.
The first take away - the importance of logging and visibility. While logging wasn’t the first thing we actually tackled, it was what eventually turned a mediocre automation script into something worth publishing.
run_command()
While automating the process of creating this infrastructure, we need to feed the output of one or more commands into the pipeline. The output of one command feeds another. But each step of course can fail. We need to both capture the output for input to later steps and capture errors to help debug the process. Automation without visibility is like trying to discern the elephant by looking at the shadows on the cave wall. Without a robust solution for capturing output and errors we experienced:
When AWS CLI calls failed, we found ourselves staring at the terminal trying to reconstruct what went wrong. Debugging was guesswork.
The solution was our first major building block: run_command()
.
echo "Running: $*" >&2
echo "Running: $*" >>"$LOG_FILE"
# Create a temp file to capture stdout
local stdout_tmp
stdout_tmp=$(mktemp)
# Detect if we're capturing output (not running directly in a terminal)
if [[ -t 1 ]]; then
# Not capturing → Show stdout live
"$@" > >(tee "$stdout_tmp" | tee -a "$LOG_FILE") 2> >(tee -a "$LOG_FILE" >&2)
else
# Capturing → Don't show stdout live; just log it and capture it
"$@" >"$stdout_tmp" 2> >(tee -a "$LOG_FILE" >&2)
fi
local exit_code=${PIPESTATUS[0]}
# Append stdout to log file
cat "$stdout_tmp" >>"$LOG_FILE"
# Capture stdout content into a variable
local output
output=$(<"$stdout_tmp")
rm -f "$stdout_tmp"
if [ $exit_code -ne 0 ]; then
echo "ERROR: Command failed: $*" >&2
echo "ERROR: Command failed: $*" >>"$LOG_FILE"
echo "Check logs for details: $LOG_FILE" >&2
echo "Check logs for details: $LOG_FILE" >>"$LOG_FILE"
echo "TIP: Since this script is idempotent, you can re-run it safely to retry." >&2
echo "TIP: Since this script is idempotent, you can re-run it safely to retry." >>"$LOG_FILE"
exit 1
fi
# Output stdout to the caller without adding a newline
if [[ ! -t 1 ]]; then
printf "%s" "$output"
fi
}
This not-so-simple wrapper gave us:
stdout
and stderr
for every commandrun_command()
became the workhorse for capturing our needed inputs
to other processes and our eyes into failures.
We didn’t arrive at run_command()
fully formed. We learned it the
hard way:
stdout
took fine-tuningThe point of this whole exercise is to host content, and for that, we need an S3 bucket. This seemed like a simple first task - until we realized it wasn’t. This is where we first collided with a concept that would shape the entire script: idempotency.
S3 bucket names are globally unique. If you try to create one that exists, you fail. Worse, AWS error messages can be cryptic:
Our naive first attempt just created the bucket. Our second attempt checked for it first:
create_s3_bucket() {
if run_command $AWS s3api head-bucket --bucket "$BUCKET_NAME" --profile $AWS_PROFILE 2>/dev/null; then
echo "Bucket $BUCKET_NAME already exists."
return
fi
run_command $AWS s3api create-bucket \
--bucket "$BUCKET_NAME" \
--create-bucket-configuration LocationConstraint=$AWS_REGION \
--profile $AWS_PROFILE
}
Making the script “re-runable” was essential unless of course we could
guarantee we did everything right and things worked the first time.
When has that every happened? Of course, we then wrapped the creation
of the bucket run_command()
because every AWS call still had the
potential to fail spectacularly.
And so, we learned: If you can’t guarantee perfection, you need idempotency.
Configuring a CloudFront distribution using the AWS Console offers a
streamlined setup with sensible defaults. But we needed precise
control over CloudFront behaviors, cache policies, and security
settings - details the console abstracts away. Automation via the AWS
CLI gave us that control - but there’s no free lunch. Prepare yourself
to handcraft deeply nested JSON payloads, get jiggy with jq
, and
manage the dependencies between S3, CloudFront, ACM, and WAF. This is
the path we would need to take to build a resilient, idempotent
deployment script - and crucially, to securely serve private S3
content using Origin Access Control (OAC).
Why do we need OAC?
Since our S3 bucket is private, we need CloudFront to securely retrieve content on behalf of users without exposing the bucket to the world.
Why not OAI?
AWS has deprecated Origin Access Identity in favor of Origin Access Control (OAC), offering tighter security and more flexible permissions.
Why do we need jq
?
In later steps we create a WAF Web ACL to firewall
our CloudFront distribution. In order to associate the WAF Web ACL with
our distribution we need to invoke the update-distribution
API which
requires a fully fleshed out JSON payload updated with the Web ACL id.
GOTHCHA: Attaching a WAF WebACL to an existing CloudFront distribution requires that you use the
update-distribution
API, notassociate-web-acl
as one might expect.
Here’s the template for our distribution configuration (some of the Bash variables used will be evident when you examine the completed script):
{
"CallerReference": "$CALLER_REFERENCE",
$ALIASES
"Origins": {
"Quantity": 1,
"Items": [
{
"Id": "S3-$BUCKET_NAME",
"DomainName": "$BUCKET_NAME.s3.amazonaws.com",
"OriginAccessControlId": "$OAC_ID",
"S3OriginConfig": {
"OriginAccessIdentity": ""
}
}
]
},
"DefaultRootObject": "$ROOT_OBJECT",
"DefaultCacheBehavior": {
"TargetOriginId": "S3-$BUCKET_NAME",
"ViewerProtocolPolicy": "redirect-to-https",
"AllowedMethods": {
"Quantity": 2,
"Items": ["GET", "HEAD"]
},
"ForwardedValues": {
"QueryString": false,
"Cookies": {
"Forward": "none"
}
},
"MinTTL": 0,
"DefaultTTL": $DEFAULT_TTL,
"MaxTTL": $MAX_TTL
},
"PriceClass": "PriceClass_100",
"Comment": "CloudFront Distribution for $ALT_DOMAIN",
"Enabled": true,
"HttpVersion": "http2",
"IsIPV6Enabled": true,
"Logging": {
"Enabled": false,
"IncludeCookies": false,
"Bucket": "",
"Prefix": ""
},
$VIEWER_CERTIFICATE
}
The create_cloudfront_distribution()
function is then used to create
the distribution.
create_cloudfront_distribution() {
# Snippet for brevity; see full script
run_command $AWS cloudfront create-distribution --distribution-config file://$CONFIG_JSON
}
Key lessons:
update-configuation
, not associate-web-acl
for CloudFront
distributionsjq
to modify the existing configuration to add the WAF Web
ACL idCool. We have a CloudFront distribution! But it’s wide open to the world. We needed to restrict access to our internal VPC traffic - without exposing the site publicly. AWS WAF provides this firewall capability using Web ACLs. Here’s what we need to do:
Keep in mind that CloudFront is designed to serve content to the public
internet. When clients in our VPC access the distribution, their
traffic needs to exit through a NAT gateway with a public IP. We’ll
use the AWS CLI to query the NAT gateway’s public IP and use that when
we create our allow list of IPs (step 1).
find_nat_ip() {
run_command $AWS ec2 describe-nat-gateways --filter "Name=tag:Environment,Values=$TAG_VALUE" --query "NatGateways[0].NatGatewayAddresses[0].PublicIp" --output text --profile $AWS_PROFILE
}
We take this IP and build our first WAF component: an IPSet. This becomes the foundation for the Web ACL we’ll attach to CloudFront.
The firewall we create will be composed of an allow list of IP addresses (step 2)…
create_ipset() {
run_command $AWS wafv2 create-ip-set \
--name "$IPSET_NAME" \
--scope CLOUDFRONT \
--region us-east-1 \
--addresses "$NAT_IP/32" \
--ip-address-version IPV4 \
--description "Allow NAT Gateway IP"
}
…that form the rules for our WAF Web ACL (step 3).
create_web_acl() {
run_command $AWS wafv2 create-web-acl \
--name "$WEB_ACL_NAME" \
--scope CLOUDFRONT \
--region us-east-1 \
--default-action Block={} \
--rules '[{"Name":"AllowNAT","Priority":0,"Action":{"Allow":{}},"Statement":{"IPSetReferenceStatement":{"ARN":"'$IPSET_ARN'"}},"VisibilityConfig":{"SampledRequestsEnabled":true,"CloudWatchMetricsEnabled":true,"MetricName":"AllowNAT"}}]' \
--visibility-config SampledRequestsEnabled=true,CloudWatchMetricsEnabled=true,MetricName="$WEB_ACL_NAME"
}
This is where our earlier jq
surgery becomes critical - attaching
the Web ACL requires updating the entire CloudFront distribution
configuration. And that’s how we finally attach that Web ACL to our
CloudFront distribution (step 4).
DISTRIBUTION_CONFIG=$(run_command $AWS cloudfront get-distribution-config --id $DISTRIBUTION_ID)
# Use jq to inject WebACLId into config JSON
UPDATED_CONFIG=$(echo "$DISTRIBUTION_CONFIG" | jq --arg ACL_ARN "$WEB_ACL_ARN" '.DistributionConfig | .WebACLId=$ACL_ARN')
# Pass updated config back into update-distribution
echo "$UPDATED_CONFIG" > updated-config.json
run_command $AWS cloudfront update-distribution --id $DISTRIBUTION_ID --if-match "$ETAG" --distribution-config file://updated-config.json
At this point, our CloudFront distribution is no longer wide open. It is protected by our WAF Web ACL, restricting access to only traffic coming from our internal VPC NAT gateway.
For many internal-only sites, this simple NAT IP allow list is enough. WAF can handle more complex needs like geo-blocking, rate limiting, or request inspection - but those weren’t necessary for us. Good design isn’t about adding everything; it’s about removing everything that isn’t needed. A simple allow list was also the most secure.
When we set up our bucket, we blocked public access - an S3-wide security setting that prevents any public access to the bucket’s contents. However, this also prevents CloudFront (even with OAC) from accessing S3 objects unless we explicitly allow it. Without this policy update, requests from CloudFront would fail with Access Denied errors.
At this point, we need to allow CloudFront to access our S3
bucket. The update_bucket_policy()
function will apply the policy
shown below.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "cloudfront.amazonaws.com"
},
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::$BUCKET_NAME/*",
"Condition": {
"StringEquals": {
"AWS:SourceArn": "arn:aws:cloudfront::$AWS_ACCOUNT:distribution/$DISTRIBUTION_ID"
}
}
}
]
}
Modern OAC best practice is to use the AWS:SourceArn condition to ensure only requests from your specific CloudFront distribution are allowed.
It’s more secure because it ties bucket access directly to a single distribution ARN, preventing other CloudFront distributions (or bad actors) from accessing your bucket.
"Condition": {
"StringEquals": { "AWS:SourceArn": "arn:aws:cloudfront::$AWS_ACCOUNT:distribution/$DISTRIBUTION_ID" }
}
With this policy in place, we’ve completed the final link in the security chain. Our S3 bucket remains private but can now securely serve content through CloudFront - protected by OAC and WAF.
We are now ready to wrap a bow around these steps in an idempotent Bash script.
jq
patch.
Restrict Access with WAFjq
and update-distribution
.Each segment of our script is safe to rerun. Each is wrapped in run_command()
,
capturing results for later steps and ensuring errors are logged. We
now have a script we can commit and re-use with confidence whenever we
need a secure static site. Together, these steps form a robust,
idempotent deployment pipeline for a secure S3 + CloudFront website -
every time.
You can find the full script here.
A hallmark of a production-ready script is an ‘-h’ option. Oh wait - your script has no help or usage? I’m supposed to RTFC? It ain’t done skippy until it’s done.
Scripts should include the ability to pass options that make it a flexible utility. We may have started out writing a “one-off” but recognizing opportunities to generalize the solution turned this into another reliable tool in our toolbox.
Be careful though - not every one-off needs to be Swiss Army knife. Just because aspirin is good for a headache doesn’t mean you should take the whole bottle.
Our script now supports the necessary options to create a secure, static website with a custom domain and certificate. We even added the ability to include additional IP addresses for your allow list in addition to the VPC’s public IP.
Now, deploying a private S3-backed CloudFront site is as easy as:
Example:
./s3-static-site.sh -b my-site -t dev -d example.com -c arn:aws:acm:us-east-1:cert-id
Inputs:
This single command now deploys an entire private website - reliably and repeatably. It only takes a little longer to do it right!
The process of working with ChatGPT to construct a production ready script that creates static websites took many hours. In the end, several lessons were reinforced and some gotchas discovered. Writing this blog itself was a collaborative effort that dissected both the technology and the process used to implement it. Overall, it was a productive, fun and rewarding experience. For those not familiar with ChatGPT or who are afraid to give it a try, I encourage you to explore this amazing tool.
Here are some of the things I took away from this adventure with ChatGPT.
With regard to the technology, some lessons were reinforced, some new knowledge was gained:
update-distribution
API call not associate-web-acl
when
adding WAF ACLs to your distribution!Thanks to ChatGPT for being an ever-present back seat driver on this journey. Real AWS battle scars + AI assistance = better results.
In Part III we wrap it all up as we learn more about how CloudFront and WAF actually protect your website.
This post was drafted with the assistance of ChatGPT, but born from real AWS battle scars.
If you like this content, please leave a comment or consider following me. Thanks.
After designing a secure static website on AWS using S3, CloudFront, and WAF as discussed in Part I of this series, we turned our focus to automating the deployment process. While AWS offers powerful APIs and tools, we quickly encountered several challenges that required careful consideration and problem-solving. This post explores the primary difficulties we faced and the lessons we learned while automating the provisioning of this infrastructure.
A key challenge when automating AWS resources is managing service dependencies. Our goal was to deploy a secure S3 website fronted by CloudFront, secured with HTTPS (via ACM), and restricted using WAF. Each of these services relies on others, and the deployment sequence is critical:
Missteps in the sequence can result in failed or partial deployments, which can leave your cloud environment in an incomplete state, requiring tedious manual cleanup.
AWS infrastructure often exhibits eventual consistency, meaning that newly created resources might not be immediately available. We specifically encountered this when working with ACM and CloudFront:
Handling these delays requires building polling mechanisms into your automation or using backoff strategies to avoid hitting API limits.
Reliable automation is not simply about executing commands; it requires designing for resilience and repeatability:
Additionally, logging the execution of deployment commands proved to be an unexpected challenge. We developed a run_command
function that captured both stdout and stderr while logging the output to a file. However, getting this function to behave correctly without duplicating output or interfering with the capture of return values required several iterations and refinements. Reliable logging during automation is critical for debugging failures and ensuring transparency when running infrastructure-as-code scripts.
While the AWS CLI and SDKs are robust, they are often verbose and require a deep understanding of each service:
Throughout this process, we found that successful AWS automation hinges on the following principles:
Automating AWS deployments unlocks efficiency and scalability, but it demands precision and robust error handling. Our experience deploying a secure S3 + CloudFront website highlighted common challenges that any AWS practitioner is likely to face. By anticipating these issues and applying resilient practices, teams can build reliable automation pipelines that simplify cloud infrastructure management.
Next up, Part IIb where we build our script for creating our static site.
This post was drafted with the assistance of ChatGPT, but born from real AWS battle scars.
If you like this content, please leave a comment or consider following me. Thanks.
While much attention is given to dynamic websites there are still many uses for the good ‘ol static website. Whether for hosting documentation, internal portals, or lightweight applications, static sites remain relevant. In my case, I wanted to host an internal CPAN repository for storing and serving Perl modules. AWS provides all of the necessary components for this task but choosing the right approach and configuring it securely and automatically can be a challenge.
Whenever you make an architectural decision various approaches are possible. It’s a best practice to document that decision in an Architectural Design Record (ADR). This type of documentation justifies your design choice, spelling out precisely how each approach either meets or fails to meet functional or non-functional requirements. In the first part of this blog series we’ll discuss the alternatives and why we ended up choosing our CloudFront based approach. This is our ADR.
Description | Notes | |
---|---|---|
1. | HTTPS website for hosting a CPAN repository | Will be used internally but we would like secure transport |
2. | Controlled Access | Can only be accessed from within a private subnet in our VPC |
3. | Scalable | Should be able to handle increasing storage without reprovisioning |
4. | Low-cost | Ideally less than $10/month |
5. | Low-maintenance | No patching or maintenance of applicaation or configurations |
6. | Highly available | Should be available 24x7, content should be backed up |
Now that we’ve defined our functional and non-functional requirements let’s look at some approaches we might take in order to create a secure, scalable, low-cost, low-maintenance static website for hosting our CPAN repository.
This solution at first glance seems like the quickest shot on goal. While S3 does offer a static website hosting feature, it doesn’t support HTTPS by default, which is a major security concern and does not match our requirements. Additionally, website-enabled S3 buckets do not support private access controls - they are inherently public if enabled. Had we been able to accept an insecure HTTP site and public access this approach would have been the easiest to implement. If we wanted to accept public access but required secure transport we could have used CloudFront with the website enabled bucket either using CloudFront’s certificate or creating our own custom domain with its own certificate.
Since our goal is to create a private static site, we can however use CloudFront as a secure, caching layer in front of S3. This allows us to enforce HTTPS, control access using Origin Access Control (OAC), and integrate WAF to restrict access to our VPC. More on this approach later…
Pros:
Cons:
Analysis:
While using an S3 website-enabled bucket is the easiest way to host static content, it fails to meet security and privacy requirements due to public access and lack of HTTPS support.
Perhaps the obvious approach to hosting a private static site is to deploy a dedicated Apache or Nginx web server on an EC2 instance. This method involves setting up a lightweight Linux instance, configuring the web server, and implementing a secure upload mechanism to deploy new content.
Pros:
Cons:
Analysis:
Using a dedicated web server is a viable alternative when additional flexibility is needed, but it comes with added maintenance and cost considerations. Given our requirements for a low-maintenance, cost-effective, and scalable solution, this may not be the best approach.
A common approach I have used to securely serve static content from an S3 bucket is to use an internal proxy server (such as Nginx or Apache) running on an EC2 instance within a private VPC. In fact, this is the approach I have used to create my own private yum repository, so I know it would work effectively for my CPAN repository. The proxy server retrieves content from an S3 bucket via a VPC endpoint, ensuring that traffic never leaves AWS’s internal network. This approach requires managing an EC2 instance, handling security updates, and scaling considerations. Let’s look at the cost of an EC2 based solution.
The following cost estimates are based on AWS pricing for us-east-1:
Item | Pricing |
---|---|
Instance type: t4g.nano (cheapest ARM-based instance) | Hourly cost: \$0.0052/hour |
Monthly usage: 730 hours (assuming 24/7 uptime) | (0.0052 x 730 = \$3.80/month) |
Pros:
Cons:
Analysis:
If predictable costs and full server control are priorities, EC2 may be preferable. However, this solution requires maintenance and may not scale with heavy traffic. Moreover, to create an HA solution would require additional AWS resources.
As alluded to before, CloudFront + S3 might fit the bill. To create a secure, scalable, and cost-effective private static website, we chose to use Amazon S3 with CloudFront (sprinkling in a little AWS WAF for good measure). This architecture allows us to store our static assets in an S3 bucket while CloudFront acts as a caching and security layer in front of it. Unlike enabling public S3 static website hosting, this approach provides HTTPS support, better scalability, and fine-grained access control.
CloudFront integrates with Origin Access Control (OAC), ensuring that the S3 bucket only allows access from CloudFront and not directly from the internet. This eliminates the risk of unintended public exposure while still allowing authorized users to access content. Additionally, AWS WAF (Web Application Firewall) allows us to restrict access to only specific IP ranges or VPCs, adding another layer of security.
Let’s look at costs:
Item | Cost | Capacity | Total |
---|---|---|---|
Data Transfer Out | First 10TB is \$0.085 per GB | 25GB/month of traffic | Cost for 25GB: (25 x 0.085 = \$2.13) |
HTTP Requests | \$0.0000002 per request | 250,000 requests/month | Cost for requests: (250,000 x 0.0000002 = \$0.05) |
Total CloudFront Cost: \$2.13 (Data Transfer) + \$0.05 (Requests) = \$2.18/month |
Pros:
Cons:
Analysis:
And the winner is…CloudFront + S3!
Using just a website enabled S3 bucket fails to meet the basic requiredments so let’s eliminate that solution right off the bat. If predictable costs and full server control are priorities, Using an EC2 either as a proxy or a full blown webserver may be preferable. However, for a low-maintenance, auto-scaling solution, CloudFront + S3 is the superior choice. EC2 is slightly more expensive but avoids CloudFront’s external traffic costs. Overall, our winning approach is ideal because it scales automatically, reduces operational overhead, and provides strong security mechanisms without requiring a dedicated EC2 instance to serve content.
Now that we have our agreed upon approach (the “what”) and documented our “architectural decision”, it’s time to discuss the “how”. How should we go about constructing our project? Many engineers would default to Terraform for this type of automation, but we had specific reasons for thinking this through and looking at a different approach. We’d like:
While Terraform is a popular tool for infrastructure automation, it introduces several challenges for this specific project. Here’s why we opted for a Bash script over Terraform:
State Management Complexity
Terraform relies on state files to track infrastructure resources, which introduces complexity when running and re-running deployments. State corruption or mismanagement can cause inconsistencies, making it harder to ensure a seamless idempotent deployment.
Slower Iteration and Debugging
Making changes in Terraform requires updating state, planning, and applying configurations. In contrast, Bash scripts execute AWS CLI commands immediately, allowing for rapid testing and debugging without the need for state synchronization.
Limited Control Over Execution Order
Terraform follows a declarative approach, meaning it determines execution order based on dependencies. This can be problematic when AWS services have eventual consistency issues, requiring retries or specific sequencing that Terraform does not handle well natively.
Overhead for a Simple, Self-Contained Deployment
For a relatively straightforward deployment like a private static website, Terraform introduces unnecessary complexity. A lightweight Bash script using AWS CLI is more portable, requires fewer dependencies, and avoids managing an external Terraform state backend.
Handling AWS API Throttling
AWS imposes API rate limits, and handling these properly requires implementing retry logic. While Terraform has some built-in retries, it is not as flexible as a custom retry mechanism in a Bash script, which can incorporate exponential backoff or manual intervention if needed.
Less Direct Logging and Error Handling
Terraform’s logs require additional parsing and interpretation, whereas a Bash script can log every AWS CLI command execution in a simple and structured format. This makes troubleshooting easier, especially when dealing with intermittent AWS errors.
Although Bash was the right choice for this project, Terraform is still useful for more complex infrastructure where:
For our case, where the goal was quick, idempotent, and self-contained automation, Bash scripting provided a simpler and more effective approach. This approach gave us the best of both worlds - automation without complexity, while still ensuring idempotency and security.
This post was drafted with the assistance of ChatGPT, but born from real AWS battle scars.
If you like this content, please leave a comment or consider following me. Thanks.