Flavio Poletti recently blogged about
using Barcode::ZBar
for reading QR codes. In a previous life I have
written applications for workflows that involved reading QR codes from
documents and have also used Barcode::ZBar
very effectively…highly
recommended.
Back then, iirc I had quite a struggle satisfying all of the dependencies required to get it actually work. I spent a little time after reading his article to see I could successfully build it once more, this time in a Docker container. The result is this project:
It was a little difficult piecing together the required packages, but in the end it works!
Next, I may create a webservice from the container and install it on a
Fargate cluster. The service will allow you to upload .png
files and
scan them for QR codes. Stay tuned!
map
The code below violates more than a few best practices making the intent of the code difficult to understand. Let’s get started…
User
Can you refactor this snippet of Perl code?
if (@multi_elems) {
my $delim_qr1 = qr/\d+$group_delim1/o;
#my $delim_qr2 = $group_delim2;
foreach my $entry ( @log_data{@EIDs} ) {
@{$entry}{@multi_elems}
= map {
my $x = $_;
$x && ( $x =~ s/$delim_qr1//go )
? [
# XXX: if $_ =~ m/^($group_delim2)+$/
# the result is ()
# at least on perl 5.8.0
split /$group_delim2/, $x
]
: []
} @{$entry}{@multi_elems};
}
#print STDERR "$err\n" if $err;
}
ChatGPT
Here’s the refactored version of your provided Perl code. I’ve added comments to explain what the code is doing and made some minor improvements for readability:
# Check if there are any multi_elems
if (@multi_elems) {
# Define the regex delimiter for group_delim1
my $delim_qr1 = qr/\d+$group_delim1/o;
# Iterate through each entry in the log_data hash
foreach my $entry (@log_data{@EIDs}) {
# Update each multi_elem in the current entry
@{$entry}{@multi_elems} = map {
my $x = $_;
# Check if the value matches the delimiter regex
$x && ($x =~ s/$delim_qr1//go) ?
[
# Split the value by group_delim2
split /$group_delim2/, $x
]
: [] # If the value doesn't match, return an empty array reference
} @{$entry}{@multi_elems};
}
}
The refactored code is more readable and has comments explaining the purpose of each block. Note that the code logic has not been changed, and it should function the same as the original snippet.
User
The use of the ternary operator obfuscates the meaning
ChatGPT
I understand your concern. Here’s the refactored version of the code
using an if
statement instead of the ternary operator to improve
readability:
# Check if there are any multi_elems
if (@multi_elems) {
# Define the regex delimiter for group_delim1
my $delim_qr1 = qr/\d+$group_delim1/o;
# Iterate through each entry in the log_data hash
foreach my $entry (@log_data{@EIDs}) {
# Update each multi_elem in the current entry
@{$entry}{@multi_elems} = map {
my $x = $_;
my $result;
# Check if the value matches the delimiter regex
if ($x && ($x =~ s/$delim_qr1//go)) {
# Split the value by group_delim2
$result = [split /$group_delim2/, $x];
} else {
# If the value doesn't match, return an empty array reference
$result = [];
}
$result;
} @{$entry}{@multi_elems};
}
}
In this version, I replaced the ternary operator with an if-else
statement, which assigns the result to the $result
variable. This
should make the code more understandable and easier to follow.
User
Excellent but I don’t like cuddled elses
ChatGPT
Sure, I’ve updated the code to uncuddle the else
statement:
# Check if there are any multi_elems
if (@multi_elems) {
# Define the regex delimiter for group_delim1
my $delim_qr1 = qr/\d+$group_delim1/o;
# Iterate through each entry in the log_data hash
foreach my $entry (@log_data{@EIDs}) {
# Update each multi_elem in the current entry
@{$entry}{@multi_elems} = map {
my $x = $_;
my $result;
# Check if the value matches the delimiter regex
if ($x && ($x =~ s/$delim_qr1//go)) {
# Split the value by group_delim2
$result = [split /$group_delim2/, $x];
}
else {
# If the value doesn't match, return an empty array reference
$result = [];
}
$result;
} @{$entry}{@multi_elems};
}
}
Now, the else
statement is on a new line, making the code more
aligned with your preferred style.
User
We are using Perl 5.16. Are there any changes you might suggest?
ChatGPT
Perl 5.16 doesn’t introduce any significant changes that would impact the provided code snippet. However, I can suggest a few general improvements for better readability and maintainability:
use strict;
use warnings;
foreach
loop with a lexical iterator variable to avoid potential conflicts with global variables:
for my $entry (@log_data{@EIDs}) {
# ...
}
With these suggestions, the updated code looks like this:
use strict;
use warnings;
# Check if there are any multi_elems
if (@multi_elems) {
# Define the regex delimiter for group_delim1
my $delim_qr1 = qr/\d+$group_delim1/o;
# Iterate through each entry in the log_data hash
for my $entry (@log_data{@EIDs}) {
# Update each multi_elem in the current entry
@{$entry}{@multi_elems} = map {
my $x = $_;
my $result;
# Check if the value matches the delimiter regex
if ($x && ($x =~ s/$delim_qr1//go)) {
# Split the value by group_delim2
$result = [split /$group_delim2/, $x];
}
else {
# If the value doesn't match, return an empty array reference
$result = [];
}
$result;
} @{$entry}{@multi_elems};
}
}
These changes should enhance the code quality and maintainability while keeping the same functionality.
User
My understanding is that the ‘o’ regex modifier may no longer be required.
ChatGPT
You’re correct. The /o
modifier for regular expressions is no longer
needed in most cases, as Perl optimizes the regex automatically if the
pattern doesn’t change during the runtime of the program. The /o
modifier was useful in older versions of Perl (prior to 5.6) when the
regex engine didn’t optimize regexes in the same way. In modern Perl,
the engine is smart enough to recognize when a regex is constant and
cache the compiled version automatically. You can remove the /o
modifier from the regex in your code snippet:
use strict;
use warnings;
# Check if there are any multi_elems
if (@multi_elems) {
# Define the regex delimiter for group_delim1
my $delim_qr1 = qr/\d+$group_delim1/;
# Iterate through each entry in the log_data hash
for my $entry (@log_data{@EIDs}) {
# Update each multi_elem in the current entry
@{$entry}{@multi_elems} = map {
my $x = $_;
my $result;
# Check if the value matches the delimiter regex
if ($x && ($x =~ s/$delim_qr1//g)) {
# Split the value by group_delim2
$result = [split /$group_delim2/, $x];
}
else {
# If the value doesn't match, return an empty array reference
$result = [];
}
$result;
} @{$entry}{@multi_elems};
}
}
The performance and behavior of the code should remain the same after
removing the /o
modifier.
User
Perl best practices indicates that a map function should only contain one statement. Using the map function above seems to reduce readability since the object being interated over with some complex logic is at the end of the expression.
ChatGPT
I agree with you. Using a map
function with complex logic can reduce
readability. In this case, we can refactor the code to use a for
loop instead of the map
function to improve readability:
use strict;
use warnings;
# Check if there are any multi_elems
if (@multi_elems) {
# Define the regex delimiter for group_delim1
my $delim_qr1 = qr/\d+$group_delim1/;
# Iterate through each entry in the log_data hash
for my $entry (@log_data{@EIDs}) {
# Iterate through each multi_elem in the current entry
for my $multi_elem (@multi_elems) {
my $x = $entry->{$multi_elem};
my $result;
# Check if the value matches the delimiter regex
if ($x && ($x =~ s/$delim_qr1//g)) {
# Split the value by group_delim2
$result = [split /$group_delim2/, $x];
}
else {
# If the value doesn't match, return an empty array reference
$result = [];
}
# Update the multi_elem in the current entry
$entry->{$multi_elem} = $result;
}
}
}
By using a for
loop, the code becomes more readable, and it’s easier
to understand the logic being applied to the elements.
User
I think the else statement in the inner for loop can be removed and we
can assign result to an empty array reference when we define $result
ChatGPT
You’re right. We can simplify the code further by initializing
$result
to an empty array reference and updating it only when the
condition is met. Here’s the updated code:
use strict;
use warnings;
# Check if there are any multi_elems
if (@multi_elems) {
# Define the regex delimiter for group_delim1
my $delim_qr1 = qr/\d+$group_delim1/;
# Iterate through each entry in the log_data hash
for my $entry (@log_data{@EIDs}) {
# Iterate through each multi_elem in the current entry
for my $multi_elem (@multi_elems) {
my $x = $entry->{$multi_elem};
# Initialize the result to an empty array reference
my $result = [];
# Check if the value matches the delimiter regex
if ($x && ($x =~ s/$delim_qr1//g)) {
# Split the value by group_delim2
$result = [split /$group_delim2/, $x];
}
# Update the multi_elem in the current entry
$entry->{$multi_elem} = $result;
}
}
}
By initializing $result
to an empty array reference, we can remove
the else
statement, making the code more concise and easier to read.
User
The $result
variable also now seems unnecessary and we can
further refactor this by directly assigning $entry->{$multi_elem}
to
an empty array and using the next statement to short circuit the if
statement
ChatGPT
You’re right. We can further simplify the code by removing the
$result
variable and directly assigning the value to
$entry->{$multi_elem}
. Here’s the refactored code:
use strict;
use warnings;
# Check if there are any multi_elems
if (@multi_elems) {
# Define the regex delimiter for group_delim1
my $delim_qr1 = qr/\d+$group_delim1/;
# Iterate through each entry in the log_data hash
for my $entry (@log_data{@EIDs}) {
# Iterate through each multi_elem in the current entry
for my $multi_elem (@multi_elems) {
my $x = $entry->{$multi_elem};
# Initialize the multi_elem in the current entry to an empty array reference
$entry->{$multi_elem} = [];
# Check if the value matches the delimiter regex, and short circuit if not
next unless ($x && ($x =~ s/$delim_qr1//g));
# Update the multi_elem in the current entry by splitting the value by group_delim2
$entry->{$multi_elem} = [split /$group_delim2/, $x];
}
}
}
By directly assigning an empty array reference to
$entry->{$multi_elem}
and using the next
statement to short
circuit the loop if the condition is not met, the code becomes even
more concise and easier to read.
User
perl best practices suggests avoiding unless
ChatGPT
You’re right. Using unless
can sometimes make the code less
readable. Instead, we can use an if
statement with a negated
condition. Here’s the updated code:
use strict;
use warnings;
# Check if there are any multi_elems
if (@multi_elems) {
# Define the regex delimiter for group_delim1
my $delim_qr1 = qr/\d+$group_delim1/;
# Iterate through each entry in the log_data hash
for my $entry (@log_data{@EIDs}) {
# Iterate through each multi_elem in the current entry
for my $multi_elem (@multi_elems) {
my $x = $entry->{$multi_elem};
# Initialize the multi_elem in the current entry to an empty array reference
$entry->{$multi_elem} = [];
# Check if the value matches the delimiter regex, and short circuit if not
if (!($x && ($x =~ s/$delim_qr1//g))) {
next;
}
# Update the multi_elem in the current entry by splitting the value by group_delim2
$entry->{$multi_elem} = [split /$group_delim2/, $x];
}
}
}
By using an if
statement with a negated condition instead of
unless
, the code conforms better to Perl best practices and is
easier to understand.
User
How do you think this code would compare to the original in terms of performance under perl 5.16
ChatGPT
The refactored code should have similar performance to the original
code under Perl 5.16. The changes made in the refactored version
mostly deal with readability and best practices, but the overall logic
and flow of the code remain the same. Some improvements in the
refactored code, such as removing the unnecessary else
statement and
the $result
variable, can slightly reduce memory usage and improve
the code’s readability without affecting the performance. The direct
assignment of an empty array reference and the use of the next
statement can make the code more efficient in some cases by avoiding
unnecessary assignments. However, it’s important to note that the
actual performance difference might not be significant, and it would
depend on the specific use case, data being processed, and the overall
complexity of the program. In general, focusing on readability and
maintainability is a good approach, as it will make it easier for you
and others to work with and optimize the code in the future.
Yikes! What the heck does this code do?
User
Please refactor this Perl snippet for readability
@{$_}{ @elem_ids_ref, @elem_ids_noref }
= ( ( map { [] } @elem_ids_ref ), @defaults_noref )
for @log_data{@EIDs};
ChatGPT
Here’s the refactored version of the provided Perl snippet with improved readability:
for my $entry (@log_data{@EIDs}) {
# Initialize array references for elements in @elem_ids_ref
@{$entry}{@elem_ids_ref} = map { [] } @elem_ids_ref;
# Assign the default values for elements in @elem_ids_noref
@{$entry}{@elem_ids_noref} = @defaults_noref;
}
In this refactored version, I’ve replaced the single line of code with
a more readable for
loop that iterates over each entry in
@log_data{@EIDs}
. Within the loop, I’ve separated the assignment of
array references for elements in @elem_ids_ref
and the assignment of
default values for elements in @elem_ids_noref
. This should make the
code easier to understand and maintain.
In this session I have an inkling what I want to do with this code, remove the
next
statements and make this more of a condition based dispatcher.
User
The code below uses the next statement 4 times and uses unless which is not considered to be a best practice. How could we avoid using next in this fashion?
foreach my $elem ( $self->elements() ) {
next unless $elem->table();
next
if ( $elem->{type} eq 'File' || $elem->{type} eq 'Signature' )
&& !$elem->copy_file;
next unless $elem->$allow_hybrid_method;
next unless exists $data->{ $elem->{uniq} };
$elem->{value} = $data->{ $elem->{uniq} };
}
ChatGPT
Here’s a refactored version of the provided Perl code snippet that
avoids using next
and unless
:
foreach my $elem ($self->elements()) {
if ($elem->table()
&& !($elem->{type} eq 'File' || $elem->{type} eq 'Signature') || $elem->copy_file
&& $elem->$allow_hybrid_method
&& exists $data->{$elem->{uniq}}
) {
$elem->{value} = $data->{$elem->{uniq}};
}
}
In this version, I combined all the conditions into a single if
statement, which makes the code more straightforward and easier to
read.
User
That does seem more readable - each of the conditions is essentially a rule that results in assigning a value to a hash member - is there another way to express this functionality?
ChatGPT
Yes, you can create an array of coderefs representing each condition, and then iterate through them to check whether all conditions are met. This approach makes the code more modular and easier to extend or modify in the future:
my @conditions = (
sub { $_[0]->table() },
sub { $_[0]->{type} ne 'File' && $_[0]->{type} ne 'Signature' || $_[0]->copy_file },
sub { $_[0]->$allow_hybrid_method },
sub { exists $_[0]->{uniq} && exists $data->{ $_[0]->{uniq} } },
);
foreach my $elem ($self->elements()) {
my $all_conditions_met = 1;
for my $condition (@conditions) {
unless ($condition->($elem)) {
$all_conditions_met = 0;
last;
}
}
if ($all_conditions_met) {
$elem->{value} = $data->{$elem->{uniq}};
}
}
In this version, we define an array of coderefs (@conditions) that represent each condition. Then, during the iteration over elements, we check whether all conditions are met by iterating through the @conditions array. If all conditions are met, we assign the value to the hash member. This approach makes it easy to add, remove, or modify conditions in the future.
User
Here we go with unless again
ChatGPT
Apologies for that oversight. Let’s rewrite the code snippet without
using unless
:
my @conditions = (
sub { $_[0]->table() },
sub { $_[0]->{type} ne 'File' && $_[0]->{type} ne 'Signature' || $_[0]->copy_file },
sub { $_[0]->$allow_hybrid_method },
sub { exists $_[0]->{uniq} && exists $data->{ $_[0]->{uniq} } },
);
foreach my $elem ($self->elements()) {
my $all_conditions_met = 1;
for my $condition (@conditions) {
if (!$condition->($elem)) {
$all_conditions_met = 0;
last;
}
}
if ($all_conditions_met) {
$elem->{value} = $data->{$elem->{uniq}};
}
}
Now, the unless
keyword has been replaced with a negated if
statement, which makes the code more in line with the best practices.
User
The foreach loop looks very generic and could be a subroutine of its own, with an argument of the conditions
ChatGPT
You’re right, the loop can be extracted into a separate
subroutine. Here’s a refactored version of the code, where the loop is
wrapped in a subroutine called process_elements
:
sub process_elements {
my ($elements, $data, $conditions) = @_;
ELEMENT:
foreach my $elem (@$elements) {
for my $condition (@$conditions) {
next ELEMENT unless $condition->($elem);
}
$elem->{value} = $data->{$elem->{uniq}};
}
}
my @conditions = (
sub { $_[0]->table() },
sub { $_[0]->{type} ne 'File' && $_[0]->{type} ne 'Signature' || $_[0]->copy_file },
sub { $_[0]->$allow_hybrid_method },
sub { exists $_[0]->{uniq} && exists $data->{ $_[0]->{uniq} } },
);
process_elements([$self->elements()], $data, \@conditions);
The process_elements
subroutine takes a list of elements, data, and
an array reference of conditions as its arguments. It iterates through
the elements, and for each element, it checks if all conditions are
met. If all conditions are satisfied, it assigns a value to the
element’s hash member.
User
The calling sequence of process_elements()
is specific to this use
case. How can we make it more generic?
ChatGPT
To make the process_elements
subroutine more generic, you can pass a
code reference for the action to be performed when all conditions are
met. This way, the subroutine can be used in various scenarios, not
just for the specific use case. Here’s the modified version:
sub process_elements {
my ($elements, $conditions, $action) = @_;
ELEMENT:
foreach my $elem (@$elements) {
for my $condition (@$conditions) {
next ELEMENT unless $condition->($elem);
}
$action->($elem);
}
}
my @conditions = (
sub { $_[0]->table() },
sub { $_[0]->{type} ne 'File' && $_[0]->{type} ne 'Signature' || $_[0]->copy_file },
sub { $_[0]->$allow_hybrid_method },
sub { exists $_[0]->{uniq} && exists $data->{ $_[0]->{uniq} } },
);
my $action = sub {
my ($elem) = @_;
$elem->{value} = $data->{$elem->{uniq}};
};
process_elements([$self->elements()], \@conditions, $action);
Now the process_elements
subroutine takes an additional argument,
$action
, which is a code reference for the action to be performed
when all conditions are met. This makes the subroutine more generic
and reusable in different situations.
User
Let’s refactor process_elements()
to receive a hash of arguments.
ChatGPT
To refactor the process_elements
subroutine to receive a hash of
arguments, you can make the following changes:
sub process_elements {
my %args = @_;
my $elements = $args{elements};
my $conditions = $args{conditions};
my $action = $args{action};
ELEMENT:
foreach my $elem (@$elements) {
for my $condition (@$conditions) {
next ELEMENT unless $condition->($elem);
}
$action->($elem);
}
}
my @conditions = (
sub { $_[0]->table() },
sub { $_[0]->{type} ne 'File' && $_[0]->{type} ne 'Signature' || $_[0]->copy_file },
sub { $_[0]->$allow_hybrid_method },
sub { exists $_[0]->{uniq} && exists $data->{ $_[0]->{uniq} } },
);
my $action = sub {
my ($elem) = @_;
$elem->{value} = $data->{$elem->{uniq}};
};
process_elements(
elements => [$self->elements()],
conditions => \@conditions,
action => $action
);
Now the process_elements
subroutine accepts a hash of arguments
instead of a list, which makes it easier to understand the purpose of
each argument when calling the subroutine.
User
Is there another reason this might be a better subroutine?
ChatGPT
Yes, there are several reasons why refactoring the process_elements
subroutine this way might be better:
Overall, the refactored version of the process_elements subroutine is more modular, flexible, and easier to understand, making it a better design choice.
You may recognize the phrase in the title as Damian Conway’s admonition in Perl Best Practices regarding magic values in your programs. It’s good advice to use names for magic numbers and values to make your code more readable for you and those that have to maintain your code.
But what about the don’t use constant
part? His argument for not
using the pragma essentially boils down to the fact that use
constant
creates package scoped constant subroutines at compile time
vs the Readonly
module which executes at runtime. So we are told to
use Readonly
…fair enough…but Perl::Critic
tells us to use
ReadonlyX
because Readonly
is slow and buggy.
Here’s the problem though: ReadonlyX
is not exactly the same as
Readonly
and many CPAN modules use Readonly
. Case in point:
File::BOM
.
Let’s start with a little script that uses both ReadonlyX
and
File::BOM
;
use strict;
use warnings;
use ReadonlyX;
use File::BOM;
1;
…let’s compile this and see what happens…
perl -wc foo.pl
Subroutine croak redefined at /home/rclauer/lib/perl5/Readonly.pm line 13.
Prototype mismatch: sub Readonly::Scalar ($;$) vs ($$) at /home/rclauer/lib/perl5/Readonly.pm line 257.
Prototype mismatch: sub Readonly::Scalar ($;$) vs ($$) at /home/rclauer/lib/perl5/Readonly.pm line 343.
Subroutine Scalar redefined at /home/rclauer/lib/perl5/Readonly.pm line 316.
Subroutine Array redefined at /home/rclauer/lib/perl5/Readonly.pm line 346.
Subroutine Hash redefined at /home/rclauer/lib/perl5/Readonly.pm line 364.
Subroutine Clone redefined at /home/rclauer/lib/perl5/Readonly.pm line 387.
Prototype mismatch: sub Readonly::Readonly (\[%@$]$) vs (\[$@%]@) at (eval 7) line 42.
Subroutine Readonly redefined at (eval 7) line 1.
foo.pl syntax OK
Hmmm…that looks bad, but I hear that Readonly
and ReadonlyX
should be “drop in replacements for each other? Well then perhaps I
can use ReadonlyX
by telling File::BOM
that Readonly
has already
been loaded…a technique I have used before when these two modules
were conflicting.
use strict;
use warnings;
BEGIN {
use Module::Loaded;
use ReadonlyX;
mark_as_loaded('Readonly');
}
use File::BOM;
1;
…let’s compile this and see if things get better…
perl -wc foo.pl
Useless use of reverse in void context at /home/rclauer/lib/perl5/File/BOM.pm line 204.
Odd number of elements in hash assignment at /home/rclauer/lib/perl5/File/BOM.pm line 187.
Compilation failed in require at foo.pl line 14.
BEGIN failed--compilation aborted at foo.pl line 14.
Nope. It appears that File::BOM
makes use of features or uses
Readonly
in a way that is not compatible with ReadonlyX
. Luckily
my use of ReadonlyX
is compatible with Readonly
, so I can reverse
the technique and use Readonly
and mark ReadonlyX
loaded so that
other modules that use ReadonlyX
will use Readonly
.
use strict;
use warnings;
BEGIN {
use Module::Loaded;
use Readonly;
mark_as_loaded('ReadonlyX');
}
use File::BOM;
1;
Of course, this only works if the downstream modules that use
ReadonlyX
use it in a way that is compatible with Readonly
…and
we’ve just introduced the slow and buggy Readonly
to our stack
which we were trying to avoid by using ReadonlyX
in the first place.
But at least we got our legacy code to work. ;-)
make
Anyone that has seen any of my projects on
GitHub knows that I love me some
Makefile
. I’ve been using make
since the days I started
programming and I doubt I’m going to give it up any time soon.
I know the kids today are haters, but make
, in all its wonky glory is a
true wonder and the
Leatherman
of GNU tools. I continually find new tricks that extend my love and
adoration for this ubiquitous tool.
My latest find is what I’ll term scriptlets. Many times
I’ll find myself creating a Makefile
that requires some
non-trivial manipulation or transformation of data. In those cases
I’ll usually try a Perl one-liner and pipe some data through a Perl
command.
Let’s suppose file foo.txt
depends on bar.txt
and we need to replace all
instances of xyz
with abc
(yes, sed
my be a better choice, but
let’s go with this for now…). Your make
recipe might look like this:
foo.txt: bar.txt
perl -npe 's/xyz/abc/g;' $< > $@
But what happens when things get a little more complicated and the
Perl one-liner is insufficient for the task? Well, maybe it’s time to
write a little Perl script. Of course, if I do that, I’ll then need to include
that script in my project, make it executable and reference it from my
Makefile
. Hmmm…is there a better way than cluttering the project
with scripts? Enter scriptlets…
Here’s a short example that removes all members of a hash where the
keys do not represent a file with an extension of .pm
.
define create_json =
use JSON::PP;
my $file_list = JSON::PP->new->decode(<>);
my @pm_files = grep { /[.]pm$/ } keys %{$file_list};
my %new_file_list = map { $_ => $file_list->{$_} } @pm_files;
print JSON::PP->new->pretty->encode(\%new_file_list);
endef
export s_create_json = $(value create_json)
foo.json: bar.json
perl -0 -e "$$s_create_json" $< > $@
…and then:
make
You can read more about the GNU make
features used above by
following these links.
Hope you find make
as useful as I do…drop me a line if you have a
favorite make
trick or tip.
Thanks for reading.
My new favorite Perl module is a very unassuming chap that makes your help and usage notes for your Perl scripts just a little bit sexier…
When you have a fairly complicated script that requires lots of help and usage information, it will probably scroll past one page of the display when you dump it to STDOUT.
Sure, your users could pipe the output through less
or more
(…and we all know that less
is more right?) but why make them do
that?
Enter IO::Pager
…a clever
little module that’s quite easy to use. The pattern I’m using is to
use it, if it’s available…
my $token = eval { require IO::Pager; IO::Pager::open( *STDOUT, '|-:utf8', 'Unbuffered' ); }; print {*STDOUT} <<'END_OF_USAGE'; usage: $0 options command Script to do blah blah blah... Options ------- ... END_OF_USAGE
Brian d foy introduced the Perl community to the idea of a script and module in the same file (the so-called modulino concept). Essentially, the idea is that you can create a script that provides a command line interface to your Perl module. Essentially, the implementation looks something like this:
package Foo
caller or __PACKAGE__->main();
# package guts
...
sub main {
my $foo = Foo->new;
use Getopt::Long;
# do something interesting here...
exit 0;
}
1;
This model comes in quite handy when writing scripts or just exercising your module during it’s creation.
I sometimes find it useful to then create a bash
script that invokes
my modulino. Since I might want to point to a development version
of the script, I might set my PERL5LIB
path to my local development
environment. Running the modulino script, I would then expect it to
use the version in my path. Hence the following modulino script.
#!/usr/bin/env bash
# -*- mode: sh; -*-
MODULINO="Amazon::Credentials"
MODULINO_PATH="${MODULINO//::/\/}.pm"
MODULINO_RUN=$(perl -M$MODULINO -e 'print $INC{"'$MODULINO_PATH'"}';)
test -n "$DEBUG" && echo $$MODULINO_RUN
if test -z "$MODULINO_RUN"; then
echo "$MODULINO is not installed"
exit 1;
fi
perl $MODULINO_RUN "$@"
…and then
$ amazon-credentials.sh -h amazon-credentials.sh options Formats credentials found in env, config, SSO, role Options ------- --help, -h this --ec2, get credentials from server IAM profile --env get credentials from environment variables --profile, -p get credentials from profile in credentials configuration --role get credentials from SSO role --account use with --role, specify AWS account id $ amazon-credentials.sh --profile=test export AWS_ACCESS_KEY_ID=AKI***************** export AWS_SECRET_ACCESS_KEY=**************************************** $ aws sso login $ amazon-credentials.sh --role my-sso-role --account 01234567890
More information about modulinos can be found here:
Although I’ve been programming in Perl for over 25 years, it wasn’t until recently that I’ve had a boost in productivity as great as the one I experienced over the last year. What happened?
Stepping out from a more management oriented role at my former
employer, they needed a technical leader for a legacy Perl web
application migration to the cloud. Somehow I let on that I had some
Perl and AWS experience and so I was tabbed as the technical lead.
The project involved some heavy lifting of years of crufty Perl code
from an on-prem, dedicated Apache server environment to a
containerized AWS environment. Aside from the challenges of trying to
get legacy Perl code running in a Docker container the team had to
mitigate findings from the company’s security scans. This included
perlcritic
findings as well as findings based on other tools that
look for potential security issues.
So, the first hurdle was whittling down the perlcritic
findings and
making sure we weren’t introducing new findings.
perlcritic
to the BuildI’m a big fan of make
and autotools
so naturally our build was
based on GNU
autotools
. This
allowed for a repeatable, standards based, reliable and
extensible build system that worked locally using docker-compose
and
worked in our CI/CD pipeline to deploy Docker images for AWS Fargate.
To make sure that we maintained a baseline of low severity findings
from perlcritic
, I added a step in our Makefile
that ran
perlcritic
and errored out if any severity level exceeded 2. This
prevented any code from going into the repository that would trigger
security concerns since all pushes to the repository were scanned.
My editor of choice has always been Emacs…let the editor wars begin!
I had already added perltidy
as an extension to Emacs so that
perltidy
would be run before any save. Our team standardized on a
set of perltidy
settings and added the .perltidyrc
file to the
project along with the .perlcriticrc
that configures perlcritic
reporting.
Most editors today have language specific syntax highlighting and syntax checking built-in. Flycheck is an Emac’s plugin for syntax checking. Flycheck extensions are available for almost any language and even for things like markdown, JSON and SQL.
Syntax checking as you work in your scripts is
another way to move the process of validating your code further
upstream, but what really supercharged my development efforts was
adding perlcritic
checking to Emacs. The combination of Flycheck’s
support for perltidy
, perlcritic
and Perl syntax checking has
helped me develop faster with less errors.
Perl Best Practices isn’t just code for “standarization”, although having some standards makes maintenance a heckuva lot easier. No, PBP can highlight potential issues in your code and prevent you from having to debug gnarly problems.
Unfortunately, it’s also a nag.
Visiting an old legacy file that someone wrote back in 2012 (before you and they were woke to PBP) is an eye opening experience. When Flycheck gives up after 1527 issues found you start to question whether you really want to edit that code!
Even code that has relatively few findings presents them with electric hilighting that eventually rattles your nerves. It also becomes a bit of an obsession to clean up old code and get down to 0 findings!
In the end tough, the result is better code. Other programmers can read and understand it and the quality will get incrementally better.
Generally speaking then, to supercharge your development efforts move code quality enforcement as far upstream as possible, starting with your editor.
In physics there is the concept of the observer effect whereby the act of observing something alters the measurement. Recently, I’ve had the pleasure of experiencing that firsthand.
The evils of tied variables are well documented and frowned upon by our
friendly(?) mentor perlcritic
, but let’s suppose we want to be
clever…another frowned upon attribute of Perl scripts :-( . But let’s
forge ahead anyway…
We’ll create a class that we can use to benchmark our scripts. We want this to be easy, so we’ll create a class that allow us to name benchmarks and simply access a hash element when we want to stop and report the time…sort of like this:
#!/usr/bin/env perl
use feature 'say';
use Easy::Benchmark;
use Data::Dumper;
my $benchmark = Easy::Benchmark->new( 'begin', 'start' );
sleep 2;
say $benchmark->stop('start');
$benchmark->start('next');
sleep 1;
# print {*STDERR} Dumper $benchmark
say $benchmark->stop('next');
say $benchmark->stop;
Here’s the class that implements our easy benchmark.
package Easy::Benchmark;
use strict;
use warnings;
use Benchmark;
use constant ANON_BENCHMARK => '__anon_benchmark__';
########################################################################
sub new {
########################################################################
my ( $proto, @args ) = @_;
my $class = ref $proto || $proto;
my $self = bless {}, $class;
tie %{$self}, $class, @args;
return $self;
}
########################################################################
sub TIEHASH {
########################################################################
my ( $class, @args ) = @_;
my $self = bless {}, $class;
# start the clock on an anonymous benchmark
$self->start(ANON_BENCHMARK);
for (@args) {
# and any additional ones requested.
$self->start($_);
}
return $self;
}
########################################################################
sub FETCH {
########################################################################
my ( $self, $name ) = @_;
die "$self: Unknown benchmark named `$name'\n"
if !exists $self->{$name};
if ( !ref( $self->{$name} ) || ref $self->{$name} ne 'ARRAY' ) {
$self->{$name} = [ Benchmark->new, $self->{$name} ];
}
return timestr( timediff( @{ $self->{$name} } ) );
}
########################################################################
sub DELETE { return delete $_[0]->{ $_[1] }; }
sub FIRSTKEY { my $a = scalar CORE::keys %{ $_[0] }; each %{ $_[0] } }
sub NEXTKEY { each %{ $_[0] } }
sub STORE { $_[0]->{ $_[1] } = $_[2] }
########################################################################
########################################################################
sub start {
########################################################################
my ( $self, $name ) = @_;
die "must specify name of counter\n"
if !$name;
my $start = Benchmark->new;
$self->{$name} = $start;
return $start;
}
########################################################################
sub stop {
########################################################################
my ( $self, @args ) = @_;
if ( @args > 0 ) {
return $self->{ $args[0] };
}
else {
my $name = ANON_BENCHMARK;
my $value = $self->{$name};
delete $self->{$name};
return $value;
}
}
1;
Running our script produces the expect output…
perl -I . foo.pl
2 wallclock secs ( 0.01 usr + 0.00 sys = 0.01 CPU)
1 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
3 wallclock secs ( 0.01 usr + 0.00 sys = 0.01 CPU)
Let’s remove the comment on the line that dumps our $benchmark
object…and see what happens
perl -I . foo.pl
2 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
$VAR1 = bless( {
'__anon_benchmark__' => ' 2 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)',
'begin' => ' 2 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)',
'next' => ' 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)',
'start' => ' 2 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)'
}, 'Easy::Benchmark' );
0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
2 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
Hmmm…I expected our last benchmark to be 3 wall seconds…that
wasn’t what I expected. By now you probably know the problem. The
class uses the FETCH
method to stop a timer. So, when Dumper
fetches the keys of the $benchmark
object, the timer is restarted.
The act of observing the hash triggered an action that resulted in
upsetting our experiment.
I am embarassed to say it took me an hour or so to piece it all together since I didn’t know who (or what) was accessing the hash in an effort to provide some useful log messages.
Generalizing, any time you create a class that triggers actions when fetching values from hash, be aware of the observer effect.
Thanks for reading…
This is part III of my blog on versions. Last time we discussed
the question of what version of perl
to use for development.
Conclusion: In most cases, the choice will be made for you,
especially in maintenance and legacy migration situations. If you have
the luxury and ability to use a more version of perl
, use it. Let’s continue:
perl
should I be using for development?perl
should I
support?perl
should I support?For me, this has become more than just an academic question. Several years I ago I petitioned the Pause administrators to allow me to adopt Amazon::S3. It had been a long time since the module was updated and it needed a little TLC to support new features and fix some deficiencies. The adoption process took a lot longer than I had imagined. Between the time I requested co-maintainer rights and the time it was actually granted the urgency and desire to adopt the module waned. Recently the need to update this module has re-appeared so I began making some much needed improvements. My first release to CPAN of the updated module was 0.49 and thus began my introduction to CPAN Testers Reports.
Version | Max Version Passing | Min Version Passing | First Failure | O/Ss With Passing Version |
---|---|---|---|---|
0.49 | 5.36.0 | 5.14.4 | 5.32.1 | darwin, freebsd, linux, mswin, openbsd, solaris |
0.50 | 5.37.1 | 5.14.4 | 5.34.1 | darwin, freebsd, linux, mswin, openbsd, solaris |
0.51 | 5.37.1 | 5.14.4 | 5.18.4 | darwin, freebsd, linux, mswin, openbsd, solaris |
0.52 | 5.37.1 | 5.14.4 | 5.8.9 | darwin, freebsd, linux, mswin, openbsd, solaris |
0.53 | 5.37.1 | 5.14.4 | NA | darwin, freebsd, linux, mswin, openbsd, solaris |
0.54 | 5.37.1 | 5.14.4 | 5.37.1 | darwin, freebsd, linux, mswin, openbsd, solaris |
My first release only passed (accidentally) on a few versions but
seemed to passs on at least 1 version of all operating systems
tested. The failures were all due to missing dependencies that I had
failed to identify in Makefile.PL
. Thus a new release (0.50) was
required - somehow I made the same mistake, however as you can see
from the report, a new max version (5.37.1) emerges due to one of the
testers adding that version to its list of perl
versions to test. I
say the tests that passed, passed accidentally, because I’m guessing
that those testers had (for some reason) the necessary modules
installed in their environment.
That’s an important point to consider - all testing environments are
not created equally. By looking at the reports, we can conclude
empircally that there must be something different about these
environments if the module fails on one tester’s version of perl
5.37.1 and another passes? Take this report for example:
Note the failed tester and the OS version. The version of the OS is
different in the failing tester’s report…however the failure was due
to XML::Simple
not being installed because it was missing from the
list of dependencies in Makefile.PL
. So why did the other’s pass?
The conclusion must be that XML::Simple
was somehow present in those
environments otherwise they would would failed there as well.
Some other oddities of the CPAN Testers Reports is what versions of
perl
are used for testing the module. After releasing 0.50 I noticed
that version 5.8.9 of perl
suddenly showed up in the testers
report. Hmmm…should I try to support that version? Is there a
reason it shouldn’t pass on 5.8.9? What versions of
perl
did the original author’s 0.45 version support? Well, as it
turns out when that module was tested in 2012, it support 5.6.2!
Should I support 5.6.2? Would it actually be possible to support
perl
5.6.2?
The answer as it turns out is no. Version 0.45 would probably not pass
version 5.14.0 due to the versions of perl
that are required by the
module’s dependencies.
Version 0.45 specifies that any version of
LWP::UserAgent::Determined
will do, which in term will require
any version of LWP
.
As I futilely attempted to see what the minimum
version of perl
my adopted module could support I attempted to
build LWP
on version 5.10.0 of perl
. Unfortunately, LWP
would
not build on 5.10.0 (actuall it was the module HTTP::HeadParser
that
is part of libwww-perl
that failed unit tests) so I dutifully
filed an issue assuming that the authors of LWP
would want to know
that the module does not build on 5.10.0.
The response to my report was suprisingly - let’s bump the required
version of perl
to 5.14. So, while LWP
successfully builds on
5.14.0, LWP::UserAgent::Determined
does not.
After a lot of useless traversing down rabbit holes I was able to
determine that installing an older version of LWP
would, in fact succeed in
building LWP::UserAgent::Determined
. The rub here however is that
other modules like XML::Simple
require yet other modules that
specifed that any version of LWP
would do. If the dependency
that required any version of LWP
is installed prior
to installing a specific older version of LWP
listed in my manifest,
well, now we’ve installed an incompatible version of LWP
because
specifying any version of LWP
will bring in the latest version.
Now, if cpanm
had done the right thing and installed specific
versions of modules specified in the dependency list first, then the
older version of LWP
would have satisfied the requirement. This is
not to say that cpanm
did the wrong thing…it got the list of
direct dependencies from the distribution and started installing them.
To be fair, the versions of perl
I have been testing against are
old. CPAN authors should not be bound to supporting older versions of
perl
at the expense of modernizing or enhancing their modules. The
point is to simply illustrate the difficulty in releasing a CPAN
module that does support older versions of perl
.
So should I then conclude that I need to bump the minimum version of
perl
that Amazon::S3 will
support to something like 5.16.3? If I still want to support older
versions of perl
would it even be possible to find the versions of
dependent modules that do not break my build? Does that even make any
sense when modules are sometimes updated to fix bugs?
As it happens, I did manage to get a working build for perl
5.10.0
and 5.14.0. Of course, it was a forensics exercise and not very
satisfying to be truthful.
By installing version 6.13 of LWP
and version 2.14 of
HTML::HeadParser
I managed to somehow get
Amazon::S3 to build
and complete all tests successfully.
The guidelines to support older versions of perl
seem to be:
perl
you plan to
supportperl
you plan to supportmore next time…
This is part II of my blog on my version woes. Part I can be found here.
Let’s review some of the issues I identified in part I.
perl
should I be using for development?perl
should I
support?perl
should I be using for development?It depends. For new development, it would make the most sense to use a
current version of perl
(5.38.0), but when do we truly do new
development in the Perl language these days? Even in companies with
sufficient Perl programmers it’s a risk to start new applications in a
language that increasingly is being ignored by both vendors and new
developers. Libraries for APIs are tough to come by and young
developers are not making it a priority to learn Perl. That’s a shame,
because learning some Perl is like learning vi
. You’ll see it on
almost every Linux based system you touch. It’s ubiquity demands some
awareness.
In the old days of computing the saying was “No one ever got fired selecting IBM.”. I think one can use that same sentiment when talking about language choices for new applications. “No one ever got fired for selecting Python, C# or Java.”. It would be a questionable choice to select Perl for new development today due to the obvious issues we all recognize.
So, for new development it seems that the majority of shops that might
have Perl developers might still choose to look elsewhere. Perl work
then seems to be relegated to maintaining and enhancing legacy Perl
applications that would be difficult, costly or impossible to rewrite
in a more modern language. Still, one could conceivably review the
version of perl
being used and choose a more modern one as a target
for enhancing the maintainability of the application. But that comes
with its own set of problems.
Will the application work if I migrate it to a new version of perl
?
Who knows? You just have to try it…but:
perl
executable with unknown
dependencies on its attributesEven attempting to create a new development environment for legacy
Perl applications using the same version of perl
can be difficult if
not impossible. As I have discovered in my most recent job, building
CPAN modules that rely on C libraries is not a particulary easy
exercise. For example, attempting to build JavaScript::V8
required
hours of forensics and internet time travel…ultimately resulting in
this solution.
FROM amazonlinux:2 COPY epel.repo /etc/yum.repos.d/epel.repo COPY RPM-GPG-KEY-EPEL-6 /etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-6 RUN yum install -y wget 'perl(App::cpanminus)' RUN yum groupinstall -y 'Development Tools' RUN cpanm -v App::cpanminus RUN wget ftp://ftp.pbone.net/mirror/vault.centos.org/6.7/cr/x86_64/Packages/libicu-4.2.1-14.el6.x86_64.rpm RUN rpm -ivh libicu-4.2.1-14.el6.x86_64.rpm RUN yum install -y libuv RUN yum install -y v8 v8-devel RUN cpanm -v JavaScript::V8@0.09
Of course, the obvious question is “Why are you using
JavaScript::V8
?”.
Reminder: This is a legacy application with no documentation, no unit tests and no integration tests. Changing anything in the application is like walking in a mine field with size 14 boots.
perl
version should you use?Since I was unable to build a necessary dependency on a system with
version 5.36.0 of perl
, the decision of which perl
version to use
for this application was made for me…stick with the version running
in production for now.
For legacy applications my approach has been more or less the trial
and error approach…optimistically trying to update the version of
perl
and seeing where it takes you. Most of the time with legacy
applications it doesn’t take you far. I try to build and unit test (to
whatever degree possible) as much of the application as I can with a
more modern version of perl
, its core modules, and the latest version
of any CPAN dependencies. Until that process is proven
successful (or not) the application is built using the versions of
modules and of perl
that are currently running in production.
Here are the basic steps I have been using with legacy applications.
git
repository and create a baseline. Don’t try to do anything fancy at
this point - just reflect the structure of the application as it is
laid down on the target servers in your repositoryperlcritic
to gauge the mess - save the results to measure
your progress in the cleanup processperl
version using your original manifest.perl
version.You will first encounter problems trying to build a containerized version of the as-is application, especially if the legacy application resides on a production server that has been poorly maintained. Poorly maintained being defined as one or more people having manually installed things over the years without of course documenting what they have installed or what additional dependencies were needed. If you can get by this step and have successfully built the application in a containerized environment, then you at least have the basis for a more modern approach to software development.
scandeps
scandeps-static.pl
Module::CoreList
docker
docker-compose
Perl::Critic
make-a-perl
ExtUtils::MM
- for
parsing module versionsmore next time…