Credit: Flavio Polletti - Just Another Perl Hacker

Flavio Poletti recently blogged about using Barcode::ZBar for reading QR codes. In a previous life I have written applications for workflows that involved reading QR codes from documents and have also used Barcode::ZBar very effectively…highly recommended.

Back then, iirc I had quite a struggle satisfying all of the dependencies required to get it actually work. I spent a little time after reading his article to see I could successfully build it once more, this time in a Docker container. The result is this project:

docker-barcode-zbar

It was a little difficult piecing together the required packages, but in the end it works!

Next, I may create a webservice from the container and install it on a Fargate cluster. The service will allow you to upload .png files and scan them for QR codes. Stay tuned!

Other Links of Interest

Session #1 - refactoring a block containing map

The code below violates more than a few best practices making the intent of the code difficult to understand. Let’s get started…

User

Can you refactor this snippet of Perl code?

if (@multi_elems) {
      my $delim_qr1 = qr/\d+$group_delim1/o;
      #my $delim_qr2 = $group_delim2;
      foreach my $entry ( @log_data{@EIDs} ) {
        @{$entry}{@multi_elems}
          = map {
          my $x = $_;
          $x && ( $x =~ s/$delim_qr1//go )
            ? [
            # XXX: if $_ =~ m/^($group_delim2)+$/
            #      the result is ()
            #      at least on perl 5.8.0
            split /$group_delim2/, $x
            ]
            : []
          } @{$entry}{@multi_elems};
      }
      #print STDERR "$err\n" if $err;
    }

ChatGPT

Here’s the refactored version of your provided Perl code. I’ve added comments to explain what the code is doing and made some minor improvements for readability:

 # Check if there are any multi_elems
 if (@multi_elems) {
     # Define the regex delimiter for group_delim1
     my $delim_qr1 = qr/\d+$group_delim1/o;
     # Iterate through each entry in the log_data hash
     foreach my $entry (@log_data{@EIDs}) {
         # Update each multi_elem in the current entry
         @{$entry}{@multi_elems} = map {
             my $x = $_;
             # Check if the value matches the delimiter regex
             $x && ($x =~ s/$delim_qr1//go) ?
                 [
                     # Split the value by group_delim2
                     split /$group_delim2/, $x
                 ]
                 : [] # If the value doesn't match, return an empty array reference
         } @{$entry}{@multi_elems};
     }
 }

The refactored code is more readable and has comments explaining the purpose of each block. Note that the code logic has not been changed, and it should function the same as the original snippet.

User

The use of the ternary operator obfuscates the meaning

ChatGPT

I understand your concern. Here’s the refactored version of the code using an if statement instead of the ternary operator to improve readability:

 # Check if there are any multi_elems
 if (@multi_elems) {
     # Define the regex delimiter for group_delim1
     my $delim_qr1 = qr/\d+$group_delim1/o;
     # Iterate through each entry in the log_data hash
     foreach my $entry (@log_data{@EIDs}) {
         # Update each multi_elem in the current entry
         @{$entry}{@multi_elems} = map {
             my $x = $_;
             my $result;
             # Check if the value matches the delimiter regex
             if ($x && ($x =~ s/$delim_qr1//go)) {
                 # Split the value by group_delim2
                 $result = [split /$group_delim2/, $x];
             } else {
                 # If the value doesn't match, return an empty array reference
                 $result = [];
             }
             $result;
         } @{$entry}{@multi_elems};
     }
 }

In this version, I replaced the ternary operator with an if-else statement, which assigns the result to the $result variable. This should make the code more understandable and easier to follow.

User

Excellent but I don’t like cuddled elses

ChatGPT

Sure, I’ve updated the code to uncuddle the else statement:

 # Check if there are any multi_elems
 if (@multi_elems) {
     # Define the regex delimiter for group_delim1
     my $delim_qr1 = qr/\d+$group_delim1/o;
     # Iterate through each entry in the log_data hash
     foreach my $entry (@log_data{@EIDs}) {
         # Update each multi_elem in the current entry
         @{$entry}{@multi_elems} = map {
             my $x = $_;
             my $result;
             # Check if the value matches the delimiter regex
             if ($x && ($x =~ s/$delim_qr1//go)) {
                 # Split the value by group_delim2
                 $result = [split /$group_delim2/, $x];
             }
             else {
                 # If the value doesn't match, return an empty array reference
                 $result = [];
             }
             $result;
         } @{$entry}{@multi_elems};
     }
 }

Now, the else statement is on a new line, making the code more aligned with your preferred style.

User

We are using Perl 5.16. Are there any changes you might suggest?

ChatGPT

Perl 5.16 doesn’t introduce any significant changes that would impact the provided code snippet. However, I can suggest a few general improvements for better readability and maintainability:

  1. Use strict and warnings pragmas to enforce stricter code quality: use strict; use warnings;
  2. Use the foreach loop with a lexical iterator variable to avoid potential conflicts with global variables: for my $entry (@log_data{@EIDs}) { # ... }

With these suggestions, the updated code looks like this:

 use strict;
 use warnings;
 # Check if there are any multi_elems
 if (@multi_elems) {
     # Define the regex delimiter for group_delim1
     my $delim_qr1 = qr/\d+$group_delim1/o;
     # Iterate through each entry in the log_data hash
     for my $entry (@log_data{@EIDs}) {
         # Update each multi_elem in the current entry
         @{$entry}{@multi_elems} = map {
             my $x = $_;
             my $result;
             # Check if the value matches the delimiter regex
             if ($x && ($x =~ s/$delim_qr1//go)) {
                 # Split the value by group_delim2
                 $result = [split /$group_delim2/, $x];
             }
             else {
                 # If the value doesn't match, return an empty array reference
                 $result = [];
             }
             $result;
         } @{$entry}{@multi_elems};
     }
 }

These changes should enhance the code quality and maintainability while keeping the same functionality.

User

My understanding is that the ‘o’ regex modifier may no longer be required.

ChatGPT

You’re correct. The /o modifier for regular expressions is no longer needed in most cases, as Perl optimizes the regex automatically if the pattern doesn’t change during the runtime of the program. The /o modifier was useful in older versions of Perl (prior to 5.6) when the regex engine didn’t optimize regexes in the same way. In modern Perl, the engine is smart enough to recognize when a regex is constant and cache the compiled version automatically. You can remove the /o modifier from the regex in your code snippet:

 use strict;
 use warnings;
 # Check if there are any multi_elems
 if (@multi_elems) {
     # Define the regex delimiter for group_delim1
     my $delim_qr1 = qr/\d+$group_delim1/;
     # Iterate through each entry in the log_data hash
     for my $entry (@log_data{@EIDs}) {
         # Update each multi_elem in the current entry
         @{$entry}{@multi_elems} = map {
             my $x = $_;
             my $result;
             # Check if the value matches the delimiter regex
             if ($x && ($x =~ s/$delim_qr1//g)) {
                 # Split the value by group_delim2
                 $result = [split /$group_delim2/, $x];
             }
             else {
                 # If the value doesn't match, return an empty array reference
                 $result = [];
             }
             $result;
         } @{$entry}{@multi_elems};
     }
 }

The performance and behavior of the code should remain the same after removing the /o modifier.

User

Perl best practices indicates that a map function should only contain one statement. Using the map function above seems to reduce readability since the object being interated over with some complex logic is at the end of the expression.

ChatGPT

I agree with you. Using a map function with complex logic can reduce readability. In this case, we can refactor the code to use a for loop instead of the map function to improve readability:

 use strict;
 use warnings;
 # Check if there are any multi_elems
 if (@multi_elems) {
     # Define the regex delimiter for group_delim1
     my $delim_qr1 = qr/\d+$group_delim1/;
     # Iterate through each entry in the log_data hash
     for my $entry (@log_data{@EIDs}) {
         # Iterate through each multi_elem in the current entry
         for my $multi_elem (@multi_elems) {
             my $x = $entry->{$multi_elem};
             my $result;
             # Check if the value matches the delimiter regex
             if ($x && ($x =~ s/$delim_qr1//g)) {
                 # Split the value by group_delim2
                 $result = [split /$group_delim2/, $x];
             }
             else {
                 # If the value doesn't match, return an empty array reference
                 $result = [];
             }
             # Update the multi_elem in the current entry
             $entry->{$multi_elem} = $result;
         }
     }
 }

By using a for loop, the code becomes more readable, and it’s easier to understand the logic being applied to the elements.

User

I think the else statement in the inner for loop can be removed and we can assign result to an empty array reference when we define $result

ChatGPT

You’re right. We can simplify the code further by initializing $result to an empty array reference and updating it only when the condition is met. Here’s the updated code:

 use strict;
 use warnings;
 # Check if there are any multi_elems
 if (@multi_elems) {
     # Define the regex delimiter for group_delim1
     my $delim_qr1 = qr/\d+$group_delim1/;
     # Iterate through each entry in the log_data hash
     for my $entry (@log_data{@EIDs}) {
         # Iterate through each multi_elem in the current entry
         for my $multi_elem (@multi_elems) {
             my $x = $entry->{$multi_elem};
             # Initialize the result to an empty array reference
             my $result = [];
             # Check if the value matches the delimiter regex
             if ($x && ($x =~ s/$delim_qr1//g)) {
                 # Split the value by group_delim2
                 $result = [split /$group_delim2/, $x];
             }
             # Update the multi_elem in the current entry
             $entry->{$multi_elem} = $result;
         }
     }
 }

By initializing $result to an empty array reference, we can remove the else statement, making the code more concise and easier to read.

User

The $result variable also now seems unnecessary and we can further refactor this by directly assigning $entry->{$multi_elem} to an empty array and using the next statement to short circuit the if statement

ChatGPT

You’re right. We can further simplify the code by removing the $result variable and directly assigning the value to $entry->{$multi_elem}. Here’s the refactored code:

 use strict;
 use warnings;
 # Check if there are any multi_elems
 if (@multi_elems) {
     # Define the regex delimiter for group_delim1
     my $delim_qr1 = qr/\d+$group_delim1/;
     # Iterate through each entry in the log_data hash
     for my $entry (@log_data{@EIDs}) {
         # Iterate through each multi_elem in the current entry
         for my $multi_elem (@multi_elems) {
             my $x = $entry->{$multi_elem};
             # Initialize the multi_elem in the current entry to an empty array reference
             $entry->{$multi_elem} = [];
             # Check if the value matches the delimiter regex, and short circuit if not
             next unless ($x && ($x =~ s/$delim_qr1//g));
             # Update the multi_elem in the current entry by splitting the value by group_delim2
             $entry->{$multi_elem} = [split /$group_delim2/, $x];
         }
     }
 }

By directly assigning an empty array reference to $entry->{$multi_elem} and using the next statement to short circuit the loop if the condition is not met, the code becomes even more concise and easier to read.

User

perl best practices suggests avoiding unless

ChatGPT

You’re right. Using unless can sometimes make the code less readable. Instead, we can use an if statement with a negated condition. Here’s the updated code:

 use strict;
 use warnings;
 # Check if there are any multi_elems
 if (@multi_elems) {
     # Define the regex delimiter for group_delim1
     my $delim_qr1 = qr/\d+$group_delim1/;
     # Iterate through each entry in the log_data hash
     for my $entry (@log_data{@EIDs}) {
         # Iterate through each multi_elem in the current entry
         for my $multi_elem (@multi_elems) {
             my $x = $entry->{$multi_elem};
             # Initialize the multi_elem in the current entry to an empty array reference
             $entry->{$multi_elem} = [];
             # Check if the value matches the delimiter regex, and short circuit if not
             if (!($x && ($x =~ s/$delim_qr1//g))) {
                 next;
             }
             # Update the multi_elem in the current entry by splitting the value by group_delim2
             $entry->{$multi_elem} = [split /$group_delim2/, $x];
         }
     }
 }

By using an if statement with a negated condition instead of unless, the code conforms better to Perl best practices and is easier to understand.

User

How do you think this code would compare to the original in terms of performance under perl 5.16

ChatGPT

The refactored code should have similar performance to the original code under Perl 5.16. The changes made in the refactored version mostly deal with readability and best practices, but the overall logic and flow of the code remain the same. Some improvements in the refactored code, such as removing the unnecessary else statement and the $result variable, can slightly reduce memory usage and improve the code’s readability without affecting the performance. The direct assignment of an empty array reference and the use of the next statement can make the code more efficient in some cases by avoiding unnecessary assignments. However, it’s important to note that the actual performance difference might not be significant, and it would depend on the specific use case, data being processed, and the overall complexity of the program. In general, focusing on readability and maintainability is a good approach, as it will make it easier for you and others to work with and optimize the code in the future.

Session 2 - Refactoring Obfuscated Code

Yikes! What the heck does this code do?

User

Please refactor this Perl snippet for readability

@{$_}{ @elem_ids_ref, @elem_ids_noref }
        = ( ( map { [] } @elem_ids_ref ), @defaults_noref )
        for @log_data{@EIDs};

ChatGPT

Here’s the refactored version of the provided Perl snippet with improved readability:

for my $entry (@log_data{@EIDs}) {
    # Initialize array references for elements in @elem_ids_ref
    @{$entry}{@elem_ids_ref} = map { [] } @elem_ids_ref;
    # Assign the default values for elements in @elem_ids_noref
    @{$entry}{@elem_ids_noref} = @defaults_noref;
}

In this refactored version, I’ve replaced the single line of code with a more readable for loop that iterates over each entry in @log_data{@EIDs}. Within the loop, I’ve separated the assignment of array references for elements in @elem_ids_ref and the assignment of default values for elements in @elem_ids_noref. This should make the code easier to understand and maintain.

Session 3 - Creating a Rules Engine from a Snippet

In this session I have an inkling what I want to do with this code, remove the next statements and make this more of a condition based dispatcher.

User

The code below uses the next statement 4 times and uses unless which is not considered to be a best practice. How could we avoid using next in this fashion?

foreach my $elem ( $self->elements() ) {
          next unless $elem->table();
          next
            if ( $elem->{type} eq 'File' || $elem->{type} eq 'Signature' )
            && !$elem->copy_file;
          next unless $elem->$allow_hybrid_method;
          next unless exists $data->{ $elem->{uniq} };
          $elem->{value} = $data->{ $elem->{uniq} };
        }

ChatGPT

Here’s a refactored version of the provided Perl code snippet that avoids using next and unless:

foreach my $elem ($self->elements()) {
    if ($elem->table()
        && !($elem->{type} eq 'File' || $elem->{type} eq 'Signature') || $elem->copy_file
        && $elem->$allow_hybrid_method
        && exists $data->{$elem->{uniq}}
    ) {
        $elem->{value} = $data->{$elem->{uniq}};
    }
}

In this version, I combined all the conditions into a single if statement, which makes the code more straightforward and easier to read.

User

That does seem more readable - each of the conditions is essentially a rule that results in assigning a value to a hash member - is there another way to express this functionality?

ChatGPT

Yes, you can create an array of coderefs representing each condition, and then iterate through them to check whether all conditions are met. This approach makes the code more modular and easier to extend or modify in the future:

my @conditions = (
    sub { $_[0]->table() },
    sub { $_[0]->{type} ne 'File' && $_[0]->{type} ne 'Signature' || $_[0]->copy_file },
    sub { $_[0]->$allow_hybrid_method },
    sub { exists $_[0]->{uniq} && exists $data->{ $_[0]->{uniq} } },
);
foreach my $elem ($self->elements()) {
    my $all_conditions_met = 1;
    for my $condition (@conditions) {
        unless ($condition->($elem)) {
            $all_conditions_met = 0;
            last;
        }
    }
    if ($all_conditions_met) {
        $elem->{value} = $data->{$elem->{uniq}};
    }
}

In this version, we define an array of coderefs (@conditions) that represent each condition. Then, during the iteration over elements, we check whether all conditions are met by iterating through the @conditions array. If all conditions are met, we assign the value to the hash member. This approach makes it easy to add, remove, or modify conditions in the future.

User

Here we go with unless again

ChatGPT

Apologies for that oversight. Let’s rewrite the code snippet without using unless:

my @conditions = (
    sub { $_[0]->table() },
    sub { $_[0]->{type} ne 'File' && $_[0]->{type} ne 'Signature' || $_[0]->copy_file },
    sub { $_[0]->$allow_hybrid_method },
    sub { exists $_[0]->{uniq} && exists $data->{ $_[0]->{uniq} } },
);
foreach my $elem ($self->elements()) {
    my $all_conditions_met = 1;
    for my $condition (@conditions) {
        if (!$condition->($elem)) {
            $all_conditions_met = 0;
            last;
        }
    }
    if ($all_conditions_met) {
        $elem->{value} = $data->{$elem->{uniq}};
    }
}

Now, the unless keyword has been replaced with a negated if statement, which makes the code more in line with the best practices.

User

The foreach loop looks very generic and could be a subroutine of its own, with an argument of the conditions

ChatGPT

You’re right, the loop can be extracted into a separate subroutine. Here’s a refactored version of the code, where the loop is wrapped in a subroutine called process_elements:

sub process_elements {
    my ($elements, $data, $conditions) = @_;
    ELEMENT:
    foreach my $elem (@$elements) {
        for my $condition (@$conditions) {
            next ELEMENT unless $condition->($elem);
        }
        $elem->{value} = $data->{$elem->{uniq}};
    }
}
my @conditions = (
    sub { $_[0]->table() },
    sub { $_[0]->{type} ne 'File' && $_[0]->{type} ne 'Signature' || $_[0]->copy_file },
    sub { $_[0]->$allow_hybrid_method },
    sub { exists $_[0]->{uniq} && exists $data->{ $_[0]->{uniq} } },
);
process_elements([$self->elements()], $data, \@conditions);

The process_elements subroutine takes a list of elements, data, and an array reference of conditions as its arguments. It iterates through the elements, and for each element, it checks if all conditions are met. If all conditions are satisfied, it assigns a value to the element’s hash member.

User

The calling sequence of process_elements() is specific to this use case. How can we make it more generic?

ChatGPT

To make the process_elements subroutine more generic, you can pass a code reference for the action to be performed when all conditions are met. This way, the subroutine can be used in various scenarios, not just for the specific use case. Here’s the modified version:

sub process_elements {
    my ($elements, $conditions, $action) = @_;
    ELEMENT:
    foreach my $elem (@$elements) {
        for my $condition (@$conditions) {
            next ELEMENT unless $condition->($elem);
        }
        $action->($elem);
    }
}
my @conditions = (
    sub { $_[0]->table() },
    sub { $_[0]->{type} ne 'File' && $_[0]->{type} ne 'Signature' || $_[0]->copy_file },
    sub { $_[0]->$allow_hybrid_method },
    sub { exists $_[0]->{uniq} && exists $data->{ $_[0]->{uniq} } },
);
my $action = sub {
    my ($elem) = @_;
    $elem->{value} = $data->{$elem->{uniq}};
};
process_elements([$self->elements()], \@conditions, $action);

Now the process_elements subroutine takes an additional argument, $action, which is a code reference for the action to be performed when all conditions are met. This makes the subroutine more generic and reusable in different situations.

User

Let’s refactor process_elements() to receive a hash of arguments.

ChatGPT

To refactor the process_elements subroutine to receive a hash of arguments, you can make the following changes:

sub process_elements {
    my %args = @_;
    my $elements   = $args{elements};
    my $conditions = $args{conditions};
    my $action     = $args{action};
    ELEMENT:
    foreach my $elem (@$elements) {
        for my $condition (@$conditions) {
            next ELEMENT unless $condition->($elem);
        }
        $action->($elem);
    }
}
my @conditions = (
    sub { $_[0]->table() },
    sub { $_[0]->{type} ne 'File' && $_[0]->{type} ne 'Signature' || $_[0]->copy_file },
    sub { $_[0]->$allow_hybrid_method },
    sub { exists $_[0]->{uniq} && exists $data->{ $_[0]->{uniq} } },
);
my $action = sub {
    my ($elem) = @_;
    $elem->{value} = $data->{$elem->{uniq}};
};
process_elements(
    elements   => [$self->elements()],
    conditions => \@conditions,
    action     => $action
);

Now the process_elements subroutine accepts a hash of arguments instead of a list, which makes it easier to understand the purpose of each argument when calling the subroutine.

User

Is there another reason this might be a better subroutine?

ChatGPT

Yes, there are several reasons why refactoring the process_elements subroutine this way might be better:

  1. Flexibility: The refactored version of the subroutine is more flexible, as it allows for different sets of conditions and actions to be applied to the elements. This makes it more reusable for different use cases and easier to adapt when requirements change.
  2. Readability: By separating the conditions and action from the core loop, the code is easier to read and understand. This also makes it simpler to modify or extend the conditions and actions without affecting the core loop functionality.
  3. Maintainability: Encapsulating the functionality within a separate subroutine makes it easier to maintain and test. It also allows for better code organization and reduces the likelihood of introducing bugs when making changes to the code.
  4. Scalability: The refactored version of the subroutine can be easily extended to handle additional conditions or actions as needed, without having to modify the core loop. This makes the code more scalable and future-proof.

Overall, the refactored version of the process_elements subroutine is more modular, flexible, and easier to understand, making it a better design choice.

Credit: Piledhigheranddeeper at English Wikipedia, CC BY-SA 3.0 , via Wikimedia Commons

You may recognize the phrase in the title as Damian Conway’s admonition in Perl Best Practices regarding magic values in your programs. It’s good advice to use names for magic numbers and values to make your code more readable for you and those that have to maintain your code.

But what about the don’t use constant part? His argument for not using the pragma essentially boils down to the fact that use constant creates package scoped constant subroutines at compile time vs the Readonly module which executes at runtime. So we are told to use Readonly…fair enough…but Perl::Critic tells us to use ReadonlyX because Readonly is slow and buggy.

Here’s the problem though: ReadonlyX is not exactly the same as Readonly and many CPAN modules use Readonly. Case in point: File::BOM.

Let’s start with a little script that uses both ReadonlyX and File::BOM;

use strict;
use warnings;

use ReadonlyX;
use File::BOM;

1;

…let’s compile this and see what happens…

perl  -wc foo.pl
Subroutine croak redefined at /home/rclauer/lib/perl5/Readonly.pm line 13.
Prototype mismatch: sub Readonly::Scalar ($;$) vs ($$) at /home/rclauer/lib/perl5/Readonly.pm line 257.
Prototype mismatch: sub Readonly::Scalar ($;$) vs ($$) at /home/rclauer/lib/perl5/Readonly.pm line 343.
Subroutine Scalar redefined at /home/rclauer/lib/perl5/Readonly.pm line 316.
Subroutine Array redefined at /home/rclauer/lib/perl5/Readonly.pm line 346.
Subroutine Hash redefined at /home/rclauer/lib/perl5/Readonly.pm line 364.
Subroutine Clone redefined at /home/rclauer/lib/perl5/Readonly.pm line 387.
Prototype mismatch: sub Readonly::Readonly (\[%@$]$) vs (\[$@%]@) at (eval 7) line 42.
Subroutine Readonly redefined at (eval 7) line 1.
foo.pl syntax OK

Hmmm…that looks bad, but I hear that Readonly and ReadonlyX should be “drop in replacements for each other? Well then perhaps I can use ReadonlyX by telling File::BOM that Readonly has already been loaded…a technique I have used before when these two modules were conflicting.

use strict;
use warnings;

BEGIN {
  use Module::Loaded;

  use ReadonlyX;

  mark_as_loaded('Readonly');
}

use File::BOM;

1;

…let’s compile this and see if things get better…

perl  -wc foo.pl
Useless use of reverse in void context at /home/rclauer/lib/perl5/File/BOM.pm line 204.
Odd number of elements in hash assignment at /home/rclauer/lib/perl5/File/BOM.pm line 187.
Compilation failed in require at foo.pl line 14.
BEGIN failed--compilation aborted at foo.pl line 14.

Nope. It appears that File::BOM makes use of features or uses Readonly in a way that is not compatible with ReadonlyX. Luckily my use of ReadonlyX is compatible with Readonly, so I can reverse the technique and use Readonly and mark ReadonlyX loaded so that other modules that use ReadonlyX will use Readonly.

use strict;
use warnings;

BEGIN {
  use Module::Loaded;

  use Readonly;

  mark_as_loaded('ReadonlyX');
}

use File::BOM;

1;

Of course, this only works if the downstream modules that use ReadonlyX use it in a way that is compatible with Readonly…and we’ve just introduced the slow and buggy Readonly to our stack which we were trying to avoid by using ReadonlyX in the first place.

But at least we got our legacy code to work. ;-)

Credit: Aurelio A. Heckert, CC BY-SA 2.0 , via Wikimedia Commons

make

Anyone that has seen any of my projects on GitHub knows that I love me some Makefile. I’ve been using make since the days I started programming and I doubt I’m going to give it up any time soon. I know the kids today are haters, but make, in all its wonky glory is a true wonder and the Leatherman of GNU tools. I continually find new tricks that extend my love and adoration for this ubiquitous tool.

Scriptlets?

My latest find is what I’ll term scriptlets. Many times I’ll find myself creating a Makefile that requires some non-trivial manipulation or transformation of data. In those cases I’ll usually try a Perl one-liner and pipe some data through a Perl command.

Let’s suppose file foo.txt depends on bar.txt and we need to replace all instances of xyz with abc (yes, sed my be a better choice, but let’s go with this for now…). Your make recipe might look like this:

foo.txt: bar.txt
    perl -npe 's/xyz/abc/g;' $< > $@

But what happens when things get a little more complicated and the Perl one-liner is insufficient for the task? Well, maybe it’s time to write a little Perl script. Of course, if I do that, I’ll then need to include that script in my project, make it executable and reference it from my Makefile. Hmmm…is there a better way than cluttering the project with scripts? Enter scriptlets

Here’s a short example that removes all members of a hash where the keys do not represent a file with an extension of .pm.

define create_json =

use JSON::PP;

my $file_list = JSON::PP->new->decode(<>);

my @pm_files = grep { /[.]pm$/ } keys %{$file_list};

my %new_file_list = map { $_ => $file_list->{$_} } @pm_files;

print JSON::PP->new->pretty->encode(\%new_file_list);

endef

export s_create_json = $(value create_json)

foo.json: bar.json
        perl -0 -e "$$s_create_json" $< > $@

…and then:

make

You can read more about the GNU make features used above by following these links.

Hope you find make as useful as I do…drop me a line if you have a favorite make trick or tip.

Thanks for reading.

Credit: Edward Hands, CC BY-SA 4.0 , via Wikimedia Commons

My new favorite Perl module is a very unassuming chap that makes your help and usage notes for your Perl scripts just a little bit sexier…

When you have a fairly complicated script that requires lots of help and usage information, it will probably scroll past one page of the display when you dump it to STDOUT.

Sure, your users could pipe the output through less or more (…and we all know that less is more right?) but why make them do that?

Enter IO::Pager…a clever little module that’s quite easy to use. The pattern I’m using is to use it, if it’s available…

 my $token = eval { 
   require IO::Pager;

   IO::Pager::open( *STDOUT, '|-:utf8', 'Unbuffered' );
 };

 print {*STDOUT} <<'END_OF_USAGE';
 usage: $0 options command

 Script to do blah blah blah...

 Options
 -------
 ...

 END_OF_USAGE

Brian d foy introduced the Perl community to the idea of a script and module in the same file (the so-called modulino concept). Essentially, the idea is that you can create a script that provides a command line interface to your Perl module. Essentially, the implementation looks something like this:

 package Foo

 caller or __PACKAGE__->main();

 # package guts
 ...

 sub main {
  my $foo = Foo->new;

  use Getopt::Long;

  # do something interesting here...

  exit 0;
 }

 1;

This model comes in quite handy when writing scripts or just exercising your module during it’s creation.

I sometimes find it useful to then create a bash script that invokes my modulino. Since I might want to point to a development version of the script, I might set my PERL5LIB path to my local development environment. Running the modulino script, I would then expect it to use the version in my path. Hence the following modulino script.

 #!/usr/bin/env bash
 # -*- mode: sh; -*-

 MODULINO="Amazon::Credentials"
 MODULINO_PATH="${MODULINO//::/\/}.pm"

 MODULINO_RUN=$(perl -M$MODULINO -e 'print $INC{"'$MODULINO_PATH'"}';)

 test -n "$DEBUG" && echo $$MODULINO_RUN

 if test -z "$MODULINO_RUN"; then
     echo "$MODULINO is not installed"
     exit 1;
 fi

 perl $MODULINO_RUN "$@"

…and then

 $ amazon-credentials.sh -h
 amazon-credentials.sh options

 Formats credentials found in env, config, SSO, role

 Options
 -------
 --help, -h     this
 --ec2,         get credentials from server IAM profile
 --env          get credentials from environment variables
 --profile, -p  get credentials from profile in credentials configuration
 --role         get credentials from SSO role
 --account      use with --role, specify AWS account id

  $ amazon-credentials.sh --profile=test
   export AWS_ACCESS_KEY_ID=AKI*****************
   export AWS_SECRET_ACCESS_KEY=****************************************

  $ aws sso login
  $ amazon-credentials.sh --role my-sso-role --account 01234567890

More information about modulinos can be found here:

Credit: https://nasa.gov

Although I’ve been programming in Perl for over 25 years, it wasn’t until recently that I’ve had a boost in productivity as great as the one I experienced over the last year. What happened?

Projects Present Opportunities for Change

Stepping out from a more management oriented role at my former employer, they needed a technical leader for a legacy Perl web application migration to the cloud. Somehow I let on that I had some Perl and AWS experience and so I was tabbed as the technical lead. The project involved some heavy lifting of years of crufty Perl code from an on-prem, dedicated Apache server environment to a containerized AWS environment. Aside from the challenges of trying to get legacy Perl code running in a Docker container the team had to mitigate findings from the company’s security scans. This included perlcritic findings as well as findings based on other tools that look for potential security issues.

So, the first hurdle was whittling down the perlcritic findings and making sure we weren’t introducing new findings.

Adding perlcritic to the Build

I’m a big fan of make and autotools so naturally our build was based on GNU autotools. This allowed for a repeatable, standards based, reliable and extensible build system that worked locally using docker-compose and worked in our CI/CD pipeline to deploy Docker images for AWS Fargate.

To make sure that we maintained a baseline of low severity findings from perlcritic, I added a step in our Makefile that ran perlcritic and errored out if any severity level exceeded 2. This prevented any code from going into the repository that would trigger security concerns since all pushes to the repository were scanned.

Moving the Checks Further Upstream

My editor of choice has always been Emacs…let the editor wars begin! I had already added perltidy as an extension to Emacs so that perltidy would be run before any save. Our team standardized on a set of perltidy settings and added the .perltidyrc file to the project along with the .perlcriticrc that configures perlcritic reporting.

Most editors today have language specific syntax highlighting and syntax checking built-in. Flycheck is an Emac’s plugin for syntax checking. Flycheck extensions are available for almost any language and even for things like markdown, JSON and SQL.

Syntax checking as you work in your scripts is another way to move the process of validating your code further upstream, but what really supercharged my development efforts was adding perlcritic checking to Emacs. The combination of Flycheck’s support for perltidy, perlcritic and Perl syntax checking has helped me develop faster with less errors.

Perl Best Practices isn’t just code for “standarization”, although having some standards makes maintenance a heckuva lot easier. No, PBP can highlight potential issues in your code and prevent you from having to debug gnarly problems.

Unfortunately, it’s also a nag.

flycheck

Visiting an old legacy file that someone wrote back in 2012 (before you and they were woke to PBP) is an eye opening experience. When Flycheck gives up after 1527 issues found you start to question whether you really want to edit that code!

Even code that has relatively few findings presents them with electric hilighting that eventually rattles your nerves. It also becomes a bit of an obsession to clean up old code and get down to 0 findings!

In the end tough, the result is better code. Other programmers can read and understand it and the quality will get incrementally better.

Generally speaking then, to supercharge your development efforts move code quality enforcement as far upstream as possible, starting with your editor.

Credit: https://en.wikipedia.org/wiki/Observer_effect_(physics)

In physics there is the concept of the observer effect whereby the act of observing something alters the measurement. Recently, I’ve had the pleasure of experiencing that firsthand.

The evils of tied variables are well documented and frowned upon by our friendly(?) mentor perlcritic, but let’s suppose we want to be clever…another frowned upon attribute of Perl scripts :-( . But let’s forge ahead anyway…

We’ll create a class that we can use to benchmark our scripts. We want this to be easy, so we’ll create a class that allow us to name benchmarks and simply access a hash element when we want to stop and report the time…sort of like this:

 #!/usr/bin/env perl

 use feature 'say';

 use Easy::Benchmark;
 use Data::Dumper;

 my $benchmark = Easy::Benchmark->new( 'begin', 'start' );

 sleep 2;

 say $benchmark->stop('start');

 $benchmark->start('next');

 sleep 1;

 # print {*STDERR} Dumper $benchmark

 say $benchmark->stop('next');

 say $benchmark->stop;

Here’s the class that implements our easy benchmark.

 package Easy::Benchmark;

 use strict;
 use warnings;

 use Benchmark;

 use constant ANON_BENCHMARK => '__anon_benchmark__';

 ########################################################################
 sub new {
 ########################################################################
   my ( $proto, @args ) = @_;

   my $class = ref $proto || $proto;

   my $self = bless {}, $class;

   tie %{$self}, $class, @args;

   return $self;
 }

 ########################################################################
 sub TIEHASH {
 ########################################################################
   my ( $class, @args ) = @_;

   my $self = bless {}, $class;

   # start the clock on an anonymous benchmark
   $self->start(ANON_BENCHMARK);

   for (@args) {
     # and any additional ones requested.
     $self->start($_);
   }

   return $self;
 }

 ########################################################################
 sub FETCH {
 ########################################################################
   my ( $self, $name ) = @_;

   die "$self: Unknown benchmark named `$name'\n"
     if !exists $self->{$name};

   if ( !ref( $self->{$name} ) || ref $self->{$name} ne 'ARRAY' ) {
     $self->{$name} = [ Benchmark->new, $self->{$name} ];
   }

   return timestr( timediff( @{ $self->{$name} } ) );
 }

 ########################################################################
 sub DELETE   { return delete $_[0]->{ $_[1] }; }
 sub FIRSTKEY { my $a = scalar CORE::keys %{ $_[0] }; each %{ $_[0] } }
 sub NEXTKEY  { each %{ $_[0] } }
 sub STORE    { $_[0]->{ $_[1] } = $_[2] }
 ########################################################################

 ########################################################################
 sub start {
 ########################################################################
   my ( $self, $name ) = @_;

   die "must specify name of counter\n"
     if !$name;

   my $start = Benchmark->new;

   $self->{$name} = $start;

   return $start;
 }

 ########################################################################
 sub stop {
 ########################################################################
   my ( $self, @args ) = @_;

   if ( @args > 0 ) {
     return $self->{ $args[0] };
   }
   else {
     my $name = ANON_BENCHMARK;

     my $value = $self->{$name};

     delete $self->{$name};

     return $value;
   }
 }

 1;

Running our script produces the expect output…

perl -I . foo.pl
 2 wallclock secs ( 0.01 usr +  0.00 sys =  0.01 CPU)
 1 wallclock secs ( 0.00 usr +  0.00 sys =  0.00 CPU)
 3 wallclock secs ( 0.01 usr +  0.00 sys =  0.01 CPU)

Let’s remove the comment on the line that dumps our $benchmark object…and see what happens

perl -I . foo.pl
 2 wallclock secs ( 0.00 usr +  0.00 sys =  0.00 CPU)
$VAR1 = bless( {
                 '__anon_benchmark__' => ' 2 wallclock secs ( 0.00 usr +  0.00 sys =  0.00 CPU)',
                 'begin' => ' 2 wallclock secs ( 0.00 usr +  0.00 sys =  0.00 CPU)',
                 'next' => ' 0 wallclock secs ( 0.00 usr +  0.00 sys =  0.00 CPU)',
                 'start' => ' 2 wallclock secs ( 0.00 usr +  0.00 sys =  0.00 CPU)'
               }, 'Easy::Benchmark' );
 0 wallclock secs ( 0.00 usr +  0.00 sys =  0.00 CPU)
 2 wallclock secs ( 0.00 usr +  0.00 sys =  0.00 CPU)

Hmmm…I expected our last benchmark to be 3 wall seconds…that wasn’t what I expected. By now you probably know the problem. The class uses the FETCH method to stop a timer. So, when Dumper fetches the keys of the $benchmark object, the timer is restarted. The act of observing the hash triggered an action that resulted in upsetting our experiment.

I am embarassed to say it took me an hour or so to piece it all together since I didn’t know who (or what) was accessing the hash in an effort to provide some useful log messages.

Generalizing, any time you create a class that triggers actions when fetching values from hash, be aware of the observer effect.

Thanks for reading…

Credit: https://medium.com/xebia-engineering/branching-and-release-strategies-119e60241345

This is part III of my blog on versions. Last time we discussed the question of what version of perl to use for development. Conclusion: In most cases, the choice will be made for you, especially in maintenance and legacy migration situations. If you have the luxury and ability to use a more version of perl, use it. Let’s continue:

  • [x] What version of perl should I be using for development?
  • [ ] If I publish a CPAN module, what version(s) of perl should I support?
  • [ ] What version of a dependency do I really need in my module?
  • [ ] What should I do if I receive errors in the CPAN Testers Report?
  • [ ] Should I include core modules as dependencies? (the answer is not as obvious as you might think)

If I publish a CPAN module, what version(s) of perl should I support?

For me, this has become more than just an academic question. Several years I ago I petitioned the Pause administrators to allow me to adopt Amazon::S3. It had been a long time since the module was updated and it needed a little TLC to support new features and fix some deficiencies. The adoption process took a lot longer than I had imagined. Between the time I requested co-maintainer rights and the time it was actually granted the urgency and desire to adopt the module waned. Recently the need to update this module has re-appeared so I began making some much needed improvements. My first release to CPAN of the updated module was 0.49 and thus began my introduction to CPAN Testers Reports.

Version Max Version Passing Min Version Passing First Failure O/Ss With Passing Version
0.49 5.36.0 5.14.4 5.32.1 darwin, freebsd, linux, mswin, openbsd, solaris
0.50 5.37.1 5.14.4 5.34.1 darwin, freebsd, linux, mswin, openbsd, solaris
0.51 5.37.1 5.14.4 5.18.4 darwin, freebsd, linux, mswin, openbsd, solaris
0.52 5.37.1 5.14.4 5.8.9 darwin, freebsd, linux, mswin, openbsd, solaris
0.53 5.37.1 5.14.4 NA darwin, freebsd, linux, mswin, openbsd, solaris
0.54 5.37.1 5.14.4 5.37.1 darwin, freebsd, linux, mswin, openbsd, solaris

My first release only passed (accidentally) on a few versions but seemed to passs on at least 1 version of all operating systems tested. The failures were all due to missing dependencies that I had failed to identify in Makefile.PL. Thus a new release (0.50) was required - somehow I made the same mistake, however as you can see from the report, a new max version (5.37.1) emerges due to one of the testers adding that version to its list of perl versions to test. I say the tests that passed, passed accidentally, because I’m guessing that those testers had (for some reason) the necessary modules installed in their environment.

That’s an important point to consider - all testing environments are not created equally. By looking at the reports, we can conclude empircally that there must be something different about these environments if the module fails on one tester’s version of perl 5.37.1 and another passes? Take this report for example:

CPAN Testers Matrix: Amazon-S3 0.54

Note the failed tester and the OS version. The version of the OS is different in the failing tester’s report…however the failure was due to XML::Simple not being installed because it was missing from the list of dependencies in Makefile.PL. So why did the other’s pass? The conclusion must be that XML::Simple was somehow present in those environments otherwise they would would failed there as well.

Some other oddities of the CPAN Testers Reports is what versions of perl are used for testing the module. After releasing 0.50 I noticed that version 5.8.9 of perl suddenly showed up in the testers report. Hmmm…should I try to support that version? Is there a reason it shouldn’t pass on 5.8.9? What versions of perl did the original author’s 0.45 version support? Well, as it turns out when that module was tested in 2012, it support 5.6.2! Should I support 5.6.2? Would it actually be possible to support perl 5.6.2?

The answer as it turns out is no. Version 0.45 would probably not pass version 5.14.0 due to the versions of perl that are required by the module’s dependencies.

Version 0.45 specifies that any version of LWP::UserAgent::Determined will do, which in term will require any version of LWP.

Makefile.PL

As I futilely attempted to see what the minimum version of perl my adopted module could support I attempted to build LWP on version 5.10.0 of perl. Unfortunately, LWP would not build on 5.10.0 (actuall it was the module HTTP::HeadParser that is part of libwww-perl that failed unit tests) so I dutifully filed an issue assuming that the authors of LWP would want to know that the module does not build on 5.10.0.

HTML::HeadParser t/threads.t fails on perl 5.10.0 #28

The response to my report was suprisingly - let’s bump the required version of perl to 5.14. So, while LWP successfully builds on 5.14.0, LWP::UserAgent::Determined does not.

LWP::UserAgent::Determined

After a lot of useless traversing down rabbit holes I was able to determine that installing an older version of LWP would, in fact succeed in building LWP::UserAgent::Determined. The rub here however is that other modules like XML::Simple require yet other modules that specifed that any version of LWP would do. If the dependency that required any version of LWP is installed prior to installing a specific older version of LWP listed in my manifest, well, now we’ve installed an incompatible version of LWP because specifying any version of LWP will bring in the latest version.

Now, if cpanm had done the right thing and installed specific versions of modules specified in the dependency list first, then the older version of LWP would have satisfied the requirement. This is not to say that cpanm did the wrong thing…it got the list of direct dependencies from the distribution and started installing them.

To be fair, the versions of perl I have been testing against are old. CPAN authors should not be bound to supporting older versions of perl at the expense of modernizing or enhancing their modules. The point is to simply illustrate the difficulty in releasing a CPAN module that does support older versions of perl.

So should I then conclude that I need to bump the minimum version of perl that Amazon::S3 will support to something like 5.16.3? If I still want to support older versions of perl would it even be possible to find the versions of dependent modules that do not break my build? Does that even make any sense when modules are sometimes updated to fix bugs?

As it happens, I did manage to get a working build for perl 5.10.0 and 5.14.0. Of course, it was a forensics exercise and not very satisfying to be truthful.

By installing version 6.13 of LWP and version 2.14 of HTML::HeadParser I managed to somehow get Amazon::S3 to build and complete all tests successfully.

The guidelines to support older versions of perl seem to be:

  1. Only use features from the minimum version of perl you plan to support
  2. Verify that any dependent modules also conform to #1 and can be built on the minimum version of perl you plan to support
  3. Verify that your unit tests are comprehensive enough so that if there are bugs in older dependent modules, they do not effect your code
  4. Be specific about the version of dependent modules required by your module

more next time…

Credit: https://medium.com/xebia-engineering/branching-and-release-strategies-119e60241345

This is part II of my blog on my version woes. Part I can be found here.

Let’s review some of the issues I identified in part I.

  • What version of perl should I be using for development?
  • If I publish a CPAN module, what version(s) of perl should I support?
  • What version of a dependency do I really need in my module?
  • What should I do if I receive errors in the CPAN Testers Report?
  • Should I include core modules as dependencies? (the answer is not as obvious as you might think)

What version of perl should I be using for development?

It depends. For new development, it would make the most sense to use a current version of perl (5.38.0), but when do we truly do new development in the Perl language these days? Even in companies with sufficient Perl programmers it’s a risk to start new applications in a language that increasingly is being ignored by both vendors and new developers. Libraries for APIs are tough to come by and young developers are not making it a priority to learn Perl. That’s a shame, because learning some Perl is like learning vi. You’ll see it on almost every Linux based system you touch. It’s ubiquity demands some awareness.

In the old days of computing the saying was “No one ever got fired selecting IBM.”. I think one can use that same sentiment when talking about language choices for new applications. “No one ever got fired for selecting Python, C# or Java.”. It would be a questionable choice to select Perl for new development today due to the obvious issues we all recognize.

  • Lack of vendor support for new APIs
  • Scarcity and aging of competent Perl developers
  • Cost of Perl developers
  • Unknown, confused trajectory for the language
  • All of the inherent deficiencies of the language itself (see Curtis Poe’s excellent video describing Corrina’s attempt to address these issues)

So, for new development it seems that the majority of shops that might have Perl developers might still choose to look elsewhere. Perl work then seems to be relegated to maintaining and enhancing legacy Perl applications that would be difficult, costly or impossible to rewrite in a more modern language. Still, one could conceivably review the version of perl being used and choose a more modern one as a target for enhancing the maintainability of the application. But that comes with its own set of problems.

Will the application work if I migrate it to a new version of perl?

Who knows? You just have to try it…but:

  • Should I go straight from 5.x to 5.38.0?
  • There are no unit tests for the scripts and modules that comprise the application
  • There are no integration tests that test the features of the application
  • The application relies on older CPAN modules with undocumented version dependencies (and worse unknown dependencies for building said modules)
  • The application also relies on core modules, some of which may have been removed from the current version of Perl
  • The application uses the system perl executable with unknown dependencies on its attributes

Even attempting to create a new development environment for legacy Perl applications using the same version of perl can be difficult if not impossible. As I have discovered in my most recent job, building CPAN modules that rely on C libraries is not a particulary easy exercise. For example, attempting to build JavaScript::V8 required hours of forensics and internet time travel…ultimately resulting in this solution.

FROM amazonlinux:2
COPY epel.repo /etc/yum.repos.d/epel.repo
COPY RPM-GPG-KEY-EPEL-6 /etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-6
RUN yum install -y wget 'perl(App::cpanminus)'
RUN yum groupinstall -y 'Development Tools'
RUN cpanm -v App::cpanminus
RUN wget ftp://ftp.pbone.net/mirror/vault.centos.org/6.7/cr/x86_64/Packages/libicu-4.2.1-14.el6.x86_64.rpm
RUN rpm -ivh libicu-4.2.1-14.el6.x86_64.rpm
RUN yum install -y libuv
RUN yum install -y v8 v8-devel 
RUN cpanm -v JavaScript::V8@0.09

Of course, the obvious question is “Why are you using JavaScript::V8?”.

Reminder: This is a legacy application with no documentation, no unit tests and no integration tests. Changing anything in the application is like walking in a mine field with size 14 boots.

So, what perl version should you use?

Since I was unable to build a necessary dependency on a system with version 5.36.0 of perl, the decision of which perl version to use for this application was made for me…stick with the version running in production for now.

For legacy applications my approach has been more or less the trial and error approach…optimistically trying to update the version of perl and seeing where it takes you. Most of the time with legacy applications it doesn’t take you far. I try to build and unit test (to whatever degree possible) as much of the application as I can with a more modern version of perl, its core modules, and the latest version of any CPAN dependencies. Until that process is proven successful (or not) the application is built using the versions of modules and of perl that are currently running in production.

Here are the basic steps I have been using with legacy applications.

  1. Identify the artifacts of the application (not always as easy as it sounds)
  2. If the source is not yet under source control, create a git repository and create a baseline. Don’t try to do anything fancy at this point - just reflect the structure of the application as it is laid down on the target servers in your repository
  3. Identify all dependencies (core and CPAN modules). Create a manifest of dependencies and versions being used in the production environment.
  4. Run perlcritic to gauge the mess - save the results to measure your progress in the cleanup process
  5. Try to build a containerized (Docker) version of the application in the currently running (older) perl version using your original manifest.
  6. Do a gap analysis between versions being used and the latest version of the modules.
  7. Try to build the application and all dependencies in the newer target version using the latest version of core and CPAN modules.
  8. Test components and/or the application as a whole if possible identifying version problems and incompatibilities with the target perl version.

You will first encounter problems trying to build a containerized version of the as-is application, especially if the legacy application resides on a production server that has been poorly maintained. Poorly maintained being defined as one or more people having manually installed things over the years without of course documenting what they have installed or what additional dependencies were needed. If you can get by this step and have successfully built the application in a containerized environment, then you at least have the basis for a more modern approach to software development.

  • development environments isolated from test or production system that are relatively easy to create
  • a build system that can be woven into you CI/CD pipeline
  • documentation of dependencies
  • a stable base for creating unit tests

Some Useful Tools for Legacy Applications

more next time…