Session #1 - refactoring a block containing map

The code below violates more than a few best practices making the intent of the code difficult to understand. Let’s get started…

User

Can you refactor this snippet of Perl code?

if (@multi_elems) {
      my $delim_qr1 = qr/\d+$group_delim1/o;
      #my $delim_qr2 = $group_delim2;
      foreach my $entry ( @log_data{@EIDs} ) {
        @{$entry}{@multi_elems}
          = map {
          my $x = $_;
          $x && ( $x =~ s/$delim_qr1//go )
            ? [
            # XXX: if $_ =~ m/^($group_delim2)+$/
            #      the result is ()
            #      at least on perl 5.8.0
            split /$group_delim2/, $x
            ]
            : []
          } @{$entry}{@multi_elems};
      }
      #print STDERR "$err\n" if $err;
    }

ChatGPT

Here’s the refactored version of your provided Perl code. I’ve added comments to explain what the code is doing and made some minor improvements for readability:

 # Check if there are any multi_elems
 if (@multi_elems) {
     # Define the regex delimiter for group_delim1
     my $delim_qr1 = qr/\d+$group_delim1/o;
     # Iterate through each entry in the log_data hash
     foreach my $entry (@log_data{@EIDs}) {
         # Update each multi_elem in the current entry
         @{$entry}{@multi_elems} = map {
             my $x = $_;
             # Check if the value matches the delimiter regex
             $x && ($x =~ s/$delim_qr1//go) ?
                 [
                     # Split the value by group_delim2
                     split /$group_delim2/, $x
                 ]
                 : [] # If the value doesn't match, return an empty array reference
         } @{$entry}{@multi_elems};
     }
 }

The refactored code is more readable and has comments explaining the purpose of each block. Note that the code logic has not been changed, and it should function the same as the original snippet.

User

The use of the ternary operator obfuscates the meaning

ChatGPT

I understand your concern. Here’s the refactored version of the code using an if statement instead of the ternary operator to improve readability:

 # Check if there are any multi_elems
 if (@multi_elems) {
     # Define the regex delimiter for group_delim1
     my $delim_qr1 = qr/\d+$group_delim1/o;
     # Iterate through each entry in the log_data hash
     foreach my $entry (@log_data{@EIDs}) {
         # Update each multi_elem in the current entry
         @{$entry}{@multi_elems} = map {
             my $x = $_;
             my $result;
             # Check if the value matches the delimiter regex
             if ($x && ($x =~ s/$delim_qr1//go)) {
                 # Split the value by group_delim2
                 $result = [split /$group_delim2/, $x];
             } else {
                 # If the value doesn't match, return an empty array reference
                 $result = [];
             }
             $result;
         } @{$entry}{@multi_elems};
     }
 }

In this version, I replaced the ternary operator with an if-else statement, which assigns the result to the $result variable. This should make the code more understandable and easier to follow.

User

Excellent but I don’t like cuddled elses

ChatGPT

Sure, I’ve updated the code to uncuddle the else statement:

 # Check if there are any multi_elems
 if (@multi_elems) {
     # Define the regex delimiter for group_delim1
     my $delim_qr1 = qr/\d+$group_delim1/o;
     # Iterate through each entry in the log_data hash
     foreach my $entry (@log_data{@EIDs}) {
         # Update each multi_elem in the current entry
         @{$entry}{@multi_elems} = map {
             my $x = $_;
             my $result;
             # Check if the value matches the delimiter regex
             if ($x && ($x =~ s/$delim_qr1//go)) {
                 # Split the value by group_delim2
                 $result = [split /$group_delim2/, $x];
             }
             else {
                 # If the value doesn't match, return an empty array reference
                 $result = [];
             }
             $result;
         } @{$entry}{@multi_elems};
     }
 }

Now, the else statement is on a new line, making the code more aligned with your preferred style.

User

We are using Perl 5.16. Are there any changes you might suggest?

ChatGPT

Perl 5.16 doesn’t introduce any significant changes that would impact the provided code snippet. However, I can suggest a few general improvements for better readability and maintainability:

  1. Use strict and warnings pragmas to enforce stricter code quality: use strict; use warnings;
  2. Use the foreach loop with a lexical iterator variable to avoid potential conflicts with global variables: for my $entry (@log_data{@EIDs}) { # ... }

With these suggestions, the updated code looks like this:

 use strict;
 use warnings;
 # Check if there are any multi_elems
 if (@multi_elems) {
     # Define the regex delimiter for group_delim1
     my $delim_qr1 = qr/\d+$group_delim1/o;
     # Iterate through each entry in the log_data hash
     for my $entry (@log_data{@EIDs}) {
         # Update each multi_elem in the current entry
         @{$entry}{@multi_elems} = map {
             my $x = $_;
             my $result;
             # Check if the value matches the delimiter regex
             if ($x && ($x =~ s/$delim_qr1//go)) {
                 # Split the value by group_delim2
                 $result = [split /$group_delim2/, $x];
             }
             else {
                 # If the value doesn't match, return an empty array reference
                 $result = [];
             }
             $result;
         } @{$entry}{@multi_elems};
     }
 }

These changes should enhance the code quality and maintainability while keeping the same functionality.

User

My understanding is that the ‘o’ regex modifier may no longer be required.

ChatGPT

You’re correct. The /o modifier for regular expressions is no longer needed in most cases, as Perl optimizes the regex automatically if the pattern doesn’t change during the runtime of the program. The /o modifier was useful in older versions of Perl (prior to 5.6) when the regex engine didn’t optimize regexes in the same way. In modern Perl, the engine is smart enough to recognize when a regex is constant and cache the compiled version automatically. You can remove the /o modifier from the regex in your code snippet:

 use strict;
 use warnings;
 # Check if there are any multi_elems
 if (@multi_elems) {
     # Define the regex delimiter for group_delim1
     my $delim_qr1 = qr/\d+$group_delim1/;
     # Iterate through each entry in the log_data hash
     for my $entry (@log_data{@EIDs}) {
         # Update each multi_elem in the current entry
         @{$entry}{@multi_elems} = map {
             my $x = $_;
             my $result;
             # Check if the value matches the delimiter regex
             if ($x && ($x =~ s/$delim_qr1//g)) {
                 # Split the value by group_delim2
                 $result = [split /$group_delim2/, $x];
             }
             else {
                 # If the value doesn't match, return an empty array reference
                 $result = [];
             }
             $result;
         } @{$entry}{@multi_elems};
     }
 }

The performance and behavior of the code should remain the same after removing the /o modifier.

User

Perl best practices indicates that a map function should only contain one statement. Using the map function above seems to reduce readability since the object being interated over with some complex logic is at the end of the expression.

ChatGPT

I agree with you. Using a map function with complex logic can reduce readability. In this case, we can refactor the code to use a for loop instead of the map function to improve readability:

 use strict;
 use warnings;
 # Check if there are any multi_elems
 if (@multi_elems) {
     # Define the regex delimiter for group_delim1
     my $delim_qr1 = qr/\d+$group_delim1/;
     # Iterate through each entry in the log_data hash
     for my $entry (@log_data{@EIDs}) {
         # Iterate through each multi_elem in the current entry
         for my $multi_elem (@multi_elems) {
             my $x = $entry->{$multi_elem};
             my $result;
             # Check if the value matches the delimiter regex
             if ($x && ($x =~ s/$delim_qr1//g)) {
                 # Split the value by group_delim2
                 $result = [split /$group_delim2/, $x];
             }
             else {
                 # If the value doesn't match, return an empty array reference
                 $result = [];
             }
             # Update the multi_elem in the current entry
             $entry->{$multi_elem} = $result;
         }
     }
 }

By using a for loop, the code becomes more readable, and it’s easier to understand the logic being applied to the elements.

User

I think the else statement in the inner for loop can be removed and we can assign result to an empty array reference when we define $result

ChatGPT

You’re right. We can simplify the code further by initializing $result to an empty array reference and updating it only when the condition is met. Here’s the updated code:

 use strict;
 use warnings;
 # Check if there are any multi_elems
 if (@multi_elems) {
     # Define the regex delimiter for group_delim1
     my $delim_qr1 = qr/\d+$group_delim1/;
     # Iterate through each entry in the log_data hash
     for my $entry (@log_data{@EIDs}) {
         # Iterate through each multi_elem in the current entry
         for my $multi_elem (@multi_elems) {
             my $x = $entry->{$multi_elem};
             # Initialize the result to an empty array reference
             my $result = [];
             # Check if the value matches the delimiter regex
             if ($x && ($x =~ s/$delim_qr1//g)) {
                 # Split the value by group_delim2
                 $result = [split /$group_delim2/, $x];
             }
             # Update the multi_elem in the current entry
             $entry->{$multi_elem} = $result;
         }
     }
 }

By initializing $result to an empty array reference, we can remove the else statement, making the code more concise and easier to read.

User

The $result variable also now seems unnecessary and we can further refactor this by directly assigning $entry->{$multi_elem} to an empty array and using the next statement to short circuit the if statement

ChatGPT

You’re right. We can further simplify the code by removing the $result variable and directly assigning the value to $entry->{$multi_elem}. Here’s the refactored code:

 use strict;
 use warnings;
 # Check if there are any multi_elems
 if (@multi_elems) {
     # Define the regex delimiter for group_delim1
     my $delim_qr1 = qr/\d+$group_delim1/;
     # Iterate through each entry in the log_data hash
     for my $entry (@log_data{@EIDs}) {
         # Iterate through each multi_elem in the current entry
         for my $multi_elem (@multi_elems) {
             my $x = $entry->{$multi_elem};
             # Initialize the multi_elem in the current entry to an empty array reference
             $entry->{$multi_elem} = [];
             # Check if the value matches the delimiter regex, and short circuit if not
             next unless ($x && ($x =~ s/$delim_qr1//g));
             # Update the multi_elem in the current entry by splitting the value by group_delim2
             $entry->{$multi_elem} = [split /$group_delim2/, $x];
         }
     }
 }

By directly assigning an empty array reference to $entry->{$multi_elem} and using the next statement to short circuit the loop if the condition is not met, the code becomes even more concise and easier to read.

User

perl best practices suggests avoiding unless

ChatGPT

You’re right. Using unless can sometimes make the code less readable. Instead, we can use an if statement with a negated condition. Here’s the updated code:

 use strict;
 use warnings;
 # Check if there are any multi_elems
 if (@multi_elems) {
     # Define the regex delimiter for group_delim1
     my $delim_qr1 = qr/\d+$group_delim1/;
     # Iterate through each entry in the log_data hash
     for my $entry (@log_data{@EIDs}) {
         # Iterate through each multi_elem in the current entry
         for my $multi_elem (@multi_elems) {
             my $x = $entry->{$multi_elem};
             # Initialize the multi_elem in the current entry to an empty array reference
             $entry->{$multi_elem} = [];
             # Check if the value matches the delimiter regex, and short circuit if not
             if (!($x && ($x =~ s/$delim_qr1//g))) {
                 next;
             }
             # Update the multi_elem in the current entry by splitting the value by group_delim2
             $entry->{$multi_elem} = [split /$group_delim2/, $x];
         }
     }
 }

By using an if statement with a negated condition instead of unless, the code conforms better to Perl best practices and is easier to understand.

User

How do you think this code would compare to the original in terms of performance under perl 5.16

ChatGPT

The refactored code should have similar performance to the original code under Perl 5.16. The changes made in the refactored version mostly deal with readability and best practices, but the overall logic and flow of the code remain the same. Some improvements in the refactored code, such as removing the unnecessary else statement and the $result variable, can slightly reduce memory usage and improve the code’s readability without affecting the performance. The direct assignment of an empty array reference and the use of the next statement can make the code more efficient in some cases by avoiding unnecessary assignments. However, it’s important to note that the actual performance difference might not be significant, and it would depend on the specific use case, data being processed, and the overall complexity of the program. In general, focusing on readability and maintainability is a good approach, as it will make it easier for you and others to work with and optimize the code in the future.

Session 2 - Refactoring Obfuscated Code

Yikes! What the heck does this code do?

User

Please refactor this Perl snippet for readability

@{$_}{ @elem_ids_ref, @elem_ids_noref }
        = ( ( map { [] } @elem_ids_ref ), @defaults_noref )
        for @log_data{@EIDs};

ChatGPT

Here’s the refactored version of the provided Perl snippet with improved readability:

for my $entry (@log_data{@EIDs}) {
    # Initialize array references for elements in @elem_ids_ref
    @{$entry}{@elem_ids_ref} = map { [] } @elem_ids_ref;
    # Assign the default values for elements in @elem_ids_noref
    @{$entry}{@elem_ids_noref} = @defaults_noref;
}

In this refactored version, I’ve replaced the single line of code with a more readable for loop that iterates over each entry in @log_data{@EIDs}. Within the loop, I’ve separated the assignment of array references for elements in @elem_ids_ref and the assignment of default values for elements in @elem_ids_noref. This should make the code easier to understand and maintain.

Session 3 - Creating a Rules Engine from a Snippet

In this session I have an inkling what I want to do with this code, remove the next statements and make this more of a condition based dispatcher.

User

The code below uses the next statement 4 times and uses unless which is not considered to be a best practice. How could we avoid using next in this fashion?

foreach my $elem ( $self->elements() ) {
          next unless $elem->table();
          next
            if ( $elem->{type} eq 'File' || $elem->{type} eq 'Signature' )
            && !$elem->copy_file;
          next unless $elem->$allow_hybrid_method;
          next unless exists $data->{ $elem->{uniq} };
          $elem->{value} = $data->{ $elem->{uniq} };
        }

ChatGPT

Here’s a refactored version of the provided Perl code snippet that avoids using next and unless:

foreach my $elem ($self->elements()) {
    if ($elem->table()
        && !($elem->{type} eq 'File' || $elem->{type} eq 'Signature') || $elem->copy_file
        && $elem->$allow_hybrid_method
        && exists $data->{$elem->{uniq}}
    ) {
        $elem->{value} = $data->{$elem->{uniq}};
    }
}

In this version, I combined all the conditions into a single if statement, which makes the code more straightforward and easier to read.

User

That does seem more readable - each of the conditions is essentially a rule that results in assigning a value to a hash member - is there another way to express this functionality?

ChatGPT

Yes, you can create an array of coderefs representing each condition, and then iterate through them to check whether all conditions are met. This approach makes the code more modular and easier to extend or modify in the future:

my @conditions = (
    sub { $_[0]->table() },
    sub { $_[0]->{type} ne 'File' && $_[0]->{type} ne 'Signature' || $_[0]->copy_file },
    sub { $_[0]->$allow_hybrid_method },
    sub { exists $_[0]->{uniq} && exists $data->{ $_[0]->{uniq} } },
);
foreach my $elem ($self->elements()) {
    my $all_conditions_met = 1;
    for my $condition (@conditions) {
        unless ($condition->($elem)) {
            $all_conditions_met = 0;
            last;
        }
    }
    if ($all_conditions_met) {
        $elem->{value} = $data->{$elem->{uniq}};
    }
}

In this version, we define an array of coderefs (@conditions) that represent each condition. Then, during the iteration over elements, we check whether all conditions are met by iterating through the @conditions array. If all conditions are met, we assign the value to the hash member. This approach makes it easy to add, remove, or modify conditions in the future.

User

Here we go with unless again

ChatGPT

Apologies for that oversight. Let’s rewrite the code snippet without using unless:

my @conditions = (
    sub { $_[0]->table() },
    sub { $_[0]->{type} ne 'File' && $_[0]->{type} ne 'Signature' || $_[0]->copy_file },
    sub { $_[0]->$allow_hybrid_method },
    sub { exists $_[0]->{uniq} && exists $data->{ $_[0]->{uniq} } },
);
foreach my $elem ($self->elements()) {
    my $all_conditions_met = 1;
    for my $condition (@conditions) {
        if (!$condition->($elem)) {
            $all_conditions_met = 0;
            last;
        }
    }
    if ($all_conditions_met) {
        $elem->{value} = $data->{$elem->{uniq}};
    }
}

Now, the unless keyword has been replaced with a negated if statement, which makes the code more in line with the best practices.

User

The foreach loop looks very generic and could be a subroutine of its own, with an argument of the conditions

ChatGPT

You’re right, the loop can be extracted into a separate subroutine. Here’s a refactored version of the code, where the loop is wrapped in a subroutine called process_elements:

sub process_elements {
    my ($elements, $data, $conditions) = @_;
    ELEMENT:
    foreach my $elem (@$elements) {
        for my $condition (@$conditions) {
            next ELEMENT unless $condition->($elem);
        }
        $elem->{value} = $data->{$elem->{uniq}};
    }
}
my @conditions = (
    sub { $_[0]->table() },
    sub { $_[0]->{type} ne 'File' && $_[0]->{type} ne 'Signature' || $_[0]->copy_file },
    sub { $_[0]->$allow_hybrid_method },
    sub { exists $_[0]->{uniq} && exists $data->{ $_[0]->{uniq} } },
);
process_elements([$self->elements()], $data, \@conditions);

The process_elements subroutine takes a list of elements, data, and an array reference of conditions as its arguments. It iterates through the elements, and for each element, it checks if all conditions are met. If all conditions are satisfied, it assigns a value to the element’s hash member.

User

The calling sequence of process_elements() is specific to this use case. How can we make it more generic?

ChatGPT

To make the process_elements subroutine more generic, you can pass a code reference for the action to be performed when all conditions are met. This way, the subroutine can be used in various scenarios, not just for the specific use case. Here’s the modified version:

sub process_elements {
    my ($elements, $conditions, $action) = @_;
    ELEMENT:
    foreach my $elem (@$elements) {
        for my $condition (@$conditions) {
            next ELEMENT unless $condition->($elem);
        }
        $action->($elem);
    }
}
my @conditions = (
    sub { $_[0]->table() },
    sub { $_[0]->{type} ne 'File' && $_[0]->{type} ne 'Signature' || $_[0]->copy_file },
    sub { $_[0]->$allow_hybrid_method },
    sub { exists $_[0]->{uniq} && exists $data->{ $_[0]->{uniq} } },
);
my $action = sub {
    my ($elem) = @_;
    $elem->{value} = $data->{$elem->{uniq}};
};
process_elements([$self->elements()], \@conditions, $action);

Now the process_elements subroutine takes an additional argument, $action, which is a code reference for the action to be performed when all conditions are met. This makes the subroutine more generic and reusable in different situations.

User

Let’s refactor process_elements() to receive a hash of arguments.

ChatGPT

To refactor the process_elements subroutine to receive a hash of arguments, you can make the following changes:

sub process_elements {
    my %args = @_;
    my $elements   = $args{elements};
    my $conditions = $args{conditions};
    my $action     = $args{action};
    ELEMENT:
    foreach my $elem (@$elements) {
        for my $condition (@$conditions) {
            next ELEMENT unless $condition->($elem);
        }
        $action->($elem);
    }
}
my @conditions = (
    sub { $_[0]->table() },
    sub { $_[0]->{type} ne 'File' && $_[0]->{type} ne 'Signature' || $_[0]->copy_file },
    sub { $_[0]->$allow_hybrid_method },
    sub { exists $_[0]->{uniq} && exists $data->{ $_[0]->{uniq} } },
);
my $action = sub {
    my ($elem) = @_;
    $elem->{value} = $data->{$elem->{uniq}};
};
process_elements(
    elements   => [$self->elements()],
    conditions => \@conditions,
    action     => $action
);

Now the process_elements subroutine accepts a hash of arguments instead of a list, which makes it easier to understand the purpose of each argument when calling the subroutine.

User

Is there another reason this might be a better subroutine?

ChatGPT

Yes, there are several reasons why refactoring the process_elements subroutine this way might be better:

  1. Flexibility: The refactored version of the subroutine is more flexible, as it allows for different sets of conditions and actions to be applied to the elements. This makes it more reusable for different use cases and easier to adapt when requirements change.
  2. Readability: By separating the conditions and action from the core loop, the code is easier to read and understand. This also makes it simpler to modify or extend the conditions and actions without affecting the core loop functionality.
  3. Maintainability: Encapsulating the functionality within a separate subroutine makes it easier to maintain and test. It also allows for better code organization and reduces the likelihood of introducing bugs when making changes to the code.
  4. Scalability: The refactored version of the subroutine can be easily extended to handle additional conditions or actions as needed, without having to modify the core loop. This makes the code more scalable and future-proof.

Overall, the refactored version of the process_elements subroutine is more modular, flexible, and easier to understand, making it a better design choice.


Next post: Containerizing a Barcode Reader

Previous post: Use named constants but don’t use constant.