19 Part 17: grep and map

LinuxChix Perl Course Part 17: grep and map

1) Introduction
2) grep - filters a list
3) map - transforms the values of a list
4) What "grep" and "map" have in common
5) Exercises
6) Answer to Previous Exercise
7) Past Information
8) Credits
9) Licensing

-----------------------------------

1) Introduction

Before we finish looking at arrays in Perl, I thought we should take a
quick look at two handy Perl functions: "grep" and "map". Both functions
are technically operators because they allow you to do magical things
that a function can't do, but syntactically they look like functions, so
we refer to them as functions here.

Let me add that this will be the last e-mail before January. It's that
busy-busy-busy time of year and I'm afraid that I have no more time to
write about Perl than you have to read about it.

-----------------------------------

2) grep - filters a list

The "grep" function returns only the elements of a list that meet a
certain condition:

@positive_numbers = grep($_ > 0, @numbers);

As you can see, each element is refered to as "$_". This (plus the fact
that parentheses are optional) allows you write commands that look
similar to invocations of the Unix "grep" program:

@non_blank_lines = grep /\S/, @lines;

In addition, you can specify a code block rather than a single condition:

@non_blank_lines = grep { /\S/ } @lines; # Equivalent to the above.

Obviously it doesn't matter in this case, but code blocks are helpful
when you want a complex filter with multiple lines of code. The result
of the code block is the result of the last statement executed:

# All positive numbers can be used as exponents,
# but negative exponents must be integers.
@can_be_used_as_exponent = grep {
if ( $_ < 0 ) {
! /\./; # No decimal point -> integer.
}
else {
1; # Always true.
}
} @array;

-----------------------------------

3) map - transforms the values of a list

The "map" function applies a transformation to each element of a list
and returns the result, leaving the original list unchanged (unless you
mess it up; more on that in a moment).

@lines_with_newlines = map( $_ . "\n", @lines_without_newlines);

As with "grep", each value in the list is refered to as "$_".

"map" can also take a block of code:

# Replace "x@y.z" with "x at y dot z" to confuse spammers.
@disguised_addresses = map {
my $email = $_;
$email =~ s/\@/ at /;
$email =~ s/\./ dot /g;
$email;
} @email_addresses;

Note that it's important not to change "$_" because that would change
the original "@email_addresses" (and you wouldn't get what you wanted in
"@disguised_addresses").

"map" needs not be a one-to-one mapping. For example, in the following code:

@words = map m/\b(\w+)\b/g, @lines; # Spaces are for clarity.

the regular expression splits a string into a list of words. The "map"
function returns the result of joining all the small lists. If a line
contains no words, the regular expression will return an empty list, and
that's okay.

-----------------------------------

4) What "grep" and "map" have in common

"grep" and "map" have a lot in common. They both "magically" take a
piece of code (either an expression or a code block) as a parameter. You
need to put a comma after an expression but shouldn't put a comma after
a code block.

Changing "$_" in "grep" or "map" will change the original list. This
isn't generally a good idea because it makes the code hard to read.
Remember that "map" builds a list of results by evaluating an
expression, NOT by setting "$_".

A side effect of this fact is that you should not use "s///" with "map".
The "s///" operator changes "$_" rather than returning a result, so you
won't get what you would expect if you use it with "map" (and you
CERTAINLY shouldn't use it with "grep").

-----------------------------------

5) Exercises

a) Write some Perl code that, given a list of numbers, generates a list
of square roots of those numbers. (The square root function in Perl is
"sqrt".)

b) Modify the code to filter out any negative numbers. The result should
be as though the negative numbers were never in the original list.

c) Write a Perl program that reads two files and outputs only the lines
that are common to both of them.

-----------------------------------

6) Answer to Previous Exercise

The following program reads the password file and outputs a list of
usernames and UIDs, ordered by username:

#!/usr/bin/perl -w
use strict;

open FILE, '< /etc/passwd' or die "Couldn't open file: $!";
my @data = sort(<FILE>);
close FILE;

my @result;
foreach (@data) {
my @fields = split(/:/); # Equivalent to split(/:/, $_)
push @result, $fields[0] . ' -> ' . $fields[2];
}

print join("\n",@result) . "\n";

The above program is a nice review of Perl functions. But of course,
There Is More Than One Way To Do It, and we could replace the bottom
half with:

foreach (@data) {
s/^(.*?):.*?:(\d*):.*$/$1 -> $2/;
}
print join("\n",@result) . "\n";

Or to make the program really short:

$_ = join '', @data;
s/^(.*?):.*?:(\d*):.*$/$1 -> $2/gm;
print; # Prints "$_"

-----------------------------------

7) Past Information

Part 16: Array Functions
http://linuxchix.org/pipermail/courses/2003-November/001359.html

Part 15: More About Lists
http://linuxchix.org/pipermail/courses/2003-November/001351.html

Part 14: Arrays
http://linuxchix.org/pipermail/courses/2003-October/001350.html

Part 13: Perl Style
http://linuxchix.org/pipermail/courses/2003-October/001349.html

Part 12: Side Effects with Perl Variables
http://linuxchix.org/pipermail/courses/2003-October/001347.html

Part 11: Perl Variables
http://linuxchix.org/pipermail/courses/2003-October/001345.html

Parts 1-10: see the end of:
http://linuxchix.org/pipermail/courses/2003-October/001345.html

-----------------------------------

8) Credits

Works cited:
a) man perlfunc
b) Kirrily Robert, Paul Fenwick and Jacinta Richardson's
"Intermediate Perl", which you can find (along with their
"Introduction to Perl") at:
http://www.perltraining.com.au/notes.html

Thanks to Jacinta Richardson for fact checking.

-----------------------------------

9) Licensing

This course (i.e., all parts of it) is copyright 2003 by Alice Wood and
Dan Richter, and is released under the same license as Perl itself
(Artistic License or GPL, your choice). This is the license of choice to
make it easy for other people to integrate your Perl code/documentation
into their own projects. It is not generally used in projects unrelated
to Perl.