06 Part 5: The Transliteration Operator

LinuxChix Perl Course Part 5: the "tr///" operator

Contents
1) Introduction
2) The tr/// operator
3) Options
4) Answers to Previous Exercises
5) Exercises
6) Past Information

-----------------------------------

1) Introduction

It seems that the original plan for this course was to start with features
common to every language, then perhaps get into features unique to Perl.
After thinking long and hard, I have decided that it would be better to do
the opposite: start with features that are unique to Perl. This is surely the
best thing for those of you who already program in other languages, and I
don't think it's detrimental to anyone who is learning Perl as a first
language. (Though learning Perl as a first language might in itself be
detrimental!)

So now we're going to look at a Perl language construct that doesn't exist in
most other languages, called "tr". It's generally written with trailing
slashes, for reasons that we will see shortly. Some people call this language
construct a "function," but I believe that "operator" is the correct term,
and it is the term that I will use.

-----------------------------------

2) The tr/// operator

Try running the following Perl program:

#!/usr/bin/perl -w
use strict;

my $text = 'some cheese';
$text =~ tr/ce/XY/;
print "$text\n";

What happened to $text?

The "tr///" operator performs a substitution on the individual characters in
a string.

Examples:

$x =~ tr/a/b/; # Replace each "a" with a "b".
$x =~ tr/ /_/; # Convert spaces to underlines.
$x =~ tr/aeiou/AEIOU/; # Capitalise vowels.
$x =~ tr/79/97/; # Exchange "7" and "9".

The only characters that have special meaning to "tr///" are the backslash
and the dash. The latter indicates a range of characters:

$x =~ tr/0-9/QERTYUIOPX/; # Digits to letters.
$x =~ tr/A-Z/a-z/; # Convert to lowercase.

Actually, the slash also has a special meaning to "tr///". The slash is
called the "delimiter", because it indicates the "limit" on the list of
characters to substitute. However, we can use most kinds of punctuation in
place of the slash. For example:

$x =~ tr!aeiou!AEIOU!;
$x =~ tr:aeiou:AEIOU:;

Note that we can also use parentheses, but the syntax changes a little
because parentheses include the idea of containment:

$x =~ tr(aeiou)(AEIOU);
$x =~ tr<aeiou><AEIOU>;

The semantics (meaning) don't change; only the syntax (way of writing it)
changes. But even though the delimiter is abitrary, we still talk about it as
"tr///".

"tr///" returns the number of replacements it made:

my $salary = '$1,000,000.00'; # Dollar sign: use single quote!
my $ego = ($salary =~ tr/0/0/); # Count the zeros in salary.

One more thing: "tr///" has an alias: "y///". This is to please users of the
program "sed", which uses the "y///" command do to basically what "tr///"
does. In Perl, "tr///" and "y///" do exactly the same thing; use whichever
you like. Remember: there is more than one way to do it (TIMTOWTDI).

$text =~ tr/0-9/A-J/; # Convert digits to letters.
$text =~ y/0-9/A-J/; # Does exactly the same thing.

-----------------------------------

3) Options

"tr///" can take the following options:
c Complement (invert) the searchlist.
d Delete found but unreplaced characters.
s Squash duplicate replaced characters.

These options are specified after the final delimiter, like this:

$x =~ tr/abc/xyz/s; # Note the "s" at the end.
$x =~ tr(abc)(xyz)s; # Same thing, but with parentheses.
$x =~ tr/abc/xy/ds; # Multiple options.
# In the last case, the "z" is missing. You'll see why shortly.

In the last example, we specified both the "d" and the "s" options. The order
of the options isn't important: we could have used "sd" instead of "ds".

Examples:

#!/usr/bin/perl -w
use strict;

##### The "s" option #####
my $text = 'good cheese';
$text =~ tr/eo/eu/s;
print "$text\n";
# Output is: gud chese

##### The "d" option #####
my $big = 'vowels are useful';
$big =~ tr/aeiou/AEI/d;
print "$big\n";
# The first three vowels are made uppercase.
# The other two, which have no replacement
# character, are deleted because of the "d".

We won't demonstrate the last option, "c", because it's rather complicated.
You can learn more about it using "man perlop" (look under "tr").

-----------------------------------

4) Answers to Previous Exercises

There was only one previous exercise: a program that reads numbers and
outputs the average. Here is one such program:

#!/usr/bin/perl -w
use strict;

my $line;
my $sum = 0;
my $n = 0;
while ( defined($line = <STDIN>) ) {
$sum += $line;
$n++;
}

my $average = $sum / $n;

print "The average is $sum/$n = $average\n";

-----------------------------------

5) Exercises

a) I constantly get spam for "V1agra" or "un1vers1ty dimpl0mas". Write a
program that helps my spam filter by converting all 1's to i's and all 0's to
o's.

b) Julius Ceasar is said to have used the famous Ceasar Cypher to encrypt his
communications with Rome. When encrypting, the Ceasar Cypher substitutes
letter-for-letter like this:
A -> C
B -> D
C -> E
...
X -> Z
Y -> A
Z -> B

Write a program that performs a Ceasar Cypher on its input.

As a test, if you feed the program "Ceasar Cypher", you should get "Egcuct
Earjgt". Be especially careful that the "y" in "cypher" maps to an "a".

-----------------------------------

6) Past Information

Part 1: Getting Started
http://linuxchix.org/pipermail/courses/2003-March/001147.html

Part 2: Scalar Data
http://linuxchix.org/pipermail/courses/2003-March/001153.html

Part 3: User Input
http://linuxchix.org/pipermail/courses/2003-April/001170.html

Part 4: Control Structures
http://linuxchix.org/pipermail/courses/2003-April/001184.html

Part 4.5, a review with a little new information at the end:
http://linuxchix.org/pipermail/courses/2003-July/001297.html