05 Part 4: Simple File Access

LinuxChix Perl Course Part 4: Simple File Access

1) Introduction
2) The "die" Command
3) An Example Program
4) Exercises
5) Answers to Previous Exercises
6) Acknowledgements
7) Licensing

----------------------------------------

1) Introduction

Last week we saw how to read input from the keyboard. This week we
are going to expand on that topic to read from and write to files.

----------------------------------------

2) The "die" Command

Before getting into file access, we'll explain a handy command:
"die". This command does exactly what its name suggests: it kills
your program with an error message. Example:

die;

Of course, you'll usually want to make the command conditional and
give a helpful error message. You could use it inside an "if"
statement, but we're going to use it with "or" instead:

chomp($line) or die "Expected newline at end of $line";

The "or" operator works similarly to "||" in shell scripts. Without
going into detail, it causes the "die" command to be executed only if
the "chomp" command returns zero (failure).

By the way, if you want to terminate a program without an error, use
"exit".

----------------------------------------

3) An Example Program

The following example program creates a duplicate of a file, but with
line numbers.

#!/usr/bin/perl -w
use strict;

# Open file for input (file name preceded by "<").
open MY_INPUT, "< file1.txt" or die "Couldn't open input file: $!";

# Open file for output (file name preceded by ">").
open MY_OUTPUT, "> file2.txt" or die "Couldn't open output file:
$!";

# Read from input file and write to output file.
my $counter = 1;
while ( defined( my $line = <MY_INPUT> ) ) {
print MY_OUTPUT "$counter: $line";
$counter++;
}

# Close the files.
close MY_INPUT;
close MY_OUTPUT;

Four lines of this program are particularly interesting: the two
"open" statements, the "while" loop, and the "print" statement. Let's
look at each one in turn.

The open statement takes two arguments. The first is a file handle,
which by convention is written in all CAPS. Note that, also by
convention, we don't put quotes around it; it's called a "bareword".
STDIN is a predefined file handle, but in this case we're declaring a
new one.

The second parameter to "open" is the name of the file. It should be
preceded by "<" if it's we're opening it for reading and by ">" if
we're opening it for writing (or ">>" if we're appending). Any spaces
before or after the filename are trimmed off. If you want to open a
file with strange characters like spaces, use "sysopen" instead.

If the "open" statement fails, the "die" statement is executed.
Notice the "$!": this special variable contains an explanation of the
last error encountered.

The "while" loop continues until "<MY_INPUT>" returns the undefined
value, which is the signal that we've reached the end of the file.
(Note: in Perl a "while" loop must be contained in braces, even if it
only has one statement. The same is true with "if".)

Finally, the "print" statement outputs its string to MY_OUTPUT. Note
that there is no comma after MY_OUTPUT. Also note that we didn't have
to add newlines because we never removed the newlines from the input.

----------------------------------------

4) Exercises

a) Write a Perl program that counts the number of lines in a file.
(Just hard-code the name of the file.)

b) Write a Perl program that reads a file, strips blank lines and
outputs the result to another file. To test if a line is (not) blank,
use:

if ( $line ne "\n" ) {
# ...
}

Of course, if you chomp() the newline, compare with the empty string
instead.

c) If you don't specify "<" or ">" in the open statement, the file
will be opened for reading. Why is it still important to include a
"<"?

d) Upon reaching the end of a file, the line input operator "<>"
returns the undefined value, which evaluates to false. So why is it
necessary to include "defined" in the while loop? Why can't we
instead it write it like this:

while ( my $line = <INPUT> ) # Don't do this! Use "defined"!

e) Try running the following program:

#!/usr/bin/perl -w
use strict;

open MY_OUTPUT, "| grep 'foo' " or die "Couldn't run grep: $!";

print MY_OUTPUT "A line with 'foo' passes.\n";
print MY_OUTPUT "A line without it doesn't pass.\n";

close MY_OUTPUT;

What does the vertical bar mean to the "open" statement? What happens
if you put the vertical bar after the command rather than before?

f) Try running the following program:

#!/usr/bin/perl -w
use strict;

open MY_INPUT, "< file1.txt" or die "Couldn't open input file: $!";

local $/ = undef;
my $content = <MY_INPUT>;
print "Content is: [$content]\n";

close MY_INPUT;

What effect does it have to set "$/" to the undefined value?

----------------------------------------

5) Answers to Previous Exercises

(a) and (b) The following program takes converts dollars to euros:

!/usr/bin/perl -w
use strict;

print "How many euros in a dollar? ";
my $conversion_factor = <STDIN>;

print "Give me a number of dollars (no dollar sign, please). ";
chomp(my $amount = <STDIN>);

my $answer = $conversion_factor * $amount;
print "$amount dollars equals $answer euros.\n";

Notice that I chomped $amount (removed the newline) because it's
output on the last line. I could have chomped $conversion_factor as
well, but that wasn't necessary in this case.

c) The special variable $/ controls where the <> operator breaks
"lines", and what "chomp" removes. If you set it to a comma, then
<STDIN> will ignore newlines and break at a comma.

----------------------------------------

6) Acknowledgements

A big thank you to Jacinta Richardson for suggestions and
proofreading. More advanced Perl users might want to check out the
free material from Perl Training Australia
<http://www.perltraining.com.au/>, which she is a part of.

Other contributors include Meryll Larkin.

----------------------------------------

7) Licensing

This course (i.e., all parts of it) is copyright 2003-2005 by Dan
Richter and Alice Wood, and is released under the same license as
Perl itself (Artistic License or GPL, your choice). This is the
license of choice to make it easy for other people to integrate your
Perl code/documentation into their own projects. It is not generally
used in projects unrelated to Perl.