Perl, a powerful script language

Perl is a very high-level programming language originally developed in the 1980s by Larry Wall. Perl is now being developed by a group of individuals known as the Perl5-porters under the watchful eye of Larry.

 

What is Perl?

Perl officially stands for “The Practical Extraction Report Language,” but Perl is really much more than a practical reporting language. It’s practically everything really likeable about the Shells, awk, sed, grep, and C combined. Programmers can enjoy the powerful pattern matching features in Perl.

 

Perl, a GNU product (i.e., it’s free), is an interpreted language. It is used primarily as a scripting language and runs on a number of platforms.

 

Although designed for the Unix, Perl is renowned for its portability and also runs on DOS, Windows, Macintosh, etc.

 

Who uses Perl?

System administrator, Web developer, database administrator, application developer in bioinformatics, etc.

 

Perl at the Command Line

% perl –e ‘print “Hello, world\n”;’

 

How to execute The Perl Script?

% cat firstPerl

#!/usr/local/bin/perl         # the first line of the script

print “Hello, world\n”;       # statement is separated by semicolon ;

 

% perl –c firstPerl           # -c is used for Syntax checking at a prompt

 

Another way to run a Perl script is:        # No compilation step!!

% perl firstPerl

OR

% firstPerl                   # After chmod +x firstPerl

 

Quotes in Perl

Quoting rules in Perl is similar to C-Shell. Perl uses single quote (all characters are treated as literals), double quote (similar to the single quote except variable substitution), backslash \, and backquote `` (for executing commands).

 

Special literals

__LIINE__   # represents the current line number

__FILE__    # represents the current filename

__END__     # represents the logical end of the script

 

#!/usr/local/bin/perl

print “Hello, world\n”;

print “We are on line number “, _LINE_, “.\n”;

print “The name of this file is “, _FILE_,”.\n”;      # the name of current file

_END_

And this part after _END_, will be ignored by Perl.   # ignored by Perl

 

The printf function

Printf(“%-15s%-20s\n”, “Jack”, “Sprat”); # right-justified

Printf “Hello, my name is %s!\n”, “Sam”;

Printf “The number in decimal is %d\n”, 100;

Printf “The formatted floating point number is %8.2f\n”, 14.3456;

 

Printing without quotes – the Here Document

The Perl here document is a line-oriented form of quoting, requiring the << operator followed by an initial terminating string and a semicolon. There can be no spaces after the <<.

 

$price = 100;

print <<EOF;            # start of here document, there are no quotes

The price is $price.    # variables are expaned

EOF                     # end of here document, NO surrounding spaces allowed

 

print <<’FINAL’;        # start of here document, enclosed in single quotes

The price is $price.    # the variable is not expanded

FINAL                   # end of here document

 

print << x 4;           # start of here document, prints the line 4 times

Hello, there!

                        # Blank line is necessary here!!

print <<`END`;          # start of here document, back quote will execute Unix

echo hi there           # commands

date

END                     # end of here document

 

Here documents are used extensively in CGI scripts for enclosing large chunks of HTML tags for printing.

 

Perl Variables

Like shell script, Perl variables don’t have to be declared before being used. Perl has three types of variables: scalar (preceded by $), list (or array, preceded by @), and associative array (or hashes preceded by %). For example, $name, @name, and %name are all different variables.

·           Variables are case sensitive.

·           Since reserved words and filehandles are not preceded by a special character, variable names will not conflict with reserved words or filehandles.

 

$salary = 50000;              # scalar variable

@months=(Mar, “Apr”, 5);      # Perl list can store different types of data

print “$salary\n”;

print “@months\n”;

print “$months[0], $months[1]\n”;   # array subscript starts with 0

 

print “The number of the last subscript of months is $#months

 

$sym=net;

print “${sym}work\n”;         # with curly braces, the value can be appended

 

$name = “Tommy”;

print “OK\n” if defined $name;  # to check the validity of a variable’s value

 

undef $name;            # this function undefines an already defined variable

 

@months=();             # assigned a null list (empty the list)

@digits=(0..10);        # range operator, will contain 0, 1, 2, ...,10

@letters=(‘A’..’Z’);

 

### Array slice ###

@names=(‘Tom’, ‘Dick’, ‘Harry’, ‘Pete’, ‘Smith’);

$count = @names;        # the number of elements 11 is assigned to $count

@people = @names;       # names list is copied to @people

@friends = @names[1,2,3];           # or @names[1..3] is also ok

($enemy[0], $enemy[2])=@names;      # the enemy array is created with values

print “@enemy\n”;       # new values list is (Tom, undefined, Dick)

 

@matrix=([1,2],[3,4],[5,6]);        # 3x2 multi-dimensional array

print “Row 0, Column 0 is $matrix[0][0].\n”

@record=(“Adams”, [2,1],            # one dimensional arrary with data 2 and 1

         “Edwards”, [1,0,3],

         “Howard”, [3,4,5,6]);

print “In the first row $record[0]\n”;    # Adams

print “In the third row $record[5][2]\n”; # 5

 

Associative Arrays (Hashes)

%states = (‘CA’ => ‘California’, ‘TX’ => ‘Texas’, ‘MT’ => ‘Montana’);   # hash

 

# the first string is called a key, and the second string is called the value

print “$states{‘CA’}, $states{‘MT’}\n”;

 

%days=(‘Mon’, ‘Monday’, ‘Tue’, ‘Tuesday’, ‘Wed’,);

$days{‘Wed’}=”Wednesday”;     # The value Wednesday is assigned to the key Wed

$days{5} = “Friday”;          # The value Friday is assigned with the key 5

 

# Array of Hashes

@band=({name=>”Tom Jones”, age=>30, city=>”New York”},

       {name=>”Michael Jack”, age=>40, city=>”LA”},);

print “The total number of members: “, $#band + 1, “\n”;

print “First member name is $band[0]{name} \n”;

 

Reading from STDIN

There are three filehandles STDIN, STDOUT, and STDERR.

 

print “What is your name? “;  # The string is sent to STDOUT by default

$name = <STDIN>;              # one line of input is read and assigned to $name

@all=<STDIN>;                 # data entered into array until Ctrl-d is pressed

$course{$course_num}=<STDIN>; # data stored into hash table with the key

$num=read(STDIN, $indata, 100); # read 100 bytes at a time

$answer=getc;                 # one character at a time

 

The chop and the chomp functions

The chop function removes the last character in a scalar variable and the last character of each word in an array. It is used primarily for removing the newline from the line of input.

 

The chomp function (introduced in Perl 5) is similar to chop except that it removes the last character only if that character is the newline.

 

print “what is your name? “;

$name = <STDIN>;

chop($name);                  # removes the last character and returns it

chomp($name=<STDIN>);         # removes only if that is the newline character

 

The join function

The join function joins the elements of an array into a single string and separates each element of the array with a given delimiter—opposite of split.

 

Format: join(delimiter, list)

 

$name=”John”;

$birthdate=”1/1/2000”;

$place=”LA”;

print join(“:”, $name, $birthdate, $place), “\n”;     # John:1/1/2000:LA

 

The split function

The split function splits up a string by some delimiter (whitespace by default) and returns an array.

 

$line=”a,b,c,d”;

@letter=split(‘,’, $line);

print “The characters in the line is @letter\n”;      # a b c d

 

The pop and push functions

The pop function pops off the last element of an array and returns it. The array size is decreased by one. The push function pushes values onto the end of an array, increasing the size of the array.

 

$boy=pop(@list);                     # returns tommy and the tommy is removed from the list

push(@list, bobby, tomb);        # bobby and tomb are added to the list @list

 

The shift, splice, split functions

The shift function shifts off and returns the first element of an array, decreasing the size of the array. Splice function removes and replaces elements in an array. The general format is:

 splice(array, offset, length, list). Split function splits up a string by some delimiter (whitespace by default) and returns an array. The general format is:

split(/delimiter/, expr).

 

@names=(“bob”, “dan”, “tom”);

$man=shift @names;      # returns “bob” and @names contains now (“dan”, “tom”)

unshift(@names, Liz, bean); # @names has now (“Liz”, “bean”, “dan”, “tom”), add to the front

 

@newnames=splice(@names,1,3,yellow,orange); # @newnames (“bean”, ”dan”, ”tom”)

print “the spliced array is @names \n”;   # @names has (“Liz”, “yello”, “orange”)

 

$line = “a b c d e”;

@letter=split(‘ ‘, $line); # @letter contains (a, b, c, d, e)

 

The sort and reverse functions

The sort function sorts and returns a sorted array. The reverse function reverses the elements in an array.

@string=(a, d, f, c, b);

@string_sorted=sort(@string);       # a b c d f

@string_reverse=reverse(@string);   # f d c b a

 

sub numeric {$a <=> $b;}            # numeric subroutine definition

@sorted_num=sort numeric 10, 5, 6, 0, 1; #

 

Associative Array Functions and foreach Loop

%weekdays=(‘1’=>’Mon’, ‘2’=>’Tue’, ‘3’=>’Wed’, ‘4’=>’Thu’, ‘5’=>’Fri’);

foreach $key (keys(%weekdays))

{print “$key “;}              # 1 2 3 4 5

 

foreach $value (values(%weekdays))

{print “$value “;}            # Mon Tue Wed Thu Fri

 

while (($key, $value) = each %weekdays)

{print “$key = $value \n“;}   # it prints each pair of key and value

 

delete $weekday{1};           # removes the element with the key 1, which is Mon

 

Special Associative Array

%ENV is a associative array that contains the environment variables handed to Perl from the parent Shell.

 

foreach $key (keys(%ENV)) {print "$key\n";}

print “your home directory is $ENV{‘HOME’}”;

 

The grep function

The grep function evaluates the expression for each element of the array.

Format: grep(expr, list)

 

@list=(tomatoes, tomorrow, potatoes, phantom, tommy);

$count=grep(/tom/i, @list);   # the number of times the expression was true

@items=grep(/tom/i, @list);   # the array consisting of those element (true)

                                                            # i means case-sensitive

Perl operators

Perl performs appropriate type conversion by testing the operands of mixed types. Operators are very similar to C language.

 

+, -, *, /, %(modulus), ++, --, ==(equal to), !=(not equal to), &&(logical and), ||(logical or), >, >=, +=, -=, <=> (signed return, e.g., -1, 0, 1 for the number comparison), ..(range operator), x (string repetition), ?:(ternary conditional), String comparison operators: eq(equal to), ne(not equal to), cmp(signed return), qt (greater than), ge(greater than or equal), lt(less than), le(less than or equal)

 

$price = ($age > 60) ? 0: 5.55; # if $age > 60 then 0 else 5.55 is assigned to $price

print “The numbers: “, 1..10;   # range operator ..

$z = “kid”;

print $z x 5, “\n”;           # print 5 “kid”

print $z . “nap”, “\n”;       # concatenate “kid” and “nap”

$num1 <=> $num2               # returns –1 or 0 or 1

 

Random number generation

srand time;                   # setting the seed value

print “Random number: “, int(rand 6);  # the random number range 0 ~ 5

$roll = int(rand 6) + 1;               # the random number range 1 ~ 6

 

Regular Expressions

The regular expression operators are used for matching patterns in searches and for replacements in substitution operations.

m// (m/pattern/ or /pattern/) operator is used for matching patterns and s/// (s/old/new/) operator is used for substitution one pattern for another. However, m is optional if the delimiter is the forward slash (default). m/good/ is equivalent to /good/

 

/abc/  Any string that matched the pattern ‘abc’ will be matched in a string or file.

?abc?  Only the first occurrence of the string is matched.

$_ = “xabcy”;

print “found it\n” if /abc/; # will print a message ‘found it’,

                             # $_ is the default space for pattern matching.

 

Modifiers: i(turn off case sensitivity), m(treat a string as multiple lines), g(match globally, e.g., find all occurrences and returns a list if an array context, and true or false if a scalar context), s(treat string as single line when newline is embedded), e(evaluate the replacement side as an expression).

 

$_ = “I lost my gloves in the clover, Love.”;

@list=/love/g;    # love love

@list=/love/gi;   # love love Love.

 

$ cat sample.dat

Steve Blenheim

Norma Cord

Jon DeLoach

 

$perl –ne ‘s/Norma/Jane/; print;’ sample.dat    # will replace Norma by Jane

Steve Blenheim

Jane Cord

Jon DeLoach

 

$perl –ne ‘print if s/Jon/Tom/;’ sample.dat

Tom DeLoach

 

$_=50;

s/$_/$&*2/e;      # A special variable $& will hold the string that was matched.

print “The new value is $_\n”;      # will print 100

 

Pattern Binding Operators

If you have a string that is not stored in the $_ variable and need to perform matches or substitutions on that string, then the pattern binding operators are used. They are also used with the tr function for string translations.

 

General Formats:

$var =~ /expr/          # true if $var contains pattern /expr/, returns 1 for true, null for false.

$var !~ /expr/           # true if $var does not contain pattern /expr/

$var =~ s/old/new/  # replace first occurrence of /old/ with /new/

$var =~ s/old/new/g  # replace all occurrences of /old/ with /new/

$var =~ tr/a-z/A-Z/   # translate all lower case letters to upper case.

$var =~ /$pattern/    # a variable can be used in the search string.

 

$ cat test.pl           # Perl code

while(<>) {

      ($name, $phone, $address) = split(/:/, $_);

      print $name if $phone =~ /400-/

}

$ perl test.pl customer.dat   # assume customer.dat contains customer data

 

# while loop is used to explicitly loop through the file named at the command line. It will get a line from the file and store it in the $_ variable. The line in $_ will be split by colon (:) and the value returned stored in the list ($name, $phone, $address). The pattern /400-/ is matched against the $phone variable. If the pattern is matched in $phone, the value of $name is printed.

 

Metacharacters

.                       matches any character except newline.

[a-z0-9]           matches any single character in set.

[^a-z0-9]         matches any single character not in set.

\d                     matches one digit.

\D is equivalent to [^0-9] matches a non-digit.

\w                    matches an alphanumeric character.

\W                   matches a non-alphanumeric character.

\s                      matches whitespace character, spaces, tabs, and newlines.

\S                     matches non-whitespace character.

^                      matches to beginning of line.

$                      matches to end of line.

\A                    matches the beginning of the string.

\Z                     matches the end of the string.

x?                     matches 0 or 1 x

x*                    matches 0 or more x’s

x+                    matches 1 or more x’s

x{m,n}             matches at least m x’s and no more than n x’s.

a|b|c                 matches a or b or c.

 

Examples

/^a..c/  # It searches at the beginning of the line for an ‘a’, followed by any three characters,

      # followed by a ‘c’. For example, it will match ‘abbbc’, ‘a123c’, ‘aAx3c’, etc.

print if /[A-Z][a-z]eve/;  #Find A-Z, followed by a-z, followed by ‘eve.

Print if /2\d\d/;          # find a ‘2’ followed by exactly two digits.

Print if /5+/;          # find one or more 5’s

Print if /5{1,3}/;      # find at least one 5 but not more than 3

Print if /10*/;         # find 1 followed by 0 or more 0’s

Print if /5{3}/;        # find exactly three consecutive 5’s

Print if /5{1,}/;       # find at least one or more consecutive 5’s

 

tr /a-z/A-Z/; # each lower case letters a-z will be replaced by upper case A-Z.

 

Control structures and compound statements (block)

Simple IF modifier: expr2 if expr1; # if expr1 is true, execute expr2

$x = 10;

print $x if $x > 5; # will print 10

 

A compound statement consists of a group of statements surrounded by curly braces. Unlike C, Perl requires if, else, while, etc. to have {} even  with one statement.

 

Conditional constructs

if (expr) {block}

 

if (expr)

    { block }

else

    { block }

 

if (expr1)

        { block 1}

elsif (expr2)

        { block 2}

...

else    { block n}

unless (expr) {block}

 

unless (expr) {block} else {block}

 

unless (expr1) {block} elsif (expr2) {block} ... else {block}

 

# example

$hour = 10;

if ($hour <= 10)

      {print “good morning\n”;}

elsif ($hour == 12)

      {print “Lunch time\n”;}

else {print “Good night\n”;}

 

LOOP construct

While modifier: expr2 while expr1; # repeatedly executes expr2

   # as long as expr1 is true.