Contact:[email protected]
Lesson Two: Regular Expressions One of the coolest things in Perl is regular expressions. Essentially, this is a very sophisticated way to search through a variable and find a match. While regular expressions may look cryptic, it doesn't take long to start understanding them. Check this out: #!/usr/bin/perl $line = "Love is blindness, I don't want to see"; print "blind is in the phrase\n" if ( $line =~ /blind/ ); print "love isn't in the phrase\n" unless ( $line =~ /love/ ); print "ignoring case, love is in the phrase\n" if ( $line =~ /love/i ); You'll want to run the above example to understand exactly what is going on here. First, we establish a variable with a phrase in it. Next, we print "blind is in the phrase" only if $line contains the string "blind," which of course it does. But in the next line, "love" is not in the phrase because "love" and "Love" are different strings. Notice the use of "unless" which does the opposite of an "if" in this case. In the last line, we add case insensitivity to the match and, hence, get a match. We can also selectively replace things in a variable: #!/usr/bin/perl $line = "Love is blindness, I don't want to see"; $line =~ s/want to/wanna/; print "$line\n"; In this case, we are changing "want to" to "wanna." Preceding the regular expression with an "s" swaps the first instance of "want to" with what's on the other side of the replace, which is the word "wanna." What would happen if we had another occurrence of the string "want to" in our sample? In that case, it wouldn't have been replaced unless we added a "g" to the end of the replace statement. For instance, the output of the following line: $line =~ s/s/z/g; which replaces all "s" characters with "z" characters, looks like this: Love iz blindnezz, I don't want to zee How about a more practical example? Let's say that you have a whole pile of MP3 files in a directory, and you want to get rid of the spaces in the names. #!/usr/bin/perl @files = qx {ls *.mp3}; foreach $original ( @files ) { chomp $original; $modified = $original; $modified =~ s/ /_/g; print "renaming '$original' to '$modified'\n"; qx {mv '$original' '$modified'}; } First, we create an array of all the files ending in .mp3 with the line @files = qx {ls *.mp3}; qx executes everything between the { and } marks as a system command and sends each line of the results to @files. Then we do a foreach loop through all the elements in @files. Let's say that the first element of the array is "love is blindness.mp3\n". In the first iteration of the loop, the string $original is set to "love is blindness.mp3\n" and then a chomp operation is done on the string, killing the \n. Then $modified is set to the contents of $original and has all its spaces replaced with underscores (s/ /_/g). Next we print what we are going to do and then execute a move command with the qx line. Because we are inside a foreach loop, the process will repeat until all the mp3 files are renamed with underscores for spaces. Voila! What happens if we have a bunch of illegal characters such as ( and ) in an mp3 name, and we want to convert those to underscores? With regular expressions come a whole slew of characters with special meanings. Here are a few: \w - Word character \W - Non-word character \t - Tab character \d - A digit (0-9) \D - A non-digit So if we do this: #!/usr/bin/per l @files = qx {ls *.mp3}; foreach $original ( @files ) { chomp $original; $modified = $original; $modified =~ s/.mp3//; $modified =~ s/\W/_/g; print "renaming '$original' to $modified.mp3'\n"; qx {mv '$original' '$modified.mp3'}; } Note that we use \W to select all non-word characters and replace them with an underscore. But the dot in ".mp3" is also considered a non-word character, so we lopped it off in the line above and added it back when we did the print and qx command. Obviously this little script could become as complicated as you want, adding numbers to filenames, and deleting multiple underscore marks. Another common use... Back to the Index