One of our primary motivations for working on the server side of the web is the ability to have some kind of persistance. In the client side of the relationship, there is no such thing as long term storage. As soon as the browser moves off the page, any data associated with that page goes away. Even with technology such as cookies, there is no way to share data between users with only client-side technology. If the web will ever be a commercially viable enterprise, it must have the capacity to accept information from users and store it in databases, and it must also have the capability to read data and act accordingly. Such activities are nearly impossible using client-side technology (at least in a browser-independant sense), but they are not terribly difficult to implement on the server.
Remember that client-side languages are almost always stripped of file-handling capability for security reasons. On the server, we use full-blown programming languages, which usually have the capacity to deal with files. Perl in particular has extremely strong support for file handling, and makes it reasonably easy to create new files, add things to files, and retrieve data from files.
Of course, you have already manipulated many files as a user. Whenever you go into a text editor, you are creating and manipulating files. In the operating system, you routinely manipulate files as well. Programming languages such as perl have functions that communicate with the operating system so that as a programmer you can directly manipulate the files.
Most programming languages have two primary ways of orginizing file data. Sequential access files are files with no specific internal organization. You can only add data to then end of a sequential file, and you can only read it in sequence (from beginning to end). Sequential access files are easy to work with, but they can be extremely inneficient for large amounts of data.
Random access files are designed to fit some careful organizational scheme. Most formal databases use a variation of random access. Random access files take a little more care to set up, and must all be the same length. You can, however, go directly to any given record in the file.
For more complex data problems, there are a number of Database Management Systems (DBMS) available. These are highly optimized random access databases with features like indices, rapid searching and querying capability, and report engines. Usually these are seen as seperate programs that yo send commands or requests to.
Web programmers choose the best type of data storage for the particular job at hand. In this unit, we will be focusing on sequential access files, because the skills learned here can be applied to the other kinds of database schemes as well. Perl's support for sequential access data is sufficient for most of the types of problems you will encounter. When speed or amounts of data become problems, most web authors will jump straight to the use of a DBMS.
To open up a file in perl, you use the (surprise!) open command. The basic form of this command is extremely simple:
open FILEHANDLE ,"filename";The first argument is the file handle. This is used to refer to the file throughout it's existance in the program. Perl programmers traditionally put filehandles in all uppercase.
The second parameter is the filename. This is the filename of the file you wish to open. If you just give a name, the file is expected to be in the current directory. In the CGI / unix world, it is often best to give a complete path from the root directory, because there is often confusion about where the 'current directory' is. The filename can be any file that is available on that computer, so in a unix system, this could be any file you can access, even those that belong to other users! (if they have given you appropriate permission)
Generally, sequential access allows three distinct modes of communicating with a file. The mode is determined when the file is open, and does not change until the file is closed and re-opened in a new mode.
The default mode opens a file for input. A file open for input can be read FROM, but not TO. (input and output is always from the point of view of the CPU). Of course, it does not make much sense to open a file for input if the file does not already exist. Once a file is open for input, you will be able to extract data from the file by copying it from the filehandle (We'll explain this procedure shortly)
Output mode creates the file if it does not already exist. If the file does exist, it will be destroyed and overwritten! You should be extremely careful not to open files for output unless that is what you really mean. Once a file is open for output, we can print to the filehhandle, and that will add information to the end of the file. To open a file for output, you put a > symbol inside the quotes that describe the file name. Note that you should use quotes even if your filename is stored in a variable (as it usually will be). Here are some examples:
open THEFILE, ">/home/aharris/public_html/guest.dat"; $file = "/home/aharris/public_html/guest.dat"; open THEFILE, ">$file";
Opening for append is a form of non-destructive output. If the file does not exist, it is created. If it does exist, any new data is added to the end of the file without destroying what was already there. To specify that a file will be opened in append mode, you put two '>' symbols at the beginning of the file name in the open statement. Here are some examples:
open THEFILE, ">>/home/aharris/public_html/guest.dat"; $file = "/home/aharris/public_html/guest.dat"; open THEFILE, ">>$file";
To close a file, you simply use the close function followed by the filehandle of the file you want to close. It is important to remember to close your files, because you cannot open them in another mode until they have been closed. In addition, most operating systems do not guarantee that data has been written to the disk until the close statement occurs. To maximize efficiency, data is often written to a temporary memory buffer, and the disk drive is only activated when the buffer is full or the close statement has been encountered. If something terrible happens after you have written to a file, but before the buffer has been copied, you will lose your data. (This is why beginning computer instructors always tell students to close all programs before turning off the computer).
Once a file is open, we use the filehandle to copy information to and from the file. The output command is actually a variant of the print statement you used in command-line perl. You simply add a valid filehandle of a file open for output or append after the print statement and befor the data to print. Here's a simple example:
open MYFILE, ">practice.txt"; print MYFILE "Hello there!!!\n"; close MYFILE;You can print anything to the file that you can print to the screen, so you can send literal string values, variables, or expressions (like concatenations) that return string values. You can also print mult-line strings.
Here's a (non-cgi) example that asks the user for some input and adds it to the end of the 'practice.txt' file:
print "Tell me something to add to the file: \n"; $userValue = <STDIN>; open MYFILE, ">>practice.txt"; print MYFILE $userValue; close MYFILE;
As you can see, printing things to a file is very similar to printing to the command line. In fact, perl sees the standard output as a file. When you do not specify a file in the print statement, perl uses a special file called <STDOUT> which is routed to the terminal in a command-line environment. When we run in a CGI environment, <STDOUT> is re-routed to the client's browser.
$userValue = <STDIN>;What this really means is 'grab the next value from the file called STDIN, which is usually the keyboard.' We generally know that we have reached the end of an input when the user presses the enter key. To grab the next 'chunk' of information from a file that is open in input mode, we simply replace the <STDIN> with the filehandle that we want to read from:
$line = <MYFILE>;This will read characters from the file until it encounters a newline character. It also has an important side effect. If the line is successfully read, the entire statement will return a boolean value of 'true.' If we were not able to read the line (because we were at the end of the file), we will get a 'false' value back from the statement. Most files consist of many lines, so perl programmers often use the ($line = <MYFILE>) as a condition in a while loop. Here's an example:
open MYFILE, "practice.txt";
while ($line = <MYFILE>){
print $line;
} # end input loop
The while loop line does 'double duty.' It grabs the next line from the file and stores it into the variable $line. It also returns back a value which is true as long as there is another line to read. It is very easy to misread this code (especially if you come from the BASIC tradition, where assignment and equality both use the = operator). We are NOT checking to see if $line is equal to <MYFILE>, but assinging a value to $line and checking to see that there is another line to get.
As an example, we will look at a simple but extremely useful utility for password protecting a document. If you have some kind of document on your site such as an exam or other sensitive materials, you might wish to password-protect the document. Most web servers have a procedure for doing this automatically, but unless you are the system administrator, you will not be able to do it easily through the server. We can write a script that will do the same thing. You will be able to write a normal web page and store it as a text file, but you do not need to give the world permission to read it. In fact, it does not even need to be in your normal public_html directory. We will write a program which will ask the user for a password. If the password is correct, it will open up the file and display it. If the password is not correct, the user will be notified, but the target file will still be protected.
#!/usr/local/bin/perl
#gateway.cgi
#password program
use CGI ':standard';
$file = "files.html";
$secret = "perlIsCool";
print header();
if (!param){
print start_html("Protected File!!");
print h1("The file you requested is protected");
print start_form();
print textfield("password");
print submit();
print end_html();
} else {
$password = param("password");
if ($password eq $secret){
#open the file
open THEFILE, "$file";
while ($line = <THEFILE>){
print $line;
} # end while
} else {
#they got it wrong
print start_html("Incorrect");
print h1("That was not the correct password");
print end_html();
} #end 'right password' if
} # end 'any parameters' if
It is often useful to write to a file as well. It is a common problem to get information from a form and add it to a very simple database. Often, we choose to write databases with tabs between the values, because many common programs including spreadsheets and DBMS software can read these 'tab-delimitted' records. Below, you will see a simple example of a program that does this. For this program, we will write the data to a tab-delimitted file, but we will not worry about displaying the data. We will presume that the site author will be reading the data with a database.
#!/usr/local/bin/perl
#maillist.cgi
#a program for generating a simple mailing list data file
#note that we are NOT yet sending email through this list.
use CGI ':standard';
if (!param){
print header();
print start_html("Join the mailing list");
print h1("Thank you for your interest");
print "Please enter the following information:<br> \n";
print start_form();
print "<table border = 1> \n";
print Tr(
td("Name"),
td(textfield("name")));
print "\n";
print Tr(
td("Email address"),
td(textfield("email")));
print "\n";
print Tr(
td({-colspan=>2,
-align=>center},
submit("ok"))
);
print "\n";
print "</table> \n";
print end_form();
print end_html();
} else {
$name = param('name');
$email = param('email');
print header();
print h1("Thank you, $name, for submitting");
open LISTFILE, ">>ml.dat";
print LISTFILE $name;
print LISTFILE "\t";
print LISTFILE $email;
print LISTFILE "\n";
close LISTFILE
} # end 'no params if'
Finally we will examine a reasonably complex program that combines much of what we have learned, and adds a few twists. This program is an online address book that has a simple menu, allows users to add a record, view all records, or search for a record. Take a look at the code, and then we will look at some special features:
#!/usr/local/bin/perl
#address.cgi
#simple DB demo using CGI.PM
use CGI ':standard';
$fileName = "/home/aharris/public_html/webprog/perl/address.dat";
print header;
if (!param){
# if this is the first time here,
&mainMenu;
} else {
#find out what they want to do
$mode = param('mode');
if ($mode eq "View Data"){
&viewData;
} elsif ($mode eq "Add a record"){
&addRecord;
} elsif ($mode eq "Search"){
&search;
} elsif ($mode eq "Process Add"){
&processAdd;
} elsif ($mode eq "Process Search"){
&processSearch;
} elsif ($mode eq "Main Menu"){
&mainMenu;
} else {
start_html();
print "<h1>you chose ";
print param('mode');
print "!!</h1>";
print end_html;
} # end 'what mode' if
} # end 'no parameters' if
This part of the code always runs first. It has two main functions. First, it sets up all the basic information needed by the rest of the program. It includes the CGI module, and sets up the file variable. The other main duty of this code fragment is to look at the value of the 'mode' variable, and use it to farm out program control to some other function. Any request to this program will either have no parameters at all, or it will have a mode parameter, that should tell it which function to go to. This is a common way to make more complex cgi programs work. Every form the program generates will contain a hidden field called mode which will tell the program the NEXT function to run. Note the use of the if/elsif structure to control program flow. Note also that the keywords that started with an ampersand (&) were the names of subprograms. It makes a lot of sense to break this program into subs, because each time through the program, you will be calling one of many smaller programs. Most of the real action in the program happens in the various subs.
sub mainMenu(){
print start_html("Welcome to the address book!");
print h1("Welcome to the address book!");
print "Please choose an option: <br> \n";
print start_form;
print radio_group( -name=> 'mode',
-values=> ['View Data', 'Add a record', 'Search'],
-default=> 'View Data',
-linebreak=>1);
print submit();
print end_form;
print end_html;
}# end mainMenu
mainMenu is a subprogram that displays the main menu on the screen. It is called in two distinct settings; when the program is called with no parameters, as it always is the first time, or when the mode parameter is set to Main Menu.
The radio group called mode is used to let the user choose which subprogram they want. Note that all the nodes are not listed here, because some will be called by others. It will make more sense as you see it. It's also worth noting that I used the cgi.pm special syntax to create the radio group, but it could have been done just as easily with a series of print statements containing the HTML. As you can see, the cgi.pm format is slightly more complex syntax, but is much cleaner to read and write.
When the user submits the form generated by this page, mode will be set to whatever the user chose.
sub viewData(){
print "<h1>Now viewing data</h1>";
#open for input
open (DATA, "$fileName");
print "<table border = 1> \n";
print "<tr> \n";
print " <th>Name</th>\n";
print " <th>Address</th>\n";
print " <th>Phone</th>\n";
print "</tr> \n";
while ($line = <DATA>){
($name, $address, $phone) = split(/\t/, $line);
print "<tr> \n";
print " <td>$name</td> \n";
print " <td>$address</td> \n";
print " <td>$phone</td> \n";
print "</tr> \n";
} # end while
print "</table> \n";
&showMainButton;
print end_html();
} # end viewData
This sub starts by opening up the file for input. We also generate a table and write in the first row with the table headings. We then use a while loop to grab data values from the file. As we grab a line of data, we use the split function to break the line into three variables, based on the location of tab characters (\t). Of course, this pre-supposes that we had three values on the line, with tab characters between them. We will need to be sure this is the case when we store data in another routine. We could actually seperate values by any character we wanted, but the use of the tab is very common, and most spreadsheet and database programs can easily read this format.
The showmainButton function calls a function that will display a special form. We will look at that form in detail later, but note for now that it produces a seperate form with a button on it. When that button is pressed, program control returns back to the main menu.
sub addRecord(){
#this sub requests the new values, but does not add them yet.
print "<h1>Now adding a record</h1>";
print start_form();
print "<input type = hidden \n";
print " name = mode \n";
print " value = 'Process Add'> \n";
print " \n";
print "<table border = 1> \n";
print "<tr> \n";
print " <th>Name</th> \n";
print " <td><input type = text \n";
print " name = txtName> \n";
print "</tr> \n";
print " \n";
print "<tr> \n";
print " <th>Address</th> \n";
print " <td><input type = text \n";
print " name = txtAddress> \n";
print "</tr> \n";
print " \n";
print "<tr> \n";
print " <th>Phone</th> \n";
print " <td><input type = text \n";
print " name = txtPhone> \n";
print "</tr> \n";
print " \n";
print "</table> \n";
print submit();
print end_form();
&showMainButton;
print end_html();
} # end addRecord
As you can see, there is no significant logic in this sub at all. All we are doing is setting up a form to INPUT the data for the new record. In order to add that data to the file, we will need to run the entire program again. We set a hidden field called mode set to the value 'Process Add' This will cause the program to go to the next function the next time it is run.
The main part of the code is the form containing text fields for name, address, and phone. We will use these fields in the next function.
Another interesting thing about this procedure is that we simply wrote the HTML file first, and made sure it looked ok as a standard HTML page. Then we just used a keyboard macro to quickly insert the 'print "' statement at the beginning of the line, and the ending information to the back end of the line. This is a very easy way to convert HTML to perl, and it is easy to do, but it results in sometimes long and unwieldy code.
sub processAdd(){
print "<h1>Now processing the record addition</h1> \n";
$name = param('txtName');
$address = param ('txtAddress');
$phone = param('txtPhone');
print "Name = $name<br> \n";
print "Address = $address <br> \n";
print "Phone = $phone <br> \n";
$currentRec = $name . "\t" . $address . "\t" . $phone ."\n";
#open up the file for append
open (DATA, ">>$fileName");
print DATA $currentRec;
close DATA;
&showMainButton;
print end_html;
} # end processAdd
This procedure is only called after an add, so we know that the form will contain fields for name, address and phone. We simply parse these out into seperate variables. We then form a string that contains the variables concatenated with '\t' characters between them. This is the tab-delimited format that we'll be looking for in the other procedures that read the data. The newline character is necessary, because that is what seperates the records (or lines).
We then open the file for append and add the current record to the end of the file. Finally, we close the file and add the code for setting up the 'main menu' button.
sub search(){
print "<h1>Now searching</h1>";
print start_form();
print " <input type = hidden \n";
print " name = mode \n";
print " value = 'Process Search'> \n";
print " What name are you searching for? \n";
print "<br> \n";
print " <input type = text \n";
print " name = txtName> \n";
print " <br>\n";
print submit();
print end_form();
&showMainButton;
print end_html();
} # end search
Searching is another example of a 'two-step' function. We need to ask the user some information (the name to search for), so we need to set up a form that will ask for that value. We'll need to call another procedure to actually search for the value, so the mode field sets up the next procedure to call.
sub processSearch(){
print "<h1>Now processing the record search</h1> \n";
$searchString = param('txtName');
$foundIt = "false";
open(DATA, "$fileName");
while ($line = <DATA>){
($name, $address, $phone) = split (/\t/, $line);
if ($name eq $searchString){
$foundIt = "true";
print "
<table border = 1>
<tr>
<th>Name</th>
<td>$name</td>
</tr>
<tr>
<th>Address</th>
<td>$address</td>
</tr>
<tr>
<th>Phone</th>
<td>$phone</td>
</tr>
</table>
";
} # end if
} # end while
if ($foundIt eq "false"){
print "There was no matching record. \n";
} # end if
close DATA;
&showMainButton;
print end_html();
} # end processSearch
This code is intended to be called by the search form. It accepts the name from the previous form, and stores it in the $searchString variable. We then open the file for INPUT, and grab a record at a time. Each record is split into its components, and we look for a record where $name is equal to $searchString. IF we found such a record, we print out an HTML table with the record information in it. If we did not find the string, we print out an appropriate message.
You might also note that we used a multiline print statement. This is again a handy strategy when we want to build a nearly complete HTML and translate it over to perl. Be extremely careful when using this technology that you have avoided quote symbols in the HTML. We can use variable interpolation as you have seen.
sub showMainButton(){
#main menu button
print start_form();
print "<input type = hidden \n";
print " name = mode \n";
print " value = 'Main Menu'> \n";
print submit('Return to main menu');
print end_form();
} # end showMainButton
This is a good example of a utility routine. We use it here because several of the other screens could use a button that would immediately return the user to the main menu. Rather than re-writing the code several times, we placed it into a function and called the function whenever we needed it. This idea is called encapsulation. It is nice because the code need only be tested once, and we don't need to worry about its details any other time.