Perl/CGI

Date: July 2000

You may have heard of CGI. You may have heard of Perl. You may know that one of the ways of implementing CGI is Perl. But what the heck is CGI? And Perl? Is that even spelled correctly?

CGI is Common Gateway Interface. It is a specification given by NCSA. It specifies the way that data is passed from a web browser to a program running behind the webserver. That program intercepts the data, acts on it, and gives the output: a webpage (HTML document). Here's the diagram:

                      data              data  
Browser (form input) ------> Webserver ------> Program

                  HTML              HTML
Program (stdout) ------> Webserver ------> Browser (display)

CGI only specifies that the data taken from the browser (HTTP client) should be given to the program on its standard input (C programmers and Unix command line gurus will be familiar with the standard devices) or as an environment variable. The program should print the HTML (actually, HTTP response) to its standard output from where the webserver will pick it up and pass it to the browser (HTTP client).

The form input is encoded in one of two ways: the GET method and the POST method. The GET method is the URL-encoding method where the data is sent to the server as part of the request URL:

http://www.example.com/cgi-bin/cgiscript.cgi?param=data

Note here that CGI scripts, as they are called, are usually kept in a directory called cgi-bin or a variant.

cgiscript.cgi is the name of the script. The script itself may be a compiled C program or a Perl script (which is a plain text file interpreted by the Perl interpreter -- more on that later). In case of a Perl script, the filename is often, but not necessarily (it depends on how the server is set up) cgiscript.pl -- an extension of `.pl'.

Now for the interesting part: everything following the question mark is called the query string. It consists of parameter=value pairs. Each parameter of the form is set to a value. More than one pair can be included by delimiting the pairs with the `&' sign. The entire query string is passed in an environment variable to the CGI program. The query string is url-encoded; special characters are replaced, with %n where n is ASCII code of the character, in hex. The space character for example is 20h and encoded as %20.

In case of the POST method, the data is passed on the standard input of the CGI program. This means the program must read the standard input instead of an environment variable. In this case, too, the data is encoded and escaped.

Perl makes things easy. You get a standard module called CGI.pm (downloadable from [1]) which takes care of decoding the data from both the methods and puts it into a convenient object (the query object). Then the script simply manipulates this data and comes up with the HTML response page. The CGI module also helps to generate HTML. It provides functions and objects for producing the standard HTML elements like form elements, headers, etc.

One point that needs to be cleared is that CGI is not Perl and Perl is not CGI. CGI is merely a specification. The underlying program can be anything. Perl is a scripting langauge that is used for many things besides CGI. The mailing list management software Majordomo[2], for example, is written in Perl but most of it is not meant for CGI use.

References:

Copyright © Satya 1999. All Rights Reserved.


Back to Satya's articles list