Work: Counting on the Net

Jeffreys Copeland & Haemer

(RS/magazine, February 1997)




Innumeracy

Okay, we can't stand it any more.

We've all laughed at articles about innumeracy, secure in the knowledge that we're in a profession where we learned to count, do ``big-O'' calculations, and make back-of-the-envelope estimates. Yes, we are computer professionals.

Why is it, then, that some of the articles we've seen about IPv6, the new addressing scheme for the Internet, make statements about addressing power that seem, well, like the authors can't tell the difference between a googol and a googolplex?

One recent report we read claims that IPv6 will let us individually address every proton on earth. A still-more-recent book boldly states that the number of possible IPv6 addresses will be larger than the number of molecules in the universe.

Really?

We could go searching the net for statistics, but a pencil and used envelope-back are enough to sanity-check these statements. (By the way. We'll leave their authors anonymous. They're friends of ours,)

Let's attack the first claim. We'll begin by estimating the volume of the earth.

A kilometer was originally defined to be 1/10,000 of the distance from the equator to the North Pole. Even though it's now defined in terms of a meter, which is, in turn, defined in terms of light wavelengths, the original definition is close enough for our purposes. Taking the circumference of the earth to be around 4x10**4 kilometers, we can calculate the volume from high-school geometry. Since the diameter is 2{pi}r, the radius is about (2/{pi})x10**4 km., or (2/{pi})x10**9 cm. The volume of a sphere is (4/3){pi}r**3 so the earth's volume is about (32/3){pi}**2x10**27 cubic centimeters. The constants nearly cancel, and leave us with an estimate of about 10**27 cubic centimeters.

How much does something that big weigh? Well, since 1cc of water weighs about a gram. if the earth were all water it would weigh 10**27 grams. The earth's water floats on the surface, so that's a safe underestimate. We suggest that readers who are curious about how far off we are surf the net to find a more precise estimate. (See, for example, the fine ``Nine Planets Tour,'' at http://www.seds.org/billa/tnp.)

Now, how many protons would there be in that much mass? For this, we need high-school chemistry. Well, we know that one mole of Hydrogen weighs a gram. Electrons are pretty light, so to a first approximation, we can say that 6.02x10**23 (Avogadro's number) protons weigh a gram. Of course, something at the other end of the periodic table, like Uranium, will have a lot of its mass in neutrons, but even Uranium is about a third protons. Since protons and neutrons are about the same size, a gram of the most abundant elements, like Iron and Oxygen, which are about half protons, would still have about 3x10**23 protons, which we can use as a safe lower bound.

Combining this with our earlier estimate of the weight of the earth, we have a lower bound of 10**27 x 3x10**23 protons/g = 3x10**50 protons.

Now, how much addressing power does IPv6 give us?

Let's go back a minute to see what problem IPv6 is solving. Currently, every IP address has four bytes. These bytes are the four numbers you see, separated by periods, if you look in your host table, or when you run a networking application like ping or telnet. My machine, for example, is 161.33.16.21, and ftp.uu.net is 192.48.96.9

Since this is only 32 bits of address space, an IP network can never have more than 2**32 addresses. (Just like there can never be more than 10**7 seven-digit phone numbers.) Basically, the net is growing fast enough that we're going to run out of addresses. The problem's worse than that, because of the way addresses are assigned -- people and corporations are actually assigned big blocks of addresses which eats up the address space even faster -- but even if that weren't true, most folks agree that 32 bits just isn't going to be enough. The web is growing too fast.

The new addressing scheme, IPv6, will put overload off a bit by making addresses four times as large -- 128-bits.

Okay, IPv6 will let us address a lot more things. (For an analogy, imagine adding an area code to phone numbers that's three times as big as the phone number.) But could it let us assign separate IP addresses to every proton on earth?

Nope. 2**10 is about 10**3, so 2**128 is about 10**39, which means we fall short by a factor of at least 3x10**11. A mistake of this magnitude is roughly comparable to confusing your net worth with that of Bill Gates.

The difference is ... well ... astronomical.

How about the number of molecules in the universe? Most molecules in the universe are hydrogen atoms, which means that we could just divide the number of protons on the earth by the weight of the earth, and then multiply by the weight of the universe. Unfortunately, we don't yet have a good, back-of-the-envelope estimate for the weight of the universe. If you have a good suggestion, please send us email at <jeff@rd.qms.com> or <jsh@rd.qms.com>.

If you don't have one, but are as amused by back-of-the-envelope calculations as we are, see Jon Bentley's March, 1984, CACM column.



Counting Electoral Votes

How do you feel about the results of the recent presidential election? The Jeffreys are split: ``Is too!'' ``Is not!''

Despite the fact that the other Jeff is wrong, we both had a good time speculating about which states were going to go to which candidate. Unfortunately, it sometimes got a little hard to track. ``Okay, now if Colorado goes for Dole, but Arizona goes for Clinton ....'' What we really needed was a spreadsheet.

A garden-variety spreadsheet would have worked, but the spreadsheet interface seemed ugly. What we really wanted was something that would show us a map, with outlines of the states, and then let us click on the individual states to assign them to candidates. We wanted our 2-dimensional, visual spreadsheet that would color the states, so we could see which candidates had which ones, and that would keep track of the running electoral totals for each candidate at the same time.

We'll go through how we built it both because it's fun, and because it lets us illustrate how to build a CGI application. Our spreadsheet interface will be a web page.

This month, we'll build a simple, radio-button form that assigns a state to a candidate.

Next month, we'll expand on that and build a map.



CPAN

Have you heard a lot about the virtues of ``code reuse''? So have we. Have you seen much of it? We haven't either. At least, until recently.

Recently, we sat in on a C++ course, which began with the admission that code reuse was one of the most oversold virtues of the object-oriented approach. Despite its allure, code reuse has been largely confined to the stdio.h model: libraries that are guaranteed to be distributed with whatever language you are using will be widely used, and treated as opaque, black-boxes with well-defined interfaces. Everything else gets built from scratch each time.

In the Perl world, that's changed. A quick visit to http://www.perl.com/cpan/, will introduce you to the Comprehensive Perl Archive Network -- a vast and growing array of reusable Perl modules (classes) that are becoming de-facto building blocks for Perl programmers all over the world.

To give you a feeling for the convenience and ease-of-use these modules offer, we recently built a Netscape-based news reader in an afternoon, out of an NTP module and a CGI module. It's hard for us to imagine doing that in either raw Perl, or in any other language.

All of these modules are user-contributed, and the whole of CPAN is a volunteer effort. Despite -- well, no, probably because -- of this, CPAN is growing at an astonishing rate.


CGI.pm

For our application, we'll use the module CGI.pm, contributed by Lincoln Stein, which lets us write CGI applications with no muss, and no fuss.

Our script, shown in Figure 1, will create and manage a form that looks like the one in Figure 2.

We begin with relatively normal code

#!/usr/local/bin/perl -w

require 5.003;
use strict;         # Perl's equivalent of "lint"

These lines are good ways to start any Perl application. The opening, ``shebang'' line invokes our perl interpreter with the all-important -w flag, which warns us of a wild array of programming errors. There is never a good reason to start your perl programs without this flag.

The next line states which version of perl we expect to be running We have 5.003 installed, so we just require it, guarding against the possibility that a user will stumble over an older revision lurking somewhere in their path.

The last of these lines is not necessary, but we like to use it anyway. The pragma use strict warns us about very nit-picky problems: undeclared functions, incompletely scoped variable names, and other things of that ilk. Raw perl, like raw C, is very easy to write but gives you a lot of rope to hang yourself with. The strict pragma, like C's lint program, warns about things that could get you into trouble. We like to use it, even though it makes us do some work that we wouldn't otherwise have to do, because we don't always really know as much about what we're doing as we'd like to pretend.

The next few lines are the equivalents of C #include directives. These lines:

# pull in modules we need
use CGI qw(:all use_named_parameters);

# now some defined constants
BEGIN {
  # full names, abbrevs, electoral votes of states
  require "states.pl";
  # list of all candidates
  require "candidates.pl";
}
are, in effect, Perl for
#include "CGI.pm"
#include "states.pl"
#include "candidates.pl"

Why two different syntaxes use and require? Basically, because CGI.pm is a full-blown module while the other two are just files full of defined constants.

The syntax use Module LIST lets us specify a list of symbols that we can use from a class without a package qualifier. The statement use CGI qw(:all use_named_parameters); lets us use nearly every function in CGI.pm without qualification. Instead of having to say CGI::hr, to get a horizontal rule, we can say just hr.

We enclose the other two require statements in a BEGIN { ... } block to get them included at compile time. (Perl programs are processed in two steps. First, the programs are checked for syntax and compiled into a sort of byte-code. Then the ``compiled'' programs are run.) We didn't really need to do this to get the program to work correctly -- the constants aren't needed until run-time -- but without this, our use strict; pragma makes the compiler complain bitterly about undeclared variables, and blocks further compilation.

Skipping down a few lines, let's go directly to the statement

my $state = param('state') || 'CO';

When you give a web browser, like Netscape, a URL to visit, the http server on the target machine can tell if the location is a CGI executable, instead of a simple text file. If so, it executes the program, passing in a variety of information, and parses and interprets the output. The information is typically passed in with a peculiar syntax, which the application must parse. For example, if you use AltaVista to search for the nine planets tour, it might generate a location that looks like this:

http://altavista.digital.com/cgi-bin/query?pg=q&what=web&fmt=.&q=%2B%22nine+planets+tour%22
This means that you're invoking an application called query, with an array of named arguments, concatenated into a single string passed in as the environment variable QUERY_STRING. The application must parse that string, pg=q&what=web&fmt=.&q=%2B%22nine+planets+tour%22, and then use the resulting information.

CGI.pm takes a lot of the work out of this for you. The call
my $state = param('state') || 'CO';
looks through both standard input and the environment, parses what it sees, finds the value of the parameter state, and puts it into the scalar variable $state. (If the variable is unset or empty, we use `CO' as a default.)

Whew.

The rest of the program is a single print statement. Each line after the print calls a function from CGI.pm that generates a string containing the proper HTML for that piece of the form. Here's what they do:

  1. header,
    

    Okay, we lied. This doesn't actually generate HTML, it generates the code that tells the server how to interpret the output of the CGI program.

  2. start_html('States'),
    ...
    end_html,
    

    These generate all the little HTML tags that a web page needs for starting and ending; things like this: <HTML><HEAD><TITLE>States</TITLE></HEAD><BODY>, (We won't show you more raw HTML. We're using CGI.pm so we don't have to.)

  3. h1("Select a candidate to cast $names{$state}'s votes for:"),
    

    This emits a first-level header, translating the 'state' parameter into a human-readable state name along the way.

  4. startform('GET'),
    ...
    endform,
    

    These bracket a form. Forms are HTML's way of packaging user input up into something that you can pass to a CGI application. The result routinely looks as ugly as the query we showed you above. Luckily, you're using CGI.pm and won't have to look at it.

  5. radio_group(
      name=>'candidate',
      'values'=>[@candidates],
      default=>'-',
      linebreak=>'true',
    ),
    
    These calls emit code for the radio buttons on the form. All the functions in CGI.pm accept self-identifying parameters. We're taking advantage of them here, both because it makes the code easier to read, and because it saves us from having to remember what order to put the arguments in. We read the list of candidates from candidates.pl, so that we don't have to change the application for every election. The linebreak parameter puts each candidate on a different line (we think that looks nicer), and the default parameter says ``don't make any candidate a default choice.'' Even if he is the one people should be voting for.
  6. submit(name=>"Cast vote"),
    

    This emits code for the submit button. Every form needs at least one. We could have more than one, in which case we could identify which one the user had pressed by its name. By default, the name and the button label are the same, but CGI.pm will let you do almost any odd thing you want. Perl programmers tend to lack a prescriptive mind-set.

  7. hidden(-name=>'state', value=>$state), # remember chosen state
    

    This one's a little subtle. What happens when you press the submit button we created? Because we didn't specify an action in the call to startform, the server invokes the default action: it re-invokes this same CGI script with the new arguments, taken from the filled-out form. (In this case, our form is ``filled out'' with button pushes.)

    But wait. Re-invoking the script with new arguments means that we've lost any information that we had when we started up this form the first time. If we want to remember what state we're working on, we have to save the information somewhere. We can either tuck the information away into a file somewhere, then read the file again on restart, or we can actually put the information into the filled-out form. We don't, however, want to put the information anywhere that the user has to see, or that the user could accidentally change by typing in the wrong spot. The function call hidden emits code for an invisible, but filled-out field whose value is transmitted to the application on submission of the form. Here's where we take the state information we were passed, and pass it on in turn.

    We'd like to expand on this a bit more, but the column's already too long, so let's defer further discussion to next time, when we'll tie this to a map.

    For now, happy trails.


    The program in figure 1, ns.cgi, is:
    #!/usr/local/bin/perl -w
    # $Id: ns.cgi,v 1.1 1997/02/14 18:42:52 jeff Exp $
    
    require 5.003;
    use strict;         # Perl's equivalent of "lint"
    
    # pull in modules we need
    use CGI qw(:all use_named_parameters);
    
    # now some defined constants
    BEGIN {
      # full names, abbrevs, electoral votes of states
      require "states.pl";
      # list of all candidates
      require "candidates.pl";
    }
    
    if (@ARGV) {
      use FileHandle;
      my $params = shift;
      my $fh = new FileHandle $params
        or die "Couldn't open $params: $!";
      $CGI::Q = new CGI($fh);
    }
    
    use_named_parameters(1);
    
    my $state = param('state') || 'CO';
    
    print
      header,
      start_html('States'),
    
      h1("Select a candidate to cast $state_names{$state}'s votes for:"),
    
      startform('GET'),
      radio_group(
        name=>'candidate',
        'values'=>[@candidates],
        default=>'-',
        linebreak=>'true',
      ),
      submit(name=>"Cast vote"),
       # remember chosen state
      hidden(-name=>'state', value=>$state),
      endform,
      hr,
      "$state_names{$state} has $elec{$state} electoral votes\n",
      end_html,
    
      "\n";
    

    Click here for figure 2.