CSCI 220: HTML Authoring & WWW Programming


Session 5 -Hyperlinks and Protocols


Protocols

  1. Discussion
  2. Using them
  3. The major protocols

What Are Hyperlinks?

Hyperlinks are the second major component of the WWW. The first we've already discussed: the marriage of graphics, sound, video & text to create dynamic, visually appealing documents which are more like "real-world" documents. With hyperlinks, the entire Internet is just a click of a mouse button away.

Hyperlinks are highlighted text in an HTML document which represents a link to another HTML document. It might be another document in the same directory as the current one, or it might be an entirely new site halfway across the globe. To view a document referenced by a hyperlink (also called a "hot-link" by many) simply double click upon the highlighted text (or image, but more on that later) and you are automatically taken to the new document.

Each site which hosts a Web page (or, more commonly, several pages created by different people or organizations) has a "main" machine which acts as a "host" for those documents. This machine is called a "server" because it serves web documents to people requesting them. Klingon is an example of a web server. As you are probably already aware, klingon does more than just serve-up web documents, it receives and distributes email, maintains your files and directories, acts as an FTP and telnet host, and performs numerous other activities. This is fairly common among Internet machines. Most Internet "nodes" are actually, local area network (LAN) servers which also happen to be linked to the Internet. They run special software which accepts requests from the outside, and assuming they don't violate security, return the requested information. These programs are continuously running behind the scenes and are called "daemons". Each daemon (or server program) is specifically designed to handle a single type of protocol (remember protocols from our first day of classes?). There are daemons which handle email, daemons which handle FTP, daemons which handle telnet/rlogin sessions and so on. Klingon runs a daemon called httpd which is responsible for accepting outside requests for web pages, finding the page, formatting it properly, and sending that page (and/or any error messages, accompanying images, Java applets, etc.) back to the requesting party.

One important thing to note is that the connection between the Web server (e.g. klingon) and the requesting machine is passive: they are not connected during any stage of the process. That is, you are not in continuous communication like you are with telnet or FTP. Web servers are like email servers, only in reverse. They receive a request from some machine, look up the documents requested, and send them to the return address. Meanwhile, the requesting machine is waiting patiently, tapping it's feet, for the server to send the requested information. When the requesting machine, called the client, receives the information, it formats the document and displays it to the screen.

What does all of this have to do with hyperlinks? Quite a bit actually. When you click on that highlighted text, your machine becomes a client. It sends out a document request to the Web server referenced in the hypertext anchor and waits for the server to respond. When the response is received, the page is formatted and displayed on your screen. You have just been "taken" to a new Web site! It looks simple enough, but there is a lot going on behind the scenes.

What Are URLs & How Do They Fit Into The Picture?

As we discussed in the first lecture, URL stands for Uniform Resource Locator, and essentially represents the "address" of a Web page. It is very similar to an email address except that it contains some additional information necessary to format and deliver HTML documents. A generic URL looks something like this:

http://www.swedish_chef.com

The "http://" indicates the protocol being used. Here we are requesting a Web document via the hypertext transport protocol. As we will see later, we can use this syntax to link to documents of other protocols, such as gopher. The second major component of the URL, "www.swedish_chef.com" is the domain of the Web site. That is, it is the name of the machine that hosts the requested web documents. It is important to note that many URLs (klingon being a perfect example) do not start with "www". It is not the prefix that indicates that a particular document is a web document, rather it is the "http://" which indicates that we are requesting a hypertext document. Thus, with the possible exception of the added "www", the domain name of a web site is the same as the domain name associated with it's email addresses. As with email, you will notice that the domain name consists of a series of character strings, separated by periods which identify the location from greatest to least specificity. Again, as with email addresses, most URLs end with one of the following suffixes:

  1. .com = commercial organization
  2. .edu = educational institution
  3. .gov = government agency (state and Federal)
  4. .net = network
  5. .org = organization (usually non-profit)

Most browsers are now "smart" enough that all you need to specify is the "core" URL (e.g. www.swedish_chef.com). You don't need to add the "http://". The browser assumes that you are requesting a hypertext document. In some cases you don't even need the full URL: just typing the organization name or other keyword will get you there, the browser automatically adds a "www" prefix and a ".com" suffix to the request. Thus, to get to Microsoft's home page, all we need to type is "microsoft" and we are there!

Using the above syntax, we will automatically be sent the document called "index.html" located in the swedish_chef.com public_html directory. That is, we will be sent the swedish_chef.com "home page."

If we wanted a specific document, we need to make a specific reference to it. For example:

http://www.swedish_chef.com/bork/bork.html

tells us there is a document called "bork.html" located in the bork subdirectory of swedish_chef.com's public_html directory.

One important note is that you do not have to use the domain name syntax to reference a web site: you can also use it's IP address, they mean the same thing. Thus if swedish_chef.com had an IP address of 123.456.789.111, we could just as easily type "http://123.456.789.111" to request the www.swedish_chef.com home page.

Finally, as you are probably already aware, most URL references are not to different Web sites, but other documents within your own Web site. There are two ways that you can reference documents within your own Web site:

  1. Absolute Referencing

    http://www.swedish_chef.com/directory1/directory2/filename

  2. Relative Referencing

    1. Files in the same directory as the present file. There are two equally valid ways to reference a file in the same directory:
      1. <a href = "./filename"></a>
      2. <a href = "filename"></a>

    2. Files in a sub-directory of the present directory
      1. <a href = "./directory1/directory2/filename"></a>
      2. <a href = "directory1/directory2/filename"></a>

    3. Files off of the parent directory of the present file
      1. <a href = "../filename"></a>
      2. <a href = "../directory1/directory2/filename"></a>

      • If you want to reference files off of parent directories further than one directory back, you will need to use absolute referencing.

Site Maintenance

The use of referencing schemes brings us to another important point: site maintenance. As your Web site grows, you will soon find that using a single directory to hold all of you image, HTML and other files (including Lord knows how many test files) will become increasingly cumbersome. Thus, it is a good idea to create separate directories for images and HTML files representing different "areas" of your Web site. All of these directories can merely be sub-directories of the public_html directory, and while taking a little thought at first, will make you life a lot easier in the long run. In fact, as we will see later, if there are certain pages that you want to restrict access to certain individuals or groups, creating separate directories becomes a necessity. At a minimum, you should have a separate sub-directory to maintain all of your graphics files. You would be surprised how many images you collect when creating a fairly large-scale site.

Types & Syntax

There are two primary types of hyperlinks:

  1. External = any document other than the one you are currently looking at

    1. WWW
      1. <a href = http://www.nowhere.com#name>descriptive text</a>
    2. Email
      1. <a href = mailto:email address> descriptive text </a>
    3. FTP
      1. <a href = ftp://ftp.nowhere.com> descriptive text </a>
    4. Usenet News
      1. <a href = news:rec.food.recipies> descriptive text </a>
    5. Telnet
      1. <a href = telnet://nowhere.com> descriptive text </a> or
      2. <a href = telnet://username@nowhere.com> descriptive text </a>
    6. Gopher
      1. <a href = gopher://nowhere.com> descriptive text </a>
    7. WAIS
      1. <a href = wais://nowhere.com> descriptive text </a>

  2. Internal = a reference to another part of the current document
    1. <a name = "name"></a>
      • Before frames, this used to be a popular way to make navigation of large documents more manageable. A name would be assigned to various parts of the document, and a table of contents with hot links to each named element would be at the top of the document.

Making Clickable Images

As we discussed in the previous lecture, one of the popular types of graphics is the button. One of the most popular uses of the button is the forward/backward arrow, which, when clicked, will take you to the previous or next page. Some of the more popular uses of "hot" graphics include:

  1. Forward/backward arrows

  2. Return to home page.

  3. Mail.

  4. Clickable logos of organizations.

Graphic images can be made "hot" by placing them between <a href = ></a> tags. That is the syntax would be as follows:

<a href = URL><img src = imagefilename></a>

Rather than having highlighted text, the result will be an image which, when double-clicked, will send the person to the designated URL. You will also notice that the following syntax will result in an image which has a small blue border around the edges. The border indicates that it is a "hot" images linked to a URL. To eliminate the border, use "border = 0" attribute in the <img src> tag. If you want to change the border's size, use "border = #". There are two schools of thought on the border issue:

  1. Keep the Border

  2. Eliminate The Border

Regardless of whether you decide to keep or eliminate the border, it is a good idea to add some descriptive text which makes it apparent what the image's function is. If one places the words "Click here to go to the Swedish Chef Home Page" next to an image, it is difficult to confuse the purpose of the image. This helps make things easier on the novice as well as the not so infrequent text-only Web surfer. In the same vein, it is a good idea to use the "alt = description" attribute with hot images. If the image is not loaded, the descriptive text will appear. Because the alternative text will be situated between anchor tags, it will be linked to the same location as the image it substitutes. Why would you want to do this? In the era of increasing graphical overhead, many people are actually returning to text-only browsers, such as Lynx, to speed along the information search process. By having alternative text descriptions of hot images, you ensure that all visitors to your age are privy to the same information and make their experience as productive as possible no matter what level of technology they are using.

Error Messages

Because hyperlinks are case sensitive, if you get a URL Not Found message, double-check you typing to see if you entered an upper-case letter where a lower-case letter should be (or vice-versa). Oftentimes these "Not Found" errors come from a typographical error rather than a non-existent web site.

Also, on particularly busy sites you may get a "Not Found" error because the server did not respond in time. Your browser became impatient and "gave-up" waiting. It took a "sour grapes" attitude and said "It's not my fault, that page doesn't exist!". Try later if you know that the site exists and you typed the URL correctly. It's just busy and response will be slow anyway.

Ever wonder what all of those pesky error messages with the funny numbers really meant? Here are two good sites which list the major errors returned by a web server and what they mean:

  1. What do errors I encounter mean?
  2. The Errors.

Lab assignment

Create your home page.

This page should be called index.html. It should be located in the public_html subdirectory. It should have appropriate permissions set. Include at least one example of each of the following:
Note that once you have an index.html file located in your public_html, a URL call to that directory will no longer contain an index of the directory. Now it will simply call up this file. Be sure you have links to all your earlier assignments available on this home page.

This is also a good place to start putting some of your favorite links. You may want to set up your browser so it automatically starts at your home page. By doing this, you give yourself a customized 'table of contents' for your browser.



© 1996, Ted D. Biggs, Andy Harris,
Dept. of Computer and Information Science,
Indiana University, Purdue University, Indianapolis
email: tbiggs@klingon.iupucs.iupui.edu