Cyberspace geography visualization
The HTTP URL takes the form
http://<host>:<port>/<path>?<searchpart>where if <port> is omitted, the port defaults to 80. It is obvious that URL containing a searchpart element can be discarded. An example of an HTTP URL is
http://www.w3.org/hypertext/WWW/Protocols/HTTP1.0/draft-ietf-http-spec.htmlwhich is the location of the HTTP Internet draft.
According to a defined MIME (Multipurpose Internet Mail Extensions) type, a resource can be either a text, a hypertext, a picture, a sound, etc. Because we are interested only in information containing hyperlinks, we can focus our attention on hypertext documents which are currently only available in HTML. Note that VRML (Virtual Reality Markup Language), although in experimental testing, should soon give hyperlinks functionalities to three-dimensional immersive environments.
HTML is an application of SGML (Standard Generalized Markup Language). It permits the anchoring of parts of documents, either in textual or pictural forms, to other resources by giving their URLs. A URL can be specified in its absolute form or as a relative address. This is an example of simple HTML document with one hypertext link:
<HTML> <HEAD> <H1>This is the title</H1> </HEAD> <BODY> <P>This is a paragraph with one <A HREF="http://www.eit.com/web/www.guide/">hypertext link</P>. </BODY> </HTML>To fetch a resource using HTTP, a connection to the specified host has to be establish over TCP (Transmission Control Protocol). The request for a resource can then be made by sending a GET command followed by the URL. A response header is returned with the information on the MIME type. As discussed before, we will limit ourselves to the text/html Content-Type. This header is normally followed by the data in the format of a MIME message body. Because no assurance is given of the existence of a resource, this fetching should be made particularly tolerant of any error.
After the HTML document has been successfully fetched, parsing its content can be made. Each discovered anchor can be put in a queue of URLs to fetch. It has to be put at the end of the queue to accomplish a breadth-first search and at the top for a depth-first search.
The next URL to fetch can then be popped from the top of the queue and this process can continue until a specified number of resources has been fetched or until the queue is empty.
Each successfully fetched URL, with all of its anchored links, can then be stored.