From The Internet: A Guide for Chemists
Edited by Steven M. Bachrach
Published 1996 by the American Chemical Society: ISBN 0-8412-3223-7
History of the Internet
By Laurence R. Dusold
In This Chapter
This chapter gives a description and brief history of the Internet and instructions for connecting to it.
What Is the Internet?
The Internet is a matrix of networks and computer systems linked together around the world.1 Networks have no political borders or boundaries on the exchange of information. Networks are connected by gateways that effectively remove barriers so that one type of network can "talk" to a different type of network. The Internet is the largest of these different networks in terms of the number of sites interconnected and the number of users.
In this chapter, the Internet's history and evolution into what today is popularly known as the information superhighway will be described. Also, the protocol running on the Internet, the transmission control protocol/Internet protocol (TCP/IP),2-5 will be described informally in the context of its evolution in the history of the Internet.
The most important reason for all of the interest in the Internet is not the network itself or how it works technically, but the information and data exchange available to the users. The explosive growth of the Internet can be seen in Figure 1.1.6
The First Network
In 1969 the Defense Advanced Research Projects Agency (DARPA) funded the establishment of a few connected sites. These first few sites eventually evolved into the ARPANET. The military intent was to build a network that continued to work in the event of a war if parts of the network were broken. DARPA's goal in this original network design was to show the feasibility of a packet-switched network that would have no single point of failure. The "packet" contained a short piece of electronic data or information that would later be put back together with other packets at the receiving computer into the complete message. All of the connection points in the network were equal, with no central administration. Paul Baran described what was to become the ARPANET in a Rand Corporation paper7 that was made public in 1964.
By the end of 1969 the first four sites were running on the infant ARPANET. These sites were at the University of California at Los Angeles, Stanford Research Institute, the University of California at Santa Barbara, and the University of Utah. These four research sites had computer centers that were doing network research. Each site had an interface message processor (IMP) computer connected to the network. The IMP computers were Honeywell minicomputers with 12K (K is kilobyte) of memory, which is very small when compared with today's standard personal computer (PC) on a desktop. The IMP computers broke down the message traffic from the sending site into packets, and at the receiving site another IMP assembled the packets back together. The original speed of the links between the four sites was 50,000 bits per second (or 50 Kbps; here K means thousand). The original network research studied different ways the network could fail when it got too many fragments of messages at one time. (This was later called network congestion).
The success of the beginning ARPANET soon became apparent when it started to grow beyond the original four sites. In 1971 15 interconnected sites were using the IMP minicomputers. By 1972 the ARPANET had increased to 37 sites. An interesting discovery was made during these early beginnings of the ARPANET. Although the U.S. Department of Defense (DOD) had intended the network to be used mainly for computer research into making robust networks and for remote computing between sites, the users also started to exchange messages with one another because it was so convenient. The research exchange was getting done, but the volume of traffic devoted to personal messages and what in the future would be called electronic mail (e-mail) greatly exceeded anyone's imagination.
Request for Comments (RFC)
During the very early stages of the development of the ARPANET there were no network standards, or universally agreed upon ways of doing things between sites. So, an informal process evolved, the request for comments or RFC. Fortuitously, the RFC process helped to accelerate the growth of the Internet. The draft RFCs would suggest possible standards or ways of communicating and exchanging information, using software and protocols that everyone could probably agree upon. They were published as drafts, commented upon, and finally accepted as networking standards. When the draft RFC became final it would be assigned a number so that it could be referenced as an Internet Standard.
The first RFC was published in 1969 by Steve Crocker.8 Today there are about 2000 RFCs.9 Because draft standards were proposed first and feedback was requested from everyone on the network, the standards became more widely accepted by everyone participating in the evolving Internet. Today the RFCs have become the de facto network standards that every major vendor must follow to sell network hardware and software.
A series of RFCs grouped together, known as the for-your-information (FYI), covers topics on everything from getting started to ethics. The titles of the current FYIs available on the Internet are summarized in Appendix 1.
Protocols for Networking
Some of the computers that were being interconnected in the early 1970s on the ARPANET were Honeywell, IBM, DEC, and Xerox Data Systems. Any computer that happened to be at one of the participating sites might get "connected". No specific hardware or brand was required, other than the IMP computer to connect to the other computers in the expanding ARPANET. In October 1972 the first public demonstration of the young ARPANET took place at the International Conference on Computer Communications held in Washington, DC. The public demonstration worked better than the organizers had imagined.
During this time frame, the first electronic mail, or e-mail, program was written, and the concept of e-mail distribution lists (today's list servers) also evolved. At this stage TCP/IP had not been invented, so it was not being used yet for the growing ARPANET. A network control program (NCP) was being used, but it had many limitations by today's TCP/IP internetwork standards. In May 1974 Vinton Cerf and Robert Cahn wrote the first paper on the emerging TCP protocols, entitled, "A Protocol for Packet Network Internetworking", published by IEEE Transactions on Communications. RFCs soon followed on the emerging TCP protocol. TCP is described in RFC-793.10 TCP allows for the correction and error-free sending and receiving of packets of data on a computer network.
The Internet protocol (IP) is concerned with the transmission of packets between the source and destination. IP is not concerned with error recovery, just the successful movement of the packets from the originating site to the destination through the network. IP is described in RFC-791.11
TCP and IP are independent of the physical medium that they can run on. TCP/IP can be run on many other media, but the most popular are twisted-pair copper wire and fiber backbone cabling in buildings and campuses.
The operation of the ARPANET was moved to the Defense Communications Agency in 1975. In August 1975 the American Chemical Society Division of Computers in Chemistry sponsored a symposium on Computer Networking and Chemistry at the 170th national meeting in Chicago, Illinois. The ACS published the proceedings of this symposium in the ACS Symposium Series as No. 19.12 The ARPANET was being used by chemists for chemical research.
The UNIX operating software was developed at the University of California at Berkeley, and it included networking software. One of the aspects that made UNIX very popular was the transportability of the software. In 1977 UNIX was distributed by AT&T Bell Labs, and in 1979 Usenet News was started using the new UNIX software, between Duke University and the University of North Carolina. Usenet News can be loosely described as a worldwide bulletin board system where anyone on the Internet can read any posting, and anyone else can reply to any posting with an answer or comment. Because Usenet News is so "open" and unrestricted, it is one of the top three in network traffic on the Internet today.
In 1981 Bitnet started at the City University of New York with a connection to Yale University. Although the Bitnet network started as a network job entry (NJE) protocol network only, its e-mail contribution to the ARPANET through gateways was significant in the 1980s. These and other described networks are important because their data would be carried by the future Internet, and that would in turn fuel the further expansion of the ever-growing Internet.
CSNET (Computer Science Network) was also formed in 1981. In 1983 the European Academic and Research Network (EARN) was established, and it operated the same as the Bitnet network in the United States, to which it was also interconnected. EARN is linked to Bitnet as a cooperative network.
In 1989 CSNET merged with Bitnet. By 1994 Bitnet evolved into Bitnet-II and used TCP/IP as the connection protocol over the Internet. Bitnet-II designated a little over a dozen "core nodes" that electronically connected other Bitnet members in the United States. All of the Bitnet traffic is encapsulated in TCP/IP using special software, VMNET, written at Princeton University.
Today most of the Bitnet traffic is carried on the Internet. Bitnet, which is run by the Corporation for Research and Education Networking (CREN), has more than 2000 host computers in more than 30 countries. Bitnet host computers are usually larger computers (i.e., IBM or DEC mainframes, not PCs or workstations) having up to several thousand users per host at the larger sites. The Bitnet-II traffic on the Internet has consistently ranked in the top 14 by byte volume. It peaked at more than 170 billion bytes in one month in 1994 when this chapter was being written. Figure 1.2 shows a world map diagram of the Bitnet network.
Internet Starts Running
In 1982 the Defense Communications Agency (DCA) and DARPA established that TCP/IP was to be the standard connection protocol used on the ARPANET. DOD established the Defense Data Network as the umbrella network for all of the DOD during the year. The conversion to TCP/IP on the ARPANET was completed by January 1, 1983, to the new protocols, and the term "Internet" came into use from the term "the Internet protocol" (IP). In 1983 gateways connecting other networks, such as Bitnet, running protocols different from TCP/IP started to appear. Thus other networks were being connected to the ARPANET, and the ARPANET was evolving into a major carrier of TCP/IP traffic across the United States.
This tremendous growth caused a major event to happen in 1983: the ARPANET split into two networks, MILNET and ARPANET. For many this is the point that marks the beginning of the true Internet, because non-DOD sites were allowed to connect to the ARPANET after the MILNET split from the ARPANET, and TCP/IP was the accepted protocol running on the network.
The next major milestone in the evolution of the ARPANET occurred in 1984 when the number of interconnected hosts finally exceeded the 1000 mark. That same year the first domain name server (DNS) was put on the ARPANET. This server relieved users of the need to know the exact IP numbers and routes from one system to another. Also in 1984 national networks were started in Japan (JUNET or Japan UNIX Network) and in the United Kingdom (JANET or Joint Academic and Research Network).
The next significant event, was in 1986 when the NSFNET finally came into being. A network connecting five supercomputer centers was established by the National Science Foundation (NSF) to enhance computer and network research capabilities in the United States. NSFNET initially started with link speeds of 56 Kbps between the sites. The connection computers initially were LSI-11 computers from DEC that performed work on the network packets and other network functions that were evolving. The five sites were at the John von Neuman Computer Center at Princeton University, Pittsburgh Supercomputer Center at the University of Pittsburgh, the San Diego Supercomputer Center at the University of California at San Diego, the National Center for Supercomputer Applications at the University of Illinois at Urbana-Champaign, and the Cornell Theory Center at Cornell University. NSFNET allowed widespread connection of regional networks and universities.
In 1987 NSF signed a cooperative agreement with Merit Network, Inc., allowing Merit to manage the NSFNET and to upgrade the network, because it was already becoming overloaded. By the summer of 1988 Merit, in cooperation with IBM providing high-speed routers and MCI providing the high-speed circuits, was able to upgrade the initial 56-Kbps NSFNET to a T-1 (1.544 million bits per second) speed network The number of computers connected to the ARPANET and NSFNET now exceeded 10,000 for the first time, and a period of very rapid expansion of these and other interconnected networks was occurring. The T-1 Internet has interconnections in the United States to the National Aeronautics and Space Administration (NASA) Science Internet (NSINET) and the Department of Energy's Energy Science Network (ESNET), in addition to about 500 different computer networks.
The Internet Worm
In May 1988 Clifford Stoll wrote a paper, "Stalking the Wily Hacker",13 in which he showed that what had started as his attempt to track down a billing error in 1986 turned into an almost 2-year project. He traced an individual who had gotten into more than 30 computers on the Internet. On November 2, 1988, about 6000 computers on the Internet came under attack by a software program known as a "worm". For the first time, major parts of the network "broke" due to a security problem. Everyone was shocked at how quickly the worm affected only those sites running some variation of the UNIX operating system. Both events helped make it clear to the growing networks that network security should be considered and made a part of future expansion. Even with the negative publicity in 1988, it took only 2 years from the time the networks passed the 10,000 mark of connected computers in 1987 to exceed the 100,000 connected computers in 1989.
In June 1989, a special edition of Communications of the ACM, devoted in its entirety to the Internet Worm incident, carried a paper by Eugene Spafford entitled "The Internet Worm Crisis and Aftermath".14 This special edition chronicled the events around the worm, how it was handled, and what should be taught to students about ethics related to the Internet. The topics covered in this special edition are relevant today on the Internet as well. Estimates at the time suggested that no more than 6000 computers running UNIX were affected by the worm, or very roughly less than 10% of the Internet.
One of the interesting points made was that the Internet survived this attack by the worm so well because there was diversity in the Internet. Not all of the network computers were supplied by one vendor, they were not all running the same operating system, and they used different implementations of TCP/IP. The vast number of different brands and different implementations actually was a strong point in the assessment of the aftermath of the worm. The hardware and software diversity in fact contributed to the robustness of the Internet to survive attacks based upon a given hardware or software bug or loophole.
Nevertheless, because of these security incidents, RFC-1087, "Ethics and the Internet", was published.15 Also, as a direct result of these and other incidents, most judicious sites today run antivirus software on PCs and operate their TCP/IP networks behind "firewalls".16 Firewalls are computers and software that provide a way for an organization to put a barrier between itself and the Internet in order to keep out unwanted accesses and to enforce some type of access policy. Firewalls can significantly increase the complexity of the network for end users but improve the security of operating a corporate or a university network attached to the Internet. Scientists operating behind firewalls should not become lax in their concern for network security, because firewalls do not provide absolute security, but just increase the level of network security of the university or corporation using them.
NREN, the National Research and Education Network
Other events of interest in the evolution of the Internet that happened in 1989 were the formation of the Internet Engineering Task Force (IETF) under the Internet Activities Board (IAB), and the publication of "The Federal High-Performance Computing Program" by the Executive Office of the President, Office of Science and Technology, on September 8, 1989. In this document the National Research and Education Network (NREN) was formally proposed and a time line established for the next decade of the NSFNET and the Internet in the United States.
The Congress of the United States, Office of Technology Assessment, also published a background paper in September 1989 entitled, "High-Performance Computing and Networking for Science". Thus, both the Executive and Legislative branches of the U.S. Government were in basic agreement with the concept of establishing an NREN in the United States.
In 1991 the Gore Bill was passed by Congress and signed by the President on December 9. The purpose of this legislation, the High-Performance Computing Act of 1991, was to connect all schools, universities, and government agencies on one network. It established the development of a National Research and Education Network as part of its provisions.
In 1990 the National Science Foundation published its program guideline NSF-90-7, which promoted the continued expansion of the Internet. This program allowed any college or university in the United States to connect with the NSFNET. An excerpt from the NSF guideline follows:
The NSFNET Program supports highspeed, widearea computer communication networks with a goal of establishing an Internet or network of networks with three levels: a national backbone network, midlevel networks usually based around some geographical region of the country, and campus or local networks at educational and research institutions. The individual institutions are connected to a midlevel network in the appropriate geographical region. The midlevel network is in turn attached to the highspeed national backbone network, usually at its network operation center. The backbone is connected to other national networks including the Defense Research Internet, NASA Science Network, and the Energy Sciences Network; these interconnected networks and many others worldwide comprise the Internet.
The gateway to the NSFNET should be available to all researchers, faculty, and scholars at the institution. Ideally the institution will have installed a highspeed campus network and have adopted the TCP/IP protocols as standard.
This announcement was significant because no restrictions were placed on who could be connected as long as the Acceptable-Use Policy (AUP) of the National Science Foundation was followed. Generally under the AUP, any use that enhances education or the goals of the educational research community is allowed. Also, research and education users can use NSFNET to correspond with commercial companies and receive commercial services and products electronically. However, for-profit companies cannot use the backbone for advertising or other commercial purposes for which the companies initiate the contact. Universities and research organizations did not need to have any tie into computer network research or other government-funded projects of any type in order to connect to the NSFNET backbone or to the regional network to the NSFNET backbone in the United States.
Searching the Internet
In 1990 the ARPANET was officially decommissioned and ceased to exist. The other major occurrence in 1990 was that the software to search throughout the Internet for the location of electronic files emerged. This software, named Archie, is used to find files on the Internet so that users can use file-transfer protocol (FTP) to get a copy of the desired software. Archie was written at McGill University in Canada by Peter Deutsch, Alan Emtage, and Bill Heelan. Archie is now also offered as a commercial product with appropriate support available.
The regional networks throughout the United States allowed smaller colleges and universities to connect to the rapidly growing Internet. Figure 1.3 is a U.S. map of regional interconnection points in 1991. These providers serve local universities and industrial sites with research facilities in the United States.
The Commercial Internet Exchange (CIX) Association, which is a nonprofit trade association of public data internetwork providers, was formed in 1991 with for-profit members. The CIX had interconnects with the regional networks, and each carried the other's traffic without adding surcharges or traffic-based charges between CIX members. By joining the CIX, all members were assured that no restriction would be placed on the type of traffic routed between the member networks. Thus there was no fear of violating the NSFNET acceptable-use guidelines, which banned commercial Internet traffic from the government-funded NSFNET.
In addition to the for-profit organizations getting together in 1991, the other significant events were that Gopher software, written by Paul Lindner and Mark McCahill, was released by the University of Minnesota, and WAIS (Wide Area Information Service) software, written by Brewster Kahle, was released on the Internet.
Gopher software allowed Internet users to use a client program running on their local computers or workstations to go to a Gopher server and review ordered lists of information. Gopher made such an impact that client software soon existed for everything from IBM mainframes, DEC VAXes, PCs running both DOS and Windows, and UNIX workstations. Gopher within 2 years would be contributing to the Internet traffic at the compound growth rate of almost 1000%! Gopher traffic on the NSFNET backbone in the United States was 374 billion bytes in January 1994, and it increased to 864 billion bytes by October 1994. Gopher will be discussed in detail in a later chapter in this book, because it is a tool that contributes significantly to the ease of getting useful information from the Internet.
WAIS provides the network user a way to search through collections of indexed data on the Internet. WAIS is a text-searching system that is distributed over the Internet, and it works because other users have indexed the words occurring in a document. When a user requests a WAIS search, the client contacts all known servers that have relevant information and gives back results ranked in order of most likely to match to least likely to match the search request.17
Because of this continued growth, the NSFNET was upgraded again in 1992. This time the upgrade was to a national T-3 (45 million bits per second) backbone network. The applications that were evolving on the Internet, and thus the NSFNET backbone in the United States, were helped by the upgrades and in turn fueled further larger increases in the amount of network traffic being carried by the NSFNET.
Other events in the history of the Internet that occurred during 1992 were that the number of host computers exceeded the 1,000,000 mark, the Internet Society (ISOC) was formed, and the Internet Activities Board (IAB) became a part of the ISOC. The Internet Society was formed as an international professional organization to foster the evolution and use of Internet technology, standardization, etc. The Society currently also publishes bimonthly On the Internet (ISSN 1081-3969) and sponsors international meetings on the Internet.18
The other event that began in 1989 and finally became public in 1992 was the first public distribution of the World Wide Web (WWW) client software on the Internet from CERN, the European Laboratory for Particle Physics, which is the home of the Web, in Geneva, Switzerland. The World Wide Web concept was designed and implemented by Tim Berners-Lee at CERN.
A WWW client is computer software that runs on a PC, workstation, or on the userid of a mainframe computer. This software is concerned with receiving and presenting the data and information to the user. Examples of WWW client programs are Netscape, Mosaic, Lynx, and Charlotte. Netscape runs on PCs and Charlotte runs on mainframes. A WWW server also is computer software that runs on PCs, workstations, or mainframes, but it is concerned with information retrieval on behalf of the WWW clients that request the data. This software has a collection of files, data, possibly graphics, and even multimedia. Upon receiving a request from a WWW client program, the server gives out the data from its computer system onto the Internet in a very standardized format. The standard format for data and information dissemination on the Internet is HyperText transport protocol (HTTP).
In 1993 the Internet Network Information Center (InterNIC) was started with the funding of the NSF to provide information, registration, and directory services for the Internet, itself, including all the standards and RFCs. NSF's awardees were General Atomics (CERFNET), Network Solutions, Inc., and AT&T.
Also in February 1993, the National Center for Supercomputing Applications (NCSA) in Illinois (Marc Andreeson's code) released the first graphical viewer, Mosaic, for the WWW.
Because of the graphical user interface (GUI) and the ability to display images, play sounds, and get text information, the growth of the World Wide Web has been phenomenal. In April 1994 the NSFNET backbone traffic from the WWW surpassed the Gopher. It is increasing at a rate that is unbelievable, until one looks at the NSFNET traffic from WWW. In January 1994 it was 269 billion bytes, and by October 1994 it had increased to 2.152 trillion bytes! In terms of total bytes on the NSFNET only file-transfer protocol (FTP) and network news transport protocol (NNTP), which supports Usenet News, are currently higher.
Future of the Internet
Because of this continued, nonstop growth of the Internet, in February 1994 the NSF formally announced the next stage for NSFNET, which had a major impact on the Internet in the United States at the end of May 1995, when the T-3 NSFNET backbone was decommissioned. An excerpt of a press release from February 14, 1994, is given in Appendix 2.19 NSF's vision for the future was described at a briefing by NSF to explain the material in the press release.
Clearly, the Internet community is in the midst of a revolution in computer technology made possible by using workstations, PCs, mainframes, and supercomputers all tied together, with the Internet as the "glue". The very high speed backbone in the United States will also make feasible new applications that today are beyond imagination.
Uniform Resource Locators (URLs)
The URL or uniform resource locator originated with the World Wide Web project and Tim Berners-Lee. It allows any Internet resource to be specified and tells the reader how to get the information and where the resource is located. For example,
If you are using a WWW client, a useful starting point to search for information on the Internet is at URL http://www.cfsan.fda.gov/referenc.html More details are given in Chapters 4 and 11.
How To Get Connected
If you want to start out with e-mail and a limited amount of Gopher, Usenet News, and World Wide Web access time per month on the Internet, several choices are available with Windows-based or MAC-based (Apple Macintosh) graphical user interfaces. The obvious big three are CompuServe (1-800-487-9197), America Online (1-800-827-6364), and Prodigy (1-800-776-3449). The big three providers all give 5 or more hours per month access to the World Wide Web and essentially unlimited e-mail messages per month for less than $10. If you are affiliated with a small college or university that is not yet connected, a good starting point is the book Internet: Getting Started.20 All of the names, addresses, and phone numbers of the regional networks and major international network providers are listed.
If you are interested in personal access, with more access hours than the big three provide, many software packages come with a "Network Provider" in the box. Before signing up with one of these services, you should fully understand all of the connection costs, telephone long-distance charges, and so on. For many Internet users, being restricted to a base number of "prime time" hours and unlimited evening or weekend use is the only choice because of the costs. Two dial-up providers worth checking out (at the time this was written) are NetCruiser from Netcom (1-800-353-6600) and the Pipeline, now sold nationally by Performance Systems International, Inc. (1-800-827-7482).
An excellent starting place for PC-based Internet dial-up access from home or office is Connecting to the Internet: An O'Reilly Buyer's Guide.21
Furthermore, more than a dozen FreeNets are now operating in North America; they provide e-mail access and some other, sometimes limited Internet, services.22 You are asked to agree not to use the FreeNet for any commercial purposes and to abide by the local fair-use rules that may be in effect in order to make the resources available to as many users as possible. An additional source of general information on the Internet and its resources can be found in reference 17. Excellent additional resources are given in reference 23 which were published subsequently to this book chapter. The next evolutionary step for the Internet is already running and is called INTERNET-2. 24