跳转至

1.0 Intro

What is a computer network?

A computer network is a structure linking devices together for the purpose of communication. It can be modelled as a graph, where each individual device is modelled as a node in the graph, and the physical connection between two devices is modelled as an edge in the graph. The purpose of the network is to allow computers to talk to each other. These connections can be implemented in a variety of ways, including physical cables, fibre-optic cables, satellites, and phone lines.

The current structure of the internet

Today's Internet is still a network of interconnected networks. Although almost every computer in the world is connected to almost every other computer, different parts of the Internet are run by different organisations. High-speed backbone links were created to carry large amounts of data. Smaller networks connect to these backbones, allowing any user on any network to exchange data with any other user. These backbone links and individual networks can be owned by companies, universities, or nation states. The Internet achieved ARPA's original goal that if part of its infrastructure is destroyed, data can still flow through the remaining networks (in principle), although recent centralisation tendencies may impact this. By the early 2000s, the Internet had become indispensable global infrastructure. Today, the Internet connects tens of billions of devices, including computers, smart phones, televisions, printers, fridges, watches, and most crucially... water bottles. These devices are connected using a wide range of different types of links, such as copper cables, fibre-optic cables, satellites, and phone lines.

What is a protocol?

No matter what medium is used (fibre, satellite links, etc.), the physical connections send data as a sequence of bits. Network communication is possible only if computers “speak” a common language and know how to interpret the bits. These common languages are known as protocols. On the Internet, it's "protocols all the way down". A protocol is a set of predefined rules that tell computers how to interpret the stream of bits being transmitted. A simple web page loading process will involve hundreds of protocols. Protocols are the lifeblood of the internet.

精简案例 (Simplified Example):

  • IP – for consistent addressing of entities on the internet.
  • BGP – for finding the best routes across the internet.
  • SMTP – for sending and receiving emails.
  • FTP – for sending and receiving files.

Purpose of TCP/IP

Every computer and network on the Internet uses the same set of protocols – the Transmission Control Protocol/Internet Protocol (TCP/IP). TCP/IP was developed by DARPA to guarantee the proper transmission of data, since the physical layer in the network may be unreliable. IP is responsible for providing consistent addressing for entities on the internet. TCP is responsible for error-free delivery of streams of data. In TCP/IP, a stream of data is split into packets which are sent individually over the network. For transmission not needing guarantees (e.g., video streaming), User Datagram Protocol (UDP) can be used instead of TCP. UDP transmission is faster, with none of the error detection or correction overheads that are in TCP/IP. The TCP/IP model is split up into four layers: Application layer, Transport layer, Internet layer, and Network access layer. The Transport layer is responsible for converting the stream of data to and from a sequence of packets and for detecting and fixing packets that are lost or corrupted during transport. The Internet layer is responsible for transmitting a single packet from the source device to the destination device across the network.

Client/server architecture

Most communication on the internet takes the form of a client-server relationship. The server is a computer whose address is known, and which stores information on its file system. The client sends a request for information to the server via an agreed protocol (FTP, SMTP, etc.). The server transmits the requested information back to the clients.

Advantages:

  • Multiple clients can use a single server.
  • New clients can join the system without having to be registered in advance.
  • We have a single, central source of information (although this has become increasingly less true in the last 10-15 years).

Disadvantages:

  • There is a single point of failure – the server.
  • If too many clients, the server may be overloaded with requests.

To get around these disadvantages, many duplicate servers containing the same content can be used. The cost is that more work must be done to keep the copies of information synchronised. The client knows who the server is, but the server ideally handles the request and can forget about the client, which allows the architecture to scale.

精简案例 (Simplified Example): When you type a website address into your browser, your browser is the client, sending a request to the server that hosts the website. The server receives the request, finds the relevant web page information, and sends it back to your browser (the client) to display.

World wide web

The World Wide Web is essentially the fragment of the internet accessible through web browsers. The World Wide Web is built on top of the Internet. The Internet is the network that allows information transfer, while the World Wide Web is a set of applications built on top of it. It provides a standardized way to access information stored on that network. It originated at CERN in 1989, invented by Tim Berners-Lee and colleagues, and was publicised in 1991. It included the first browser and the Domain Name System (DNS). In 1993, the release of the Mosaic browser (the first multimedia browser) led to an explosion in internet use. The World Wide Web was called the "killer app" of the 90s. It is a unique engineering environment with obscure ownership and control. Many consider it one of the most impressive pieces of infrastructure ever built by humanity. Hundreds of standards define interaction over web pages. Today, due to the popularity of search, URLs are nearly irrelevant to users.

简要历史演变 (Brief Historical Evolution):

  • Early internet: Mainly used by technical people in universities and research labs. Accessing information was difficult (needed IP address), and there was no uniform way of visualising information.
  • World Wide Web: Introduced DNS and browsers, making information access easy (no need to remember IP addresses, browser handles rendering). Seamless navigation between different pieces of information was enabled by hyperlinks.
  • Static Web: Late 20th century, users primarily accessed static information on servers in a "read-only" mode.
  • Social Web: Early 21st century, technology allowed users to store their own information on servers and create user-generated content (e.g., social media, blogs).
  • Personalized Web: In the last five years or so, the web started responding personally to users' interests, with tailored search results and services offered. The advent of large language models aggregates information and extracts semantic content.

Purpose of DNS

DNS stands for Domain Name System. Its purpose is to resolve URLs (human-readable names) to IP addresses. With DNS, users no longer need to remember complex numerical IP addresses but can use easier-to-remember domain names to access specific servers on the Internet. Your computer and router will use DNS to determine which server on the Internet the information needs to be sent to.

精简案例 (Simplified Example): When you visit google.com, your computer doesn't connect directly to something called "google.com". It sends a request to a DNS server asking: "What is the IP address for google.com?" The DNS server returns a numerical IP address (e.g., 172.217.160.142), and your computer then uses this IP address to connect to Google's servers.

URL structure

URL stands for Uniform Resource Locator. It is a human-readable format used for addressing World Wide Web servers.

A typical URL structure example is: http://www.domain.edu.au:1000/path/to/file?parameters=true#fragment.

The URL contains the following parts:

  • Protocol: The protocol being used, typically http, ftp, https, etc..
  • Domain name: The domain name, mapped to an IP address by a DNS server. E.g., www.domain.edu.au.
  • Port number: The port number, servers have ports 0-65535, but HTTP defaults to port 80.
  • Path: The path to the file to be executed or retrieved (or code/function to be run on the server). E.g., /path/to/file. This often points to code/functions rather than physical files.
  • Parameters: Parameters of the request, specified as key-value pairs. These parameters can change how the server renders content or what information it sends. E.g., parameters=true.
  • Fragment: The fragment, anchoring to a location in a page. E.g., #fragment.

There are also hidden parts of the URL, including the browser name and cookies.

Client-server architecture in the world wide web

In the World Wide Web, the client-server architecture is almost universally used. Communication is by an agreed protocol, e.g., the HTTP (HyperText Transfer Protocol). The servers store the web pages (information), typically as HTML files (although a web page consists of multiple files including CSS, JavaScript, images, etc.). The user requests a web page through the browser, a program running locally on their computer (acting as the client). The browser locates the correct server and communicates the request. The server retrieves the web page from its local file system or constructs it based on information and parameters. The server transmits the files (usually text files containing HTML instructions) back to the browser. The browser receives the files and uses them to render the web page. Modern World Wide Web servers are typically many computers stacked in data centres, capable of handling massive requests and scaling proportionally.