This post is a summary of one of my Udacity class where I have got some important basic information of http. There is the course link.
Basic
The http transaction always involves a client and a server. And the browser is the most common client we used. The basic workfow of http is that your browser sends http requests to web server, and the web server send responses to your web browser.
Browsers are the most common and complicated user interface for web technology, but they are not the only web client around. Http is powerful and widely supported in software, so it is a common choice for programs that need to talk to each other across the network, even they don’t look like anything like a web browser.
You enter a URL in the browser, and if there is a file named index.html in the directory represented by the URL, the browser will present the content of the index.html.
1. What is a server?
A server is just a program that accepts connections from other programs on the network.
When you start a server program, it waits for clients to connect to it. Then when a connection comes in, server will run some programs to handle it. The connection is like a channel, for example a phone call, the clients send requests over the connection, and servers send response back.
2. URI
A web address is also called a URI for Uniform Resource Identifier.
You put it into the browser to tell the browser where to go.
You problably have known the term URL or Uniform Resource Locator. They are pretty close to the same thing. A URL is a URI for a resource in the network. And URI is slightly more prices.
A URI is a name for a resource, the form of the resource is pretty abundant, like page, data and API, so on.
Here is an example of URI: https://en.wikipedia.org/wiki/Fish (this is a common example of web address URI, and there are many other style URI)
The URI has three visible parts:
- https is the scheme
- en.wikipedia.org is the hostname
- and <p sytle="color=red">/wiki/Fish</p>is the path
Scheme
The first part is scheme, which tells the client how to go about accessing the resource. For example, http, https and ftp. Http and https URIs point to resources served by a web server.
Hostname
Hostname tells the client which server to connect to. And hostname can only appear after a URI scheme that supports it, such as http or https. In these URIs there will always be a “://” between the scheme and hostname. And in fact, the “:” goes after the scheme, and the “//” goes before the hostname. For example, mailto URIs doesn’t have the hostname part, and mailto:example@gmail.com, a well-formed URI, only has the “:”.
Path
In an http URI, the next thing that appears is the path, which identifies a particular resource on a server. A server can have many resources on it. The path can tells the server which resource the client is looking for.
When you write a URI without a path, such as http://www.baidu.com, the browser fills in the default path, which is written with a single slash. That’s why the http://www.baidu.com is the same with http://www.baidu.com/( with the slash on the end).
The slash after hostname is also called root. You’re not looking at the root of your computer’s whole file system. It’s just the root of the resources served by web server.
Relative URI reference
<a href="a.png">a.png</a>
The above html fragment shows a URI of a picture named a.png, but the ‘href’ attribute doesn’t include the hostname or port of the server. This is a relative URI reference, and it’s relative to the context in which it appears - specially the page is on. The browser can figure that out from context.
Other URI parts
https://en.wikipedia.org/wiki/Oxygen#Discovery
The part of the URI after the # sign is called a fragment. The browser doesn’t even send it to the web server. It lets a link point to a specific named part of a resource; in HTML pages it links to an element by id.
https://www.google.com/search?q=fish
The ?q=fish is a query part of the URI. This does get sent to the server.
There are a few possible parts of URI, you could read this for more information:
https://en.wikipedia.org/wiki/Uniform_Resource_Identifier#Generic_syntax
3. Hostname and Port
Hostnames
The internet tells computer by IP address, and the traffic of sending and receiving over the internet is labeled by IP address. In order to connect to a web server such as www.baidu.com, a client needs to translate the hostname into an IP address.
Your operating system’s network configuration uses the Domain Name Service(DNS)- a set of servers maintained by Internet Service Providers(ISPs) and other network users - to look up hostnames and get back IP address. There are two programs to look up hostnames in DNS.
$ host www.baidu.com
$ nslookup www.baidu.com
nslookup also give the IP address of the DNS server.
Why is it called a hostname? In network terminology, a host is a computer in the network; one that could host services.
Localhost
The IPv4 address 127.0.0.1 and the IPv6 address ::1 are special addresses that mean “this computer itself”. The hostname localhost refers to these special addresses.
Port
When start a server in the computer, it’s usual to enter the URI like “localhost:8000”, offering a port number after the hostname, but most of the web addresses you see in the wild don’t have a port number on them. This is because the client usually figures out the port number from the URI scheme.
For instance, HTTP URIs imply a port number of 80, whereas HTTPS URIs imply a port number of 443.
What is a port number, anyway? In the network, IP address distinguish computers, and port numbers distinguish programs on those computers to decide which program should handle this internet data.
We say that a server “listens on” a port, such as 8000. “Listening” means that when server starts up, it tells its operating system that it wants to receive connections from clients on a particular port number. When a client(such as a web browser) “connects to” that port and sends a request, the operating system knows to forward that request to the server that’s listening that port.