When you need to locate someone's home, you need their house address. If you want to call your friend, you need your friend's phone number. Without that information, finding that house or calling your friend is not possible. Further, if you're provided an address or phone number, you can immediately tell one from the other, due to the uniformity of how an address is formatted vs. how a phone number is formatted.
There's a similar concept for finding and accessing servers on the Internet. When you want to check Facebook's games page, you start by launching your web browser and navigating to http://www.facebook.com/games. The web browser makes an HTTP request to this address resulting in the resource being returned to your browser. The address you entered,
https://www.facebook.com/games, is known as a Uniform Resource Locator or URL. A URL is like that address or phone number you need in order to visit or communicate with your friend. A URL is the most frequently used part of the general concept of a Uniform Resource Identifier or URI, which specifies how resources are located. This section looks at what a URL is, its components and what it means to you as a web developer.
When you see a URL, such as "http://www.example.com:88/home?item=book", it is comprised of several components. We can break this URL into 5 parts:
http: The scheme. It always comes before the colon and two forward slashes and tells the web client how to access the resource. In this case it tells the web client to use the Hypertext Transfer Protocol or HTTP to make a request. Other popular URL schemes are
git. You may sometimes see this part of the URL referred to as the protocol, and there is a connection between the two things in that the scheme can indicate which protocol (or system of rules) should be used to access the resource; in the context of of a URL however, the correct term for this component is the scheme.
www.example.com: The host. It tells the client where the resource is hosted or located.
:88 : The port or port number. It is only required if you want to use a port other than the default.
/home/: The path. It shows what local resource is being requested. This part of the URL is optional.
?item=book : The query string, which is made up of query parameters. It is used to send data to the server. This part of the URL is also optional.
Sometimes, the path can point to a specific resource on the host. For instance, www.example.com/home/index.html points to an HTML file located on the example.com server.
Sometimes, we may want to include a port number which the host uses to listen to HTTP requests. A URL in the form of: http://localhost:3000/profile is using the port number
3000 to listen to HTTP requests. The default port number for HTTP is port
80. Even though this port number is not always specified, it's assumed to be part of every URL. Unless a different port number is specified, port
80 will be used by default in normal HTTP requests. To use anything other than the default, one has to specify it in the URL.
A simple URL with a query string might look like:
Let's take that apart:
|Query String Component||Description|
|?||This is a reserved character that marks the start of the query string|
|search=ruby||This is a parameter name/value pair.|
|&||This is a reserved character, used when adding more parameters to the query string.|
|results=10||This is also a parameter name/value pair.|
Now let's take a look at an example. Suppose we had the following URL:
In the above example, name/value pairs in the form of
color=white are passed to the server from the URL. This is asking the
www.phoneshop.com server to narrow down on a product
32gb and color
white. How the server uses these parameters is up to the server side application.
Because query strings are passed in through the URL, they are only used in HTTP GET requests. We'll talk about the different HTTP requests later in the book, but for now just know that whenever you type in a URL into the address bar of your browser, you're issuing HTTP GET requests. Most links also issue HTTP GET requests, though there are some minor exceptions.
Query strings are great to pass in additional information to the server, however, there are some limits to the use of query strings:
&cannot be used with query strings. They must be URL encoded, which we'll talk about next.
URLs are designed to accept only certain characters in the standard 128-character ASCII character set. Reserved or unsafe ASCII characters which are not being used for their intended purpose, as well as characters not in this set, have to be encoded. URL encoding serves the purpose of replacing these non-conforming characters with a
% symbol followed by two hexadecimal digits that represent the ASCII code of the character.
Below are some popular encoded characters and example URLs:
Characters must be encoded if:
%is unsafe because it can be used for encoding other characters. Other unsafe characters include spaces, quotation marks, the
~, among others.
&are all reserved and must be encoded. For example
&is reserved for use as a query string delimiter.
:is also reserved to delimit host/port components and user/password.
So what characters can be used safely within a URL? Only alphanumeric and special characters
$-_.+!'()", and reserved characters when used for their reserved purposes can be used unencoded within a URL. As long as a character is not being used for its reserved purpose, it has to be encoded.
In this chapter, we've discussed URLs and what a URL is. We also looked at components of the URL and concluded by exploring URL encoding. We'll dive a little deeper into requests and responses and what they comprise of after the preparations chapter.