Encode & Decode
Wiki
Percent-encoding
Percent-encoding, also known as URL encoding, is a method to encode arbitrary data in a Uniform Resource Identifier (URI) using only the limited US-ASCII characters legal within a URI. Although it is known as URL encoding, it is also used more generally within the main Uniform Resource Identifier (URI) set, which includes both Uniform Resource Locator (URL) and Uniform Resource Name (URN). As such, it is also used in the preparation of data of the application/x-www-form-urlencoded
media type, as is often used in the submission of HTML form data in HTTP requests.
Types of URI characters
The characters allowed in a URI are either reserved or unreserved (or a percent character as part of a percent-encoding). Reserved characters are those characters that sometimes have special meaning. For example, forward slash characters are used to separate different parts of a URL (or more generally, a URI). Unreserved characters have no such meanings. Using percent-encoding, reserved characters are represented using special character sequences. The sets of reserved and unreserved characters and the circumstances under which certain reserved characters have special meaning have changed slightly with each revision of specifications that govern URIs and URI schemes.
Reserved characters
When a character from the reserved set (a "reserved character") has a special meaning (a "reserved purpose") in a certain context, and a URI scheme says that it is necessary to use that character for some other purpose, then the character must be percent-encoded. Percent-encoding a reserved character involves converting the character to its corresponding byte value in ASCII and then representing that value as a pair of hexadecimal digits (if there is a single hex digit, a leading zero are added). The digits, preceded by a percent sign (%
) as an escape character, are then used in the URI in place of the reserved character.
(For a non-ASCII character, it is typically converted to its byte sequence in UTF-8, and then each byte value is represented as above.)
The reserved character /
, for example, if used in the "path" component of a URI, has the special meaning of being a delimiter between path segments. If, according to a given URI scheme, /
needs to be in a path segment, then the three characters %2F
or %2f
must be used in the segment instead of a raw /
.
␣ |
! |
# |
$ |
% |
& |
' |
( |
) |
* |
+ |
, |
/ |
: |
; |
= |
? |
@ |
[ |
] |
%20 |
%21 |
%23 |
%24 |
%25 |
%26 |
%27 |
%28 |
%29 |
%2A |
%2B |
%2C |
%2F |
%3A |
%3B |
%3D |
%3F |
%40 |
%5B |
%5D |
Reserved characters that have no reserved purpose in a particular context may also be percent-encoded but are not semantically different from those that are not.