What is an internationalized domain name (IDN)?
According to the international telecommunications union (ITU), more than three billion people use the World Wide Web and increasingly so in their mother tongues. This change was in part brought on by the introduction of international domain names in 2003. We’ll explain how IDN domains work.
What is an internationalized domain name (IDN)?
The IETF (Internet Engineering Task Force) refers to IDNs as domain names that contain special characters that are not part of the Latin alphabet, such as umlauts or characters from other alphabets. However, the Domain Name System (DNS), which is responsible for translating URLs into IP addresses, cannot understand these domain names. The DNS is based on the limited standard character set ASCII.
In order to make IDNs understandable for the DNS as well as other internet protocols, the internet standard Internationalizing Domain Names in Applications (IDNA) was created in 2003. This defines a standardized translation from Unicode to ASCII, therefore enabling the use of non-ASCII characters in domain names.
- Free Wildcard SSL for safer data transfers
- Free private registration for more privacy
- Free 2 GB email account
How does IDNA work?
Much of the internet’s infrastructure is only supported by the ASCII character set. In order to make sure that international domain names can be processed, each IDN that’s available in Unicode is translated into an ACE string, which is based on ASCII. Following this, URLs featuring characters with accents or umlauts are displayed. The server, on the other hand, continues to process the addresses as ASCII compatible. This procedure is specified in the IDNA2003 internet standard and in the IDNA2008 revision, which was approved in 2010. Translating from Unicode to ASCII occurs client-side (in the browser, email program, etc.) and is based on a standardized coding process called Punycode.
Punycode
The RFC 3492-standardized Punycode was developed for clearly displaying Unicode character strings as ASCII symbols without loss of quality. All non-ASCII characters are removed from the domain name, encoded and separated with a hyphen. This code sequence contains information about the Unicode symbol in question as well as its position in the domain name. Additionally, each ACE string created in this way is labeled with the prefix xn–. This clarifies to the reader that the character sequence is an IDN that has been encoded according to IDNA and Punycode standards. See our article on Punycode for a detailed explanation of the encoding process as well as some examples.
With an online IDN domain converter, you can convert IDNs to their corresponding ACE strings using Punycode.
Differences between IDNA2003 and IDNA2008
For the original 2003 procedure, internationalized URLs were normalized prior to Punycode encoding using the nameprep method. This method changed capital letters into lowercase letters, removed control characters and transferred equivalent characters into a unified form. Nameprep was removed from this process when IDNA2008 was introduced. Now, IDNA does not specify any normalization. Instead, it recommends an algorithm that converts capital letters into lowercase ones.
This adaption also accommodates users in the German-speaking world, since the Unicode character “ß”, which is common in Germany, was originally defined as the equivalent of “ss” according to IDNA2003. Domains such as www.fußball-ergebnisse.de
were thus automatically normalized to www.fussball-ergebnisse.de
in the nameprep process. This is no longer the case since IDNA2008 came into the picture. Since 2010, the “ß” is correctly interpreted as “Latin small letter sharp s” and can be registered as part of an IDN domain.
In addition, around 8,000 characters that were possible in domain names under IDNA2003 are no longer supported under IDNA2008. Four characters including “ß” are interpreted differently since the standard was revised. For a detailed discussion of the differences between IDNA2003 and IDNA2008, see Unicode Technical Standard #46. The following table provides a summary of the main differences:
IDNA2003 | IDNA2008 |
---|---|
Nameprep procedure required | No normalization specified |
Valid for Unicode 3.2 | Valid for Unicode versions from 5.2 onwards |
Strict rules for right-to-left fonts | Clearer rules for right-to-left fonts |
Upper- and lower-case letters are considered as separate characters | Upper-case letters are converted to lower-case letters |
Many symbols are prohibited, e.g., graphic symbols that do not belong to any alphabets, as well as some punctuation | |
“Remapping” removed from some Unicode characters, as this could lead to irregularities |
What problems are there with IDNs?
By now, all common internet programs should be able to understand IDN. However, problems with internationalized domain names sometimes occur because the switch from IDNA2003 to IDNA2008 has not yet been consistently implemented. One example that’s problematic for German is the different interpretation of “ß”. Since IDNA2003 compulsorily converts “ß” to “ss”, special ß domains that can be registered according to IDNA2008 are often not discoverable for systems that convert according to the outdated standard. Instead, users are directed to the corresponding domain containing “ss”. This problem can be circumvented by website operators registering both variants and redirecting the second domain to the prioritized spelling using a domain redirect.