On Thu, 25 Nov 2010, Andrew Sullivan wrote: > So what aside from [...] do you want?
Something like this: Abstract This memo clarifies the syntax of top-level domain labels in the domain name system as specified in RFC 1123, and how this syntax relates to the allocationn policy for TLDs. It describes the current [...blah...] Background [RFC0952] defines a host name in the first paragraph under "ASSUMPTIONS", as follows: A "name" ... is a text string up to 24 characters drawn from the alphabet (A-Z), digits (0-9), minus sign (-), and period (.). Note that periods are only allowed when they serve to delimit components of "domain style names". (See RFC-921, "Domain Name System Implementation Schedule", for background). No blank or space characters are permitted as part of a name. No distinction is made between upper and lower case. The first character must be an alpha character. The last character must not be a minus sign or period. [RFC1123] section 2.1 reaffirms this definition, but makes one change to the syntax: The syntax of a legal Internet host name was specified in RFC-952 [DNS:4]. One aspect of host name syntax is hereby changed: the restriction on the first character is relaxed to allow either a letter or a digit. Host software MUST support this more liberal syntax. In addition, the DISCUSSION in Section 2.1 says: 'However, a valid host name can never have the dotted-decimal form #.#.#.#, since at least the highest-level component label will be alphabetic.' [Section 2.1] Some implementers may have understood the above phrase "will be alphabetic" to be a protocol restriction. This is incorrect. It is in fact a description of the TLD allocation policy at that time. The TLD allocation policy has since had two significant syntactic changes. On 16 November 2000 the first long TLD (.museum) was allocated, and it was added to the root zone in June 2001. In October 2007, the first IDNA test TLDs were added to the root zone. These were the first TLDs with non-alphabetic characters. ICANN approved a policy for allocating IDNA ccTLDs in October 2009 and the first production IDNA TLDs were added to the root zone in January 2010. Deployed software that checks DNS top-level labels for conformance with past allocation policy is likely to reject domain names allocated after a policy change. Syntax of TLD labels - protocol level All labels of a domain name have the same syntax. The syntax of TLDs is not specially restricted at the protocol level. domain = *(label ".") label ["."] label = let-dig [ldh-str] let-dig = ALPHA / DIGIT ldh-str = *( ALPHA / DIGIT / "-" ) let-dig A label can be up to 63 characters long. A domain name can be up to 255 characters long. A domain name as a whole shall not match the dotted quad representation of an IPv4 address. IPv4 = 3(digits ".") digits = 1*DIGIT Syntax of TLD labels - allocation policy The syntax of allocated TLDs is restricted in order to ensure that no domain name can match an IPv4 dotted quad, and for compatibility with past practice and deployed software. The policy is subject to change by ICANN. This section describes the syntax of domain names permitted by the current allocation policy. IDNS encodes Unicode strings within the syntax permitted for domain name labels. The Unicode string used by applications is known as a U-Label; its corresponding encoding in the DNS is known as an A-Label. The terms A-Label and U-Label are used in this document as defined in [RFC5890]. Valid A-Labels always contain non-alphabetic characters. In order to accommodate the wish to express TLD names in scripts other than the ASCII subset of Latin, it is necessary to allow non-alphabetic characters in the corresponding TLD DNS-Labels. Following past practice, the U-label form of a TLD name is restricted by applying rules analogous to those already imposed on ASCII TLD DNS-Labels. ASCII TLDs have the following syntax: TLD = 1*63(ALPHA) IDNA TLDs obey the following requirements: 1. the DNS-Label is a valid A-Label according to [RFC5890]; 2. the derived property value of all code points, as defined by [RFC5890], is PVALID; 3. the general category of all code points, is one of { Ll, Lo, Lm, Mn }. [... etc etc ...] Tony. -- f.anthony.n.finch <d...@dotat.at> http://dotat.at/ HUMBER THAMES DOVER WIGHT PORTLAND: NORTH BACKING WEST OR NORTHWEST, 5 TO 7, DECREASING 4 OR 5, OCCASIONALLY 6 LATER IN HUMBER AND THAMES. MODERATE OR ROUGH. RAIN THEN FAIR. GOOD. _______________________________________________ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop