Generally I like this, and it is a much more thorough treatment than what I was working on. I think it needs a little wordsmithing, and I'm not clear if Tony's intention in the last section is to indicate that there should be a protocol restriction on non-IDN labels (which obviously I oppose). Otherwise I think this is definitely a step in the right direction.

Doug


On 11/26/2010 08:53, Tony Finch wrote:
On Thu, 25 Nov 2010, Andrew Sullivan wrote:

So what aside from [...] do you want?

Something like this:


Abstract

This memo clarifies the syntax of top-level domain labels in the domain
name system as specified in RFC 1123, and how this syntax relates to the
allocationn policy for TLDs. It describes the current

[...blah...]

Background

[RFC0952] defines a host name in the first paragraph under "ASSUMPTIONS",
as follows:

       A "name" ... is a text string up to 24 characters drawn from the
       alphabet (A-Z), digits (0-9), minus sign (-), and period (.).
       Note that periods are only allowed when they serve to delimit
       components of "domain style names".  (See RFC-921, "Domain Name
       System Implementation Schedule", for background).  No blank or
       space characters are permitted as part of a name.  No distinction
       is made between upper and lower case.  The first character must be
       an alpha character.  The last character must not be a minus sign
       or period.

[RFC1123] section 2.1 reaffirms this definition, but makes one change
to the syntax:

       The syntax of a legal Internet host name was specified in RFC-952
       [DNS:4].  One aspect of host name syntax is hereby changed: the
       restriction on the first character is relaxed to allow either a
       letter or a digit.  Host software MUST support this more liberal
       syntax.

In addition, the DISCUSSION in Section 2.1 says:

       'However, a valid host name can never have the dotted-decimal form
       #.#.#.#, since at least the highest-level component label will be
       alphabetic.'  [Section 2.1]

Some implementers may have understood the above phrase "will be
alphabetic" to be a protocol restriction. This is incorrect. It is in fact
a description of the TLD allocation policy at that time.

The TLD allocation policy has since had two significant syntactic changes.

On 16 November 2000 the first long TLD (.museum) was allocated, and it
was added to the root zone in June 2001.

In October 2007, the first IDNA test TLDs were added to the root zone.
These were the first TLDs with non-alphabetic characters. ICANN approved a
policy for allocating IDNA ccTLDs in October 2009 and the first production
IDNA TLDs were added to the root zone in January 2010.

Deployed software that checks DNS top-level labels for conformance with
past allocation policy is likely to reject domain names allocated after a
policy change.


Syntax of TLD labels - protocol level

All labels of a domain name have the same syntax. The syntax of TLDs is
not specially restricted at the protocol level.

    domain  = *(label ".") label ["."]

    label   = let-dig [ldh-str]

    let-dig = ALPHA / DIGIT

    ldh-str = *( ALPHA / DIGIT / "-" ) let-dig

A label can be up to 63 characters long. A domain name can be up to 255
characters long.

A domain name as a whole shall not match the dotted quad representation of
an IPv4 address.

    IPv4    = 3(digits ".")

    digits  = 1*DIGIT


Syntax of TLD labels - allocation policy

The syntax of allocated TLDs is restricted in order to ensure that no
domain name can match an IPv4 dotted quad, and for compatibility with past
practice and deployed software. The policy is subject to change by ICANN.
This section describes the syntax of domain names permitted by the current
allocation policy.

IDNS encodes Unicode strings within the syntax permitted for domain name
labels. The Unicode string used by applications is known as a U-Label;
its corresponding encoding in the DNS is known as an A-Label. The terms
A-Label and U-Label are used in this document as defined in [RFC5890].
Valid A-Labels always contain non-alphabetic characters.

In order to accommodate the wish to express TLD names in scripts other
than the ASCII subset of Latin, it is necessary to allow non-alphabetic
characters in the corresponding TLD DNS-Labels.  Following past practice,
the U-label form of a TLD name is restricted by applying rules analogous
to those already imposed on ASCII TLD DNS-Labels.

ASCII TLDs have the following syntax:

    TLD = 1*63(ALPHA)

IDNA TLDs obey the following requirements:

    1.  the DNS-Label is a valid A-Label according to [RFC5890];

    2.  the derived property value of all code points, as defined by
        [RFC5890], is PVALID;

    3.  the general category of all code points, is one of { Ll, Lo, Lm, Mn }.


[... etc etc ...]

Tony.



--

        Nothin' ever doesn't change, but nothin' changes much.
                        -- OK Go

        Breadth of IT experience, and depth of knowledge in the DNS.
        Yours for the right price.  :)  http://SupersetSolutions.com/

_______________________________________________
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Reply via email to