--On Tuesday, February 7, 2023 19:31 -0700 Peter Saint-Andre
<stpe...@stpeter.im> wrote:

>...
>>>     2.  An "internationalized domain name", i.e., a DNS
>>>     domain name that includes at least one label containing
>>>         appropriately encoded Unicode code points outside
>>>         the traditional US-ASCII range and conforming to the
>>>         processing and validity checks specified for
>>>         "IDNA2008" in [IDNA-DEFS] and the associated
>>>         documents.  In particular, it contains at least one
>>>         U-label or A-label, but otherwise may contain any
>>>         mixture of NR-LDH labels, A-labels, or U-labels.
>> 
>> This is confusing 
> 
> What specifically do you think is confusing? We tried to get
> it right, but clearly didn't succeed...

Patrik might have other reasons, but, IMO, this would be ok if
you just stopped before "In particular".  If you think that
additional explanation along those lines is desirable (I suspect
it might be) because neither NR-LDH or A-labels contain code
points outside the traditional ASCII code range.

(You can ignore me when I kick this dead horse but, outside
Content-type registrations, the odd sentence in the third
paragraph of Section 2.1 of RFC 5890, and a few other,
IETF-specific, contexts, there is no such thing as "US-ASCII").

>> and it seems people misunderstand the big changed we went
>> through in the IETF from IDNA2003 to IDNA2008.
>> 
>> In IDNA2008 we have:
>> 
>> - Got rid of mapping, i.e. mapping like case folding is
>> something happening in application layer, and have nothing to
>> do with "domain names". - Have a 1:1 mapping between A-label
>> and U-label.
>> - In theory because of this can have A-label and U-label for
>> domain names that include by IDNA2008 not allowed Unicode
>> code points (or not allowed code point by other policy rules,
>> for example the ones a registry have).
>> 
>> I stronly recommend you have similar rules here. Separate
>> potential mapping from comparison of domain names which in
>> turn must be separated from policy for what code points are
>> allowed.
> 
> When you say "have similar rules here", are you suggesting
> that we define such rules outside the context of IDNA2008
> (e.g., in a way that would be valid for both IDNA2008 and
> IDNA2003 + UTS-46?) I think it would be a challenge to get
> that right and I'm not confident that a document about
> certificate matching is the correct place to do so.

Not possible because they are contradictory.  With IDNA2003 and
UTS#46, mapping from a native character ("Unicode") form to an
ASCII one is included in the protocol and the mapping is not
reversible, i.e., mapping from the ASCII form back to the
Unicode one may not reproduce whatever the user thought went in.
With IDNA2008, there are no mappings within the protocol
precisely in order to make an identity relationship between
U-labels and A-labels well-defined.  

Patrik's memory is probably better than mine, but my vague
recollection is that cases like certificate matching are among
those we were explicitly concerned about when we decided the
transformations had to be reversible.  Remember that there are a
few cases (very few for Latin-derived scripts) but many for some
other scripts, in which a native character form is valid for
both IDNA2003 and IDAN2008 but produces different ACE labels.

>> Ok, onwards...
>> 
>>>     If the DNS domain name portion of a reference identifier
>>>     is a traditional domain name, then matching of the
>>>     reference identifier against the presented identifier
>>>     MUST be performed by comparing the set of domain name
>>>     labels using a case-insensitive ASCII comparison, as
>>>     clarified by [DNS-CASE].  For example, WWW.Example.Com
>>>     would be lower-cased to www.example.com for comparison
>>>     purposes.  Each label MUST match in order for the names
>>>     to be considered to match, except as supplemented by the
>>>     rule about checking of wildcard labels given below.
>>> 
>>>     If the DNS domain name portion of a reference identifier
>>>     is an internationalized domain name, then the client
>>>     MUST convert any U-labels [IDNA-DEFS] in the domain name
>>>     to A-labels before checking the domain name or comparing
>>>     it with others.  In accordance with [IDNA-PROTO],
>>>     A-labels MUST be compared as case-insensitive ASCII.
>>>     Each label MUST match in order for the domain names to
>>>     be considered to match, except as supplemented by the
>>>     rule about checking of wildcard labels given below.
>> 
>> All of the above can be replaced by just saying that "A
>> domain name is to be compared using case insensitive matching
>> according to what DNS uses, and this because of this include
>> domain names that have A-Labels in them" and reference
>> IDNA2008.

> It seems that we should at least say that U-labels need to be
> converted to A-labels first, no? Or do you think that is
> implied by referencing the DNS rules (which don't allow
> U-labels natively)?

I think it is implied by that but that, if your audience is not
as familiar with DNS rules as some of us are, it would not hurt
to point that out.  And, as I think I mentioned in an earlier
note, if you start talking about "DNS rules", you need to
somehow refer to the Preferred Syntax because the DNS permits
almost any string of octets in a label.

>...
>> The differences from UTS-46 are specifically two things:
>> 
>> - UTS-46 also include rules for mapping that IDNA2008 does
>> not include. The mapping that might be performed according to
>> UTS-46 is "out of scope" for IDNA2008.

And some code points that UTS-46 (and IDNA2008) map to other
code points are themselves PVALID in IDNA2008.  That difference
is one of the main sources of "same input ('Unicode') string,
different ACE string"

>> - What code points are allowed in the ultimate domain name is
>> slightly different.
>> 
>> But, we have people using domain names (i.e. in the wild)
>> which are neither allowed in UTS-46 or IDNA2008.
>> 
>> And, then there are people using the algorithm in IDNA2008
>> applied to versions of Unicode that IETF have not approved
>> yet.
>> 
>> So, once again, not "either or". It is "a little bit of
>> everything".
 
> I see what you mean. However, that makes it more difficult to
> specify recommended behavior.

Yep.
 
> As one example, it seems possible that these differences could
> lead to someone using domain names in the wild that include
> DISALLOWED code point (e.g., because the definition of which
> code points are DISALLOWED can vary across Unicode versions).
> Thus if we say that applications MUST NOT match on DISALLOWED
> code points, behavior could be inconsistent.

Now I think I see where you were going above.  But consider the
most obvious case which is code points that are DISALLOWED for a
given version of Unicode because they are Unassigned in that
version.  If assigned in a future one, the might then become
PVALID, permanently DISALLOWED, or either of the CONTEXTx
categories.    What can you do?  IDNA2008 more or less requires
that a version of an application be tied to a specific version
of Unicode.   UTS#46, IMO sort of dances around the problem -- a
dance that is made easier by its being defined by a collection
of per-Unicode-version tables rather than rules-- but,
ultimately, essentially says that, if a label makes in into the
DNS, you are supposed to trust it.  To the extent to which your
document is ultimately about trust and security, that is not a
really good idea.  It is an especially bad idea with code points
that are undefined in one version and assigned meanings in the
next because the per-version Unicode tables on which UTS#46
relies are no more, and maybe less, constrained about what can
be said about an undefined code point when it is assigned than
IDNA2008 is.

Do I have a simple-to-apply solution to the above?  Nope.


>...
> Do you have opinions on Corey's suggestion to use P-labels
> instead of U-labels and to reference the CA/Browser Forum
> specifications?
> 
> https://mailarchive.ietf.org/arch/msg/uta/r5uJRGUzCC55XH4XSnwt
> MB2YWPA/

I've already responded at some length to that question, so refer
it to Patrik.

best,
  john

_______________________________________________
Uta mailing list
Uta@ietf.org
https://www.ietf.org/mailman/listinfo/uta

Reply via email to