Re: [GENERAL] Hostnames, IDNs, Punycode and Unicode Case Folding

2015-01-05 Thread Jonathan Vanasco
On Dec 29, 2014, at 5:36 PM, Mike Cardwell wrote: > So the system I've settled with is storing both the originally supplied > representation, *and* the lower cased punycode encoded version in a separate > column for indexing/search. This seems really hackish to me though. I actually do the same

Re: [GENERAL] Hostnames, IDNs, Punycode and Unicode Case Folding

2014-12-29 Thread Andrew Sullivan
On Tue, Dec 30, 2014 at 12:53:42AM +, Mike Cardwell wrote: > > Hmm. How did you get the original, then? > > The "original" in my case, is the hostname which the end user supplied. > Essentially, when I display it back to them, I want to make sure it is > displayed the same way that it was whe

Re: [GENERAL] Hostnames, IDNs, Punycode and Unicode Case Folding

2014-12-29 Thread Mike Cardwell
* on the Mon, Dec 29, 2014 at 07:25:59PM -0500, Andrew Sullivan wrote: >> This is exactly the same method that we commonly use for performing case >> insensitive text searches using lower() indexes. > > Hmm. How did you get the original, then? The "original" in my case, is the hostname which the

Re: [GENERAL] Hostnames, IDNs, Punycode and Unicode Case Folding

2014-12-29 Thread Mike Cardwell
* on the Mon, Dec 29, 2014 at 07:22:21PM -0500, Andrew Sullivan wrote: >> can't just encode it with punycode and then store the ascii result. For >> example, >> these two are the same hostnames thanks to unicode case folding [1]: >> >> tesst.ëxämplé.com >> teßt.ëxämplé.com > > Well, in IDNA

Re: [GENERAL] Hostnames, IDNs, Punycode and Unicode Case Folding

2014-12-29 Thread Mike Cardwell
* on the Mon, Dec 29, 2014 at 05:01:44PM -0700, David G Johnston wrote: >> CREATE UNIQUE INDEX hostnames_hostname_key ON hostnames >> (lower(punycode_encode(hostname))); >> >> That would prevent adding more than one representation for the same >> hostname >> to the column. > > Except two differen

Re: [GENERAL] Hostnames, IDNs, Punycode and Unicode Case Folding

2014-12-29 Thread Andrew Sullivan
On Tue, Dec 30, 2014 at 12:18:58AM +, Mike Cardwell wrote: > > This is exactly the same method that we commonly use for performing case > insensitive text searches using lower() indexes. Hmm. How did you get the original, then? If you have the original Unicode version, why don't you switch

Re: [GENERAL] Hostnames, IDNs, Punycode and Unicode Case Folding

2014-12-29 Thread Andrew Sullivan
[resending, because somehow this got routed through my work address the first time.] Hi, I didn't have time to write a short note, so I wrote a long one instead. Sorry. On Mon, Dec 29, 2014 at 10:36:42PM +, Mike Cardwell wrote: > can't just encode it with punycode and then store the ascii

Re: [GENERAL] Hostnames, IDNs, Punycode and Unicode Case Folding

2014-12-29 Thread Mike Cardwell
* on the Mon, Dec 29, 2014 at 07:00:05PM -0500, Andrew Sullivan wrote: >> CREATE UNIQUE INDEX hostnames_hostname_key ON hostnames >> (lower(punycode_encode(hostname))); > > This wouldn't work to get the original back if oyu have any IDNA200 > data, because puncode-encoding the UTF-8 under IDNA200

Re: [GENERAL] Hostnames, IDNs, Punycode and Unicode Case Folding

2014-12-29 Thread Tom Lane
David G Johnston writes: > It was also mentioned that using the Perl encoding function was > non-performant; which is why caching the data into a memoization table has > value. I find it hard to believe that the standards folk would have chosen a hostname encoding method that was so inefficient t

Re: [GENERAL] Hostnames, IDNs, Punycode and Unicode Case Folding

2014-12-29 Thread David G Johnston
On Mon, Dec 29, 2014 at 4:51 PM, Mike Cardwell [via PostgreSQL] < ml-node+s1045698n583236...@n5.nabble.com> wrote: > * on the Mon, Dec 29, 2014 at 04:23:22PM -0700, David G Johnston wrote: > > >>>WHERE hostname='nißan.com ' > >>> > >> > >> _IF_ Postgres had a punycode functi

Re: [GENERAL] Hostnames, IDNs, Punycode and Unicode Case Folding

2014-12-29 Thread Andrew Sullivan
On Mon, Dec 29, 2014 at 11:50:54PM +, Mike Cardwell wrote: > > CREATE UNIQUE INDEX hostnames_hostname_key ON hostnames > (lower(punycode_encode(hostname))); This wouldn't work to get the original back if oyu have any IDNA2003 data, because puncode-encoding the UTF-8 under IDNA2003 and then p

Re: [GENERAL] Hostnames, IDNs, Punycode and Unicode Case Folding

2014-12-29 Thread Mike Cardwell
* on the Mon, Dec 29, 2014 at 04:23:22PM -0700, David G Johnston wrote: >>>WHERE hostname='nißan.com' >>> >> >> _IF_ Postgres had a punycode function, then you could use: >> WHERE punycode(hostname) = punycode('nißan.com') > > If the OP wraps what he is doing up into a function that is what

Re: [GENERAL] Hostnames, IDNs, Punycode and Unicode Case Folding

2014-12-29 Thread David G Johnston
Andy Colson wrote > On 12/29/2014 4:36 PM, Mike Cardwell wrote: >> I'd like to store hostnames in a postgres database and I want to fully >> support >> IDNs (Internationalised Domain Names) >> >> I want to be able to recover the original representation of the hostname, >> so I >> can't just encode

Re: [GENERAL] Hostnames, IDNs, Punycode and Unicode Case Folding

2014-12-29 Thread Andy Colson
On 12/29/2014 4:36 PM, Mike Cardwell wrote: I'd like to store hostnames in a postgres database and I want to fully support IDNs (Internationalised Domain Names) I want to be able to recover the original representation of the hostname, so I can't just encode it with punycode and then store the as

[GENERAL] Hostnames, IDNs, Punycode and Unicode Case Folding

2014-12-29 Thread Mike Cardwell
I'd like to store hostnames in a postgres database and I want to fully support IDNs (Internationalised Domain Names) I want to be able to recover the original representation of the hostname, so I can't just encode it with punycode and then store the ascii result. For example, these two are the sam