Re: [PATCH] Expand character set for ltree labels

2023-01-06 Thread Andrew Dunstan
On 2023-01-06 Fr 11:29, Tom Lane wrote: > Andrew Dunstan writes: >> Regardless of the punycode issue, allowing hyphens in ltree labels seems >> quite reasonable. I haven't reviewed the patch yet, but if it's OK I >> intend to commit it. > No objection to allowing hyphens. If we're going to incr

Re: [PATCH] Expand character set for ltree labels

2023-01-06 Thread Tom Lane
Andrew Dunstan writes: > On 2022-10-05 We 18:05, Garen Torikian wrote: >> Therefore I have updated the patch with three much smaller changes: >> >> * Support for `-` in addition to `_` >> * Expanding the limit to 512 chars (from the existing 256); again it's >> not uncommon for non-English string

Re: [PATCH] Expand character set for ltree labels

2023-01-06 Thread Andrew Dunstan
On 2022-10-05 We 18:05, Garen Torikian wrote: > After digging into it, you are completely correct. I had to do a bit > more reading to understand the relationships between UTF-8 and wchar, > but ultimately the existing locale support works for my use case. > > Therefore I have updated the patch w

Re: [PATCH] Expand character set for ltree labels

2023-01-05 Thread vignesh C
On Wed, 4 Jan 2023 at 00:27, Garen Torikian wrote: > > Sure. Rebased onto HEAD. > There is one more merge conflict, please post a rebased patch: === Applying patches on top of PostgreSQL commit ID eb5ad4ff05fd382ac98cab60b82f7fd6ce4cfeb8 === === applying patch ./0003-Expand-character-set-for-ltr

Re: [PATCH] Expand character set for ltree labels

2023-01-03 Thread Garen Torikian
Sure. Rebased onto HEAD. On Tue, Jan 3, 2023 at 7:27 AM vignesh C wrote: > On Thu, 6 Oct 2022 at 03:35, Garen Torikian wrote: > > > > After digging into it, you are completely correct. I had to do a bit > more reading to understand the relationships between UTF-8 and wchar, but > ultimately the

Re: [PATCH] Expand character set for ltree labels

2023-01-03 Thread vignesh C
On Thu, 6 Oct 2022 at 03:35, Garen Torikian wrote: > > After digging into it, you are completely correct. I had to do a bit more > reading to understand the relationships between UTF-8 and wchar, but > ultimately the existing locale support works for my use case. > > Therefore I have updated the

Re: [PATCH] Expand character set for ltree labels

2022-11-03 Thread Ian Lawrence Barwick
2022年10月6日(木) 7:05 Garen Torikian : > > After digging into it, you are completely correct. I had to do a bit more > reading to understand the relationships between UTF-8 and wchar, but > ultimately the existing locale support works for my use case. > > Therefore I have updated the patch with thre

Re: [PATCH] Expand character set for ltree labels

2022-10-05 Thread Garen Torikian
After digging into it, you are completely correct. I had to do a bit more reading to understand the relationships between UTF-8 and wchar, but ultimately the existing locale support works for my use case. Therefore I have updated the patch with three much smaller changes: * Support for `-` in add

Re: [PATCH] Expand character set for ltree labels

2022-10-05 Thread Tom Lane
Garen Torikian writes: >> Perhaps the docs are a bit unclear about that, but it's not >> restricted to ASCII alphanumerics. AFAICS the code will accept >> whatever iswalpha() and iswdigit() will accept in the database's >> default locale. > Sorry but I don't think that is correct. Here is the si

Re: [PATCH] Expand character set for ltree labels

2022-10-05 Thread Garen Torikian
Hi Tom, > Perhaps the docs are a bit unclear about that, but it's not > restricted to ASCII alphanumerics. AFAICS the code will accept > whatever iswalpha() and iswdigit() will accept in the database's > default locale. Sorry but I don't think that is correct. Here is the single definition check

Re: [PATCH] Expand character set for ltree labels

2022-10-05 Thread Tom Lane
Garen Torikian writes: > I am submitting a patch to expand the label requirements for ltree. > The current format is restricted to alphanumeric characters, plus _. > Unfortunately, for non-English labels, this set is insufficient. Hm? Perhaps the docs are a bit unclear about that, but it's not

Re: [PATCH] Expand character set for ltree labels

2022-10-04 Thread Garen Torikian
No, not quite. Valid Punycode characters are `[A-Za-z0-9-]`. This proposal includes `-`, as well as `#` and `;` for HTML entities. I double-checked the RFC to see the valid Punycode characters and the set above is indeed correct: https://datatracker.ietf.org/doc/html/draft-ietf-idn-punycode-02#se

Re: [PATCH] Expand character set for ltree labels

2022-10-04 Thread Nathan Bossart
On Tue, Oct 04, 2022 at 12:54:46PM -0400, Garen Torikian wrote: > The punycode range of characters is the exact same set as the existing > ltree range, with the addition of a hyphen (-). Within this system, any > human language can be encoded using just A-Za-z0-9-. IIUC ASCII characters like '!' a