On Fri, 2023-12-15 at 16:48 -0800, Jeremy Schneider wrote:
> This goes back to my other thread (which sadly got very little
> discussion): PosgreSQL really needs to be safe by /default/
Doesn't a built-in provider help create a safer option?
The built-in provider's version of Unicode will be cons
On Sat, Dec 16, 2023 at 1:48 PM Jeremy Schneider
wrote:
> On 12/14/23 7:12 AM, Jeff Davis wrote:
> > The concern over unassigned code points is misplaced. The application
> > may be aware of newly-assigned code points, and there's no way they
> > will be mapped correctly in Postgres if the provide
On 12/14/23 7:12 AM, Jeff Davis wrote:
> The concern over unassigned code points is misplaced. The application
> may be aware of newly-assigned code points, and there's no way they
> will be mapped correctly in Postgres if the provider is not aware of
> those code points. The user can either procee
On Tue, 2023-12-12 at 14:35 -0800, Jeremy Schneider wrote:
> Is someone able to test out upper & lower functions on U+A7BA ...
> U+A7BF
> across a few libs/versions?
Those code points are unassigned in Unicode 11.0 and assigned in
Unicode 12.0.
In ICU 63-2 (based on Unicode 11.0), they just get m
On 12/12/23 1:39 PM, Jeff Davis wrote:
> On Sun, 2023-12-10 at 10:39 +1300, Thomas Munro wrote:
>> Unless you also
>> implement built-in case mapping, you'd still have to call libc or ICU
>> for that, right?
>
> We can do built-in case mapping, see:
>
> https://postgr.es/m/ff4c2f2f9c8fc7ca27c1c24
On Sun, 2023-12-10 at 10:39 +1300, Thomas Munro wrote:
>
> How would you specify what you want?
One proposal would be to have a builtin collation provider:
https://postgr.es/m/9d63548c4d86b0f820e1ff15a83f93ed9ded4543.ca...@j-davis.com
I don't think there are very many ctype options, but they c
On Sat, Dec 2, 2023 at 9:49 AM Jeff Davis wrote:
> Your definition is too wide in my opinion, because it mixes together
> different sources of variation that are best left separate:
> a. region/language
> b. technical requirements
> c. versioning
> d. implementation variance
>
> (a) is not a t
On Thu, Nov 30, 2023 at 1:23 PM Jeff Davis wrote:
> Character classification is not localized at all in libc or ICU as far
> as I can tell.
Really? POSIX isalpha()/isalpha_l() and friends clearly depend on a
locale. See eg d522b05c for a case where that broke something.
Perhaps you mean glibc w
Jeff Davis writes:
> The problem seems to be confusion between pg_wchar and a unicode code
> point in pg_wc_isalpha() and related functions.
Yeah, that's an ancient sore spot: we don't really know what the
representation of wchar is. We assume it's Unicode code points
for UTF8 locales, but libc
The following query:
SELECT U&'\017D' ~ '[[:alpha:]]' collate "en-US-x-icu";
returns true if the server encoding is UTF8, and false if the server
encoding is LATIN9. That's a bug -- any behavior involving ICU should
be encoding-independent.
The problem seems to be confusion between pg_wchar
10 matches
Mail list logo