Re: [HACKERS] [pgsql-packagers] Palle Girgensohn's ICU patch

Palle Girgensohn Sun, 19 Apr 2015 04:47:31 -0700

> 27 nov 2014 kl. 10:15 skrev Dave Page <[email protected]>:
> 
> 
> 
> On Thu, Nov 27, 2014 at 9:09 AM, Jakob Egger <[email protected]> wrote:
> Am 26.11.2014 um 17:46 schrieb Geoff Montee <[email protected]>:
> > This topic reminds me of a thread from a couple months ago:
> >
> > http://www.postgresql.org/message-id/[email protected]
> >
> > It sounds like adding ICU support to core may also allow for adding
> > collation versioning to indexes.
> 
> Reading through this thread it becomes clear to me that adding support for 
> ICU is more important than I thought, and the only problem is that no one has 
> yet volunteered for it :)
> 
> I've started looking through the PostgreSQL source and Palle's patch to 
> estimate what needs to be done.
> 
> MINIMUM TODO
> ============
> 
> * Add support for per-column collations in varstr_comp() in varlena.c. 
> Currently the patch creates a single ICU collator for the default collation 
> and stores it in a static variable. We would need to change this to create 
> collators for each collation and store them in a hash table similar to 
> pg_newlocale_from_collation() / lookup_collation_cache()
> 
> * There's a new feature in trunk for faster sorting using SortSupport, so we 
> would also need to also patch bttextfastcmp_locale() in varlena.c
> 
> These two changes would allow using ICU for collation. This has two major 
> advantages:
> 1) Systems with broken strcoll like OS X and FreeBSD can take advantage of 
> ICU to offer proper text sorting
> 2) You can link with a specific version of ICU to avoid index corruption and 
> duplicate keys caused by changing implementations of the glibc strcoll 
> function
> 
> 
> NEXT STEPS: Support for more collations
> =======================================
> 
> ICU offers a lot more collations than the OS. For example, besides "de_CH" it 
> also offers "de_CH@collation=phonebook". Adding support for these is a bit 
> more involved.
> 
> * initdb would need to be extended to also look for collations offered by ICU 
> and add them to the pg_collation catalog.
> 
> * A special case for LC_COLLATE must be added to check_locale() in the 
> backend, get_canonical_locale_name() in pg_upgrade, check_locale_name() in 
> initdb to support collations provided by ICU
> 
> * pg_perm_setlocale() must get a special case to handle ICU collations
> 
> * the local handling code in pgperl must be modified (when using a ICU 
> collation as default collation, we must decide what collation to send to perl)
> 
> * convert_string_datum() in selfuncs.c could be patched to use ICU instead of 
> strxfrm. However, as far as I understand, this is not absolutely required as 
> this is only used by the query planner and would in the worst case prevent 
> some optimisation in corner cases
> 
> These changes would probably have an even bigger impact, because then people 
> would no longer be limited to the collations supported by the locales 
> installed on their OS.
> 
> NEXT STEPS: Collation versioning in indices
> ===========================================
> 
> Since ICU provides reliable versioning of collations, this would allow us to 
> finally prevent index corruption caused by changing implementations of 
> strcoll. I haven't looked at this in detail, but I assume that this would be 
> a small change with potentially big impact.
> 
> Ideally, PostgreSQL would detect when the collation is a different version 
> than the one used to create the index, and stop using the index until it is 
> rebuilt.
> 
> 
> I'll take a shot at the MINIMUM TODO as outlined above.
> 
> 
> We've already included ICU support in our Postgres Plus Advanced Server 
> product. Before you spend too much time on this, give me a few days to see if 
> we can get that change contributed back. The people I need to speak to are 
> OOO for Thanksgiving at the moment though, so it may be a few days.
> 
> --



Hi,

Just poking this old thread again. What happened here, is anyone putting work 
into this area at the moment?

Palle

signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [HACKERS] [pgsql-packagers] Palle Girgensohn's ICU patch

Reply via email to