> 27 nov 2014 kl. 10:15 skrev Dave Page <dp...@pgadmin.org>: > > > > On Thu, Nov 27, 2014 at 9:09 AM, Jakob Egger <ja...@eggerapps.at> wrote: > Am 26.11.2014 um 17:46 schrieb Geoff Montee <geoff.mon...@gmail.com>: > > This topic reminds me of a thread from a couple months ago: > > > > http://www.postgresql.org/message-id/f8268db6-b50f-429f-8289-da8ffa5f2...@tripadvisor.com > > > > It sounds like adding ICU support to core may also allow for adding > > collation versioning to indexes. > > Reading through this thread it becomes clear to me that adding support for > ICU is more important than I thought, and the only problem is that no one has > yet volunteered for it :) > > I've started looking through the PostgreSQL source and Palle's patch to > estimate what needs to be done. > > MINIMUM TODO > ============ > > * Add support for per-column collations in varstr_comp() in varlena.c. > Currently the patch creates a single ICU collator for the default collation > and stores it in a static variable. We would need to change this to create > collators for each collation and store them in a hash table similar to > pg_newlocale_from_collation() / lookup_collation_cache() > > * There's a new feature in trunk for faster sorting using SortSupport, so we > would also need to also patch bttextfastcmp_locale() in varlena.c > > These two changes would allow using ICU for collation. This has two major > advantages: > 1) Systems with broken strcoll like OS X and FreeBSD can take advantage of > ICU to offer proper text sorting > 2) You can link with a specific version of ICU to avoid index corruption and > duplicate keys caused by changing implementations of the glibc strcoll > function > > > NEXT STEPS: Support for more collations > ======================================= > > ICU offers a lot more collations than the OS. For example, besides "de_CH" it > also offers "de_CH@collation=phonebook". Adding support for these is a bit > more involved. > > * initdb would need to be extended to also look for collations offered by ICU > and add them to the pg_collation catalog. > > * A special case for LC_COLLATE must be added to check_locale() in the > backend, get_canonical_locale_name() in pg_upgrade, check_locale_name() in > initdb to support collations provided by ICU > > * pg_perm_setlocale() must get a special case to handle ICU collations > > * the local handling code in pgperl must be modified (when using a ICU > collation as default collation, we must decide what collation to send to perl) > > * convert_string_datum() in selfuncs.c could be patched to use ICU instead of > strxfrm. However, as far as I understand, this is not absolutely required as > this is only used by the query planner and would in the worst case prevent > some optimisation in corner cases > > These changes would probably have an even bigger impact, because then people > would no longer be limited to the collations supported by the locales > installed on their OS. > > NEXT STEPS: Collation versioning in indices > =========================================== > > Since ICU provides reliable versioning of collations, this would allow us to > finally prevent index corruption caused by changing implementations of > strcoll. I haven't looked at this in detail, but I assume that this would be > a small change with potentially big impact. > > Ideally, PostgreSQL would detect when the collation is a different version > than the one used to create the index, and stop using the index until it is > rebuilt. > > > I'll take a shot at the MINIMUM TODO as outlined above. > > > We've already included ICU support in our Postgres Plus Advanced Server > product. Before you spend too much time on this, give me a few days to see if > we can get that change contributed back. The people I need to speak to are > OOO for Thanksgiving at the moment though, so it may be a few days. > > --
Hi, Just poking this old thread again. What happened here, is anyone putting work into this area at the moment? Palle
signature.asc
Description: Message signed with OpenPGP using GPGMail