On 20 June 2012 17:41, Tom Lane <t...@sss.pgh.pa.us> wrote: > Peter Geoghegan <pe...@2ndquadrant.com> writes: >> No, I'm suggesting it would probably be at least a bit of a win here >> to cache the constant, and only have to do a strxfrm() + strcmp() per >> comparison. > > Um, have you got any hard evidence to support that notion? The > traditional advice is that strcoll is faster than using strxfrm unless > the same strings are to be compared repeatedly. I'm not convinced that > saving the strxfrm on just one side will swing the balance.
Fair enough, but I'd suggest that that traditional advice assumes that strcoll()'s space-efficiency and general avoidance of dynamic allocation is an important factor, which, for us, it clearly isn't, since we can re-use a single buffer in the manner of Robert's text sortsupport patch for each and every per-tuple strxfrm() blob (when traversing an index). This is a direct quote from glibc's strcoll_l.c : Perform the first pass over the string and while doing this find and store the weights for each character. Since we want this to be as fast as possible we are using `alloca' to store the temporary values. But since there is no limit on the length of the string we have to use `malloc' if the string is too long. We should be very conservative here. Here, alloca is used to allocate space in a stack frame. I believe that this is an entirely inappropriate trade-off for Postgres to be making. strxfrm(), in constrast, leaves buffer sizing and management up to the caller. That has to be a big part of the problem here. >> The fact is that this is likely to be a fairly significant >> performance win, because strxfrm() is quite simply the way you're >> supposed to do collation-aware sorting, and is documented as such. For >> that reason, C standard library implementations should not be expected >> to emphasize its performance - they assume that you're using strxfrm() >> + their highly optimised strcmp() > > Have you got any evidence in support of this claim, or is it just > wishful thinking about what's likely to be inside libc? According to the single-unix specification's strcoll() documentation, "The strxfrm() and strcmp() functions should be used for sorting large lists". If that isn't convincing enough for you, there is the fact that glibc's strcmp() is clearly highly optimised for each and every architecture, and that we are currently throwing away an extra strcmp() in the event of strcoll() equality. > I'd also note that any comparisons you may have seen about this are certainly > not > accounting for the effects of data bloat from strxfrm (ie, possible > spill to disk, more merge passes, etc). What about the fact that strcoll() may be repeatedly allocating and freeing memory per comparison? The blobs really aren't that much larger than the strings to be sorted, which are typically quite short. > In any case, if you have to redefine the meaning of equality in order > to justify a performance patch, I'm prepared to walk away at the start. The advantage of my proposed implementation is precisely that I won't have to redefine the meaning of equality, and that only the text datatype will have to care about equivalency, so you can just skip over an explanation of equivalency for most audiences. If you feel that strongly about it, and I have no possible hope of getting this accepted, I'm glad that I know now rather than after completing a significant amount of work on this. I would like to hear other people's opinions before I drop it though. > The range of likely performance costs/benefits across different locales > and different implementations is so wide that if you can't show it to be > a win even with the strcmp tiebreaker, it's not likely to be a reliable > win without that. glibc is the implementation that really matters. My test-case used en_US.UTF-8 as its locale, which has to be one of the least stressful to strcoll() - I probably could have shown a larger improvement just by selecting a locale that was known to have to make more passes, like, say, hu_HU.UTF-8. I expect to have access to a 16 core server next week, which Bull have made available to me. Maybe I'll get some interesting performance numbers from it. The reason that I don't want to use the blob with original string hack is because it's ugly, space-inefficient, unnecessary and objectively incorrect, since it forces us to violate the conformance requirement C9 of Unicode 3.0, marginal though that may be. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers