On 28/05/2021 18.24, Tom Lane wrote: > Pavel Stehule <pavel.steh...@gmail.com> writes: >> pá 28. 5. 2021 v 16:12 odesílatel Johannes Graën <johan...@selfnet.de> >> napsal: >>> When trying to upgrade an existing database from version 10 to 13 I came >>> across a degression in some existing code used by clients. Further >>> investigations showed that performance measures are similar in versions >>> 11 to 13, while in the original database on version 10 it's around 100 >>> times faster. I could boil it down to perl functions used for sorting. > >> Are you sure, so all databases use the same encoding and same locale? > > Yeah ... I don't know too much about the performance of Perl regexps, > but it'd be plausible that it varies depending on locale setting.
It probably wasn't Perl at all. Thanks to the hint I checked the initial database again and, while encoding and ctype are set to UTF8, the collation is C, which makes a huge difference: ... order by tab(attr) => Execution Time: 51429.875 ms ... order by tab(attr collate "C") => Execution Time: 537.757 ms in the original database. Any other version yields similar times. On 28/05/2021 17.47, Tomas Vondra wrote: > That function is pretty much just a sequence of ~120 regular > expressions, doing something similar to unaccent(). I wonder if we're > calling the function much more often, perhaps due to some changes in the > sort code (the function is immutable, but that does not guarantee it's > called just once). > Also, maybe try materializing the function results before doing the > sort, perhaps like this: > > SELECT * FROM (select attr, func(attr) as fattr from tab offset 0) foo > ORDER BY fattr; I was expecting it to be called once in the process of sorting, and it seems that this is actually true for all version and different collations, but sorting for a collation that is not C requires considerable more resources (that still needs to be shown for other collations, but I see the overhead of having more or less complex definitions vs. just comparing numbers). That being said, I would have used unaccent or, if that wasn't an option, maybe have those values calculated by a trigger function when the corresponding rows are changed. But I don't control the code. Now what keeps me wondering is how the sorting works internally and if we could conclude that using the C collation in order expressions and indexes is a general way to speed up queries - if the actual order is of less importance. Best Johannes