Re: Degression (PG10 > 11, 12 or 13)

Johannes Graën Fri, 28 May 2021 16:20:15 -0700

On 28/05/2021 18.24, Tom Lane wrote:
> Pavel Stehule <pavel.steh...@gmail.com> writes:
>> pá 28. 5. 2021 v 16:12 odesílatel Johannes Graën <johan...@selfnet.de>
>> napsal:
>>> When trying to upgrade an existing database from version 10 to 13 I came
>>> across a degression in some existing code used by clients. Further
>>> investigations showed that performance measures are similar in versions
>>> 11 to 13, while in the original database on version 10 it's around 100
>>> times faster. I could boil it down to perl functions used for sorting.
> 
>> Are you sure, so all databases use the same encoding and same locale?
> 
> Yeah ... I don't know too much about the performance of Perl regexps,
> but it'd be plausible that it varies depending on locale setting.

It probably wasn't Perl at all. Thanks to the hint I checked the initial
database again and, while encoding and ctype are set to UTF8, the
collation is C, which makes a huge difference:

... order by tab(attr) => Execution Time: 51429.875 ms
... order by tab(attr collate "C") => Execution Time: 537.757 ms

in the original database. Any other version yields similar times.

On 28/05/2021 17.47, Tomas Vondra wrote:
> That function is pretty much just a sequence of ~120 regular
> expressions, doing something similar to unaccent(). I wonder if we're
> calling the function much more often, perhaps due to some changes in the
> sort code (the function is immutable, but that does not guarantee it's
> called just once).

> Also, maybe try materializing the function results before doing the
> sort, perhaps like this:
>
> SELECT * FROM (select attr, func(attr) as fattr from tab offset 0) foo
> ORDER BY fattr;

I was expecting it to be called once in the process of sorting, and it
seems that this is actually true for all version and different
collations, but sorting for a collation that is not C requires
considerable more resources (that still needs to be shown for other
collations, but I see the overhead of having more or less complex
definitions vs. just comparing numbers).

That being said, I would have used unaccent or, if that wasn't an
option, maybe have those values calculated by a trigger function when
the corresponding rows are changed. But I don't control the code.

Now what keeps me wondering is how the sorting works internally and if
we could conclude that using the C collation in order expressions and
indexes is a general way to speed up queries - if the actual order is of
less importance.

Best
  Johannes

Re: Degression (PG10 > 11, 12 or 13)

Reply via email to