05.05.2025 03:22, Jelte Fennema-Nio wrote:

[...]


I think there are a few things at play here why that did not happen in
Bruce his initial draft:
1. I personally think the requirement that Bruce uses for perf
improvements to make it into the changelog is too strict (see my
previous email for details)
2. Bruce is only a single person, and as such cannot read all emails
on pgsql-hackers, so he relies only on commit messages to determine
impact for release notes. The commit message for your change did not
include any details on the perf improvements that could be expected.
3. After skimming the email thread[1], it's hard for me to understand
where these perf numbers came from. And the first few results only
mention casefold performance i.e. they call the results: "casefold()
test." So, it's unclear what perf gains are expected for the other
functions mentioned in the email subject.

I totally agree with you, it's hard to keep track of everything. It's
also a lot of work to read every commit and understand its essence.

I have no complaints, I'm just trying to understand the rules of getting
into Release Notes.
The rules, as it turns out, are not simple. But they are rules, even
though I don't agree with them, I accept them.


As for how to improve these:
1 is discussed/complained about basically every year whenever release
notes are created. I don't think we can do any better than having
those discussions. Unless someone else wants to start owning writing
the release notes, or we somehow share the burden, e.g. by having the
person that commits also write a release note entry.
2 can be improved by people including perf numbers in their commit
messages. The second way to improve is by sending feedback on the
release notes if things are missed, like you did.
3 is something you could help with I think. It would have been helpful
if you had shared the script/commands you used to get these
performance numbers. That way I could reproduce them myself. Also if
you had included some perf numbers for lower() and upper() that would
have been great too, as those are (currently) much more commonly used
than casefold(). NOTE: I might have missed the script or be wrong
about this some other way, since Jeff did not require this for
committing it. If so, please disregard.

[1]: 
https://www.postgresql.org/message-id/flat/7cac7e66-9a3b-4e3f-a997-42aa0c401f80%40gmail.com

A bit about what those numbers are, in the discussion for the patch I
described how I got those numbers.

The point is that functions lower(), upper(), casefold() have one common
algorithm, the difference is in what table for mapping we pass to this
algorithm.
Therefore, there is no sense to measure the performance of each function
separately. Any of these functions will show the performance of the
algorithm of getting codepoints from tables in the same way.

Therefore, we can take lower() or upper() or casefold() and get the
result of Unicode table mapping algorithm (that's where I changed the
code, the algorithm).
I can measure everything, but there is no sense in it.
Here are the measurements made at the moment of patch discussion:

For each test, a sql file was created for pgbench. The data description
is present.

casefold() test.

ASCII:
Repeated characters (700kb) in the range from 0x20 to 0x7E.
Patch: tps = 278.449809
Without: tps = 266.526168

Cyrillic:
Repeated characters (1MB) in the range from 0x0410 to 0x042F.
Patch: tps = 86.740680
Without: tps = 49.373695

Unicode:
A query consisting of all Unicode characters from 0xA0 to 0x2FA1D
(excluding 0xD800..0xDFFF).
Patch: tps = 102.221092
Without: tps = 92.477798

* Ubuntu 24.04.1 (Intel(R) Xeon(R) Gold 6140) (gcc version 13.3.0)

ASCII:
Repeated characters (700kb) in the range from 0x20 to 0x7E.
Patch: tps = 146.712371
Without: tps = 120.794307

Cyrillic:
Repeated characters (1MB) in the range from 0x0410 to 0x042F.
Patch: tps = 44.499567
Without: tps = 24.237999

Unicode:
A query consisting of all Unicode characters from 0xA0 to 0x2FA1D
(excluding 0xD800..0xDFFF).
Patch: tps = 54.354833
Without: tps = 46.556531


I will continue to improve Postgres.

Please do, your work is very much appreciated!

I thought it was worthy of a separate line in the Release Notes.
As I think, it is not so easy to increase the performance for Unicode.
So many users use lower() and upper(), and it would be nice to know that
work is being done to improve performance in this area.
But again, I'm new to the Postgres community and I'm getting to know
what's going on here and how it works.

Thank you for paying attention to it!

--
Regards,
Alexander Borisov


Reply via email to