On 10/26/24 12:42 AM, Jeff Davis wrote:
On Thu, 2024-10-24 at 10:05 +0200, Andreas Karlsson wrote:
Why is there no pg_locale_builtin.c?

Just that it would be a fairly small file, but I'm fine with doing
that.

I think adding such a small file would make life easier for people new to the collation part of the code base. It would be a nice symmetry between collation providers and where code for them can be found.

Why are casemap and ctype_methods not the same struct? They seem very
closely related.

The code impact was in fairly different places, so it seemed like a
nice way to break it out. I could combine them, but it would be a
fairly large patch.

For me combining them would make the intention of the code easier to understand since aren't the casemap functions just a set of "ctype_methods"?

This commit makes me tempted to handle the ctype_is_c logic for
character classes also in callbacks and remove the if in functions
like
pg_wc_ispunct(). But this si something that would need to be
benchmarked.

That's a good idea. The reason collate_is_c is important is because
there are quite a few caller-specific optimizations, but that doesn't
seem to be true of ctype_is_c.

Yeah, that was my though too but I have not confirmed it.

I wonder if the bitmask idea isn't terrible for the branch predictor
and
that me may want one function per character class, but this is yet
again
something we need to benchmark.

Agreed -- a lot of work has gone into optimizing the regex code, and we
don't want a perf regression there. But I'm also not sure exactly which
kinds of tests I should be running for that.

I think we should at least try to find the worst case to see how big the performance hit for that is. And then after that try to figure out a more typical case benchmark.

= v6-0011-Introduce-hooks-for-creating-custom-pg_locale_t.patch

Looks good but seems like a quite painful API to use.

How is it painful and can we make it better?

The painful part was mostly just a reference to that without a catalog table where new providers can be added we would need to add collations for our new custom provider on some already existing provider and then do for example some pattern matching on the name of the new collation. Really ugly but works.

I am thinking of implementing ICU4x as an external extension to try out the hook, but for the in-core contrib module we likely want to use something which does not require an external dependency. Or what do you think?

Andreas



Reply via email to