On 2/13/23 8:11 PM, Jeff Davis wrote:
On Thu, 2023-02-02 at 05:13 -0800, Jeff Davis wrote:As a project, do we want to nudge users toward ICU as the collation provider as the best practice going forward?One consideration here is security. Any vulnerability in ICU collation routines could easily become a vulnerability in Postgres.
Would it be any different than a vulnerability in OpenSSL et al? I know that's a general, nuanced question but it would be good to understand if we are exposing ourselves to any more vulnerabilities. And would it be any different than today, given people can build PG with libicu as is?
Continuing on $SUBJECT, I wanted to understand performance comparisons. I saw your comments[1] in response to Robert's question, looked at your benchmarks[2] and one that ICU ran on older versions[3]. It seems that in general, users would see performance gains switching to ICU. The only one in [3] that stood out to me was the tests on the "ko_KR" collation underperformed on a list of Korean names, but maybe that is better in newer versions.
I agree with most of your points in [1]. The platform-consistent behavior is a good point, especially with more PG deployments running on different systems. While taking on a new dependency is a concern, ICU was released in 1999[4], has an active community, and seems to follow standards (i.e. the Unicode Consortium).
I do wonder about upgrades, beyond the ongoing work with pg_upgrade. I think the logical methods (pg_dumpall, logical replication) should generally be OK, but we should ensure we think of things that could go wrong and how we'd answer them.
Based on the available data, I think it's OK to move towards ICU as the default, or preferred, collation provider. I agree (for now) in not taking a hard dependency on ICU.
Thanks, Jonathan[1] https://www.postgresql.org/message-id/b676252eeb57ab8da9dbb411d0ccace95caeda0a.camel%40j-davis.com [2] https://www.postgresql.org/message-id/64039a2dbcba6f42ed2f32bb5f0371870a70afda.ca...@j-davis.com
[3] https://icu.unicode.org/charts/collation-icu4c48-glibc [4] https://en.wikipedia.org/wiki/International_Components_for_Unicode
OpenPGP_signature
Description: OpenPGP digital signature