On Sun, Feb 4, 2024 at 10:42 PM Jeff Davis <pg...@j-davis.com> wrote: > I'm hesitant to put much more work into it (e.g. new patches, etc.) > without more feedback. Your opinion would certainly be valuable -- for > instance, when reading the docs, can you imagine yourself actually > using this if you ran into a collation versioning/migration problem?
I'm having some difficulty understanding what the docs are trying to tell me. I think there are some issues with ordering and pacing. "The icu_multilib module provides control over the version (or versions) of the ICU provider library used by PostgreSQL, which can be different from the version of ICU with which it was built. Collations are a product of natural language, and natural language evolves over time; but PostgreSQL depends on stable ordering for structures such as indexes. Newer versions of ICU update the provided collators to adapt to changes in natural language, so it's important to control when and how those new versions of ICU are used to prevent problems such as index corruption." Check. So far, so good. "This module assumes that the necessary versions of ICU are already available, such as through the operating system's package manager; and already properly installed in a single location accessible to PostgreSQL. The configration variable icu_multilib.library_path should be set to the location where these ICU library versions are installed." Here I feel we've skipped a few steps. I suggest postponing all discussion of specific GUCs to a later point -- specifically the configuration parameters section, which I think should actually be F.19.1, with the use cases following that rather than preceding it. In this introductory section, I suggest elaborating a bit more on what problem we're trying to solve at a conceptual level. It feels like we've gone straight from the very general issue (collation definitions need to be stable but language isn't) to very specific (here's a GUC that you can set to a pathname). I feel like the need for this module should be more specifically motivated. Maybe something like: 1. Here's what we think your OS package manager is probably going to do. 2. That's going to interact with PostgreSQL in this way that I will now describe. 3. See, that sucks, because of the stuff I said above about needing stable collations! 4. But if you installed this module instead, then you could prevent the things I said under #2 from happening. 5. Instead, you'd get this other behavior, which would make you happy. I feel like I can almost piece together in my head how this is supposed to work -- I think it's like "we expect the OS package manager to drop all the ICU versions in the same directory via side by side installs, and that works well for other programs because ... for some mysterious reason they can latch onto the specific version they were linked against ... but we can't or don't do that because ... I guess we're dumber than those other pieces of software or something???? ... so this module lets you ask for more sensible behavior." But I think that could be spelled out a bit more clearly and directly than this document seems to me to do. I also wonder if we should be explaining why we don't get this right out of the box. Like, if the normal behavior categorically sucks, why do you have to install icu_multilib to get something else? Why not make the multilib treatment the default? And if the normal behavior is better for some cases and the icu_multilib behavior is better for other cases, then maybe we ought to explain which one to use in which scenario. "icu_multilib must be loaded via shared_preload_libraries. icu_multilib ignores any ICU library with a major version greater than that with which PostgreSQL was built." It's not clear from reading this whether the second sentence here is a regrettable implementation restriction or design behavior. If it's design behavior, what's the point of it? -- Robert Haas EDB: http://www.enterprisedb.com