On Fri, Aug 30, 2024 at 8:10 PM Noah Misch <n...@leadboat.com> wrote: > > On Thu, Aug 29, 2024 at 03:48:53PM -0500, Masahiko Sawada wrote: > > On Sun, May 19, 2024 at 6:46 AM Noah Misch <n...@leadboat.com> wrote: > > > If I were standardizing pg_trgm on one or the other notion of "char", I > > > would > > > choose signed char, since I think it's still the majority. More broadly, > > > I > > > see these options to fix pg_trgm: > > > > > > 1. Change to signed char. Every arm64 system needs to scan pg_trgm > > > indexes. > > > 2. Change to unsigned char. Every x86 system needs to scan pg_trgm > > > indexes. > > > > Even though it's true that signed char systems are the majority, it > > would not be acceptable to force the need to scan pg_trgm indexes on > > unsigned char systems. > > > > > 3. Offer both, as an upgrade path. For example, pg_trgm could have > > > separate > > > operator classes gin_trgm_ops and gin_trgm_ops_unsigned. Running > > > pg_upgrade on an unsigned-char system would automatically map v17 > > > gin_trgm_ops to v18 gin_trgm_ops_unsigned. This avoids penalizing any > > > architecture with upgrade-time scans. > > > > Very interesting idea. How can new v18 users use the correct operator > > class? I don't want to require users to specify the correct signed or > > unsigned operator classes when creating a GIN index. Maybe we need to > > In brief, it wouldn't matter which operator class new v18 indexes use. The > documentation would focus on gin_trgm_ops and also say something like: > > There's an additional operator class, gin_trgm_ops_unsigned. It behaves > exactly like gin_trgm_ops, but it uses a deprecated on-disk representation. > Use gin_trgm_ops in new indexes, but there's no disadvantage from continuing > to use gin_trgm_ops_unsigned. Before PostgreSQL 18, gin_trgm_ops used a > platform-dependent representation. pg_upgrade automatically uses > gin_trgm_ops_unsigned when upgrading from source data that used the > deprecated representation. > > What concerns might users have, then? (Neither operator class would use plain > "char" in a context that affects on-disk state. They'll use "signed char" and > "unsigned char".)
I think I understand your idea now. Since gin_trgm_ops will use "signed char", there is no impact for v18 users -- they can continue using gin_trgm_ops. But how does pg_upgrade use gin_trgm_ops_unsigned? This opclass will be created by executing the pg_trgm script for v18, but it isn't executed during pg_upgrade. Another way would be to do these opclass replacement when executing the pg_trgm's update script (i.e., 1.6 to 1.7). Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com