On Fri, Jun 20, 2025 at 10:15:47AM -0700, Jeff Davis wrote: > On Fri, 2025-06-20 at 11:31 -0500, Nico Williams wrote: > > In the slow path you only normalize the _current character_, so you > > only need enough buffer space for that. > > That's a clear win for UTF8 data. Also, if there are no changes, then > you can just return the input buffer and not bother allocating an > output buffer.
The latter is not relevant to string comparison or hashing, but, yeah, if you have to produce a normalized string you can optimistically assume it is already normalized and defer allocation until you know it isn't normalized. > Postgres is already form-preserving; it does not auto-normalize. (I > have suggested that we might want to offer something like that, but > that would be a user choice.) Excellent, then I would advise looking into adding form-insensitive string comparison and hashing to get f-i/f-p behavior. > Currently, the non-deterministic collations (which offer form- > insensitivity) are not available at the database level, so you have to > explicitly specify the COLLATE clause on a column or query. In other > words, Postgres is not form-insensitive by default, though there is > work to make that possible. TIL. Thanks. > Databases have similar concerns as a filesystem in this respect. I figured :)