On Tue, Dec 5, 2023 at 1:57 AM Jeff Davis <pg...@j-davis.com> wrote: > > On Mon, 2023-12-04 at 12:12 +0700, John Naylor wrote:
> There's already a patch to use simplehash, and the API is a bit > cleaner, and there's a minor performance improvement. It seems fairly > non-controversial -- should I just proceed with that patch? I won't object if you want to commit that piece now, but I hesitate to call it a performance improvement on its own. - The runtime measurements I saw reported were well within the noise level. - The memory usage starts out better, but with more entries is worse. > > From my point of view, it would at least be useful for C-strings, > > where we don't have the length available up front. > > That's good news. > > By the way, is there any reason that we would need hash_bytes(s, > strlen(s)) == cstring_hash(s)? "git grep cstring_hash" found nothing, so not sure what you're asking. > Each approach has its own optimization techniques. In (a), we can use > the |= 0x20 trick, and for equality do a memcmp() check first. I will assume you are referring to semantics, but on the odd chance readers take this to mean the actual C library call, that wouldn't be an optimization, that'd be a pessimization. > As a tangential point, we may eventually want to provide a more > internationalized definition of "case insensitive" for GUC names. That > would be slightly easier with (b) than with (a), but we can cross that > bridge if and when we come to it. The risk/reward ratio seems pretty bad. > It seems you are moving toward (a) whereas my patches moved toward (b). > I am fine with either approach but I wanted to clarify which approach > we are using. I will make my case: > In the abstract, I kind of like approach (b) because we don't need to > be as special/clever with the hash functions. In the abstract, I consider (b) to be a layering violation. As a consequence, the cleverness in (b) is not confined to one or two places, but is smeared over a whole bunch of places. I find it hard to follow. Concretely, it also adds another pointer to the element struct. That's not good for a linear open-addressing array, which simplehash has. Further, remember the equality function is important as well. In v3, it was "strcmp(a,b)==0", which is a holdover from the dynahash API. One of the advantages of the simplehash API is that we can 1) use an equality function, which should be slightly cheaper than a full comparison function, and 2) we have the option to inline it. (It doesn't make sense in turn, to jump to a shared lib page and invoke an indirect function call.) Once we've done that, it's already "special", so it's not a stretch to make it do what we want to begin with. If a nicer API is important, why not use it?