On Thu, May 12, 2022 at 9:57 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > Andrew Dunstan <and...@dunslane.net> writes: > > On 2022-05-12 Th 07:02, Valeriy Meleshkin wrote: > >> AFAICT it happens because when iterating over a jsonb the hash function > >> makes no distinction between raw scalars and arrays (it doesn't inspect > >> v.val.array.rawScalar) > > > It does look rather like a bug, but I'm unclear about the implications > > of fixing it. > > Changing this hash algorithm would break existing hash indexes on jsonb > columns. Maybe there aren't any, but even if so I can't get very excited > about changing this. Hash algorithms always have collisions, and we have > never made any promise that ours are cryptographically strong.
I might be missing something, but I don't know why "cryptographically strong" is the relevant concept here. It seems like the question is how likely it is that this would lead to queries having crappy performance. In general we want hash functions to deliver different values for different inputs so that we give ourselves the best possible chance of spreading keys evenly across buckets. If for example we hashed every possible JSON object to the same constant value, everything we tried to do with this hash function would suck, and the problem would be so severe that I think we'd have to fix it, even though it would mean a compatibility break. Or even if we hashed every JSON integer to the same value, that would be horrible, because it's quite likely that you could have a column full of json objects where ignoring the difference between one integer and another results in a ton of duplicate hash values. Here, that doesn't seem too likely. You could have a column that contains 'tom' and ['tom'] and [['tom']] and [[['tom']]] and so forth and they all get mapped onto the same bucket and you're sad. But probably not. So I'd judge that while it was probably a mistake to make the hash function work this way, it's not likely to cause serious problems, and therefore we ought to maybe leave it alone for now, but add a comment so that if we ever break backward-compatibility for any other reason, we remember to fix this too. IOW, I think I mostly agree with your conclusion, but perhaps not entirely with the reasoning. -- Robert Haas EDB: http://www.enterprisedb.com