> On Mar 2, 2021, at 12:04 PM, Joel Jacobson <j...@compiler.org> wrote:
>
> On Tue, Mar 2, 2021, at 20:57, Mark Dilger wrote:
>> I didn't phrase that clearly enough. I'm thinking about whether you include
>> the bounds information in the hash function. The current implementation of
>> hash_range(PG_FUNCTION_ARGS) is going to hash the lower and upper bounds,
>> since you didn't change it to do otherwise, so "equal" values won't always
>> hash the same. I haven't tested this out, but it seems you could get a
>> different set of rows depending on whether the planner selects a hash join.
>
> I think this issue is solved by the
> empty-ranges-with-bounds-information-v2.patch I just sent,
> since with it, there are no semantic changes at all, lower() and upper()
> works like before.
There are semantic differences, because hash_range() doesn't call lower() and
upper(), it uses RANGE_HAS_LBOUND and RANGE_HAS_UBOUND, the behavior of which
you have changed. I created a regression test and expected results and checked
after applying your patch, and your patch breaks the hash function behavior.
Notice that before your patch, all three ranges hashed to the same value, but
not so after:
@@ -1,18 +1,18 @@
select hash_range('[a,a)'::textrange);
hash_range
------------
- 484847245
+ -590102690
(1 row)
select hash_range('[b,b)'::textrange);
hash_range
------------
- 484847245
+ 281562732
(1 row)
select hash_range('[c,c)'::textrange);
- hash_range
-------------
- 484847245
+ hash_range
+-------------
+ -1887445565
(1 row)
You might change how hash_range() works to get all "equal" values to hash the
same value, but that just gets back to the problem that non-equal things appear
to be equal. I guess that's your two-warty-feet preference, but not everyone
is going to be in agreement on that.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company