Re: So I'm actually using the columns during merge join, basically I'm building 
a bloom filter on the outer relation and filtering out data on the inner 
relation of the join. I'm building the filter on the join keys

We had a whole implementation for Bloom filtering for hash inner join, complete 
with costing and pushdown of the Bloom filter from the build side to the 
execution tree on the probe side (i.e. building a Bloom filter on the inner 
side of the join at the conclusion of the build phase of the hash join, then 
pushing it down as a semi-join filter to the probe side of the join, where it 
could potentially be applied to multiple scans).  After a large change to that 
same area of the code by the community it got commented out and has been in 
that state ever since.  It's a good example of the sort of change that really 
ought to be made with the community because there's too much merge burden 
otherwise.

It was a pretty effective optimization in some cases, though.  Most commercial 
systems have an optimization like this, sometimes with special optimizations 
when the number of distinct join keys is very small. If there is interest in 
reviving this functionality, we could probably extract some patches and work 
with the community to try to get it running again.  

   /Jim


Reply via email to