Re: So I'm actually using the columns during merge join, basically I'm building a bloom filter on the outer relation and filtering out data on the inner relation of the join. I'm building the filter on the join keys
We had a whole implementation for Bloom filtering for hash inner join, complete with costing and pushdown of the Bloom filter from the build side to the execution tree on the probe side (i.e. building a Bloom filter on the inner side of the join at the conclusion of the build phase of the hash join, then pushing it down as a semi-join filter to the probe side of the join, where it could potentially be applied to multiple scans). After a large change to that same area of the code by the community it got commented out and has been in that state ever since. It's a good example of the sort of change that really ought to be made with the community because there's too much merge burden otherwise. It was a pretty effective optimization in some cases, though. Most commercial systems have an optimization like this, sometimes with special optimizations when the number of distinct join keys is very small. If there is interest in reviving this functionality, we could probably extract some patches and work with the community to try to get it running again. /Jim