my-vegetable-has-exploded commented on PR #12754: URL: https://github.com/apache/datafusion/pull/12754#issuecomment-2429057021
> It's still _input_size^2_ complexity, however the _N^2_ step (step 3) is just looping through a bit array, it's way more efficient than do _N^2_ join condition evaluation if it's using NLJ, to make this implementation efficient. If use btreemap to maintain it, the complexity will be $N*logN + OutputSize$. And the bitmap[i..n] maybe sparse in many scenes, use bitmap with bloom filter(bloomfilter[i]=0 means bitmap[k*i..k*(i+1)] are all 0) can skip lots of useless value, and bitmap is very cache-friendly. > Is it possible to extract the second sorting outside `IEJoin` executor, and let the existing sort executor to do this? This step then can be more easily parallelized, or does `IEJoin` executor have a better way to parallelize its job? I don't know how to use extract the second sorting outside `IEJoin` executor though. And currently parallelize mechanism is split the left table into n blocks and right table into m blocks (left table and right table already sort by condition1). Then we have n*m part data to compute, and we can check whether l[i] block and r[j] block can produce any result pair satisfying condition1 in O(1) complexity, if not result pair can be produced, we can just skip the block pair l[i] and r[j]. If you have a better idea, I'd love to hear about it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org