Kontinuation opened a new pull request, #525: URL: https://github.com/apache/sedona-db/pull/525
This is a follow up of https://github.com/apache/sedona-db/pull/523. When executing queries with large windows on dense datasets, each probe row may be matched with millions of indexed rows. If we don't break large result batches generated by such index probing, we'll easily overshoot the memory limit when assembling join result batches. This patch splits large joined build-probe side indices into smaller pieces and gradually assemble result batches. This will greatly reduce the amount of memory required for producing join results for "cover all" probe rows. The code for properly slicing join result indices for various join types is a bit complicated. We have added fuzz tests to verify that it works correctly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
