UBarney commented on PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#issuecomment-2994063902
> * `apply_join_filter_to_indices` Showed a reduction in execution time
(sample count reduced from 528million to 241million).
The benchmark results indicate that restricting the row count of
`intermediate_batch` in `apply_join_filter_to_indices` indeed enhances
performance.
```
Benchmarking nlj/filter/batch_size==8192: Collecting 100 samples in
estimated 156.94 s (30
nlj/filter/batch_size==8192
time: [487.31 ms 488.41 ms 489.68 ms]
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) low mild
1 (1.00%) high mild
4 (4.00%) high severe
Benchmarking nlj/filter/batch_size==8192*8192: Collecting 100 samples in
estimated 126.81
nlj/filter/batch_size==8192*8192
time: [658.42 ms 659.90 ms 661.49 ms]
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
```
https://gist.github.com/UBarney/23fdb597f43bfcffe4f781fb6b99e579
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]