jonathanc-n commented on PR #16450: URL: https://github.com/apache/datafusion/pull/16450#issuecomment-2988126572
@xudong963 These were tests run with one of the sides having 5 rows: <details> <summary>Click to expand</summary> ``` joins/HashJoin/l=16_r=5 time: [9.5541 µs 9.6068 µs 9.6640 µs] change: [-1.2426% -0.2737% +0.6768%] (p = 0.57 > 0.05) No change in performance detected. Found 15 outliers among 100 measurements (15.00%) 5 (5.00%) low mild 8 (8.00%) high mild 2 (2.00%) high severe joins/NestedLoopJoin/l=16_r=5 time: [8.3347 µs 8.4427 µs 8.5472 µs] change: [+16.951% +17.961% +19.019%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild joins/HashJoin/l=64_r=5 time: [9.7575 µs 9.8982 µs 10.029 µs] change: [-6.8033% -2.8109% -0.1691%] (p = 0.11 > 0.05) No change in performance detected. Found 30 outliers among 100 measurements (30.00%) 8 (8.00%) low severe 5 (5.00%) low mild 2 (2.00%) high mild 15 (15.00%) high severe joins/NestedLoopJoin/l=64_r=5 time: [10.104 µs 10.157 µs 10.228 µs] change: [+12.067% +12.951% +13.830%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild joins/HashJoin/l=256_r=5 time: [10.351 µs 10.460 µs 10.576 µs] change: [-0.9628% +0.0307% +1.0802%] (p = 0.95 > 0.05) No change in performance detected. joins/NestedLoopJoin/l=256_r=5 time: [20.469 µs 20.519 µs 20.577 µs] change: [+8.3494% +9.3946% +10.285%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) low mild 3 (3.00%) high severe joins/HashJoin/l=1024_r=5 time: [24.460 µs 24.713 µs 24.901 µs] change: [+21.363% +31.250% +41.778%] (p = 0.00 < 0.05) Performance has regressed. Found 23 outliers among 100 measurements (23.00%) 16 (16.00%) low severe 3 (3.00%) low mild 4 (4.00%) high mild joins/NestedLoopJoin/l=1024_r=5 time: [67.322 µs 67.751 µs 68.268 µs] change: [+6.0403% +8.0079% +9.5998%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) high mild 2 (2.00%) high severe joins/HashJoin/l=4096_r=5 time: [22.298 µs 22.814 µs 23.411 µs] change: [-4.7502% -0.0368% +4.6566%] (p = 0.99 > 0.05) No change in performance detected. Found 6 outliers among 100 measurements (6.00%) 5 (5.00%) high mild 1 (1.00%) high severe joins/NestedLoopJoin/l=4096_r=5 time: [258.64 µs 259.56 µs 260.40 µs] change: [+5.1991% +8.4282% +10.949%] (p = 0.00 < 0.05) Performance has regressed. Found 22 outliers among 100 measurements (22.00%) 13 (13.00%) low severe 3 (3.00%) low mild 5 (5.00%) high mild 1 (1.00%) high severe joins/HashJoin/l=32768_r=5 time: [118.18 µs 120.61 µs 124.07 µs] change: [-0.5396% +0.8988% +2.6146%] (p = 0.26 > 0.05) No change in performance detected. Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe joins/NestedLoopJoin/l=32768_r=5 time: [2.3731 ms 2.4175 ms 2.4715 ms] change: [-2.4822% +2.7216% +7.0548%] (p = 0.29 > 0.05) No change in performance detected. Found 11 outliers among 100 measurements (11.00%) 6 (6.00%) high mild 5 (5.00%) high severe joins/HashJoin/l=5_r=16 time: [9.3953 µs 9.4891 µs 9.5818 µs] change: [+0.2660% +2.3151% +3.8663%] (p = 0.01 < 0.05) Change within noise threshold. joins/NestedLoopJoin/l=5_r=16 time: [8.4094 µs 8.4620 µs 8.5189 µs] change: [+7.3597% +13.271% +16.947%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) low mild joins/HashJoin/l=5_r=64 time: [10.281 µs 10.304 µs 10.322 µs] change: [+5.5270% +7.6174% +9.6222%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) low mild joins/NestedLoopJoin/l=5_r=64 time: [10.511 µs 10.644 µs 10.845 µs] change: [+7.7402% +11.531% +14.867%] (p = 0.00 < 0.05) Performance has regressed. Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe joins/HashJoin/l=5_r=256 time: [9.8597 µs 9.9384 µs 10.007 µs] change: [-1.8462% -0.6659% +0.4999%] (p = 0.29 > 0.05) No change in performance detected. joins/NestedLoopJoin/l=5_r=256 time: [21.108 µs 21.195 µs 21.280 µs] change: [+1.5413% +6.7273% +10.204%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild joins/HashJoin/l=5_r=1024 time: [11.088 µs 11.182 µs 11.279 µs] change: [-0.8211% +0.3052% +1.3850%] (p = 0.59 > 0.05) No change in performance detected. joins/NestedLoopJoin/l=5_r=1024 time: [69.937 µs 76.254 µs 83.638 µs] change: [+5.7682% +9.4840% +14.122%] (p = 0.00 < 0.05) Performance has regressed. Found 4 outliers among 100 measurements (4.00%) 4 (4.00%) high severe joins/HashJoin/l=5_r=4096 time: [16.550 µs 16.633 µs 16.713 µs] change: [-1.4548% -0.7755% -0.0668%] (p = 0.03 < 0.05) Change within noise threshold. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild joins/NestedLoopJoin/l=5_r=4096 time: [271.97 µs 272.96 µs 273.65 µs] change: [+13.243% +14.024% +14.795%] (p = 0.00 < 0.05) Performance has regressed. Found 15 outliers among 100 measurements (15.00%) 7 (7.00%) low severe 1 (1.00%) low mild 5 (5.00%) high mild 2 (2.00%) high severe joins/HashJoin/l=5_r=32768 time: [79.925 µs 81.041 µs 82.978 µs] change: [+2.3887% +3.6331% +5.0589%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe joins/NestedLoopJoin/l=5_r=32768 time: [2.6531 ms 2.7439 ms 2.8609 ms] change: [+9.0545% +13.275% +18.591%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe ``` </details> Interesting I noticed I wasn't using a filter for the `NestedLoopJoin` and it was hardly performing faster. The benchmarks I just sent were the benchmarks using the filter for NLJ, which is why there are some regressions. I'll see if I can take some time to see how we can speed up `NestedLoopJoin`. The reason why `NestedLoopJoin` is so slow is due to the cartesian product which is being calculated + running the filter through all the indices (which is also memory expensive). It would probably be faster to just keep one side in memory if possible. and have the other side run a block nested loop join on it. In that case, equijoins tend to be much faster on a smaller table. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org