2010YOUY01 commented on PR #15610: URL: https://github.com/apache/datafusion/pull/15610#issuecomment-2798787828
Benchmark results: (I think there is no significant regression for an extra round of re-spill, if it's running on a machine with fast SSDs) ### Environment MacBook Pro with m4-pro chip (disk bandwidth is around 8000MB/s) ### Sorting 'thin' table 1. Run datafusion-cli with `cargo run --profile release-nonlto -- --mem-pool-type fair -m 100M` 2. Execute `explain analyze select * from generate_series(1, 1000000000) as t1(v1) order by v1;` Main: 37s (merge ~170 spill files at once) PR (with `sort_max_spill_merge_degree = 16`, and there is one round of re-spill): 43s PR (with `sort_max_spill_merge_degree = 10`, two rounds of re-spill): 49s ### Sorting 'fat' table Run `sort_tpch` benchmark q7 ``` // Q7: 3 sort keys {(INTEGER, 7), (BIGINT, 10k), (BIGINT, 1.5M)} + 12 all other columns r#" SELECT l_linenumber, l_suppkey, l_orderkey, l_partkey, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode FROM lineitem ORDER BY l_linenumber, l_suppkey, l_orderkey "#, ``` Benchmark command ```sh cargo run --profile release-nonlto --bin dfbench -- sort-tpch -p /Users/yongting/Code/datafusion/benchmarks/data/tpch_sf10 -q 7 --memory-limit 1.2G ``` Notes: - `target_partitions` config set to 14, and later configurations and results depend on this setting. - For PR's benchmark runs, `sort_max_spill_merge_degree` is manually changed to 6, as a result: - under 1.2G memory limit, 1 round of re-spill will be triggered - under 500M memory limit, 2 rounds of re-spill happens #### Result Main (1.2G): ``` Q7 iteration 0 took 9374.7 ms and returned 59986052 rows Q7 iteration 1 took 8117.6 ms and returned 59986052 rows Q7 iteration 2 took 8549.1 ms and returned 59986052 rows Q7 avg time: 8680.47 ms ``` Main (500M): ``` Fail with OOM ``` PR (1.2G): ``` ata/tpch_sf10 -q 7 --memory-limit 1G` Q7 iteration 0 took 10723.6 ms and returned 59986052 rows Q7 iteration 1 took 12962.8 ms and returned 59986052 rows Q7 iteration 2 took 11739.7 ms and returned 59986052 rows Q7 avg time: 11808.71 ms ``` PR (500M): ``` Q7 iteration 0 took 16233.1 ms and returned 59986052 rows Q7 iteration 1 took 18568.4 ms and returned 59986052 rows Q7 iteration 2 took 19173.4 ms and returned 59986052 rows Q7 avg time: 17991.67 ms ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org