zhuqi-lucas commented on PR #14902:
URL: https://github.com/apache/datafusion/pull/14902#issuecomment-2685544998

   Try to reproduce the 
   https://github.com/apache/datafusion/issues/13765
   
   
   But current main branch, our join passed! It takes about 50s, it's a good 
result! cc @alamb  @2010YOUY01 
   
   
   ```rust
   ./bench.sh data h2o_big_join
   ```
   
   ```rust
   cargo run --release --bin dfbench -- h2o --mem-pool-type fair --memory-limit 
16G  --iterations 3 --join-paths 
/Users/zhuqi/arrow-datafusion/benchmarks/data/h2o/J1_1e9_NA_0.csv,/Users/zhuqi/arrow-datafusion/benchmarks/data/h2o/J1_1e9_1e3_0.csv,/Users/zhuqi/arrow-datafusion/benchmarks/data/h2o/J1_1e9_1e6_0.csv,/Users/zhuqi/arrow-datafusion/benchmarks/data/h2o/J1_1e9_1e9_NA.csv
 --queries-path /Users/zhuqi/arrow-datafusion/benchmarks/queries/h2o/join.sql 
-o /Users/zhuqi/arrow-datafusion/benchmarks/results/issue_14867/h2o_join.json
       Finished `release` profile [optimized] target(s) in 0.19s
        Running `/Users/zhuqi/arrow-datafusion/target/release/dfbench h2o 
--mem-pool-type fair --memory-limit 16G --iterations 3 --join-paths 
/Users/zhuqi/arrow-datafusion/benchmarks/data/h2o/J1_1e9_NA_0.csv,/Users/zhuqi/arrow-datafusion/benchmarks/data/h2o/J1_1e9_1e3_0.csv,/Users/zhuqi/arrow-datafusion/benchmarks/data/h2o/J1_1e9_1e6_0.csv,/Users/zhuqi/arrow-datafusion/benchmarks/data/h2o/J1_1e9_1e9_NA.csv
 --queries-path /Users/zhuqi/arrow-datafusion/benchmarks/queries/h2o/join.sql 
-o /Users/zhuqi/arrow-datafusion/benchmarks/results/issue_14867/h2o_join.json`
   Running benchmarks with the following options: RunOpt { query: None, common: 
CommonOpt { iterations: 3, partitions: None, batch_size: 8192, mem_pool_type: 
"fair", memory_limit: Some(17179869184), sort_spill_reservation_bytes: None, 
debug: false }, queries_path: 
"/Users/zhuqi/arrow-datafusion/benchmarks/queries/h2o/join.sql", path: 
"benchmarks/data/h2o/G1_1e7_1e7_100_0.csv", join_paths: 
"/Users/zhuqi/arrow-datafusion/benchmarks/data/h2o/J1_1e9_NA_0.csv,/Users/zhuqi/arrow-datafusion/benchmarks/data/h2o/J1_1e9_1e3_0.csv,/Users/zhuqi/arrow-datafusion/benchmarks/data/h2o/J1_1e9_1e6_0.csv,/Users/zhuqi/arrow-datafusion/benchmarks/data/h2o/J1_1e9_1e9_NA.csv",
 output_path: 
Some("/Users/zhuqi/arrow-datafusion/benchmarks/results/issue_14867/h2o_join.json")
 }
   Q1: SELECT x.id1, x.id2, x.id3, x.id4 as xid4, small.id4 as smallid4, x.id5, 
x.id6, x.v1, small.v2 FROM x INNER JOIN small ON x.id1 = small.id1;
   Query 1 iteration 1 took 38.1 ms and returned 900 rows
   Query 1 iteration 2 took 3.3 ms and returned 900 rows
   Query 1 iteration 3 took 2.1 ms and returned 900 rows
   Q2: SELECT x.id1 as xid1, medium.id1 as mediumid1, x.id2, x.id3, x.id4 as 
xid4, medium.id4 as mediumid4, x.id5 as xid5, medium.id5 as mediumid5, x.id6, 
x.v1, medium.v2 FROM x INNER JOIN medium ON x.id2 = medium.id2;
   Query 2 iteration 1 took 46.1 ms and returned 912 rows
   Query 2 iteration 2 took 18.4 ms and returned 912 rows
   Query 2 iteration 3 took 18.6 ms and returned 912 rows
   Q3: SELECT x.id1 as xid1, medium.id1 as mediumid1, x.id2, x.id3, x.id4 as 
xid4, medium.id4 as mediumid4, x.id5 as xid5, medium.id5 as mediumid5, x.id6, 
x.v1, medium.v2 FROM x LEFT JOIN medium ON x.id2 = medium.id2;
   Query 3 iteration 1 took 18.2 ms and returned 1000 rows
   Query 3 iteration 2 took 18.5 ms and returned 1000 rows
   Query 3 iteration 3 took 17.7 ms and returned 1000 rows
   Q4: SELECT x.id1 as xid1, medium.id1 as mediumid1, x.id2, x.id3, x.id4 as 
xid4, medium.id4 as mediumid4, x.id5 as xid5, medium.id5 as mediumid5, x.id6, 
x.v1, medium.v2 FROM x JOIN medium ON x.id5 = medium.id5;
   Query 4 iteration 1 took 17.8 ms and returned 912 rows
   Query 4 iteration 2 took 18.1 ms and returned 912 rows
   Query 4 iteration 3 took 17.6 ms and returned 912 rows
   Q5: SELECT x.id1 as xid1, large.id1 as largeid1, x.id2 as xid2, large.id2 as 
largeid2, x.id3, x.id4 as xid4, large.id4 as largeid4, x.id5 as xid5, large.id5 
as largeid5, x.id6 as xid6, large.id6 as largeid6, x.v1, large.v2 FROM x JOIN 
large ON x.id3 = large.id3;
   Query 5 iteration 1 took 49496.6 ms and returned 906 rows
   Query 5 iteration 2 took 49838.1 ms and returned 906 rows
   Query 5 iteration 3 took 49552.0 ms and returned 906 rows
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to