UBarney commented on issue #16710:
URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3082362965

   > > > [@UBarney](https://github.com/UBarney) - here are the 1e7 join results 
on my M3 Macbook with 16GB of RAM:
   > > 
   > > 
   > > [@MrPowers](https://github.com/MrPowers) I am using the **1e8** dataset.
   > > ```
   > > target/release/dfbench h2o --iterations 5 --join-paths 
/home/lv/code/datafusion/benchmarks/data/h2o/J1_1e8_NA_0.csv,/home/lv/code/datafusion/benchmarks/data/h2o/J1_1e8_1e2_0.csv,/home/lv/code/datafusion/benchmarks/data/h2o/J1_1e8_1e5_0.csv,/home/lv/code/datafusion/benchmarks/data/h2o/J1_1e8_1e8_NA.csv
 --queries-path /home/lv/code/datafusion/benchmarks/queries/h2o/join.sql -o 
/home/lv/code/datafusion/benchmarks/results/join/h2o_join_exch_join_order.json
   > > 
   > > 
   > >  devhomeinsp  ~/c/datafusion > wc benchmarks/data/h2o/J1_1e8_*           
 
   > >        101        101       2841 benchmarks/data/h2o/J1_1e8_1e2_0.csv
   > >     100001     100001    4420917 benchmarks/data/h2o/J1_1e8_1e5_0.csv
   > >  100000001  100000001 6619091495 benchmarks/data/h2o/J1_1e8_1e8_NA.csv
   > >        101        101       6661 benchmarks/data/h2o/J1_1e8_NA_0.csv
   > >  100100204  100100204 6623521914 total
   > > ```
   > 
   > I got similar result for my Mac M4 48GB, i believe we need to limit 16GB 
to mock the 16GB result.
   
   @zhuqi-lucas Even after manually limiting my VM's memory to 16G, the 
execution time didn't change much (the longer execution time for q1 on the 
first run might be because the CSV file wasn't in the cache ?).
   
   
   <details>
   <summary>Output</summary>
   
   
   ```
   
   √ devhomeinsp  ~/c/datafusion > /usr/bin/time -v target/release/dfbench h2o 
--iterations 5 --join-paths 
/home/lv/code/datafusion/benchmarks/data/h2o/J1_1e8_NA_0.csv,/home/lv/code/datafusion/benchmarks/data/h2o/J1_1e8_1e2_0.csv,/home/lv/code/datafusion/benchmarks/data/h2o/J1_1e8_1e5_0.csv,/home/lv/code/datafusion/benchmarks/data/h2o/J1_1e8_1e8_NA.csv
 --queries-path /home/lv/code/datafusion/benchmarks/queries/h2o/join.sql -o 
/dev/null                                                      Running 
benchmarks with the following options: RunOpt { query: None, common: CommonOpt 
{ iterations: 5, partitions: None, batch_size: None, mem_pool_type: "fair", 
memory_limit: None, sort_spill_reservation_bytes: None, debug: false }, 
queries_path: "/home/lv/code/datafusion/benchmarks/queries/h2o/join.sql", path: 
"benchmarks/data/h2o/G1_1e7_1e7_100_0.csv", join_paths: 
"/home/lv/code/datafusion/benchmarks/data/h2o/J1_1e8_NA_0.csv,/home/lv/code/datafusion/benchmarks/data/h2o/J1_1e8_1e2_0.csv,/home/lv
 
/code/datafusion/benchmarks/data/h2o/J1_1e8_1e5_0.csv,/home/lv/code/datafusion/benchmarks/data/h2o/J1_1e8_1e8_NA.csv",
 output_path: Some("/dev/null") }
   Q1: SELECT x.id1, x.id2, x.id3, x.id4 as xid4, small.id4 as smallid4, x.id5, 
x.id6, x.v1, small.v2 FROM x INNER JOIN small ON x.id1 = small.id1;
   Query 1 iteration 1 took 34.4 ms and returned 90 rows
   Query 1 iteration 2 took 1.9 ms and returned 90 rows
   Query 1 iteration 3 took 1.8 ms and returned 90 rows
   Query 1 iteration 4 took 1.6 ms and returned 90 rows
   Query 1 iteration 5 took 1.3 ms and returned 90 rows
   Query 1 avg time: 8.23 ms
   Q2: SELECT x.id1 as xid1, medium.id1 as mediumid1, x.id2, x.id3, x.id4 as 
xid4, medium.id4 as mediumid4, x.id5 as xid5, medium.id5 as mediumid5, x.id6, 
x.v1, medium.v2 FROM x INNER JOIN medium ON x.id2 = medium.id2;
   Query 2 iteration 1 took 15.1 ms and returned 89 rows
   Query 2 iteration 2 took 12.8 ms and returned 89 rows
   Query 2 iteration 3 took 12.7 ms and returned 89 rows
   Query 2 iteration 4 took 12.4 ms and returned 89 rows
   Query 2 iteration 5 took 12.5 ms and returned 89 rows
   Query 2 avg time: 13.08 ms
   Q3: SELECT x.id1 as xid1, medium.id1 as mediumid1, x.id2, x.id3, x.id4 as 
xid4, medium.id4 as mediumid4, x.id5 as xid5, medium.id5 as mediumid5, x.id6, 
x.v1, medium.v2 FROM x LEFT JOIN medium ON x.id2 = medium.id2;
   Query 3 iteration 1 took 13.5 ms and returned 100 rows
   Query 3 iteration 2 took 13.2 ms and returned 100 rows
   Query 3 iteration 3 took 12.1 ms and returned 100 rows
   Query 3 iteration 4 took 14.2 ms and returned 100 rows
   Query 3 iteration 5 took 13.1 ms and returned 100 rows
   Query 3 avg time: 13.19 ms
   Q4: SELECT x.id1 as xid1, medium.id1 as mediumid1, x.id2, x.id3, x.id4 as 
xid4, medium.id4 as mediumid4, x.id5 as xid5, medium.id5 as mediumid5, x.id6, 
x.v1, medium.v2 FROM x JOIN medium ON x.id5 = medium.id5;
   Query 4 iteration 1 took 12.1 ms and returned 89 rows
   Query 4 iteration 2 took 11.6 ms and returned 89 rows
   Query 4 iteration 3 took 12.0 ms and returned 89 rows
   Query 4 iteration 4 took 11.5 ms and returned 89 rows
   Query 4 iteration 5 took 11.8 ms and returned 89 rows
   Query 4 avg time: 11.80 ms
   Q5: SELECT x.id1 as xid1, large.id1 as largeid1, x.id2 as xid2, large.id2 as 
largeid2, x.id3, x.id4 as xid4, large.id4 as largeid4, x.id5 as xid5, large.id5 
as largeid5, x.id6 as xid6, large.id6 as largeid6, x.v1, large.v2 FROM x JOIN 
large ON x.id3 = large.id3;
   Query 5 iteration 1 took 3181.6 ms and returned 92 rows
   Query 5 iteration 2 took 2234.5 ms and returned 92 rows
   Query 5 iteration 3 took 2320.8 ms and returned 92 rows
   Query 5 iteration 4 took 2246.6 ms and returned 92 rows
   Query 5 iteration 5 took 2243.5 ms and returned 92 rows
   Query 5 avg time: 2445.39 ms
           Command being timed: "target/release/dfbench h2o --iterations 5 
--join-paths 
/home/lv/code/datafusion/benchmarks/data/h2o/J1_1e8_NA_0.csv,/home/lv/code/datafusion/benchmarks/data/h2o/J1_1e8_1e2_0.csv,/home/lv/code/datafusion/benchmarks/data/h2o/J1_1e8_1e5_0.csv,/home/lv/code/datafusion/benchmarks/data/h2o/J1_1e8_1e8_NA.csv
 --queries-path /home/lv/code/datafusion/benchmarks/queries/h2o/join.sql -o 
/dev/null"
           User time (seconds): 247.85
           System time (seconds): 21.28
           Percent of CPU this job got: 2151%
           Elapsed (wall clock) time (h:mm:ss or m:ss): 0:12.51
           Average shared text size (kbytes): 0
           Average unshared data size (kbytes): 0
           Average stack size (kbytes): 0
           Average total size (kbytes): 0
           Maximum resident set size (kbytes): 322980
           Average resident set size (kbytes): 0
           Major (requiring I/O) page faults: 260
           Minor (reclaiming a frame) page faults: 33992
           Voluntary context switches: 58558
           Involuntary context switches: 7478
           Swaps: 0
           File system inputs: 13006232
           File system outputs: 0
           Socket messages sent: 0
           Socket messages received: 0
           Signals delivered: 0
           Page size (bytes): 4096
           Exit status: 0
   
   √ devhomeinsp  ~/c/datafusion > free -h
                  total        used        free      shared  buff/cache   
available
   Mem:            11Gi       1.1Gi       3.2Gi        13Mi       7.1Gi        
10Gi
   Swap:             0B          0B          0B
   
   
   ```
   
   </details>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to