2010YOUY01 commented on code in PR #16804: URL: https://github.com/apache/datafusion/pull/16804#discussion_r2212651111
########## benchmarks/bench.sh: ########## @@ -100,15 +100,24 @@ clickbench_pushdown: ClickBench queries against partitioned (100 files) parqu clickbench_extended: ClickBench \"inspired\" queries against a single parquet (DataFusion specific) # H2O.ai Benchmarks (Group By, Join, Window) -h2o_small: h2oai benchmark with small dataset (1e7 rows) for groupby, default file format is csv -h2o_medium: h2oai benchmark with medium dataset (1e8 rows) for groupby, default file format is csv -h2o_big: h2oai benchmark with large dataset (1e9 rows) for groupby, default file format is csv -h2o_small_join: h2oai benchmark with small dataset (1e7 rows) for join, default file format is csv -h2o_medium_join: h2oai benchmark with medium dataset (1e8 rows) for join, default file format is csv -h2o_big_join: h2oai benchmark with large dataset (1e9 rows) for join, default file format is csv -h2o_small_window: Extended h2oai benchmark with small dataset (1e7 rows) for window, default file format is csv -h2o_medium_window: Extended h2oai benchmark with medium dataset (1e8 rows) for window, default file format is csv -h2o_big_window: Extended h2oai benchmark with large dataset (1e9 rows) for window, default file format is csv +h2o_small: h2oai benchmark with small dataset (1e7 rows) for groupby, default file format is csv Review Comment: Later, we can clean it up with additional size/format options like `./bench.sh run h2o_join medium parquet` ########## benchmarks/bench.sh: ########## @@ -775,6 +840,7 @@ data_h2o() { # Set virtual environment directory VIRTUAL_ENV="${PWD}/venv" + rm -rf "$VIRTUAL_ENV" Review Comment: Could you add a comment for this line? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org