Re: [I] Benchmark: Concurrent queries in constrainted environment [datafusion]

via GitHub Thu, 25 Apr 2024 08:26:17 -0700


alamb commented on issue #10229:
URL: https://github.com/apache/datafusion/issues/10229#issuecomment-2077570355


   Here is a potential strawman implementation that I think would be fairly 
straightforward to implement:
   1. Use criterion and add a new `bench` benchmark (takes care of the warmup, 
measurements, etc)
   2. Use the tpch / clickbench dataset (created via `./bench.sh data 
clickbench`, etc)
   3. 2-3 queries (e.g. a TOP 10 query with predicates, and an aggregate with 
low cardinality)
   3. Use default target-partitions / threadpool (e.g. num cores on machine)
   4. The benchmark is how long it takes to complete some fixed number queries 
(e.g. 100)
   
   Then we could run 
   ```
   # Run q1 100 times, start the next query as soon as the first is complete
   concurrency_q1_concurrency1
   # Run q1 100 times, running 2 queries at any time
   concurrency_q1_concurrency2
   # Run q1 100 times, running 4 queries at any time
   concurrency_q1_concurrency4
   ```
   
   ANd the same for one or two other queries


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Benchmark: Concurrent queries in constrainted environment [datafusion]

Reply via email to