Re: [Rust][DataFusion] profiling TPC-H benchmarks with flamegraphs

2022-04-06 Thread Andrew Lamb
I am going to try and make profiling in datafusion a bit easier. Feel free to comment on the ticket[1] Andrew [1] https://github.com/apache/arrow-datafusion/issues/2174 On Mon, Mar 21, 2022 at 7:39 PM Bob Tinsman wrote: > Wow, hotspot looks very cool! However, I was only able to download v1.

Re: [Rust][DataFusion] profiling TPC-H benchmarks with flamegraphs

2022-03-21 Thread Bob Tinsman
Wow, hotspot looks very cool! However, I was only able to download v1.1 which is very slow at processing the perf.data; it took something like 30 minutes for a perf.data file of about 0.5G. I am resorting to building Qt from source, so I can build hotspot from source...😖 On Mon, Mar 21, 2022 at 2:

Re: [Rust][DataFusion] profiling TPC-H benchmarks with flamegraphs

2022-03-21 Thread Daniël Heres
Hi Bob, One command I've been using succesfully some time for profiling is as follows (with different flags, csv file works just as well): perf record --call-graph=dwarf ../target/release/tpch benchmark datafusion --path [path] --format parquet --query 6 --iterations 1 -n 16 And using hotspot (ht

Re: [Rust][DataFusion] profiling TPC-H benchmarks with flamegraphs

2022-03-21 Thread Bob Tinsman
Andrew, thanks for your feedback! I started looking at IOx and pprof, and I'm slowly getting a better picture of DataFusion performance work. In particular, I can see that IOx is driving some of this (in particular [1]). I'm still in sponge mode, but I can think of a few useful things to do around

Re: [Rust][DataFusion] profiling TPC-H benchmarks with flamegraphs

2022-03-21 Thread Andrew Lamb
Thank you for writing up your findings If you use the `--mem-table` / `-m` command, the CSV file is read once and then the query is executed subsequently As for better ways of profiling rust, we have had good luck using `pprof` [1] in InfluxDB IOx (which also uses DataFusion), so I have mostly ne