I am going to try and make profiling in datafusion a bit easier. Feel free
to comment on the ticket[1]
Andrew
[1] https://github.com/apache/arrow-datafusion/issues/2174
On Mon, Mar 21, 2022 at 7:39 PM Bob Tinsman wrote:
> Wow, hotspot looks very cool! However, I was only able to download v1.
Wow, hotspot looks very cool! However, I was only able to download v1.1
which is very slow at processing the perf.data; it took something like 30
minutes for a perf.data file of about 0.5G. I am resorting to building Qt
from source, so I can build hotspot from source...😖
On Mon, Mar 21, 2022 at 2:
Hi Bob,
One command I've been using succesfully some time for profiling is as
follows (with different flags, csv file works just as well):
perf record --call-graph=dwarf ../target/release/tpch benchmark datafusion
--path [path] --format parquet --query 6 --iterations 1 -n 16
And using hotspot (ht
Andrew, thanks for your feedback! I started looking at IOx and pprof, and
I'm slowly getting a better picture of DataFusion performance work. In
particular, I can see that IOx is driving some of this (in particular [1]).
I'm still in sponge mode, but I can think of a few useful things to do
around
Thank you for writing up your findings
If you use the `--mem-table` / `-m` command, the CSV file is read once and
then the query is executed subsequently
As for better ways of profiling rust, we have had good luck using `pprof`
[1] in InfluxDB IOx (which also uses DataFusion), so I have mostly ne