alamb commented on issue #18909:
URL: https://github.com/apache/datafusion/issues/18909#issuecomment-3577686002
I did some profiling on q0:
```shell
ubuntu@ip-172-31-45-69:~/ClickBench/datafusion-partitioned$ cat create.sql
CREATE EXTERNAL TABLE hits
STORED AS PARQUET
LOCATION 'partitioned'
OPTIONS ('binary_as_string' 'true');
ubuntu@ip-172-31-45-69:~/ClickBench/datafusion-partitioned$ cat q0.sql
SELECT COUNT(*) FROM hits;
Then run
datafusion-cli -f create.sql -f q0.sql
```
And then looked at it with samply like this to mimic a "warm" run
```shell
ubuntu@ip-172-31-45-69:~/ClickBench/datafusion-partitioned$ datafusion-cli
-f create.sql -f q0.sql && samply record datafusion-cli -f create.sql -f q0.sql
```
My analysis is that 75% of the time is spent calculating statistics
<img width="1750" height="1164" alt="Image"
src="https://github.com/user-attachments/assets/2bc9dfd9-6128-4063-a14a-561c3f64d2a3"
/>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]