swaingotnochill commented on issue #16365:
URL: https://github.com/apache/datafusion/issues/16365#issuecomment-3045422144

   @alamb You are right, the improvement is due to collect_statistics. 
   
   My poc is able to reduce the original time, however it still doesn't seem to 
work very well with `collect_statistics` which is I am still working on. 
   
   Here are the results: 
   
   ```
   ➜  datafusion git:(rswain/parquet_metadata_caching) ✗ 
./target/debug/datafusion-cli 
   DataFusion CLI v48.0.0
   > set datafusion.execution.collect_statistics = false;
   0 row(s) fetched. 
   Elapsed 0.008 seconds.
   
   > CREATE EXTERNAL TABLE nyc_taxi_rides
   STORED AS PARQUET LOCATION 
's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet/';
   
   0 row(s) fetched. 
   Elapsed 8.987 seconds.
   
   > select count(*) from nyc_taxi_rides;
   +------------+
   | count(*)   |
   +------------+
   | 1310903963 |
   +------------+
   1 row(s) fetched. 
   Elapsed 1.422 seconds.
   
   > select count(*) from nyc_taxi_rides;
   +------------+
   | count(*)   |
   +------------+
   | 1310903963 |
   +------------+
   1 row(s) fetched. 
   Elapsed 1.343 seconds.
   
   > set datafusion.execution.collect_statistics = true;
   0 row(s) fetched. 
   Elapsed 0.002 seconds.
   
   > select count(*) from nyc_taxi_rides;
   +------------+
   | count(*)   |
   +------------+
   | 1310903963 |
   +------------+
   1 row(s) fetched. 
   Elapsed 1.818 seconds.
   
   > select count(*) from nyc_taxi_rides;
   +------------+
   | count(*)   |
   +------------+
   | 1310903963 |
   +------------+
   1 row(s) fetched. 
   Elapsed 1.157 seconds.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to