Dandandan commented on PR #20160:
URL: https://github.com/apache/datafusion/pull/20160#issuecomment-3905053370

   > [#20160 
(comment)](https://github.com/apache/datafusion/pull/20160#issuecomment-3902329306)
   > 
   > This is the main improvement.
   
   Ok - yes I see some improvements here and there but it is still largely 
regressing main with ~30s (TPCDS runs in ~50s without and ~80s with filter 
pushdown). See e.g. this run 
https://github.com/apache/datafusion/pull/20318#issuecomment-3902690761 against 
main without dynamic filter pushdown.
   
   ```
   │ QQuery 64 │  1194.15 ms │                 31181.42 ms │ 26.11x slower │
   ```
   
   This ~26x regression (and many others) is still unchanged in this PR:
   
   (
   
   ```
   │ QQuery 64 │ 28583.66 ms │             28523.14 ms │     no change │
   ```
   
   
   As we're running with ```DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true
   DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true``` also the main branch is 
showing the regressions - so we're comparing both "slow" versions.
   
   I think I now have an understanding why the current approaches adaptiveness 
isn't helping _that much_ yet.
   As we're only checking the filters on `open` it is only sorted / considered 
/ discarded when the query consists of many files i.e. more files than threads. 
In other cases, it will evaluate / scan the columns regardless of the tracking 
(as it will open the files directly at the start of the query / query phase 
when the selectivity is yet unknown).
   
   I think for it to work effectively, it needs to integrate more with the 
parquet reader to remove or add a filter based on the adaptiveness _during_  
the scan.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to