theirix opened a new issue, #16545:
URL: https://github.com/apache/datafusion/issues/16545

   ### Describe the bug
   
   This is a follow-up to a discussion in 
https://github.com/apache/datafusion/pull/16325#issuecomment-2985522134, which 
is not directly related to table sampling but could affect it.
   
   I'd like to double-check if a volatile filter pushdown to a Parquet executor 
is expected. I had implemented the disabling of volatile pushdown filters for a 
logical plan in #13268. But it seems like the physical optimiser still pushes 
this predicate to an executor.  Should we implement a similar mechanism to make 
volatile predicates as unsupported filters? In a current physical plan 
implementation, there is a concept of "unsupported" filters, which can be 
easily reused for it.
   
   Current behaviour:
   
   Before:
   ```
   [2025-06-18T18:20:07Z TRACE datafusion::physical_planner] Optimized physical 
plan by LimitedDistinctAggregation:
       OutputRequirementExec
         ProjectionExec: expr=[count(Int64(1))@0 as count(*)]
           AggregateExec: mode=Final, gby=[], aggr=[count(Int64(1))]
             AggregateExec: mode=Partial, gby=[], aggr=[count(Int64(1))]
               FilterExec: random() < 0.1
                 DataSourceExec: file_groups={1 group: [[sample.parquet]]}, 
file_type=parquet
   ```
   
   After:
   ```
   [2025-06-18T18:20:07Z TRACE datafusion::physical_planner] Optimized physical 
plan by FilterPushdown:
       OutputRequirementExec
         ProjectionExec: expr=[count(Int64(1))@0 as count(*)]
           AggregateExec: mode=Final, gby=[], aggr=[count(Int64(1))]
             AggregateExec: mode=Partial, gby=[], aggr=[count(Int64(1))]
               DataSourceExec: file_groups={1 group: [[sample.parquet]]}, 
file_type=parquet, predicate=random() < 0.1
   ```
   
   
   
   ### To Reproduce
   
   ```sql
   set datafusion.execution.parquet.pushdown_filters=true;
   create external table data stored as parquet location 'sample.parquet';
   SELECT count(*) FROM data WHERE random() < 0.1;
   ```
   
   ### Expected behavior
   
   I expect the physical plan optimiser doesn't perform pushdown of volatile 
predicates.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to