[I] Why does `PruningPredicate` reference a `row_count` for each column? [datafusion]

via GitHub Wed, 18 Dec 2024 21:11:45 -0800


adriangb opened a new issue, #13836:
URL: https://github.com/apache/datafusion/issues/13836


   ### Is your feature request related to a problem or challenge?
   
   Are there any scenarios where it makes sense for each column in a container 
to have a different row count? I think they should always be the same. Even if 
they are stored separately in Parquet we should be able to pick any non-missing 
row count and have it be correct. If this is true we can simplify the pruning 
predicate a little bit which would make it (possibly insignificantly) faster to 
evaluate  for everyone using DataFusion but selfishly would allow me to remove 
a couple lines of hacky code in our codebase.
   
   
https://github.com/apache/datafusion/blob/46101f3d195d1f8b483e13f2d19485e04070e0b0/datafusion/physical-optimizer/src/pruning.rs#L843
   
   ### Describe the solution you'd like
   
   `PruningPredicate` has the option to be configured to only reference a 
single column called `row_count`.
   
   ### Describe alternatives you've considered
   
   Do nothing.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Why does `PruningPredicate` reference a `row_count` for each column? [datafusion]

Reply via email to