adriangb opened a new pull request, #14297:
URL: https://github.com/apache/datafusion/pull/14297

   Currently pruning predicates may return `NULL` to indicate "this container 
should be included", thus using `NULL` as a *truthy* value. That is quite 
confusing, as explained in the various comments addressing it.
   
   Additionally this is a big inconvenience for anything using 
`PredicateRewriter` because you have to handle nulls yourself, i.e. if you pipe 
the result into a `WHERE` clause you get the wrong result (silently!!). The 
workaround is to wrap the expression returned by `PredicateRewriter` with 
`(<expr>) IS NOT FALSE` which makes `NULL` truthy. This has the unfortunate 
consequence of breaking down a simple binary expression into a [non sargable 
one](https://en.wikipedia.org/wiki/Sargable). This poses a problem for systems 
that may want to store statistics in a DMBS with indexes. For example, if I add 
an index on `col1_min` it can't be used because the `(...) IS NOT FALSE` 
prevents anything from being pushed down into indexes.
   
   This PR addresses both problems by introducing checks for nulls in the stats 
columns in the right places such that we can now promise that the predicates 
always return `true`.
   
   Since we make no promises about the produced predicate this should not be a 
breaking change.
   
   This does not read any extra columns and the null checks should be very 
cheap, so I do not expect this to have any performance impact on systems 
evaluating statistics in memory (like DataFusion does internally for parquet 
row group and page statistics).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to