adriangb opened a new pull request, #12606:
URL: https://github.com/apache/datafusion/pull/12606
I have a secondary index with min/max stats columns that is compatible with
PruningPredicate's rewrites.
I now want to add an index for point lookups (I plan on implementing it as a
column with distinct array values, but that's a bit of an implementation
detail).
The point is that when `PruningPredicate` encounters this column (for which
there are no stats, and which it doesn't recognize because I only pass in
Fields for which there are stats) it currently returns `true` such that
`a_column_with_stats = 123 and a_point_lookup_column = 'abc'` becomes
`a_column_with_stats_min <= 123 and a_column_with_stats_max >= 123 and true`
(ignoring nulls, maybe simplifying other bits) but I want it to become
`a_column_with_stats_min <= 123 and a_column_with_stats_max >= 123 and
a_point_lookup_column @> '{abc}'::text[]` or something like that.
I don't think it's reasonable to add APIs to DataFusion for this specific
case since it depends on implementation details outside of DataFusion's
control, but I also can't easily work around it on my end (I'd have to
re-implement all of PruningPredicate). So I'm hoping that adding this hook is
acceptable 😄
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]