zhuqi-lucas commented on PR #22450:
URL: https://github.com/apache/datafusion/pull/22450#issuecomment-4702240019

   > Did some quick review but need to find more time to look in details.
   > 
   > I want to double check:
   > 
   > 1. Logic for deciding when we need to go down a different path. I see this 
is using `DynamicFilterTracker::classify` but I'm not sure if we should always 
or optionally create a RowGroupPruner`, etc.
   > 2. Logic for splitting runs. I thought the APIs we added in arrow-rs are 
flexible enough to not require splitting runs, they allow pausing a run at row 
group boundaries and changing it (e.g. skipping subsequent row groups) but 
maybe it doesn't have the right apis...
   
    Thanks @adriangb for the review!
   
     (2) splitting runs: Already gone — 7e1aae26a collapses to a single decoder 
paused at RG boundaries, rebuilt via 
decoder.into_builder().with_row_groups(remaining).build() to skip pruned RGs. 
RowGroupRun /
     split_runs are deleted. The arrow-rs API is exactly enough.
   
     (1) when to create RowGroupPruner: Agreed, current gate is too loose. 
Static predicates were already fully consumed by prune_by_statistics at open 
time, so a runtime pruner over them is wasted work. Will tighten
     to classify(&predicate) == Watching, matching the fmt_extra marker.
   
   Inline on snapshot_generation: Stale — the pruner already uses the 
DynamicFilterTracker watcher from #22460. Will fix the comment in the same 
follow-up.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to