adriangb commented on PR #21641:
URL: https://github.com/apache/datafusion/pull/21641#issuecomment-4253861394

   > I feel intead of making it non-blocking, there should be a timeout 
something like 
[this](https://trino.io/docs/current/admin/dynamic-filtering.html).
   
   I agree a timeout could make sense. I don't see that Trino has one (they 
have other thresholds, some of which we've already copied).
   
   IIRC we introduced the barrier for 2 reasons:
   1. Correctness. Without synchronization (leader selection) the filters that 
get pushed down are incorrect.
   2. Performance. The hope was that as you say we avoid a situation where we 
miss the ability to apply filters by a couple ms.
   
   This PR keeps (1) but drops (2).
   
   Two reasons I think this is a good tradeoff:
   1. Performance was never absolute. If the filters end up being not 
effective, the query is large enough that it doesn't matter or the system is 
heavily IO constrainted it may make the query slower. See e.g. 
https://github.com/apache/datafusion/pull/19760, cc @gabotechs.
   2. We are working on making scans more adaptive so that they can e.g. start 
applying a newly generated dynamic filter mid stream. In particular 
https://github.com/apache/datafusion/pull/21351. I think this mitigates the 
issue because if a dynamic filter is updated while we are reading the first 
file on the probe side we don't have to wait a full file to apply it.
   
   But yes this is ultimately a tradeoff. Especially because of the deadlocks 
we are currently experiencing I think trading some performance for correctness 
is justified.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to