chenkovsky commented on PR #16505:
URL: https://github.com/apache/datafusion/pull/16505#issuecomment-3003924958

   > @2010YOUY01 thank you for pointing this out.
   > 
   > @chenkovsky, it looks like both our PRs solve the same sampling problem 
from different approaches. The direction of my PR is to continue improving 
random filtering (as in #13268) by enhancing a predicate-based sampling, as 
previously discussed with @alamb 
[here](https://github.com/apache/datafusion/issues/13563#issuecomment-2498989436).
   > 
   > The sampling logic differs between databases, and in my PR implementation 
and review process, we have already begun addressing some subtle semantics 
differences for Postgres, DuckDB, Hive etc.
   
   I considered random filtering before, but I found it's hard to implement 
poisson sample and seed. then I bring spark's design here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to