berkaysynnada closed issue #15529: Extend TopK early termination to partially
sorted inputs
URL: https://github.com/apache/datafusion/issues/15529
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
alamb commented on issue #15529:
URL: https://github.com/apache/datafusion/issues/15529#issuecomment-2781385443
@NGA-TRAN and @gabotechs can you please help review this PR?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and
geoffreyclaude commented on issue #15529:
URL: https://github.com/apache/datafusion/issues/15529#issuecomment-2773087938
> FWIW my view is that https://github.com/apache/datafusion/pull/15301 tries
to implement data skipping for partially sorted / globally roughly clustered
inputs where the
NGA-TRAN commented on issue #15529:
URL: https://github.com/apache/datafusion/issues/15529#issuecomment-2780700162
Thanks for a nice real life use case and benchmarking numbers,
@geoffreyclaude
--
This is an automated message from the Apache Git Service.
To respond to the message, please
geoffreyclaude commented on issue #15529:
URL: https://github.com/apache/datafusion/issues/15529#issuecomment-2779122390
PR should be ready for review. I've included some pretty nice benchmark
results from https://github.com/apache/datafusion/pull/15560:
```
> ./bench.sh compare ma
geoffreyclaude commented on issue #15529:
URL: https://github.com/apache/datafusion/issues/15529#issuecomment-2772276984
@alamb:
> This may be some overlap with this work from @adriangb (though I realize
you are talking about a different optimization)
The two are complimentary. @ad
adriangb commented on issue #15529:
URL: https://github.com/apache/datafusion/issues/15529#issuecomment-2772802775
FWIW my view is that #15301 tries to implement data skipping for partially
sorted / globally roughly clustered inputs where there is an ORDER BY LIMIT on
the sortedish dimensio
alamb commented on issue #15529:
URL: https://github.com/apache/datafusion/issues/15529#issuecomment-2770458720
This may be some overlap with this work from @adriangb (though I realize you
are talking about a different optimization)
- https://github.com/apache/datafusion/issues/15037
geoffreyclaude commented on issue #15529:
URL: https://github.com/apache/datafusion/issues/15529#issuecomment-2769593513
I ran some quick [experiments on my
fork](https://github.com/geoffreyclaude/datafusion/pull/3) by checking for
early termination after each batch processed in the "topK"
geoffreyclaude opened a new issue, #15529:
URL: https://github.com/apache/datafusion/issues/15529
### Is your feature request related to a problem or challenge?
DataFusion currently has a "TopK early termination" optimization, which
speeds up queries that involve `ORDER BY` and `LIMIT
10 matches
Mail list logo