I created a new issue [1] to track the refactoring. Could you clarify the request (here or in the issue)?
My understanding is that the Skyhook file format code [2] should be refactored to use a higher-level interface rather than using dataset::FileFormat and dataset::FragmentScanOptions directly [3]. I am assuming the reference to Acero and Substrait to be only for context and not necessarily a preferred direction. If that is the preferred direction, there is something much more general in progress that we can perhaps specialize as a replacement for the Skyhook file format, but I'm not sure that's what's actually being requested. Thank you! [1]: https://github.com/apache/arrow/issues/40583 [2]: https://github.com/apache/arrow/tree/main/cpp/src/skyhook [3]: https://github.com/apache/arrow/blob/main/cpp/src/skyhook/cls/cls_skyhook.cc#L153-L156 # ------------------------------ # Aldrin https://github.com/drin/ https://gitlab.com/octalene https://keybase.io/octalene On Thursday, March 14th, 2024 at 09:10, Jayjeet Chakraborty <jayjeetchakrabort...@gmail.com> wrote: > Hi Ben, I am willing to help out with the refactor too ! > > On Wed, Mar 13, 2024 at 9:25 PM Aldrin octalene....@pm.me.invalid wrote: > > > I am interested in helping to refactor! > > > > -Aldrin > > > > On Wed, Mar 13, 2024 at 08:54, Benjamin Kietzman <bengil...@gmail.com > > <On+Wed,+Mar+13,+2024+at+08:54,+Benjamin+Kietzman+%3C%3Ca+href=>> wrote: > > > > Skyhook [1] enables efficient predicate and projection pushdown from > > Arrow Dataset to a Ceph storage cluster. This is very cool > > functionality, but it's tightly coupled to the Arrow C++ Dataset > > implementation in a way which blocks refactoring. In the Arrow C++ > > codebase today, Acero is designed specifically to handle projection > > and filtration in a more modular fashion, and to accept configuration > > from standardized plan/expression formats like Substrait. In light of > > improvements to Dataset which are not possible while maintaining > > Skyhook in its current form, we need volunteers to update Skyhook. > > Please reply to let us know if you are actively using Skyhook or if > > you are interested in helping to refactor Skyhook. > > > > Sincerely, > > Ben Kietzman > > > > [1] > > > > https://arrow.apache.org/blog/2022/01/31/skyhook-bringing-computation-to-storage-with-apache-arrow/ > > > -- > Jayjeet Chakraborty > CS PhD student > UC Santa Cruz > California, USA
publickey - octalene.dev@pm.me - 0x21969656.asc
Description: application/pgp-keys
signature.asc
Description: OpenPGP digital signature