I created a new issue [1] to track the refactoring. Could you clarify the 
request (here or in the issue)?

My understanding is that the Skyhook file format code [2] should be refactored 
to use a higher-level interface rather than using dataset::FileFormat and 
dataset::FragmentScanOptions directly [3].

I am assuming the reference to Acero and Substrait to be only for context and 
not necessarily a preferred direction. If that is the preferred direction, 
there is something much more general in progress that we can perhaps specialize 
as a replacement for the Skyhook file format, but I'm not sure that's what's 
actually being requested.

Thank you!


[1]: https://github.com/apache/arrow/issues/40583
[2]: https://github.com/apache/arrow/tree/main/cpp/src/skyhook
[3]: 
https://github.com/apache/arrow/blob/main/cpp/src/skyhook/cls/cls_skyhook.cc#L153-L156



# ------------------------------

# Aldrin


https://github.com/drin/

https://gitlab.com/octalene

https://keybase.io/octalene


On Thursday, March 14th, 2024 at 09:10, Jayjeet Chakraborty 
<jayjeetchakrabort...@gmail.com> wrote:

> Hi Ben, I am willing to help out with the refactor too !
> 

> On Wed, Mar 13, 2024 at 9:25 PM Aldrin octalene....@pm.me.invalid wrote:
> 

> > I am interested in helping to refactor!
> > 

> > -Aldrin
> > 

> > On Wed, Mar 13, 2024 at 08:54, Benjamin Kietzman <bengil...@gmail.com
> > <On+Wed,+Mar+13,+2024+at+08:54,+Benjamin+Kietzman+%3C%3Ca+href=>> wrote:
> > 

> > Skyhook [1] enables efficient predicate and projection pushdown from
> > Arrow Dataset to a Ceph storage cluster. This is very cool
> > functionality, but it's tightly coupled to the Arrow C++ Dataset
> > implementation in a way which blocks refactoring. In the Arrow C++
> > codebase today, Acero is designed specifically to handle projection
> > and filtration in a more modular fashion, and to accept configuration
> > from standardized plan/expression formats like Substrait. In light of
> > improvements to Dataset which are not possible while maintaining
> > Skyhook in its current form, we need volunteers to update Skyhook.
> > Please reply to let us know if you are actively using Skyhook or if
> > you are interested in helping to refactor Skyhook.
> > 

> > Sincerely,
> > Ben Kietzman
> > 

> > [1]
> > 

> > https://arrow.apache.org/blog/2022/01/31/skyhook-bringing-computation-to-storage-with-apache-arrow/
> 

> 

> --
> Jayjeet Chakraborty
> CS PhD student
> UC Santa Cruz
> California, USA

Attachment: publickey - octalene.dev@pm.me - 0x21969656.asc
Description: application/pgp-keys

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to