Krisztián,

This is great research. I totally agree on using a Vec abstraction and
using traits over enums.

I know you have some working code already (albeit mostly just API) and I
would suggest you create a PR to get that submitted as a starting point for
us all to start contributing.

I'm excited to start contributing to this and using DataFusion as a use
case to drive requirements.

Thanks,

Andy.





On Fri, Mar 23, 2018 at 1:09 PM, Krisztián Szűcs <szucs.kriszt...@gmail.com>
wrote:

>
> Hey!
> I've done a little research about implementing arrow in rust and I'd like
> to share
> my thoughts. Please Andy correct me if I'm wrong, still hiking rust's
> learning curve.
>
> My first plan was to re-implement iron-arrow and mirror the cpp api as
> close as
> possible, but realized that rust can provide better ergonomics, somewhere
> between
>
> cpp and python. Also cargo makes it possible to reuse other libraries more
> easily.
>
> A couple of my findings:
> We should provide a Vec like API for arrow::Array, a high quality example
> is
> servo/smallvec (https://link.getmailspring.com/link/CAHM19a4xUF0fa_
> XPkZXyH2Ox_=YXcdGW47_dVeB4xcEGXAWD2w@mail.gmail.
> com/0?redirect=https%3A%2F%2Fgithub.com%2Fservo%2Frust-
> smallvec%2Fblob%2Fmaster%2Flib.rs%23L80&recipient=
> ZGV2QGFycm93LmFwYWNoZS5vcmc%3D)
>
> freeze method which turns a mutable array (ArrayBuilder in cpp's notation)
> into an immutable one: ArrayMut.freeze() -> Array. Idea taken from
> bytes (https://link.getmailspring.com/link/CAHM19a4xUF0fa_
> XPkZXyH2Ox_=YXcdGW47_dVeB4xcEGXAWD2w@mail.gmail.
> com/1?redirect=https%3A%2F%2Fcarllerche.github.io%
> 2Fbytes%2Fbytes%2Fstruct.BytesMut.html%23examples&recipient=
> ZGV2QGFycm93LmFwYWNoZS5vcmc%3D) crate. Bytes crate would be great for
> using as a buffer, but sadly doesn't
> support custom memory layouts.
>
> Use the nightly allocator_api and raw_vec instead of a handcrafted one.
> The only disadvantage is it's not stabilized yet however it's on the
> roadmap,
> see language (https://link.getmailspring.com/link/CAHM19a4xUF0fa_
> XPkZXyH2Ox_=YXcdGW47_dVeB4xcEGXAWD2w@mail.gmail.
> com/2?redirect=https%3A%2F%2Fblog.rust-lang.org%2F2018%
> 2F03%2F12%2Froadmap.html&recipient=ZGV2QGFycm93LmFwYWNoZS5vcmc%3D)
> improvements. (https://link.getmailspring.com/link/CAHM19a4xUF0fa_
> XPkZXyH2Ox_=YXcdGW47_dVeB4xcEGXAWD2w@mail.gmail.
> com/3?redirect=https%3A%2F%2Fblog.rust-lang.org%2F2018%
> 2F03%2F12%2Froadmap.html&recipient=ZGV2QGFycm93LmFwYWNoZS5vcmc%3D) See
> the RFC (https://link.getmailspring.com/link/CAHM19a4xUF0fa_
> XPkZXyH2Ox_=YXcdGW47_dVeB4xcEGXAWD2w@mail.gmail.
> com/4?redirect=https%3A%2F%2Fgithub.com%2Frust-lang%
> 2Frfcs%2Fblob%2Fmaster%2Ftext%2F1398-kinds-of-allocators.md&recipient=
> ZGV2QGFycm93LmFwYWNoZS5vcmc%3D). Pros briefly:
> Pluggable allocators, like https://github.com/alexcrichton/jemallocator (
> https://link.getmailspring.com/link/CAHM19a4xUF0fa_XPkZXyH2Ox_=YXcdGW47_
> dveb4xcegxaw...@mail.gmail.com/5?redirect=https%3A%2F%
> 2Fgithub.com%2Falexcrichton%2Fjemallocator&recipient=
> ZGV2QGFycm93LmFwYWNoZS5vcmc%3D)
>
> Layout (https://link.getmailspring.com/link/CAHM19a4xUF0fa_
> XPkZXyH2Ox_=YXcdGW47_dVeB4xcEGXAWD2w@mail.gmail.
> com/6?redirect=https%3A%2F%2Fdoc.rust-lang.org%2Falloc%
> 2Fallocator%2Fstruct.Layout.html&recipient=ZGV2QGFycm93LmFwYWNoZS5vcmc%3D)
> abstraction
>
> Easy to start with Heap (https://link.getmailspring.
> com/link/CAHM19a4xUF0fa_XPkZXyH2Ox_=YXcdGW47_dVeB4xcEGXAWD2w@mail.gmail.
> com/7?redirect=%20https%3A%2F%2Fdoc.rust-lang.org%2Fstd%
> 2Fheap%2Fstruct.Heap.html%20&recipient=ZGV2QGFycm93LmFwYWNoZS5vcmc%3D)
> implementation
>
> A low level RawVec (https://link.getmailspring.com/link/CAHM19a4xUF0fa_
> XPkZXyH2Ox_=YXcdGW47_dVeB4xcEGXAWD2w@mail.gmail.
> com/8?redirect=https%3A%2F%2Fdoc.rust-lang.org%2Fnightly%
> 2Falloc%2Fraw_vec%2Fstruct.RawVec.html&recipient=
> ZGV2QGFycm93LmFwYWNoZS5vcmc%3D) which is pretty close to Arrow's buffer
>
> IMHO we should prefer trait based abstractions instead of enums, because
> that would provide more flexibility and extensibility (with associated
> types).
>
> If possible reuse bitvec implementations: bit-vec (
> https://link.getmailspring.com/link/CAHM19a4xUF0fa_XPkZXyH2Ox_=YXcdGW47_
> dveb4xcegxaw...@mail.gmail.com/9?redirect=https%3A%2F%
> 2Fgithub.com%2Fcontain-rs%2Fbit-vec&recipient=
> ZGV2QGFycm93LmFwYWNoZS5vcmc%3D) , bitvec (https://link.getmailspring.
> com/link/CAHM19a4xUF0fa_XPkZXyH2Ox_=YXcdGW47_dVeB4xcEGXAWD2w@mail.gmail.
> com/10?redirect=https%3A%2F%2Fgithub.com%2Fmarcianx%2Fbitvec-rs&recipient=
> ZGV2QGFycm93LmFwYWNoZS5vcmc%3D)
>
> I think we should specify the desired user facing API, then we might been
> able to plan
> the development. We can also have some help from Alex Crichton (
> https://link.getmailspring.com/link/CAHM19a4xUF0fa_XPkZXyH2Ox_=YXcdGW47_
> dveb4xcegxaw...@mail.gmail.com/11?redirect=https%3A%2F%
> 2Fgithub.com%2Falexcrichton&recipient=ZGV2QGFycm93LmFwYWNoZS5vcmc%3D),
> core rust-lang and
> ecosystem developer. He was really helpful, gave me a couple of hints
> already.
>
> What do You think?
> Krisztian
> On Fri, Mar 23, 2018 at 5:20 PM, Wes McKinney <wesmck...@gmail.com
> (mailto:wesmck...@gmail.com)> wrote:
> > Just "rust" would be fine for the top-level directory, I think.
> >
> > On Fri, Mar 23, 2018 at 12:09 PM, Andy Grove <andygrov...@gmail.com
> (mailto:andygrov...@gmail.com)> wrote:
> > > OK I would be happy with that. How should I get started? Should I just
> > > create a PR to add a `rust` or `rust-native` root level directory with
> some
> > > starting code? I could do that this weekend.
> > >
> > > Thanks,
> > >
> > > Andy.
> > >
> > > On Fri, Mar 23, 2018 at 10:04 AM, Wes McKinney <wesmck...@gmail.com
> (mailto:wesmck...@gmail.com)> wrote:
> > >
> > >> > Wes - if we continue developing an a separate repo for now to prove
> > >> commitment levels and get this further along does that actually make
> the IP
> > >> clearance procedure harder with more individual contributors involved?
> > >>
> > >> Yes, this will make things harder (since we will have to chase down
> > >> ICLA's from each contributor). If you are going to work on a native
> > >> implementation, I strongly recommend doing the work in the Apache
> > >> community. The code does not need to be API-stable nor
> > >> production-ready to go into the master branch.
> > >>
> > >> Thanks
> > >>
> > >> On Fri, Mar 23, 2018 at 11:51 AM, Andy Grove <andygrov...@gmail.com
> (mailto:andygrov...@gmail.com)>
> > >> wrote:
> > >> > I probably shouldn't have used the term binding. I am primarily
> > >> interested
> > >> > in a native Rust implementation but it should be possible to have
> traits
> > >> > defining the interface and two implementations - one native and one
> using
> > >> > FFI to call C. Rust has zero overhead when calling C code
> typically. I
> > >> need
> > >> > to know more about Arrow before I can say for sure.
> > >> >
> > >> > Wes - if we continue developing an a separate repo for now to prove
> > >> > commitment levels and get this further along does that actually
> make the
> > >> IP
> > >> > clearance procedure harder with more individual contributors
> involved?
> > >> >
> > >> > Thanks,
> > >> >
> > >> > Andy.
> > >> >
> > >> >
> > >> >
> > >> > On Fri, Mar 23, 2018 at 9:11 AM, Wes McKinney <wesmck...@gmail.com
> (mailto:wesmck...@gmail.com)>
> > >> wrote:
> > >> >
> > >> >> Not knowing the Rust ecosystem very well, I'm interested in the
> > >> >> pros/cons of building and maintaining Rust bindings vs. a native
> Rust
> > >> >> implementation, or some hybrid of the two. Seems like both bindings
> > >> >> and native implementation could be part of the same codebase
> > >> >> potentially.
> > >> >>
> > >> >> If we decide to import https://github.com/jihoonson/iron-arrow
> into
> > >> >> the Apache Arrow project, it will take 1-2 weeks to conduct the IP
> > >> >> clearance procedure as we recently did for the Go implementation.
> This
> > >> >> is a lot of legwork for the PMC, so I want to make sure before we
> do
> > >> >> this that it is worth it, and that there's a plan to continue
> actively
> > >> >> developing this code.
> > >> >>
> > >> >> Thanks
> > >> >> Wes
> > >> >>
> > >> >> On Fri, Mar 23, 2018 at 11:02 AM, Andy Grove <
> andygrov...@gmail.com (mailto:andygrov...@gmail.com)>
> > >> >> wrote:
> > >> >> > My personal view (and I think I've seen others state this already
> > >> here)
> > >> >> is
> > >> >> > that we should bring it into the repo sooner rather than later
> and
> > >> work
> > >> >> on
> > >> >> > it there. The version is 0.1.0 so I think that sets peoples
> > >> expectations
> > >> >> > about how complete it is.
> > >> >> >
> > >> >> > I think it is better for people to see it in the arrow repo being
> > >> >> actively
> > >> >> > developed. I'm very interested in getting compatibility unit
> tests
> > >> set up
> > >> >> > soon too so we can be sure it really is compatible with the other
> > >> >> > implementations.
> > >> >> >
> > >> >> > Andy.
> > >> >> >
> > >> >> > On Fri, Mar 23, 2018 at 8:44 AM, paddy horan <
> paddyho...@hotmail.com (mailto:paddyho...@hotmail.com)>
> > >> >> wrote:
> > >> >> >
> > >> >> >> Hi Andy,
> > >> >> >>
> > >> >> >> I’m looking to get involved in contributing to the Rust
> > >> implementation
> > >> >> >> also, would love to see it in the arrow repo sooner rather than
> > >> later.
> > >> >> >>
> > >> >> >> Should we identify what needs to be added to iron-Arrow before
> it’s
> > >> >> ready
> > >> >> >> to be donated to the Apache repo?
> > >> >> >>
> > >> >> >>
> > >> >> >> Thanks,
> > >> >> >> Paddy
> > >> >> >>
> > >> >> >> Get Outlook for iOS<https://aka.ms/o0ukef>
> > >> >> >> _____________________________
> > >> >> >> From: Andy Grove <andygrov...@gmail.com (mailto:
> andygrov...@gmail.com)>
> > >> >> >> Sent: Friday, March 23, 2018 9:08 AM
> > >> >> >> Subject: Rust bindings
> > >> >> >> To: <dev@arrow.apache.org (mailto:dev@arrow.apache.org)>
> > >> >> >>
> > >> >> >>
> > >> >> >> Hi,
> > >> >> >>
> > >> >> >> Congratulations on the release of the Go bindings for Arrow. I
> think
> > >> >> Rust
> > >> >> >> should be next ;-)
> > >> >> >>
> > >> >> >> I've been a bit distracted getting a release out in the day job
> but
> > >> am
> > >> >> now
> > >> >> >> working on iron-arrow and getting it ready to integrate with my
> > >> >> project. I
> > >> >> >> hope to be able to put some time in this weekend on this. I
> don't
> > >> think
> > >> >> it
> > >> >> >> will be very hard to get to a point where I am at least using
> the
> > >> Array
> > >> >> >> type.
> > >> >> >>
> > >> >> >> I can commit to working on the Rust bindings moving forward
> (weekends
> > >> >> >> mostly) so I think we should go ahead and do this under the
> arrow
> > >> repo
> > >> >> if
> > >> >> >> everyone is in agreement.
> > >> >> >>
> > >> >> >> Thanks,
> > >> >> >>
> > >> >> >> Andy,
> > >> >> >>
> > >> >> >>
> > >> >> >>
> > >> >>
> > >>
> >
> >
>
>

Reply via email to