Hey!
I've done a little research about implementing arrow in rust and I'd like to 
share
my thoughts. Please Andy correct me if I'm wrong, still hiking rust's learning 
curve.

My first plan was to re-implement iron-arrow and mirror the cpp api as close as
possible, but realized that rust can provide better ergonomics, somewhere 
between

cpp and python. Also cargo makes it possible to reuse other libraries more 
easily.

A couple of my findings:
We should provide a Vec like API for arrow::Array, a high quality example is
servo/smallvec 
(https://link.getmailspring.com/link/CAHM19a4xUF0fa_XPkZXyH2Ox_=yxcdgw47_dveb4xcegxaw...@mail.gmail.com/0?redirect=https%3A%2F%2Fgithub.com%2Fservo%2Frust-smallvec%2Fblob%2Fmaster%2Flib.rs%23L80&recipient=ZGV2QGFycm93LmFwYWNoZS5vcmc%3D)

freeze method which turns a mutable array (ArrayBuilder in cpp's notation)
into an immutable one: ArrayMut.freeze() -> Array. Idea taken from
bytes 
(https://link.getmailspring.com/link/CAHM19a4xUF0fa_XPkZXyH2Ox_=yxcdgw47_dveb4xcegxaw...@mail.gmail.com/1?redirect=https%3A%2F%2Fcarllerche.github.io%2Fbytes%2Fbytes%2Fstruct.BytesMut.html%23examples&recipient=ZGV2QGFycm93LmFwYWNoZS5vcmc%3D)
 crate. Bytes crate would be great for using as a buffer, but sadly doesn't
support custom memory layouts.

Use the nightly allocator_api and raw_vec instead of a handcrafted one.
The only disadvantage is it's not stabilized yet however it's on the roadmap,
see language 
(https://link.getmailspring.com/link/CAHM19a4xUF0fa_XPkZXyH2Ox_=yxcdgw47_dveb4xcegxaw...@mail.gmail.com/2?redirect=https%3A%2F%2Fblog.rust-lang.org%2F2018%2F03%2F12%2Froadmap.html&recipient=ZGV2QGFycm93LmFwYWNoZS5vcmc%3D)
 improvements. 
(https://link.getmailspring.com/link/CAHM19a4xUF0fa_XPkZXyH2Ox_=yxcdgw47_dveb4xcegxaw...@mail.gmail.com/3?redirect=https%3A%2F%2Fblog.rust-lang.org%2F2018%2F03%2F12%2Froadmap.html&recipient=ZGV2QGFycm93LmFwYWNoZS5vcmc%3D)
 See the RFC 
(https://link.getmailspring.com/link/CAHM19a4xUF0fa_XPkZXyH2Ox_=yxcdgw47_dveb4xcegxaw...@mail.gmail.com/4?redirect=https%3A%2F%2Fgithub.com%2Frust-lang%2Frfcs%2Fblob%2Fmaster%2Ftext%2F1398-kinds-of-allocators.md&recipient=ZGV2QGFycm93LmFwYWNoZS5vcmc%3D).
 Pros briefly:
Pluggable allocators, like https://github.com/alexcrichton/jemallocator 
(https://link.getmailspring.com/link/CAHM19a4xUF0fa_XPkZXyH2Ox_=yxcdgw47_dveb4xcegxaw...@mail.gmail.com/5?redirect=https%3A%2F%2Fgithub.com%2Falexcrichton%2Fjemallocator&recipient=ZGV2QGFycm93LmFwYWNoZS5vcmc%3D)

Layout 
(https://link.getmailspring.com/link/CAHM19a4xUF0fa_XPkZXyH2Ox_=yxcdgw47_dveb4xcegxaw...@mail.gmail.com/6?redirect=https%3A%2F%2Fdoc.rust-lang.org%2Falloc%2Fallocator%2Fstruct.Layout.html&recipient=ZGV2QGFycm93LmFwYWNoZS5vcmc%3D)
 abstraction

Easy to start with Heap 
(https://link.getmailspring.com/link/CAHM19a4xUF0fa_XPkZXyH2Ox_=yxcdgw47_dveb4xcegxaw...@mail.gmail.com/7?redirect=%20https%3A%2F%2Fdoc.rust-lang.org%2Fstd%2Fheap%2Fstruct.Heap.html%20&recipient=ZGV2QGFycm93LmFwYWNoZS5vcmc%3D)
 implementation

A low level RawVec 
(https://link.getmailspring.com/link/CAHM19a4xUF0fa_XPkZXyH2Ox_=yxcdgw47_dveb4xcegxaw...@mail.gmail.com/8?redirect=https%3A%2F%2Fdoc.rust-lang.org%2Fnightly%2Falloc%2Fraw_vec%2Fstruct.RawVec.html&recipient=ZGV2QGFycm93LmFwYWNoZS5vcmc%3D)
 which is pretty close to Arrow's buffer

IMHO we should prefer trait based abstractions instead of enums, because
that would provide more flexibility and extensibility (with associated types).

If possible reuse bitvec implementations: bit-vec 
(https://link.getmailspring.com/link/CAHM19a4xUF0fa_XPkZXyH2Ox_=yxcdgw47_dveb4xcegxaw...@mail.gmail.com/9?redirect=https%3A%2F%2Fgithub.com%2Fcontain-rs%2Fbit-vec&recipient=ZGV2QGFycm93LmFwYWNoZS5vcmc%3D)
 , bitvec 
(https://link.getmailspring.com/link/CAHM19a4xUF0fa_XPkZXyH2Ox_=yxcdgw47_dveb4xcegxaw...@mail.gmail.com/10?redirect=https%3A%2F%2Fgithub.com%2Fmarcianx%2Fbitvec-rs&recipient=ZGV2QGFycm93LmFwYWNoZS5vcmc%3D)

I think we should specify the desired user facing API, then we might been able 
to plan
the development. We can also have some help from Alex Crichton 
(https://link.getmailspring.com/link/CAHM19a4xUF0fa_XPkZXyH2Ox_=yxcdgw47_dveb4xcegxaw...@mail.gmail.com/11?redirect=https%3A%2F%2Fgithub.com%2Falexcrichton&recipient=ZGV2QGFycm93LmFwYWNoZS5vcmc%3D),
 core rust-lang and
ecosystem developer. He was really helpful, gave me a couple of hints already.

What do You think?
Krisztian
On Fri, Mar 23, 2018 at 5:20 PM, Wes McKinney <wesmck...@gmail.com 
(mailto:wesmck...@gmail.com)> wrote:
> Just "rust" would be fine for the top-level directory, I think.
>
> On Fri, Mar 23, 2018 at 12:09 PM, Andy Grove <andygrov...@gmail.com 
> (mailto:andygrov...@gmail.com)> wrote:
> > OK I would be happy with that. How should I get started? Should I just
> > create a PR to add a `rust` or `rust-native` root level directory with some
> > starting code? I could do that this weekend.
> >
> > Thanks,
> >
> > Andy.
> >
> > On Fri, Mar 23, 2018 at 10:04 AM, Wes McKinney <wesmck...@gmail.com 
> > (mailto:wesmck...@gmail.com)> wrote:
> >
> >> > Wes - if we continue developing an a separate repo for now to prove
> >> commitment levels and get this further along does that actually make the IP
> >> clearance procedure harder with more individual contributors involved?
> >>
> >> Yes, this will make things harder (since we will have to chase down
> >> ICLA's from each contributor). If you are going to work on a native
> >> implementation, I strongly recommend doing the work in the Apache
> >> community. The code does not need to be API-stable nor
> >> production-ready to go into the master branch.
> >>
> >> Thanks
> >>
> >> On Fri, Mar 23, 2018 at 11:51 AM, Andy Grove <andygrov...@gmail.com 
> >> (mailto:andygrov...@gmail.com)>
> >> wrote:
> >> > I probably shouldn't have used the term binding. I am primarily
> >> interested
> >> > in a native Rust implementation but it should be possible to have traits
> >> > defining the interface and two implementations - one native and one using
> >> > FFI to call C. Rust has zero overhead when calling C code typically. I
> >> need
> >> > to know more about Arrow before I can say for sure.
> >> >
> >> > Wes - if we continue developing an a separate repo for now to prove
> >> > commitment levels and get this further along does that actually make the
> >> IP
> >> > clearance procedure harder with more individual contributors involved?
> >> >
> >> > Thanks,
> >> >
> >> > Andy.
> >> >
> >> >
> >> >
> >> > On Fri, Mar 23, 2018 at 9:11 AM, Wes McKinney <wesmck...@gmail.com 
> >> > (mailto:wesmck...@gmail.com)>
> >> wrote:
> >> >
> >> >> Not knowing the Rust ecosystem very well, I'm interested in the
> >> >> pros/cons of building and maintaining Rust bindings vs. a native Rust
> >> >> implementation, or some hybrid of the two. Seems like both bindings
> >> >> and native implementation could be part of the same codebase
> >> >> potentially.
> >> >>
> >> >> If we decide to import https://github.com/jihoonson/iron-arrow into
> >> >> the Apache Arrow project, it will take 1-2 weeks to conduct the IP
> >> >> clearance procedure as we recently did for the Go implementation. This
> >> >> is a lot of legwork for the PMC, so I want to make sure before we do
> >> >> this that it is worth it, and that there's a plan to continue actively
> >> >> developing this code.
> >> >>
> >> >> Thanks
> >> >> Wes
> >> >>
> >> >> On Fri, Mar 23, 2018 at 11:02 AM, Andy Grove <andygrov...@gmail.com 
> >> >> (mailto:andygrov...@gmail.com)>
> >> >> wrote:
> >> >> > My personal view (and I think I've seen others state this already
> >> here)
> >> >> is
> >> >> > that we should bring it into the repo sooner rather than later and
> >> work
> >> >> on
> >> >> > it there. The version is 0.1.0 so I think that sets peoples
> >> expectations
> >> >> > about how complete it is.
> >> >> >
> >> >> > I think it is better for people to see it in the arrow repo being
> >> >> actively
> >> >> > developed. I'm very interested in getting compatibility unit tests
> >> set up
> >> >> > soon too so we can be sure it really is compatible with the other
> >> >> > implementations.
> >> >> >
> >> >> > Andy.
> >> >> >
> >> >> > On Fri, Mar 23, 2018 at 8:44 AM, paddy horan <paddyho...@hotmail.com 
> >> >> > (mailto:paddyho...@hotmail.com)>
> >> >> wrote:
> >> >> >
> >> >> >> Hi Andy,
> >> >> >>
> >> >> >> I’m looking to get involved in contributing to the Rust
> >> implementation
> >> >> >> also, would love to see it in the arrow repo sooner rather than
> >> later.
> >> >> >>
> >> >> >> Should we identify what needs to be added to iron-Arrow before it’s
> >> >> ready
> >> >> >> to be donated to the Apache repo?
> >> >> >>
> >> >> >>
> >> >> >> Thanks,
> >> >> >> Paddy
> >> >> >>
> >> >> >> Get Outlook for iOS<https://aka.ms/o0ukef>
> >> >> >> _____________________________
> >> >> >> From: Andy Grove <andygrov...@gmail.com 
> >> >> >> (mailto:andygrov...@gmail.com)>
> >> >> >> Sent: Friday, March 23, 2018 9:08 AM
> >> >> >> Subject: Rust bindings
> >> >> >> To: <dev@arrow.apache.org (mailto:dev@arrow.apache.org)>
> >> >> >>
> >> >> >>
> >> >> >> Hi,
> >> >> >>
> >> >> >> Congratulations on the release of the Go bindings for Arrow. I think
> >> >> Rust
> >> >> >> should be next ;-)
> >> >> >>
> >> >> >> I've been a bit distracted getting a release out in the day job but
> >> am
> >> >> now
> >> >> >> working on iron-arrow and getting it ready to integrate with my
> >> >> project. I
> >> >> >> hope to be able to put some time in this weekend on this. I don't
> >> think
> >> >> it
> >> >> >> will be very hard to get to a point where I am at least using the
> >> Array
> >> >> >> type.
> >> >> >>
> >> >> >> I can commit to working on the Rust bindings moving forward (weekends
> >> >> >> mostly) so I think we should go ahead and do this under the arrow
> >> repo
> >> >> if
> >> >> >> everyone is in agreement.
> >> >> >>
> >> >> >> Thanks,
> >> >> >>
> >> >> >> Andy,
> >> >> >>
> >> >> >>
> >> >> >>
> >> >>
> >>
>
>

Reply via email to