Krisztián, This is great research. I totally agree on using a Vec abstraction and using traits over enums.
I know you have some working code already (albeit mostly just API) and I would suggest you create a PR to get that submitted as a starting point for us all to start contributing. I'm excited to start contributing to this and using DataFusion as a use case to drive requirements. Thanks, Andy. On Fri, Mar 23, 2018 at 1:09 PM, Krisztián Szűcs <szucs.kriszt...@gmail.com> wrote: > > Hey! > I've done a little research about implementing arrow in rust and I'd like > to share > my thoughts. Please Andy correct me if I'm wrong, still hiking rust's > learning curve. > > My first plan was to re-implement iron-arrow and mirror the cpp api as > close as > possible, but realized that rust can provide better ergonomics, somewhere > between > > cpp and python. Also cargo makes it possible to reuse other libraries more > easily. > > A couple of my findings: > We should provide a Vec like API for arrow::Array, a high quality example > is > servo/smallvec (https://link.getmailspring.com/link/CAHM19a4xUF0fa_ > XPkZXyH2Ox_=YXcdGW47_dVeB4xcEGXAWD2w@mail.gmail. > com/0?redirect=https%3A%2F%2Fgithub.com%2Fservo%2Frust- > smallvec%2Fblob%2Fmaster%2Flib.rs%23L80&recipient= > ZGV2QGFycm93LmFwYWNoZS5vcmc%3D) > > freeze method which turns a mutable array (ArrayBuilder in cpp's notation) > into an immutable one: ArrayMut.freeze() -> Array. Idea taken from > bytes (https://link.getmailspring.com/link/CAHM19a4xUF0fa_ > XPkZXyH2Ox_=YXcdGW47_dVeB4xcEGXAWD2w@mail.gmail. > com/1?redirect=https%3A%2F%2Fcarllerche.github.io% > 2Fbytes%2Fbytes%2Fstruct.BytesMut.html%23examples&recipient= > ZGV2QGFycm93LmFwYWNoZS5vcmc%3D) crate. Bytes crate would be great for > using as a buffer, but sadly doesn't > support custom memory layouts. > > Use the nightly allocator_api and raw_vec instead of a handcrafted one. > The only disadvantage is it's not stabilized yet however it's on the > roadmap, > see language (https://link.getmailspring.com/link/CAHM19a4xUF0fa_ > XPkZXyH2Ox_=YXcdGW47_dVeB4xcEGXAWD2w@mail.gmail. > com/2?redirect=https%3A%2F%2Fblog.rust-lang.org%2F2018% > 2F03%2F12%2Froadmap.html&recipient=ZGV2QGFycm93LmFwYWNoZS5vcmc%3D) > improvements. (https://link.getmailspring.com/link/CAHM19a4xUF0fa_ > XPkZXyH2Ox_=YXcdGW47_dVeB4xcEGXAWD2w@mail.gmail. > com/3?redirect=https%3A%2F%2Fblog.rust-lang.org%2F2018% > 2F03%2F12%2Froadmap.html&recipient=ZGV2QGFycm93LmFwYWNoZS5vcmc%3D) See > the RFC (https://link.getmailspring.com/link/CAHM19a4xUF0fa_ > XPkZXyH2Ox_=YXcdGW47_dVeB4xcEGXAWD2w@mail.gmail. > com/4?redirect=https%3A%2F%2Fgithub.com%2Frust-lang% > 2Frfcs%2Fblob%2Fmaster%2Ftext%2F1398-kinds-of-allocators.md&recipient= > ZGV2QGFycm93LmFwYWNoZS5vcmc%3D). Pros briefly: > Pluggable allocators, like https://github.com/alexcrichton/jemallocator ( > https://link.getmailspring.com/link/CAHM19a4xUF0fa_XPkZXyH2Ox_=YXcdGW47_ > dveb4xcegxaw...@mail.gmail.com/5?redirect=https%3A%2F% > 2Fgithub.com%2Falexcrichton%2Fjemallocator&recipient= > ZGV2QGFycm93LmFwYWNoZS5vcmc%3D) > > Layout (https://link.getmailspring.com/link/CAHM19a4xUF0fa_ > XPkZXyH2Ox_=YXcdGW47_dVeB4xcEGXAWD2w@mail.gmail. > com/6?redirect=https%3A%2F%2Fdoc.rust-lang.org%2Falloc% > 2Fallocator%2Fstruct.Layout.html&recipient=ZGV2QGFycm93LmFwYWNoZS5vcmc%3D) > abstraction > > Easy to start with Heap (https://link.getmailspring. > com/link/CAHM19a4xUF0fa_XPkZXyH2Ox_=YXcdGW47_dVeB4xcEGXAWD2w@mail.gmail. > com/7?redirect=%20https%3A%2F%2Fdoc.rust-lang.org%2Fstd% > 2Fheap%2Fstruct.Heap.html%20&recipient=ZGV2QGFycm93LmFwYWNoZS5vcmc%3D) > implementation > > A low level RawVec (https://link.getmailspring.com/link/CAHM19a4xUF0fa_ > XPkZXyH2Ox_=YXcdGW47_dVeB4xcEGXAWD2w@mail.gmail. > com/8?redirect=https%3A%2F%2Fdoc.rust-lang.org%2Fnightly% > 2Falloc%2Fraw_vec%2Fstruct.RawVec.html&recipient= > ZGV2QGFycm93LmFwYWNoZS5vcmc%3D) which is pretty close to Arrow's buffer > > IMHO we should prefer trait based abstractions instead of enums, because > that would provide more flexibility and extensibility (with associated > types). > > If possible reuse bitvec implementations: bit-vec ( > https://link.getmailspring.com/link/CAHM19a4xUF0fa_XPkZXyH2Ox_=YXcdGW47_ > dveb4xcegxaw...@mail.gmail.com/9?redirect=https%3A%2F% > 2Fgithub.com%2Fcontain-rs%2Fbit-vec&recipient= > ZGV2QGFycm93LmFwYWNoZS5vcmc%3D) , bitvec (https://link.getmailspring. > com/link/CAHM19a4xUF0fa_XPkZXyH2Ox_=YXcdGW47_dVeB4xcEGXAWD2w@mail.gmail. > com/10?redirect=https%3A%2F%2Fgithub.com%2Fmarcianx%2Fbitvec-rs&recipient= > ZGV2QGFycm93LmFwYWNoZS5vcmc%3D) > > I think we should specify the desired user facing API, then we might been > able to plan > the development. We can also have some help from Alex Crichton ( > https://link.getmailspring.com/link/CAHM19a4xUF0fa_XPkZXyH2Ox_=YXcdGW47_ > dveb4xcegxaw...@mail.gmail.com/11?redirect=https%3A%2F% > 2Fgithub.com%2Falexcrichton&recipient=ZGV2QGFycm93LmFwYWNoZS5vcmc%3D), > core rust-lang and > ecosystem developer. He was really helpful, gave me a couple of hints > already. > > What do You think? > Krisztian > On Fri, Mar 23, 2018 at 5:20 PM, Wes McKinney <wesmck...@gmail.com > (mailto:wesmck...@gmail.com)> wrote: > > Just "rust" would be fine for the top-level directory, I think. > > > > On Fri, Mar 23, 2018 at 12:09 PM, Andy Grove <andygrov...@gmail.com > (mailto:andygrov...@gmail.com)> wrote: > > > OK I would be happy with that. How should I get started? Should I just > > > create a PR to add a `rust` or `rust-native` root level directory with > some > > > starting code? I could do that this weekend. > > > > > > Thanks, > > > > > > Andy. > > > > > > On Fri, Mar 23, 2018 at 10:04 AM, Wes McKinney <wesmck...@gmail.com > (mailto:wesmck...@gmail.com)> wrote: > > > > > >> > Wes - if we continue developing an a separate repo for now to prove > > >> commitment levels and get this further along does that actually make > the IP > > >> clearance procedure harder with more individual contributors involved? > > >> > > >> Yes, this will make things harder (since we will have to chase down > > >> ICLA's from each contributor). If you are going to work on a native > > >> implementation, I strongly recommend doing the work in the Apache > > >> community. The code does not need to be API-stable nor > > >> production-ready to go into the master branch. > > >> > > >> Thanks > > >> > > >> On Fri, Mar 23, 2018 at 11:51 AM, Andy Grove <andygrov...@gmail.com > (mailto:andygrov...@gmail.com)> > > >> wrote: > > >> > I probably shouldn't have used the term binding. I am primarily > > >> interested > > >> > in a native Rust implementation but it should be possible to have > traits > > >> > defining the interface and two implementations - one native and one > using > > >> > FFI to call C. Rust has zero overhead when calling C code > typically. I > > >> need > > >> > to know more about Arrow before I can say for sure. > > >> > > > >> > Wes - if we continue developing an a separate repo for now to prove > > >> > commitment levels and get this further along does that actually > make the > > >> IP > > >> > clearance procedure harder with more individual contributors > involved? > > >> > > > >> > Thanks, > > >> > > > >> > Andy. > > >> > > > >> > > > >> > > > >> > On Fri, Mar 23, 2018 at 9:11 AM, Wes McKinney <wesmck...@gmail.com > (mailto:wesmck...@gmail.com)> > > >> wrote: > > >> > > > >> >> Not knowing the Rust ecosystem very well, I'm interested in the > > >> >> pros/cons of building and maintaining Rust bindings vs. a native > Rust > > >> >> implementation, or some hybrid of the two. Seems like both bindings > > >> >> and native implementation could be part of the same codebase > > >> >> potentially. > > >> >> > > >> >> If we decide to import https://github.com/jihoonson/iron-arrow > into > > >> >> the Apache Arrow project, it will take 1-2 weeks to conduct the IP > > >> >> clearance procedure as we recently did for the Go implementation. > This > > >> >> is a lot of legwork for the PMC, so I want to make sure before we > do > > >> >> this that it is worth it, and that there's a plan to continue > actively > > >> >> developing this code. > > >> >> > > >> >> Thanks > > >> >> Wes > > >> >> > > >> >> On Fri, Mar 23, 2018 at 11:02 AM, Andy Grove < > andygrov...@gmail.com (mailto:andygrov...@gmail.com)> > > >> >> wrote: > > >> >> > My personal view (and I think I've seen others state this already > > >> here) > > >> >> is > > >> >> > that we should bring it into the repo sooner rather than later > and > > >> work > > >> >> on > > >> >> > it there. The version is 0.1.0 so I think that sets peoples > > >> expectations > > >> >> > about how complete it is. > > >> >> > > > >> >> > I think it is better for people to see it in the arrow repo being > > >> >> actively > > >> >> > developed. I'm very interested in getting compatibility unit > tests > > >> set up > > >> >> > soon too so we can be sure it really is compatible with the other > > >> >> > implementations. > > >> >> > > > >> >> > Andy. > > >> >> > > > >> >> > On Fri, Mar 23, 2018 at 8:44 AM, paddy horan < > paddyho...@hotmail.com (mailto:paddyho...@hotmail.com)> > > >> >> wrote: > > >> >> > > > >> >> >> Hi Andy, > > >> >> >> > > >> >> >> I’m looking to get involved in contributing to the Rust > > >> implementation > > >> >> >> also, would love to see it in the arrow repo sooner rather than > > >> later. > > >> >> >> > > >> >> >> Should we identify what needs to be added to iron-Arrow before > it’s > > >> >> ready > > >> >> >> to be donated to the Apache repo? > > >> >> >> > > >> >> >> > > >> >> >> Thanks, > > >> >> >> Paddy > > >> >> >> > > >> >> >> Get Outlook for iOS<https://aka.ms/o0ukef> > > >> >> >> _____________________________ > > >> >> >> From: Andy Grove <andygrov...@gmail.com (mailto: > andygrov...@gmail.com)> > > >> >> >> Sent: Friday, March 23, 2018 9:08 AM > > >> >> >> Subject: Rust bindings > > >> >> >> To: <dev@arrow.apache.org (mailto:dev@arrow.apache.org)> > > >> >> >> > > >> >> >> > > >> >> >> Hi, > > >> >> >> > > >> >> >> Congratulations on the release of the Go bindings for Arrow. I > think > > >> >> Rust > > >> >> >> should be next ;-) > > >> >> >> > > >> >> >> I've been a bit distracted getting a release out in the day job > but > > >> am > > >> >> now > > >> >> >> working on iron-arrow and getting it ready to integrate with my > > >> >> project. I > > >> >> >> hope to be able to put some time in this weekend on this. I > don't > > >> think > > >> >> it > > >> >> >> will be very hard to get to a point where I am at least using > the > > >> Array > > >> >> >> type. > > >> >> >> > > >> >> >> I can commit to working on the Rust bindings moving forward > (weekends > > >> >> >> mostly) so I think we should go ahead and do this under the > arrow > > >> repo > > >> >> if > > >> >> >> everyone is in agreement. > > >> >> >> > > >> >> >> Thanks, > > >> >> >> > > >> >> >> Andy, > > >> >> >> > > >> >> >> > > >> >> >> > > >> >> > > >> > > > > > >