I’m also interested in the Parquet/Arrow integration and may help there. This is however a relative large feature and I’m not sure if it can be done in 0.13.
Another area I’d like to work in is high level Parquet writer support. This issue has been discussed several times in the past. People should not need to specify definition & repetition levels in order to write data in Parquet format. Chao On Wed, Feb 13, 2019 at 10:24 AM paddy horan <paddyho...@hotmail.com> wrote: > Hi All, > > The focus for me for 0.13.0 is SIMD. I would like to port all the "ops" > in "array_ops" to the new "compute" module and leverage SIMD for them all. > I have most of this done in various forks. > > Past 0.13.0 I would really like to work toward getting Rust running in the > integration tests. The thing I am most excited about regarding Arrow is > the concept of defining computational libraries in say Rust and being able > to use them from any implementation, pyarrow probably for me. This all > starts and ends with the integration tests. > > Also, Gandiva is fascinating I would love to have robust support for this > in Rust (via bindings)... > > Regards, > P > > > ________________________________ > From: Neville Dipale <nevilled...@gmail.com> > Sent: Tuesday, February 12, 2019 11:33 AM > To: dev@arrow.apache.org > Subject: Re: [Rust] Rust 0.13.0 release > > Thanks for bringing this up Andy. > > I'm unemployed/on recovery leave, so I've had some surplus time to work on > Rust. > > There's a lot of features that I've wanted to work on, some which I've > spent some time attempting, but struggled with. A few block additional work > that I could contribute. > > In 0.13.0 and the release thereafter: I'd like to see: > > Date/time support. I've spent a lot of time trying to implement this, but I > get the feeling that my Rust isn't good enough yet to pull this together. > > More IO support. > I'm working on JSON reader, and want to work on JSON and CSV (continuing > where you left off) writers after this. > With date/time support, I can also work on date/time parsing so we can have > these in CSV and JSON. > Parquet support isn't on my radar at the moment. JSON and CSV are more > commonly used, so I'm hoping that with concrete support for these, more > people using Rust can choose to integrate Arrow. That could bring us more > hands to help. > > Array slicing (https://issues.apache.org/jira/browse/ARROW-3954). I tried > working on it but failed. Related to this would be array chunking. > I need these in order to be able to operate on "Tables" like CPP, Python > and others. I've got ChunkedArray, Column and Table roughly implemented in > my fork, but without zero-copy slicing, I can't upstream them. > > I've made good progress on scalar and array operations. I have trig > functions, some string operators and other functions that one can run on a > Spark-esque dataframe. > These will fit in well with DataFusion's SQL operations, but from a > decision-perspective, I think it would help if we join heads and think > about the direction we want to take on compute. > > SIMD is great, and when Paddy's hashed out how it works, more of us will be > able to contribute SIMD compatible compute operators. > > Thanks, > Neville > > On Tue, 12 Feb 2019 at 18:12, Andy Grove <andygrov...@gmail.com> wrote: > > > I was curious what our Rust committers and contributors are excited about > > for 0.13.0. > > > > The feature I would most like to see is that ability for DataFusion to > run > > SQL against Parquet files again, as that would give me an excuse for a > PoC > > in my day job using Arrow. > > > > I know there were some efforts underway to build arrow array readers for > > Parquet and it would make sense for me to help there. > > > > I would also like to start building out some benchmarks. > > > > I think the SIMD work is exciting too. > > > > I'd like to hear thoughts from everyone else though since we're all > coming > > at this from different perspectives. > > > > Thanks, > > > > Andy. > > >