Hi All, The focus for me for 0.13.0 is SIMD. I would like to port all the "ops" in "array_ops" to the new "compute" module and leverage SIMD for them all. I have most of this done in various forks.
Past 0.13.0 I would really like to work toward getting Rust running in the integration tests. The thing I am most excited about regarding Arrow is the concept of defining computational libraries in say Rust and being able to use them from any implementation, pyarrow probably for me. This all starts and ends with the integration tests. Also, Gandiva is fascinating I would love to have robust support for this in Rust (via bindings)... Regards, P ________________________________ From: Neville Dipale <nevilled...@gmail.com> Sent: Tuesday, February 12, 2019 11:33 AM To: dev@arrow.apache.org Subject: Re: [Rust] Rust 0.13.0 release Thanks for bringing this up Andy. I'm unemployed/on recovery leave, so I've had some surplus time to work on Rust. There's a lot of features that I've wanted to work on, some which I've spent some time attempting, but struggled with. A few block additional work that I could contribute. In 0.13.0 and the release thereafter: I'd like to see: Date/time support. I've spent a lot of time trying to implement this, but I get the feeling that my Rust isn't good enough yet to pull this together. More IO support. I'm working on JSON reader, and want to work on JSON and CSV (continuing where you left off) writers after this. With date/time support, I can also work on date/time parsing so we can have these in CSV and JSON. Parquet support isn't on my radar at the moment. JSON and CSV are more commonly used, so I'm hoping that with concrete support for these, more people using Rust can choose to integrate Arrow. That could bring us more hands to help. Array slicing (https://issues.apache.org/jira/browse/ARROW-3954). I tried working on it but failed. Related to this would be array chunking. I need these in order to be able to operate on "Tables" like CPP, Python and others. I've got ChunkedArray, Column and Table roughly implemented in my fork, but without zero-copy slicing, I can't upstream them. I've made good progress on scalar and array operations. I have trig functions, some string operators and other functions that one can run on a Spark-esque dataframe. These will fit in well with DataFusion's SQL operations, but from a decision-perspective, I think it would help if we join heads and think about the direction we want to take on compute. SIMD is great, and when Paddy's hashed out how it works, more of us will be able to contribute SIMD compatible compute operators. Thanks, Neville On Tue, 12 Feb 2019 at 18:12, Andy Grove <andygrov...@gmail.com> wrote: > I was curious what our Rust committers and contributors are excited about > for 0.13.0. > > The feature I would most like to see is that ability for DataFusion to run > SQL against Parquet files again, as that would give me an excuse for a PoC > in my day job using Arrow. > > I know there were some efforts underway to build arrow array readers for > Parquet and it would make sense for me to help there. > > I would also like to start building out some benchmarks. > > I think the SIMD work is exciting too. > > I'd like to hear thoughts from everyone else though since we're all coming > at this from different perspectives. > > Thanks, > > Andy. >