Hi, Andy: Thanks for bringing this thread. I'm working on the arrow reader for parquet and expecting to make progress recently. BTW, what's the time line of 0.13.0?
Chao Sun <sunc...@apache.org> 于 2019年2月13日周三 上午10:34写道: > I’m also interested in the Parquet/Arrow integration and may help there. > This is however a relative large feature and I’m not sure if it can be done > in 0.13. > > Another area I’d like to work in is high level Parquet writer support. This > issue has been discussed several times in the past. People should not need > to specify definition & repetition levels in order to write data in Parquet > format. > > Chao > > > > On Wed, Feb 13, 2019 at 10:24 AM paddy horan <paddyho...@hotmail.com> > wrote: > > > Hi All, > > > > The focus for me for 0.13.0 is SIMD. I would like to port all the "ops" > > in "array_ops" to the new "compute" module and leverage SIMD for them > all. > > I have most of this done in various forks. > > > > Past 0.13.0 I would really like to work toward getting Rust running in > the > > integration tests. The thing I am most excited about regarding Arrow is > > the concept of defining computational libraries in say Rust and being > able > > to use them from any implementation, pyarrow probably for me. This all > > starts and ends with the integration tests. > > > > Also, Gandiva is fascinating I would love to have robust support for this > > in Rust (via bindings)... > > > > Regards, > > P > > > > > > ________________________________ > > From: Neville Dipale <nevilled...@gmail.com> > > Sent: Tuesday, February 12, 2019 11:33 AM > > To: dev@arrow.apache.org > > Subject: Re: [Rust] Rust 0.13.0 release > > > > Thanks for bringing this up Andy. > > > > I'm unemployed/on recovery leave, so I've had some surplus time to work > on > > Rust. > > > > There's a lot of features that I've wanted to work on, some which I've > > spent some time attempting, but struggled with. A few block additional > work > > that I could contribute. > > > > In 0.13.0 and the release thereafter: I'd like to see: > > > > Date/time support. I've spent a lot of time trying to implement this, > but I > > get the feeling that my Rust isn't good enough yet to pull this together. > > > > More IO support. > > I'm working on JSON reader, and want to work on JSON and CSV (continuing > > where you left off) writers after this. > > With date/time support, I can also work on date/time parsing so we can > have > > these in CSV and JSON. > > Parquet support isn't on my radar at the moment. JSON and CSV are more > > commonly used, so I'm hoping that with concrete support for these, more > > people using Rust can choose to integrate Arrow. That could bring us more > > hands to help. > > > > Array slicing (https://issues.apache.org/jira/browse/ARROW-3954). I > tried > > working on it but failed. Related to this would be array chunking. > > I need these in order to be able to operate on "Tables" like CPP, Python > > and others. I've got ChunkedArray, Column and Table roughly implemented > in > > my fork, but without zero-copy slicing, I can't upstream them. > > > > I've made good progress on scalar and array operations. I have trig > > functions, some string operators and other functions that one can run on > a > > Spark-esque dataframe. > > These will fit in well with DataFusion's SQL operations, but from a > > decision-perspective, I think it would help if we join heads and think > > about the direction we want to take on compute. > > > > SIMD is great, and when Paddy's hashed out how it works, more of us will > be > > able to contribute SIMD compatible compute operators. > > > > Thanks, > > Neville > > > > On Tue, 12 Feb 2019 at 18:12, Andy Grove <andygrov...@gmail.com> wrote: > > > > > I was curious what our Rust committers and contributors are excited > about > > > for 0.13.0. > > > > > > The feature I would most like to see is that ability for DataFusion to > > run > > > SQL against Parquet files again, as that would give me an excuse for a > > PoC > > > in my day job using Arrow. > > > > > > I know there were some efforts underway to build arrow array readers > for > > > Parquet and it would make sense for me to help there. > > > > > > I would also like to start building out some benchmarks. > > > > > > I think the SIMD work is exciting too. > > > > > > I'd like to hear thoughts from everyone else though since we're all > > coming > > > at this from different perspectives. > > > > > > Thanks, > > > > > > Andy. > > > > > >