Thanks for all the feedback. I forgot about one major thing I want to see in DataFusion in 0.13.0 and that is some basic query optimizations, such as projection push down. Without this, DataFusion is just not usable for most real world use cases, so I am going to focus on this for 0.13.0 and then Parquet support in 0.14.0.
Being able to run SQL queries efficiently against CSV files (or against data already loaded into arrays) is enough for me to be able to start using Arrow/DataFusion for some real use cases at work. Thanks, Andy. On Thu, Feb 14, 2019 at 7:12 AM Renjie Liu <liurenjie2...@gmail.com> wrote: > Then I'm expecting to finish it in 0.14 > > Wes McKinney <wesmck...@gmail.com> 于 2019年2月13日周三 下午11:08写道: > > > > BTW, what's the time line of 0.13.0? > > > > See > > > https://lists.apache.org/thread.html/7890bd7aebd2d2018fa68a78630280581a544346ce80e4002cd9e548@%3Cdev.arrow.apache.org%3E > > > > Since 0.12 was ~January 20 I think it would be good to release again > > by the end of March > > > > On Wed, Feb 13, 2019 at 7:29 AM Renjie Liu <liurenjie2...@gmail.com> > > wrote: > > > > > > Hi, Andy: > > > Thanks for bringing this thread. I'm working on the arrow reader for > > > parquet and expecting to make progress recently. BTW, what's the time > > line > > > of 0.13.0? > > > > > > Chao Sun <sunc...@apache.org> 于 2019年2月13日周三 上午10:34写道: > > > > > > > I’m also interested in the Parquet/Arrow integration and may help > > there. > > > > This is however a relative large feature and I’m not sure if it can > be > > done > > > > in 0.13. > > > > > > > > Another area I’d like to work in is high level Parquet writer > support. > > This > > > > issue has been discussed several times in the past. People should not > > need > > > > to specify definition & repetition levels in order to write data in > > Parquet > > > > format. > > > > > > > > Chao > > > > > > > > > > > > > > > > On Wed, Feb 13, 2019 at 10:24 AM paddy horan <paddyho...@hotmail.com > > > > > > wrote: > > > > > > > > > Hi All, > > > > > > > > > > The focus for me for 0.13.0 is SIMD. I would like to port all the > > "ops" > > > > > in "array_ops" to the new "compute" module and leverage SIMD for > them > > > > all. > > > > > I have most of this done in various forks. > > > > > > > > > > Past 0.13.0 I would really like to work toward getting Rust running > > in > > > > the > > > > > integration tests. The thing I am most excited about regarding > > Arrow is > > > > > the concept of defining computational libraries in say Rust and > being > > > > able > > > > > to use them from any implementation, pyarrow probably for me. This > > all > > > > > starts and ends with the integration tests. > > > > > > > > > > Also, Gandiva is fascinating I would love to have robust support > for > > this > > > > > in Rust (via bindings)... > > > > > > > > > > Regards, > > > > > P > > > > > > > > > > > > > > > ________________________________ > > > > > From: Neville Dipale <nevilled...@gmail.com> > > > > > Sent: Tuesday, February 12, 2019 11:33 AM > > > > > To: dev@arrow.apache.org > > > > > Subject: Re: [Rust] Rust 0.13.0 release > > > > > > > > > > Thanks for bringing this up Andy. > > > > > > > > > > I'm unemployed/on recovery leave, so I've had some surplus time to > > work > > > > on > > > > > Rust. > > > > > > > > > > There's a lot of features that I've wanted to work on, some which > > I've > > > > > spent some time attempting, but struggled with. A few block > > additional > > > > work > > > > > that I could contribute. > > > > > > > > > > In 0.13.0 and the release thereafter: I'd like to see: > > > > > > > > > > Date/time support. I've spent a lot of time trying to implement > this, > > > > but I > > > > > get the feeling that my Rust isn't good enough yet to pull this > > together. > > > > > > > > > > More IO support. > > > > > I'm working on JSON reader, and want to work on JSON and CSV > > (continuing > > > > > where you left off) writers after this. > > > > > With date/time support, I can also work on date/time parsing so we > > can > > > > have > > > > > these in CSV and JSON. > > > > > Parquet support isn't on my radar at the moment. JSON and CSV are > > more > > > > > commonly used, so I'm hoping that with concrete support for these, > > more > > > > > people using Rust can choose to integrate Arrow. That could bring > us > > more > > > > > hands to help. > > > > > > > > > > Array slicing (https://issues.apache.org/jira/browse/ARROW-3954). > I > > > > tried > > > > > working on it but failed. Related to this would be array chunking. > > > > > I need these in order to be able to operate on "Tables" like CPP, > > Python > > > > > and others. I've got ChunkedArray, Column and Table roughly > > implemented > > > > in > > > > > my fork, but without zero-copy slicing, I can't upstream them. > > > > > > > > > > I've made good progress on scalar and array operations. I have trig > > > > > functions, some string operators and other functions that one can > > run on > > > > a > > > > > Spark-esque dataframe. > > > > > These will fit in well with DataFusion's SQL operations, but from a > > > > > decision-perspective, I think it would help if we join heads and > > think > > > > > about the direction we want to take on compute. > > > > > > > > > > SIMD is great, and when Paddy's hashed out how it works, more of us > > will > > > > be > > > > > able to contribute SIMD compatible compute operators. > > > > > > > > > > Thanks, > > > > > Neville > > > > > > > > > > On Tue, 12 Feb 2019 at 18:12, Andy Grove <andygrov...@gmail.com> > > wrote: > > > > > > > > > > > I was curious what our Rust committers and contributors are > excited > > > > about > > > > > > for 0.13.0. > > > > > > > > > > > > The feature I would most like to see is that ability for > > DataFusion to > > > > > run > > > > > > SQL against Parquet files again, as that would give me an excuse > > for a > > > > > PoC > > > > > > in my day job using Arrow. > > > > > > > > > > > > I know there were some efforts underway to build arrow array > > readers > > > > for > > > > > > Parquet and it would make sense for me to help there. > > > > > > > > > > > > I would also like to start building out some benchmarks. > > > > > > > > > > > > I think the SIMD work is exciting too. > > > > > > > > > > > > I'd like to hear thoughts from everyone else though since we're > all > > > > > coming > > > > > > at this from different perspectives. > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Andy. > > > > > > > > > > > > > > > > > >