Re: [Rust] Rust 0.13.0 release

paddy horan Tue, 12 Feb 2019 18:25:09 -0800

Hi All,

The focus for me for 0.13.0 is SIMD.  I would like to port all the "ops" in 
"array_ops" to the new "compute" module and leverage SIMD for them all.  I have 
most of this done in various forks.

Past 0.13.0 I would really like to work toward getting Rust running in the 
integration tests.  The thing I am most excited about regarding Arrow is the 
concept of defining computational libraries in say Rust and being able to use 
them from any implementation, pyarrow probably for me.  This all starts and 
ends with the integration tests.

Also, Gandiva is fascinating I would love to have robust support for this in 
Rust (via bindings)...

Regards,
P

________________________________
From: Neville Dipale <nevilled...@gmail.com>
Sent: Tuesday, February 12, 2019 11:33 AM
To: dev@arrow.apache.org
Subject: Re: [Rust] Rust 0.13.0 release

Thanks for bringing this up Andy.

I'm unemployed/on recovery leave, so I've had some surplus time to work on
Rust.

There's a lot of features that I've wanted to work on, some which I've
spent some time attempting, but struggled with. A few block additional work
that I could contribute.

In 0.13.0 and the release thereafter: I'd like to see:

Date/time support. I've spent a lot of time trying to implement this, but I
get the feeling that my Rust isn't good enough yet to pull this together.

More IO support.
I'm working on JSON reader, and want to work on JSON and CSV (continuing
where you left off) writers after this.
With date/time support, I can also work on date/time parsing so we can have
these in CSV and JSON.
Parquet support isn't on my radar at the moment. JSON and CSV are more
commonly used, so I'm hoping that with concrete support for these, more
people using Rust can choose to integrate Arrow. That could bring us more
hands to help.

Array slicing (https://issues.apache.org/jira/browse/ARROW-3954). I tried
working on it but failed. Related to this would be array chunking.
I need these in order to be able to operate on "Tables" like CPP, Python
and others. I've got ChunkedArray, Column and Table roughly implemented in
my fork, but without zero-copy slicing, I can't upstream them.

I've made good progress on scalar and array operations. I have trig
functions, some string operators and other functions that one can run on a
Spark-esque dataframe.
These will fit in well with DataFusion's SQL operations, but from a
decision-perspective, I think it would help if we join heads and think
about the direction we want to take on compute.

SIMD is great, and when Paddy's hashed out how it works, more of us will be
able to contribute SIMD compatible compute operators.

Thanks,
Neville

On Tue, 12 Feb 2019 at 18:12, Andy Grove <andygrov...@gmail.com> wrote:

> I was curious what our Rust committers and contributors are excited about
> for 0.13.0.
>
> The feature I would most like to see is that ability for DataFusion to run
> SQL against Parquet files again, as that would give me an excuse for a PoC
> in my day job using Arrow.
>
> I know there were some efforts underway to build arrow array readers for
> Parquet and it would make sense for me to help there.
>
> I would also like to start building out some benchmarks.
>
> I think the SIMD work is exciting too.
>
> I'd like to hear thoughts from everyone else though since we're all coming
> at this from different perspectives.
>
> Thanks,
>
> Andy.
>

Re: [Rust] Rust 0.13.0 release

Reply via email to