Load Spark dataframes in Arrow buffer using Scala (to be used by Gandiva)

2018-07-24 Thread Richard Siebeling
Hi, how can I load a Spark dataframe into Arrow using Scala? I've seen some older posts regarding this subject, but am hoping that there has been some development around this... I'd like to load a Spark dataframe into Arrow and then use the Gandiva project to do some analytics on that, thanks in

Re: [DISCUSS] Contribution of Gandiva to Apache Arrow

2018-07-24 Thread Antoine Pitrou
Le 23/07/2018 à 21:22, Wes McKinney a écrit : >> With regards to (2), there are two main questions. >> a) How do we make sure the Gandiva team (who are not Arrow committers) get >> ample support from other committers to make good continued progress. Since >> they will no longer be able to commit

[jira] [Created] (ARROW-2903) Setting -DARROW_HDFS=OFF breaks arrow build when linking against boost libraries

2018-07-24 Thread Pearu Peterson (JIRA)
Pearu Peterson created ARROW-2903: - Summary: Setting -DARROW_HDFS=OFF breaks arrow build when linking against boost libraries Key: ARROW-2903 URL: https://issues.apache.org/jira/browse/ARROW-2903 Proj

Re: Load Spark dataframes in Arrow buffer using Scala (to be used by Gandiva)

2018-07-24 Thread Li Jin
Hi, Do you want to collect a Spark DataFrame into Arrow format on a single machine or do you still want to keep the data distributed?

Re: [DISCUSS] Contribution of Gandiva to Apache Arrow

2018-07-24 Thread Uwe L. Korn
Small update: > * CentOS 6 and later (we use the RedHat devtoolset-2 compiler toolchain) We still build the Python wheels on CentOS 5. This should be no problem as llvmlite can also be build on that OS. It looks like there will be a new Linux Python wheel standard soon where we will upgrade to

Re: Working towards 0.10.0 release candidate

2018-07-24 Thread Wes McKinney
hi folks, Tuesday's update: it looks like we're going to be able to close out the C++ and Python dev work without any issues before the end of the week. I'm going to make some tweaks to the website per the JIRAs that are available but that won't take long. The two items where there's some uncerta

Re: Working towards 0.10.0 release candidate

2018-07-24 Thread Phillip Cloud
That sounds great to me. I'll make sure I'm able to build and sign artifacts this week and surface any issues I find along the way. Looking forward to a smooth release! On Tue, Jul 24, 2018 at 11:24 AM Wes McKinney wrote: > hi folks, > > Tuesday's update: it looks like we're going to be able to

Re: [DISCUSS] Contribution of Gandiva to Apache Arrow

2018-07-24 Thread Phillip Cloud
While I'm not a level 83 LLVM wizard like Antoine :) I have a small amount of experience with it and would also be happy to review/merge patches. Having Gandiva in arrow will simplify packaging and building the library, which IME has always been annoying with large cross-platform and cross-languag

[jira] [Created] (ARROW-2904) [C++] Use FirstTimeBitmapWriter instead of SetBit functions in builder.h/cc

2018-07-24 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-2904: --- Summary: [C++] Use FirstTimeBitmapWriter instead of SetBit functions in builder.h/cc Key: ARROW-2904 URL: https://issues.apache.org/jira/browse/ARROW-2904 Project: Apac

[jira] [Created] (ARROW-2905) [C++] Investigate if the *_data_ pointers used in Builder classes improve performance on hot paths

2018-07-24 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-2905: --- Summary: [C++] Investigate if the *_data_ pointers used in Builder classes improve performance on hot paths Key: ARROW-2905 URL: https://issues.apache.org/jira/browse/ARROW-2905

[jira] [Created] (ARROW-2906) [Website] Remove the link to slack channel

2018-07-24 Thread okkez (JIRA)
okkez created ARROW-2906: Summary: [Website] Remove the link to slack channel Key: ARROW-2906 URL: https://issues.apache.org/jira/browse/ARROW-2906 Project: Apache Arrow Issue Type: Improvement

Re: Load Spark dataframes in Arrow buffer using Scala (to be used by Gandiva)

2018-07-24 Thread Wes McKinney
hi Richard, I might start here in the Spark codebase to see how Spark SQL tables are converted to Arrow record batches: https://github.com/apache/spark/blob/d8aaa771e249b3f54b57ce24763e53fd65a0dbf7/sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala The code has be

[jira] [Created] (ARROW-2907) [GitHub] Improve "How to contribute patches"

2018-07-24 Thread okkez (JIRA)
okkez created ARROW-2907: Summary: [GitHub] Improve "How to contribute patches" Key: ARROW-2907 URL: https://issues.apache.org/jira/browse/ARROW-2907 Project: Apache Arrow Issue Type: Improvement

Re: Proof of concept work and conferences

2018-07-24 Thread Wes McKinney
hi Ryan, Welcome! To your questions: * The only thing we ask is that you follow ASF trademark policy when referring to the project: https://www.apache.org/foundation/marks/#books. If you are looking for help with phrasing or descriptions feel free to ask us here for help. * Feel free to ask them

Re: Map Type Metadata Representation

2018-07-24 Thread Wes McKinney
Thanks Bryan. Let's have a discussion about this and the other format-1.0 issues after 0.10.0 goes out On Wed, Jul 11, 2018 at 12:01 PM, Bryan Cutler wrote: > Thanks Wes, sure I will add a section in the wiki. > > On Tue, Jul 10, 2018 at 3:07 PM, Wes McKinney wrote: > >> hi Bryan, >> >> Thanks f

Rust tasks for 0.10.0?

2018-07-24 Thread Andy Grove
Hi, I'm wondering what we should do with the Rust implementation for the 0.10.0 release. I would like to have an official release pushed to crates.io for sure. Since the Rust implementation is so new there isn't much demand for new features yet so I think it is more important to focus on changes

Re: Rust tasks for 0.10.0?

2018-07-24 Thread Renjie Liu
+1 for pushing to crates.io. On Wed, Jul 25, 2018 at 10:52 AM Andy Grove wrote: > Hi, > > I'm wondering what we should do with the Rust implementation for the 0.10.0 > release. I would like to have an official release pushed to crates.io for > sure. > > Since the Rust implementation is so new th

Re: Rust tasks for 0.10.0?

2018-07-24 Thread Wes McKinney
Seems fine to cut a release to crates.io with the 0.10.0 release. Do you want to use the same version numbers? On Tue, Jul 24, 2018 at 11:00 PM, Renjie Liu wrote: > +1 for pushing to crates.io. > > On Wed, Jul 25, 2018 at 10:52 AM Andy Grove wrote: > >> Hi, >> >> I'm wondering what we should do

Re: Rust tasks for 0.10.0?

2018-07-24 Thread Andy Grove
Yes, I think that makes sense. So should I go ahead and do that whenever now or is there more involved? Also, I'm the only person that can release to crates.io right now so that is something we should fix. I can grant release privileges to others. On Tue, Jul 24, 2018 at 9:04 PM Wes McKinney wro

Re: Rust tasks for 0.10.0?

2018-07-24 Thread Wes McKinney
If you could wait until the 0.10.0 release vote has been conducted that would be best. Ideally the crates.io release should be performed using the source release tarball. If it cannot, then open a bug report so that it can be fixed for next time. On Tue, Jul 24, 2018 at 11:09 PM, Andy Grove wrote

Re: Rust tasks for 0.10.0?

2018-07-24 Thread Andy Grove
That's fine. I'll create a PR for updating the version number to 0.10.0 and I think we should be good to release from the source tarball. On Tue, Jul 24, 2018 at 9:14 PM Wes McKinney wrote: > If you could wait until the 0.10.0 release vote has been conducted > that would be best. Ideally the cra

[jira] [Created] (ARROW-2908) [Rust] Update version to 0.10.0

2018-07-24 Thread Andy Grove (JIRA)
Andy Grove created ARROW-2908: - Summary: [Rust] Update version to 0.10.0 Key: ARROW-2908 URL: https://issues.apache.org/jira/browse/ARROW-2908 Project: Apache Arrow Issue Type: Task Com