Re: Load Spark dataframes in Arrow buffer using Scala (to be used by Gandiva)

2018-07-25 Thread Richard Siebeling
Hi, @Li, same as Jieun , I'd like to start with a single machine but can imagine that there are use cases for a distributed approach. @Wes, thanks, I'll look into it, Richard On Wed, 25 Jul 2018 at 03:59, Wes McKinney wrote: > hi Richard, > > I might start here in the Spark codebase to see how

Stuck in building Arrow C++

2018-07-25 Thread Xu,Wenjian
Hello, I want to build Arrow C++ from source. I follow the steps according to the instructions in: https://github.com/apache/arrow/tree/master/cpp But I was stuck for a few hours after entering the commands *make unittest* as follows: = $make u

arrow:io:S3ReadableFile

2018-07-25 Thread Renato Marroquín Mogrovejo
Hi Arrow experts, I am in the middle of implementing a S3ReadableFile class, and I am wondering if this is can be accomplished by using the hdfs client? or is it just that it isn't a feature that users needed so far? Any pointers/ideas are highly appreciated! Thanks! Renato M.

Re: [DISCUSS] Contribution of Gandiva to Apache Arrow

2018-07-25 Thread Uwe L. Korn
Having it in Arrow also will enable us to better promote it on the Python side with the pyarrow package. It will be a great addition but we will need to figure out then the problem of dealing with in-memory LLVM versions so that we don't conflict with packages like Numba. This is a problem on th

Re: arrow:io:S3ReadableFile

2018-07-25 Thread Wes McKinney
hey Renato, I would recommend following whatever TensorFlow has done. We can even reuse their code (Apache 2.0): https://github.com/tensorflow/tensorflow/tree/master/tensorflow/core/platform/s3 - Wes On Wed, Jul 25, 2018 at 6:49 AM, Renato Marroquín Mogrovejo wrote: > Hi Arrow experts, > > I a

Re: arrow:io:S3ReadableFile

2018-07-25 Thread Uwe L. Korn
Hello Renato, I don't think that the hdfs client will give you the necessary interface to use Hadoop's S3 implementation. If it does, this might be a simple way to support more filesystems just by using their Hadoop implementation. In general, it would be preferred to have native (C/C++) implem

Re: Rust tasks for 0.10.0?

2018-07-25 Thread Chao Sun
I'm working on ARROW-2583 (very slowly! see WIP here ) which will be a API-breaking change. It is fine to delay this after 0.10.0 release right? On Tue, Jul 24, 2018 at 8:16 PM, Andy Grove wrote: > That's fine. I'l

Notes from Today's Sync

2018-07-25 Thread Jacques Nadeau
*Attendees, Topics to discuss* Wes: 0.10 Release, Gandiva discussion Uwe: No additional topics Jacques: No additional topics Sidd: No additional topics Bryan: No additional topics *Release Discussion* - 9 issues in the backlog - Two issues around ARROW-2826 that are array builder cleanup i

[jira] [Created] (ARROW-2910) [Packaging] Build from official apache archive

2018-07-25 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-2910: -- Summary: [Packaging] Build from official apache archive Key: ARROW-2910 URL: https://issues.apache.org/jira/browse/ARROW-2910 Project: Apache Arrow Issue

Re: Stuck in building Arrow C++

2018-07-25 Thread Wes McKinney
hi Wenjian, Are you able to download https://github.com/google/brotli/archive/v0.6.0.tar.gz ? - Wes On Wed, Jul 25, 2018 at 5:39 AM, Xu,Wenjian wrote: > Hello, > > I want to build Arrow C++ from source. I follow the steps according to the > instructions in: > https://github.com/apache/arrow/t

Re: Stuck in building Arrow C++

2018-07-25 Thread Uwe L. Korn
Hello, the output is quite sparse. Can you run just `make VERBOSE=1` and have a look at the last lines of that. This should give you a better indication where it hangs. Uwe On Wed, Jul 25, 2018, at 11:39 AM, Xu,Wenjian wrote: > Hello, > > I want to build Arrow C++ from source. I follow the st

Arrow sync, 12p Eastern today

2018-07-25 Thread Wes McKinney
https://meet.google.com/vtm-teks-phx

[jira] [Created] (ARROW-2909) [JS] Add convenience function for creating a table from a list of vectors

2018-07-25 Thread Brian Hulette (JIRA)
Brian Hulette created ARROW-2909: Summary: [JS] Add convenience function for creating a table from a list of vectors Key: ARROW-2909 URL: https://issues.apache.org/jira/browse/ARROW-2909 Project: Apac

Sync today?

2018-07-25 Thread Li Jin
Hi All, Do we have sync today? (Notified by my cal) Li

[jira] [Created] (ARROW-2911) [Python] Parquet binary statistics that end in '\0' truncate last byte

2018-07-25 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-2911: -- Summary: [Python] Parquet binary statistics that end in '\0' truncate last byte Key: ARROW-2911 URL: https://issues.apache.org/jira/browse/ARROW-2911 Project: Apache Arro

[jira] [Created] (ARROW-2912) [Website] Build more detailed Community landing patch a la Apache Spark

2018-07-25 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-2912: --- Summary: [Website] Build more detailed Community landing patch a la Apache Spark Key: ARROW-2912 URL: https://issues.apache.org/jira/browse/ARROW-2912 Project: Apache A

[jira] [Created] (ARROW-2913) [Python] Exported buffers don't expose type information

2018-07-25 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2913: - Summary: [Python] Exported buffers don't expose type information Key: ARROW-2913 URL: https://issues.apache.org/jira/browse/ARROW-2913 Project: Apache Arrow

RC Cutting

2018-07-25 Thread Phillip Cloud
I'm think I'm still a bit confused about the order in which things need to happen to cut a release candidate. My understanding is that the ordering is: 1. create the source release 2. build packages from the source release (wheels, conda packages, etc) 3. commit source release + binary packages a

Re: RC Cutting

2018-07-25 Thread Wes McKinney
hi Phillip, I think you have it right. The basic idea is that the release artifacts being voted on must be placed in the SVN dist system with their checksums and signatures. The source release script already commits the artifacts and sigs/checksums to SVN [1] Any released code artifact (source or

Re: RC Cutting

2018-07-25 Thread Wes McKinney
Here is our keys file https://dist.apache.org/repos/dist/dev/arrow/KEYS Kou is the key-signing winner in our group it seems On Wed, Jul 25, 2018 at 3:32 PM, Wes McKinney wrote: > hi Phillip, > > I think you have it right. The basic idea is that the release > artifacts being voted on must be pla

Re: Load Spark dataframes in Arrow buffer using Scala (to be used by Gandiva)

2018-07-25 Thread Li Jin
Another pointer to look at: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L3369 This function Dataset.toArrowPayload here turns a Spark Dataset to a RDD[ArrowPayload], where ArrowPayload is basically deserialized bytes in Arrow file format.

[jira] [Created] (ARROW-2914) [Integration] Add WindowPandasUDFTests to Spark Integration

2018-07-25 Thread Bryan Cutler (JIRA)
Bryan Cutler created ARROW-2914: --- Summary: [Integration] Add WindowPandasUDFTests to Spark Integration Key: ARROW-2914 URL: https://issues.apache.org/jira/browse/ARROW-2914 Project: Apache Arrow

Re: [DISCUSS] Contribution of Gandiva to Apache Arrow

2018-07-25 Thread Philipp Moritz
+1 on merging it and also agreed with Uwe that we will need to deal with LLVM version conflicts. In addition it would be good to come up with a plan on how it can be useful for other DataFrame open source projects. Having end-to-end applications that let people profit from this code will help adopt

Re: [DISCUSS] Contribution of Gandiva to Apache Arrow

2018-07-25 Thread Li Jin
Although I am not familiar with LLVM details but I think at the high level such component fits in Arrow project. Hopefully I can learn more and help as well. +1 On Wed, Jul 25, 2018 at 8:45 PM, Philipp Moritz wrote: > +1 on merging it and also agreed with Uwe that we will need to deal with > LL

[jira] [Created] (ARROW-2915) [Packaging] Remove artifact form ubuntu-trusty build

2018-07-25 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-2915: -- Summary: [Packaging] Remove artifact form ubuntu-trusty build Key: ARROW-2915 URL: https://issues.apache.org/jira/browse/ARROW-2915 Project: Apache Arrow

[jira] [Created] (ARROW-2916) [C++] Plasma Seal is slow due to hashing

2018-07-25 Thread Simon Mo (JIRA)
Simon Mo created ARROW-2916: --- Summary: [C++] Plasma Seal is slow due to hashing Key: ARROW-2916 URL: https://issues.apache.org/jira/browse/ARROW-2916 Project: Apache Arrow Issue Type: New Feature

Re: Stuck in building Arrow C++

2018-07-25 Thread Xu,Wenjian
Hi Wes, I could download that package through the link. Thanks, Wenjian On Thu, Jul 26, 2018 at 12:55 AM Wes McKinney wrote: > hi Wenjian, > > Are you able to download > > https://github.com/google/brotli/archive/v0.6.0.tar.gz > > ? > > - Wes > > On Wed, Jul 25, 2018 at 5:39 AM, Xu,Wenjian wr

Re: Stuck in building Arrow C++

2018-07-25 Thread Xu,Wenjian
Hi Uwe, The result of *make VERBOSE=1* is as follows: == [ 29%] Built target zlib_ep make -f CMakeFiles/zstd_ep.dir/build.make CMakeFiles/zstd_ep.dir/depend make[2]: Entering directory `/home/felix.xwj/workspace/arrow/cpp/release

[jira] [Created] (ARROW-2917) Fix pytorch gradient errors on serialization

2018-07-25 Thread Alok Singh (JIRA)
Alok Singh created ARROW-2917: - Summary: Fix pytorch gradient errors on serialization Key: ARROW-2917 URL: https://issues.apache.org/jira/browse/ARROW-2917 Project: Apache Arrow Issue Type: Bug