[jira] [Created] (ARROW-2866) [Plasma] TensorFlow op: Investiate outputting multiple output Tensors for the reading op

2018-07-16 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-2866: - Summary: [Plasma] TensorFlow op: Investiate outputting multiple output Tensors for the reading op Key: ARROW-2866 URL: https://issues.apache.org/jira/browse/ARROW-2866

Re: Re: Passing Arrow object across language

2018-07-16 Thread 周宇睿(闻拙)
Hi Wes: Thank you for the response. Yes the examples you provided are very helpful. But I still have a question regarding memory management. Let’s say passed memory addresses from c++ to JVM and constructed the data structure in Java. Since this is an off heap memory, how could I make sure the

[jira] [Created] (ARROW-2865) [C++/Python] Reduce some duplicated code in python/builtin_convert.cc

2018-07-16 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-2865: --- Summary: [C++/Python] Reduce some duplicated code in python/builtin_convert.cc Key: ARROW-2865 URL: https://issues.apache.org/jira/browse/ARROW-2865 Project: Apache Arr

[jira] [Created] (ARROW-2863) [Python] Add context manager APIs to RecordBatch*Writer/Reader classes

2018-07-16 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-2863: --- Summary: [Python] Add context manager APIs to RecordBatch*Writer/Reader classes Key: ARROW-2863 URL: https://issues.apache.org/jira/browse/ARROW-2863 Project: Apache Ar

[jira] [Created] (ARROW-2864) Add deletion cache to delete objects later

2018-07-16 Thread Yuhong Guo (JIRA)
Yuhong Guo created ARROW-2864: - Summary: Add deletion cache to delete objects later Key: ARROW-2864 URL: https://issues.apache.org/jira/browse/ARROW-2864 Project: Apache Arrow Issue Type: Improve

[jira] [Created] (ARROW-2862) [C++] `wget -c` doesn't work when using thirdparty/download_thirdparty.sh for the first time

2018-07-16 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-2862: --- Summary: [C++] `wget -c` doesn't work when using thirdparty/download_thirdparty.sh for the first time Key: ARROW-2862 URL: https://issues.apache.org/jira/browse/ARROW-2862

[jira] [Created] (ARROW-2861) [Python] Add extra tips about using Parquet to store index-less pandas data

2018-07-16 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-2861: --- Summary: [Python] Add extra tips about using Parquet to store index-less pandas data Key: ARROW-2861 URL: https://issues.apache.org/jira/browse/ARROW-2861 Project: Apac

Re: Pyarrow Plasma client.release() fault

2018-07-16 Thread Wes McKinney
Seems like we might want to write down some best practices for this level of large scale usage, essentially a supercomputer-like rig. I wouldn't even know where to come by a machine with a machine with > 2TB memory for scalability / concurrency load testing On Mon, Jul 16, 2018 at 2:59 PM, Robert

[jira] [Created] (ARROW-2860) Null values in a single partition of dataset, results in invalid schema on read

2018-07-16 Thread Sam Oluwalana (JIRA)
Sam Oluwalana created ARROW-2860: Summary: Null values in a single partition of dataset, results in invalid schema on read Key: ARROW-2860 URL: https://issues.apache.org/jira/browse/ARROW-2860 Project

[jira] [Created] (ARROW-2859) [Python] Handle objects exporting the buffer protocol in open_stream, open_file, and RecordBatch*Reader APIs

2018-07-16 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-2859: --- Summary: [Python] Handle objects exporting the buffer protocol in open_stream, open_file, and RecordBatch*Reader APIs Key: ARROW-2859 URL: https://issues.apache.org/jira/browse/ARRO

[jira] [Created] (ARROW-2858) Add unit tests for crossbow

2018-07-16 Thread Phillip Cloud (JIRA)
Phillip Cloud created ARROW-2858: Summary: Add unit tests for crossbow Key: ARROW-2858 URL: https://issues.apache.org/jira/browse/ARROW-2858 Project: Apache Arrow Issue Type: Task

Arrow meetup in Hyderabad July 24

2018-07-16 Thread Kelly Stirman
We're organizing a meetup in Hyderabad next week. Would anyone like to give a talk? Apologies, I know it's a long shot due to location and short notice (some of our Mountain View team will be visiting our team there who is working on Gandiva). https://www.meetup.com/Apache-Arrow-Meetup/events/2527

Re: Pyarrow Plasma client.release() fault

2018-07-16 Thread Robert Nishihara
Are you using the same plasma client from all of the different threads? If so, that could cause race conditions as the client is not thread safe. Alternatively, if you have a separate plasma client for each thread, then you may be running out of file descriptors somewhere (either the client proces

Re: pyarrow read/write schema as json?

2018-07-16 Thread Wes McKinney
hi Patrick, The JSON representation of schemas weren't intended as public APIs. Can you use the pyarrow Schema directly? I'm not sure I would advise using the JSON for building any kind of production software. Although, I'm not opposed to exposing this functionality in Python with the clear cavea

[jira] [Created] (ARROW-2857) [Python] Expose integration test JSON read/write in Python API

2018-07-16 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-2857: --- Summary: [Python] Expose integration test JSON read/write in Python API Key: ARROW-2857 URL: https://issues.apache.org/jira/browse/ARROW-2857 Project: Apache Arrow

Re: Passing Arrow object across language

2018-07-16 Thread Wes McKinney
I discussed some of these things at a high level in my talk at SciPy 2018 last week https://www.slideshare.net/wesm/apache-arrow-crosslanguage-development-platform-for-inmemory-data-105427919 On Mon, Jul 16, 2018 at 2:08 PM, Wes McKinney wrote: > hi Yurui, > > You can also share data structures

Re: Passing Arrow object across language

2018-07-16 Thread Wes McKinney
hi Yurui, You can also share data structures through JNI without using the IPC tools at all, which could require memory copying to produce the IPC messages. What you can do is obtain the memory addresses for the component buffers of an array (or vector, as called in Java) and construct the data s

Re: Passing Arrow object across language

2018-07-16 Thread Philipp Moritz
Hey Yuri, you can use the Arrow IPC mechanism to do this: - https://github.com/apache/arrow/blob/master/format/IPC.md - Python: https://arrow.apache.org/docs/python/ipc.html - C++: https://arrow.apache.org/docs/cpp/namespacearrow_1_1ipc.html - For Java, see the org.apache.arrow.vector.ipc namespa

Re: Proposed Java ArrowStreamReader/MessageReader API Changes

2018-07-16 Thread Bryan Cutler
Thanks for the comments Li. For your concerns about memory ownership, I don't think anything is really changed here, but we can discuss further in the PR. I'm not sure I quite understand your concern when you say "complexity of maintaining both style APIs"? The proposed changes are for 1 coheren

[jira] [Created] (ARROW-2856) [Python/C++] Array constructor should not truncate floats when casting to int

2018-07-16 Thread Florian Jetter (JIRA)
Florian Jetter created ARROW-2856: - Summary: [Python/C++] Array constructor should not truncate floats when casting to int Key: ARROW-2856 URL: https://issues.apache.org/jira/browse/ARROW-2856 Project

Passing Arrow object across language

2018-07-16 Thread 周宇睿(闻拙)
Hi guys: I might miss something quite obviously. But how does Arrow passing objects across language? Let’s say I have a java program that invoke a c++ function via JNI, how does the c++ function pass an Arrow RecordBack object back to Java runtime without memory copy? Any advise would be appre