Re: Help with zero-copy conversion of pyarrow table to pandas dataframe.

2018-09-28 Thread Bipin Mathew
Good Evening Abdul, Wes, @Abdul, I agree I could probably use plasma, but I just wanted to get something up and running quickly for prototyping purposes. As @Wes mentioned, I will probably run into the same thing using plasma. I managed to get a little more debugging output. Here is the scrip

Re: Help with zero-copy conversion of pyarrow table to pandas dataframe.

2018-09-28 Thread Wes McKinney
hi Abdul -- Plasma vs. a memory map on /dev/shm should have the same semantics re: memory copying, so I don't believe using Plasma will change the outcome - Wes On Fri, Sep 28, 2018 at 5:38 PM Abdul Rahman wrote: > > Have you tried using plasma which is effectively what you are trying to do ? > >

Re: Help with zero-copy conversion of pyarrow table to pandas dataframe.

2018-09-28 Thread Wes McKinney
hi Bipin, There are narrow circumstances where zero-copy pandas deserialization is possible. Firstly, I noted that we are short of documentation for Table.to_pandas, so I opened https://issues.apache.org/jira/browse/ARROW-3356 It's possible there's a bug when zero_copy_only=True -- it is suppose

Re: Help with zero-copy conversion of pyarrow table to pandas dataframe.

2018-09-28 Thread Abdul Rahman
Have you tried using plasma which is effectively what you are trying to do ? https://arrow.apache.org/docs/python/plasma.html#using-arrow-and-pandas-with-plasma From: Bipin Mathew Sent: Friday, September 28, 2018 2:28:54 PM To: dev@arrow.apache.org Subject: Help

[jira] [Created] (ARROW-3356) [Python] Document parameters of Table.to_pandas method

2018-09-28 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-3356: --- Summary: [Python] Document parameters of Table.to_pandas method Key: ARROW-3356 URL: https://issues.apache.org/jira/browse/ARROW-3356 Project: Apache Arrow Iss

Help with zero-copy conversion of pyarrow table to pandas dataframe.

2018-09-28 Thread Bipin Mathew
Hello Everyone, I am just getting my feet wet with apache arrow and I am running into a bug or, more likely, simply misunderstanding the pyarrow api. I wrote out a four column, million row apache arrow table to shared memory and I am attempting to read it into a python dataframe. It is advert

Re: Some interesting VLDB reading on vectorized query evaluation relevant to Gandiva, other items

2018-09-28 Thread Julian Hyde
An excellent paper, thanks for sharing. (It’s worth reading every single one of the references.) I wonder whether Timo Kersten is related to Martin. > On Sep 27, 2018, at 9:44 AM, Wes McKinney wrote: > > http://www.vldb.org/pvldb/vol11/p2209-kersten.pdf

[jira] [Created] (ARROW-3355) [R] Support for factors

2018-09-28 Thread JIRA
Romain François created ARROW-3355: -- Summary: [R] Support for factors Key: ARROW-3355 URL: https://issues.apache.org/jira/browse/ARROW-3355 Project: Apache Arrow Issue Type: New Feature

Re: Using CUDA enabled pyarrow

2018-09-28 Thread Wes McKinney
Seems like there is a fair bit of work to do to specify APIs and semantics. I suggest we create a Google document or something collaborative where we can enumerate and discuss the issues we want to resolve, and then make a list of the concrete development. The underlying problem IMHO in ARROW-2446

[jira] [Created] (ARROW-3354) read_record_patch interfaces differ in pyarrow and pyarrow.cuda

2018-09-28 Thread Pearu Peterson (JIRA)
Pearu Peterson created ARROW-3354: - Summary: read_record_patch interfaces differ in pyarrow and pyarrow.cuda Key: ARROW-3354 URL: https://issues.apache.org/jira/browse/ARROW-3354 Project: Apache Arrow

[RESULT] [VOTE] Accept donation of C GLib bindings to Parquet C++ libraries

2018-09-28 Thread Wes McKinney
With 5 binding +1 votes and 1 non-binding +1, the vote carries I'll proceed with the IP Clearance process so that this can be merged early next week On Tue, Sep 25, 2018 at 10:54 AM Phillip Cloud wrote: > > +1, nice work. > > On Tue, Sep 25, 2018 at 10:53 AM Krisztián Szűcs > wrote: > > > +1 >

Re: Using CUDA enabled pyarrow

2018-09-28 Thread Pearu Peterson
Hi Wes, Yes, it makes sense. If I understand you correctly then defining a device abstraction would also bring Buffer and CudaBuffer under the same umbrella (that would be opposite approach to ARROW-2446, btw). This issue is also related to https://github.com/dmlc/dlpack/blob/master/include/dl

[jira] [Created] (ARROW-3353) [Packaging] Build python 3.7 wheels

2018-09-28 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-3353: -- Summary: [Packaging] Build python 3.7 wheels Key: ARROW-3353 URL: https://issues.apache.org/jira/browse/ARROW-3353 Project: Apache Arrow Issue Type: Impr

[jira] [Created] (ARROW-3352) [Packaging] Fix recently failing wheel builds

2018-09-28 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-3352: -- Summary: [Packaging] Fix recently failing wheel builds Key: ARROW-3352 URL: https://issues.apache.org/jira/browse/ARROW-3352 Project: Apache Arrow Issue

Re: Using CUDA enabled pyarrow

2018-09-28 Thread Wes McKinney
hi Pearu, Yes, I think it would be a good idea to develop some tools to make interacting with device memory using the existing data structures work seamlessly. This is all closely related to https://issues.apache.org/jira/browse/ARROW-2447 I would say step 1 would be defining the device abstrac

Using CUDA enabled pyarrow

2018-09-28 Thread Pearu Peterson
Hi, Consider the following use case: schema = cbuf = cbatch = pa.cuda.read_record_batch(schema, cbuf) Note that cbatch is pa.RecordBatch instance where data pointers are device pointers. for col in cbatch.columns: # here col is, say, FloatArray, that data pointer is a device pointer #

[jira] [Created] (ARROW-3351) [Python] Build failure on macOS

2018-09-28 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-3351: --- Summary: [Python] Build failure on macOS Key: ARROW-3351 URL: https://issues.apache.org/jira/browse/ARROW-3351 Project: Apache Arrow Issue Type: Bug

[jira] [Created] (ARROW-3350) [Website] Fix powered by links

2018-09-28 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-3350: -- Summary: [Website] Fix powered by links Key: ARROW-3350 URL: https://issues.apache.org/jira/browse/ARROW-3350 Project: Apache Arrow Issue Type: Improveme

Re: Failures bc clang-format

2018-09-28 Thread Romain Francois
Turns out I have: romain@purrplex ~/git/apache/arrow/r $ clang-format --version clang-format version 8.0.0 (tags/google/stable/2018-08-24) So I just made a symlink in my ~/bin/ Travis is happy about it. PR sent. Romain > Le 27 sept. 2018 à 22:43, Wes McKinney a écrit : > > I found it weird