[jira] [Created] (ARROW-2512) [Python ]Enable direct interaction of GPU Objects in Python

2018-04-25 Thread William Paul (JIRA)
William Paul created ARROW-2512: --- Summary: [Python ]Enable direct interaction of GPU Objects in Python Key: ARROW-2512 URL: https://issues.apache.org/jira/browse/ARROW-2512 Project: Apache Arrow

Re: Peak memory usage for pyarrow.parquet.read_table

2018-04-25 Thread Bryant Menn
Uwe, I'll try pinpointing things further with `columns=` and try to reproduce what I find with data I can share. Thanks for the pointer. -Bryant On Wed, Apr 25, 2018 at 2:10 PM Uwe L. Korn wrote: > No, there is no need to pass any options on reading. Sometimes they are > beneficial depending

Re: Peak memory usage for pyarrow.parquet.read_table

2018-04-25 Thread Uwe L. Korn
No, there is no need to pass any options on reading. Sometimes they are beneficial depending on what you want to achieve but defaults are ok, too. I'm not sure if you're able to post an example but it would be nice if you could post the resulting Arrow schema from the table. It might be related

Re: Peak memory usage for pyarrow.parquet.read_table

2018-04-25 Thread Bryant Menn
Uwe, I am not. Should I be? I forgot to mention earlier that the Parquet file came from Spark/PySpark. On Wed, Apr 25, 2018 at 1:32 PM Uwe L. Korn wrote: > Hello Bryant, > > are you using any options on `pyarrow.parquet.read_table` or a possible > `to_pandas` afterwards? > > Uwe > > On Wed, Apr

[jira] [Created] (ARROW-2511) Base{Variable|Fixed}WidthVector.allocateNew is not throwing OOM when it can't allocate memory

2018-04-25 Thread Venki Korukanti (JIRA)
Venki Korukanti created ARROW-2511: -- Summary: Base{Variable|Fixed}WidthVector.allocateNew is not throwing OOM when it can't allocate memory Key: ARROW-2511 URL: https://issues.apache.org/jira/browse/ARROW-2511

Re: Peak memory usage for pyarrow.parquet.read_table

2018-04-25 Thread Uwe L. Korn
Hello Bryant, are you using any options on `pyarrow.parquet.read_table` or a possible `to_pandas` afterwards? Uwe On Wed, Apr 25, 2018, at 7:27 PM, Bryant Menn wrote: > I tried reading a Parquet file (<200MB, lots of text with snappy) using > read_table and saw the memory usage peak over 8GB be

Peak memory usage for pyarrow.parquet.read_table

2018-04-25 Thread Bryant Menn
I tried reading a Parquet file (<200MB, lots of text with snappy) using read_table and saw the memory usage peak over 8GB before settling back down to ~200MB. This surprised me as I was expecting to be able to handle a Parquet file of this size with much less RAM (doing some processing with smaller

[jira] [Created] (ARROW-2510) [Python] Segmentation fault when converting empty column as categorical

2018-04-25 Thread Florian Jetter (JIRA)
Florian Jetter created ARROW-2510: - Summary: [Python] Segmentation fault when converting empty column as categorical Key: ARROW-2510 URL: https://issues.apache.org/jira/browse/ARROW-2510 Project: Apac

[jira] [Created] (ARROW-2509) [CI] Intermittent npm failures

2018-04-25 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2509: - Summary: [CI] Intermittent npm failures Key: ARROW-2509 URL: https://issues.apache.org/jira/browse/ARROW-2509 Project: Apache Arrow Issue Type: Bug

[jira] [Created] (ARROW-2508) [Python] pytest API changes make tests fail

2018-04-25 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-2508: - Summary: [Python] pytest API changes make tests fail Key: ARROW-2508 URL: https://issues.apache.org/jira/browse/ARROW-2508 Project: Apache Arrow Issue Type