[jira] [Created] (ARROW-1371) [Website] Add "Powered By" page to the website

2017-08-16 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1371: --- Summary: [Website] Add "Powered By" page to the website Key: ARROW-1371 URL: https://issues.apache.org/jira/browse/ARROW-1371 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-1370) wrong signed to unsigned conversion in js

2017-08-16 Thread Saman Amraii (JIRA)
Saman Amraii created ARROW-1370: --- Summary: wrong signed to unsigned conversion in js Key: ARROW-1370 URL: https://issues.apache.org/jira/browse/ARROW-1370 Project: Apache Arrow Issue Type: Bug

[jira] [Created] (ARROW-1369) Support boolean types in the javascript arrow reader library

2017-08-16 Thread Saman Amraii (JIRA)
Saman Amraii created ARROW-1369: --- Summary: Support boolean types in the javascript arrow reader library Key: ARROW-1369 URL: https://issues.apache.org/jira/browse/ARROW-1369 Project: Apache Arrow

[jira] [Created] (ARROW-1368) libarrow.a is not linked against boost libraries when compiled with -DARROW_BOOST_USE_SHARED=off

2017-08-16 Thread Robert Nishihara (JIRA)
Robert Nishihara created ARROW-1368: --- Summary: libarrow.a is not linked against boost libraries when compiled with -DARROW_BOOST_USE_SHARED=off Key: ARROW-1368 URL: https://issues.apache.org/jira/browse/ARROW-13

[jira] [Created] (ARROW-1367) [Website] Divide CHANGELOG issues by component and add subheaders

2017-08-16 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1367: --- Summary: [Website] Divide CHANGELOG issues by component and add subheaders Key: ARROW-1367 URL: https://issues.apache.org/jira/browse/ARROW-1367 Project: Apache Arrow

[jira] [Created] (ARROW-1366) [Python] Add instructions for starting the Plasma store when installing pyarrow from wheels

2017-08-16 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1366: --- Summary: [Python] Add instructions for starting the Plasma store when installing pyarrow from wheels Key: ARROW-1366 URL: https://issues.apache.org/jira/browse/ARROW-1366

[jira] [Created] (ARROW-1365) [Python] Remove usage of removed jemalloc_memory_pool in Python API docs

2017-08-16 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1365: --- Summary: [Python] Remove usage of removed jemalloc_memory_pool in Python API docs Key: ARROW-1365 URL: https://issues.apache.org/jira/browse/ARROW-1365 Project: Apache

Re: [DISCUSS] Apache Arrow and the GPU Open Analytics Initiative

2017-08-16 Thread Wes McKinney
To motivate the use case, the folks in GOAI are building applications with multiple components which interact with the GPU. For example: MapD (GPU database) allocates GPU memory, hands off to Python. Python can then decref and cudaFree on the device. Perhaps Python then uses cudaMalloc and wishes

Re: [DISCUSS] Apache Arrow and the GPU Open Analytics Initiative

2017-08-16 Thread Robert Nishihara
That makes a lot of sense. In some contexts it could make sense to run multiple Plasma stores per machine (possibly for different devices or different NUMA zones). Though that could make it slightly harder to take advantage of faster GPU to GPU communication. On Wed, Aug 16, 2017 at 2:01 PM Philip

Re: [DISCUSS] Apache Arrow and the GPU Open Analytics Initiative

2017-08-16 Thread Philipp Moritz
One observation here is that as far as I know shared memory is not typically used between multiple gpus and on a single gpu there is already a unified shared address space that each cuda thread can access. One reasonable extension of the APIs and facilities given these limitations would be the fol

[jira] [Created] (ARROW-1364) [C++] IPC reader and writer specialized for GPU device memory

2017-08-16 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1364: --- Summary: [C++] IPC reader and writer specialized for GPU device memory Key: ARROW-1364 URL: https://issues.apache.org/jira/browse/ARROW-1364 Project: Apache Arrow

Re: [DISCUSS] Apache Arrow and the GPU Open Analytics Initiative

2017-08-16 Thread Wes McKinney
One idea is whether the Plasma object store could be extended to support devices other than POSIX shared memory, like GPU device memory (or multiple GPUs on a single host). Philipp or Robert or any of the people who know the Plasma code best, any idea how this might be approached? It would have to

[jira] [Created] (ARROW-1363) [C++] IPC writer sends buffer layout for dictionary rather than indices

2017-08-16 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1363: --- Summary: [C++] IPC writer sends buffer layout for dictionary rather than indices Key: ARROW-1363 URL: https://issues.apache.org/jira/browse/ARROW-1363 Project: Apache A

[jira] [Created] (ARROW-1362) [Integration] Validate vector type layout in IPC messages

2017-08-16 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1362: --- Summary: [Integration] Validate vector type layout in IPC messages Key: ARROW-1362 URL: https://issues.apache.org/jira/browse/ARROW-1362 Project: Apache Arrow

[jira] [Created] (ARROW-1361) [Java] Add minor type param accessors to NullableValueVectors

2017-08-16 Thread Bryan Cutler (JIRA)
Bryan Cutler created ARROW-1361: --- Summary: [Java] Add minor type param accessors to NullableValueVectors Key: ARROW-1361 URL: https://issues.apache.org/jira/browse/ARROW-1361 Project: Apache Arrow

RE: Major difference between Spark and Arrow Parquet Implementations

2017-08-16 Thread Erin Sobkow
Thanks. Erin Sobkow, BA Kin, RMT Community Consultant Parkland Valley Sport, Culture & Recreation District Box 263, Yorkton, SK S3N 2V7 Phone: (306) 786-6585 Fax: (306) 782-0474 Email: esob...@parklandvalley.ca Website: www.parklandvalley.ca If you no longer wish to receive electronic messa

Re: Major difference between Spark and Arrow Parquet Implementations

2017-08-16 Thread Wes McKinney
hi Erin -- please send a separate e-mail to dev-unsubscr...@arrow.apache.org Thanks On Wed, Aug 16, 2017 at 1:06 PM, Erin Sobkow wrote: > Hi Wes: > > Somehow I have been inadvertently added to your list and am getting all these > emails that make no sense to me at all. I'm in on some conversat

RE: Major difference between Spark and Arrow Parquet Implementations

2017-08-16 Thread Erin Sobkow
Hi Wes: Somehow I have been inadvertently added to your list and am getting all these emails that make no sense to me at all. I'm in on some conversation I know nothing about and am getting up to 20 emails a day from different people. Can I ask you to remove me from your list and can you get

[jira] [Created] (ARROW-1360) [C++] Add Copy virtual method to arrow::Buffer

2017-08-16 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1360: --- Summary: [C++] Add Copy virtual method to arrow::Buffer Key: ARROW-1360 URL: https://issues.apache.org/jira/browse/ARROW-1360 Project: Apache Arrow Issue Type:

Re: Major difference between Spark and Arrow Parquet Implementations

2017-08-16 Thread Wes McKinney
hi Lucas, My understanding is that the Parquet format by itself does not place any such restrictions on the names of fields, and so this is a Spark SQL-specific issue (anyone please correct me if I'm mistaken about this). I would be happy to help add a schema cleaning option to normalize field nam

[jira] [Created] (ARROW-1359) [Python] Add Parquet writer option to normalize field names for use in Spark

2017-08-16 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1359: --- Summary: [Python] Add Parquet writer option to normalize field names for use in Spark Key: ARROW-1359 URL: https://issues.apache.org/jira/browse/ARROW-1359 Project: Apa

Major difference between Spark and Arrow Parquet Implementations

2017-08-16 Thread Lucas Pickup
Hello, I have been using pyarrow and PySpark to write Parquet files. I have used pyarrow to successfully write out a Parquet file with spaces in column names. E.g. 'X Coordinate'. When I try to write out the same dataset using Sparks Parquet writer it fails claiming: "Attribute name "X Coordina

[ANNOUNCE] Apache Arrow 0.6.0 released

2017-08-16 Thread Wes McKinney
The Apache Arrow community is pleased to announce the 0.6.0 release. It includes 90 resolved issues ([1]) since the 0.5.0 release. The release is available now from our website and [2]: http://arrow.apache.org/install/ Read about what's new in the release http://arrow.apache.org/blog/2017/08/

[jira] [Created] (ARROW-1358) Update source release scripts to account for new SHA checksum policy

2017-08-16 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1358: --- Summary: Update source release scripts to account for new SHA checksum policy Key: ARROW-1358 URL: https://issues.apache.org/jira/browse/ARROW-1358 Project: Apache Arro

[jira] [Created] (ARROW-1357) Data corruption in reading multi-file parquet dataset

2017-08-16 Thread Jarno Seppanen (JIRA)
Jarno Seppanen created ARROW-1357: - Summary: Data corruption in reading multi-file parquet dataset Key: ARROW-1357 URL: https://issues.apache.org/jira/browse/ARROW-1357 Project: Apache Arrow