[jira] [Created] (ARROW-1829) [Plasma] Clean up eviction policy bookkeeping

2017-11-16 Thread Robert Nishihara (JIRA)
Robert Nishihara created ARROW-1829: --- Summary: [Plasma] Clean up eviction policy bookkeeping Key: ARROW-1829 URL: https://issues.apache.org/jira/browse/ARROW-1829 Project: Apache Arrow Issu

Re: Modeling N-Dim Arrays in Arrow

2017-11-16 Thread Robert Nishihara
Great! On Thu, Nov 16, 2017 at 3:04 PM Lewis John McGibbney wrote: > Fantastic Robert, thank you for the pointers. > The documentation and graphics on ray github pages is very helpful. > Lewis > > On 2017-11-16 11:20, Robert Nishihara wrote: > > Yes definitely! You can do this through high leve

Re: Modeling N-Dim Arrays in Arrow

2017-11-16 Thread Lewis John McGibbney
Fantastic Robert, thank you for the pointers. The documentation and graphics on ray github pages is very helpful. Lewis On 2017-11-16 11:20, Robert Nishihara wrote: > Yes definitely! You can do this through high level Python APIs, e.g., > something like > https://github.com/apache/arrow/blob/ca3

Re: Migrating from Avro to Arrow

2017-11-16 Thread Jacques Nadeau
For java, you can start by looking at this entry point: https://github.com/dremio/dremio-oss/blob/master/sabot/kernel/src/main/java/com/dremio/exec/store/parquet/columnreaders/DeprecatedParquetVectorizedReader.java Something that might actually be easier as an initial understanding (simpler) is l

Re: Migrating from Avro to Arrow

2017-11-16 Thread Lewis John McGibbney
Hi Jacques, Can you point me to where I get started e.g. with the converter? Where does the Parquet --> Arrow one current exist? Thank you On 2017-11-16 10:42, Jacques Nadeau wrote: > Welcome Lewis! > > The use case you outline makes a lot of sense for Arrow to help out > with. We don't yet hav

Re: Modeling N-Dim Arrays in Arrow

2017-11-16 Thread Robert Nishihara
Yes definitely! You can do this through high level Python APIs, e.g., something like https://github.com/apache/arrow/blob/ca3acdc138b1ac27c9111b236d33593988689a20/python/pyarrow/tests/test_serialization.py#L214-L216 . You can also share the numpy arrays using shared memory, e.g., https://issues.ap

Modeling N-Dim Arrays in Arrow

2017-11-16 Thread Lewis John McGibbney
Hi Folks, Array-oriented scientific data (such as satellite remote sensing data) is commonly encoded using NetCDF [0] and HDF [1] data formats as these formats have been designed and developed to offer amongst other things, some/all of the following features * Self-Describing. A netCDF file in

Re: Migrating from Avro to Arrow

2017-11-16 Thread Jacques Nadeau
Welcome Lewis! The use case you outline makes a lot of sense for Arrow to help out with. We don't yet have an AVRO <> Arrow converter written but it is something that would be great to have. We'd all be happy to help if you're interested in taking this on. The new improvements to the Arrow Java AP

Re: General questions about Arrow & Plasma

2017-11-16 Thread Philipp Moritz
Here are some more examples on how to interact between Plasma and Arrow: http://arrow.apache.org/docs/python/plasma.html, see also the C++ documentation: http://arrow.apache.org/docs/cpp/md_tutorials_plasma.html On Thu, Nov 16, 2017 at 10:31 AM, Philipp Moritz wrote: > Hey Matthias, > > 1. The w

Migrating from Avro to Arrow

2017-11-16 Thread Lewis John McGibbney
Hi Folks, We've been working on GORA (Generic Object Representation using Avro) for some years now. https://gora.apache.org The framework provides an in-memory data model and persistence for big data. Gora supports persisting to column stores, key value stores, document stores, distributed in-

Re: General questions about Arrow & Plasma

2017-11-16 Thread Philipp Moritz
Hey Matthias, 1. The way it is done is as in https://github.com/apache/arrow/blob/c6295f3b74bcc2fa9ea1b9442f922bf564669b8e/python/pyarrow/plasma.pyx#L394: You first create the arrow object (using the builder from C++ or the python functions), get it's size, create a plasma object of the required s

[jira] [Created] (ARROW-1828) [C++] Implement hash kernel specialization for BooleanType

2017-11-16 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1828: --- Summary: [C++] Implement hash kernel specialization for BooleanType Key: ARROW-1828 URL: https://issues.apache.org/jira/browse/ARROW-1828 Project: Apache Arrow

General questions about Arrow & Plasma

2017-11-16 Thread Matthias Vallentin
Two question about Plasma; my use case is sharing Arrow data between a C++ and Python application (eventually also R). 1. What's the typical memory allocation procedure when using Plasma and Arrow? Do I first construct a builder, populate it, finish it, and *then* copy it into mmaped buffe

[jira] [Created] (ARROW-1827) [Java] Add checkstyle config file and header file

2017-11-16 Thread Li Jin (JIRA)
Li Jin created ARROW-1827: - Summary: [Java] Add checkstyle config file and header file Key: ARROW-1827 URL: https://issues.apache.org/jira/browse/ARROW-1827 Project: Apache Arrow Issue Type: Task

[jira] [Created] (ARROW-1826) [JAVA] Avoid branching at cell level (copyFrom)

2017-11-16 Thread Siddharth Teotia (JIRA)
Siddharth Teotia created ARROW-1826: --- Summary: [JAVA] Avoid branching at cell level (copyFrom) Key: ARROW-1826 URL: https://issues.apache.org/jira/browse/ARROW-1826 Project: Apache Arrow Is