Re: unsubscribe please

2016-02-24 Thread Corey J Nolet
Please send a message to dev-unsubscr...@arrow.apache.org in order to take yourself off of this list. Thanks. Sent from my iPad > On Feb 24, 2016, at 9:23 PM, Chester wrote: > > Unsubscribe doest work for me either. > I switch the email from person to work by unsubscribe the personal email.

Re: unsubscribe please

2016-02-24 Thread Chester
Unsubscribe doest work for me either. I switch the email from person to work by unsubscribe the personal email. I got confirmation that email is removed But I am still getting emails on my personal email Sent from my iPhone > On Feb 24, 2016, at 12:02 PM, Corey Nolet wrote: > > Russel, >

Re: Question about mutability

2016-02-24 Thread Leif Walsh
Here's the image that popped into my mind when I heard about this project, at best it's a motivating example, at worst it's a distraction: 1. Spark reads parquet from wherever into an arrow structure in shared memory. 2. Spark executor calls into the Python half of pyspark with a handle to this me

Re: SIMD support in Java

2016-02-24 Thread Leif Walsh
The JVM may be able to do popcount optimization but it's categorically bad at other vectorization instructions. On Wed, Feb 24, 2016 at 18:30 Taro L. Saito wrote: > Thanks for letting me know. > > If we need to embed C++ binaries (.so files) inside java, > snappy-java's approach https://github.co

Re: Should Nullable be a nested type?

2016-02-24 Thread Wes McKinney
hi Dan, Thanks for these thoughts! There's a few separate problems - Physical memory representation - Metadata - Implementation container type layering / user API We aren't expecting any system that uses Arrow to necessarily use one of the reference Arrow implementations, but I expect an ecosyst

Re: SIMD support in Java

2016-02-24 Thread Taro L. Saito
Thanks for letting me know. If we need to embed C++ binaries (.so files) inside java, snappy-java's approach https://github.com/xerial/snappy-java would be useful, which bundles .so files built for several OS/CPU architectures, and loads one of them at run-time. Btw, JVM is smart enough to replac

Should Nullable be a nested type?

2016-02-24 Thread Daniel Robinson
(The below mostly uses terminology and examples from the draft spec and C++ implementation ). Under the current spec, if I understand it correctly, there are two versions of every type:

Re: SIMD support in Java

2016-02-24 Thread Wes McKinney
I will soon need some SIMD-enabled algorithms for hashing and bitmap-related stuff like popcount in the C++ implementation; we might prioritize a batchy JNI interface to Arrow C++ to use for cases where the JNI overhead is worth paying from the Java side. On Wed, Feb 24, 2016 at 11:30 AM, Jacques

Arrow examples

2016-02-24 Thread Dmitriy Morozov
Hello everyone, I'm just starting with Arrow. I'd like to see how good Arrow at caching when used in conjunction with Allixio (Tachyon). The use case that I'm going to validate involves reading data from Spark's DataFrame, storing in Tachyon in Arrow and then reading back into DataFrame. I checked

Re: unsubscribe please

2016-02-24 Thread Corey Nolet
Russel, Please send a message to dev-unsubscr...@arrow.apache.org On Wed, Feb 24, 2016 at 2:56 PM, Russell Simmons < russell.emergen...@gmail.com> wrote: > thx >

unsubscribe please

2016-02-24 Thread Russell Simmons
thx

Re: Question about mutability

2016-02-24 Thread Corey Nolet
So far, how does the integration with the Spark project look? Do you envision cached Spark partitions allocating Arrows? I could imagine this would be absolutely huge for being able to ask questions of real-time data sets across applications. On Wed, Feb 24, 2016 at 2:49 PM, Zhe Zhang wrote: > T

Re: Question about mutability

2016-02-24 Thread Zhe Zhang
Thanks for the insights Jacques. Interesting to learn the thoughts on zero-copy sharing. mmap allows sharing address spaces via filesystem interface. That has some security concerns but with the immutability restrictions (as you clarified here), it sounds feasible. What other options do you have i

Re: Question about mutability

2016-02-24 Thread Jacques Nadeau
Absolutely. One of the designs of batches of records if they include no more than 1 << 16 records (and may hold substantially less). A typical set of operations will create large numbers of these batches structures and many of them may be ephemeral. The goal of the data headers structure (which is

RE: Question about mutability

2016-02-24 Thread Andrew Brust
Here's the relevant bit from my article, based on my discussion with Jaques: Partying on the data Once multiple projects adopt Arrow, they will be able to share data with little overhead, since the data won't need

Re: Question about mutability

2016-02-24 Thread Corey Nolet
Thanks for the fast response Jacques. But that's not to say it would be impossible to continue to create new data structures and de-allocate others in order to add and remove data, correct? On Wed, Feb 24, 2016 at 2:26 PM, Jacques Nadeau wrote: > Both are possible. Arrow doesn't force one or the

Re: SIMD support in Java

2016-02-24 Thread Jacques Nadeau
The short answer is the JVM is horrible at SIMD. It does a few optimizations when working with primitive arrays but beyond that, you're basically stuck working outside the JVM. The key for Arrow is that the overhead of stepping out of the JVM can be amortized across all records in a batch. I hope t

Re: Question about mutability

2016-02-24 Thread Corey Nolet
Agreed, I thought the whole purpose was to share the memory space (using possibly unsafe operations like ByteBuffers) so that it could be directly shared without copy. My interest in this is to have it enable fully in-memory computation. Not just "processing" as in Spark, but as a fully in-memory

Re: Question about mutability

2016-02-24 Thread Jacques Nadeau
Both are possible. Arrow doesn't force one or the other. You can copy between memory spaces and you still benefit from the same representation (avoiding ser/deserialization). However, the target is definitely leveraging shared memory. The way memory spaces are shared still needs more definition. M

Re: Question about mutability

2016-02-24 Thread Zhe Zhang
Well I could be wrong. I'm new to the codebase myself. Let's hope someone from the core dev team can help clarify. On Wed, Feb 24, 2016 at 11:20 AM Michael D. Coon wrote: > I had the same understanding as Corey and thought that apps shared an > allocated memory space for the sole purpose of elim

Re: Question about mutability

2016-02-24 Thread Michael D. Coon
I had the same understanding as Corey and thought that apps shared an allocated memory space for the sole purpose of eliminating the need to copy data between the apps. If it's just a replacement for protobuf SERDE, that makes it a whole lot less exciting :( On Wednesday, February 24, 20

RE: Question about mutability

2016-02-24 Thread Andrew Brust
Hmm...that's not exactly how Jaques described things to me when he briefed me on Arrow ahead of the announcement. -Original Message- From: Zhe Zhang [mailto:z...@apache.org] Sent: Wednesday, February 24, 2016 2:08 PM To: dev@arrow.apache.org Subject: Re: Question about mutability I don'

Re: Question about mutability

2016-02-24 Thread Zhe Zhang
I don't think one application/process's memory space will be made available to other applications/processes. It's fundamentally hard for processes to share their address spaces. IIUC, with Arrow, when application A shares data with application B, the data is still duplicated in the memory spaces o

Question about mutability

2016-02-24 Thread Corey Nolet
Forgive me if this question seems ill-informed. I just started looking at Arrow yesterday. I looked around the github a tad. Are you expecting the memory space held by one application to be mutable by that application and made available to all applications trying to read the memory space?

SIMD support in Java

2016-02-24 Thread Taro L. Saito
Hi, I have just started looking at the java code of Arrow. So far what I can found is: - Code template is used to generate efficient codes for reading/writing fixed bit-length value vectors - Unsafe class will be used to accelerate raw memory access within ByteBuffer - ValueHolder class is used