Re: Some minor points from ARROW-94 (https://github.com/apache/arrow/pull/58)

2016-05-25 Thread Micah Kornfield
"hello world" makes sense as a good place to start for general IPC integration. I thought there was still some disconnect on how strings were going to be represented. That was the basis for my suggestion above. But the integer use-case bypasses these concerns for now. On Wed, May 25, 2016 at 2:

Re: Some minor points from ARROW-94 (https://github.com/apache/arrow/pull/58)

2016-05-25 Thread Jacques Nadeau
By usecase, I really meant "hello world" On Wed, May 25, 2016 at 2:09 PM, Jacques Nadeau wrote: > Let's start by creating a simple usecase. For example, I would start with > nullable 4 byte integer, maybe and use the example of java > (col1) > > python (or c++) > (newcol) > java that is one what

Re: Some minor points from ARROW-94 (https://github.com/apache/arrow/pull/58)

2016-05-25 Thread Jacques Nadeau
Let's start by creating a simple usecase. For example, I would start with nullable 4 byte integer, maybe and use the example of java > (col1) > python (or c++) > (newcol) > java that is one what I'd call a single batch algorithm (e.g. one batch of values in, one out). A simple way to sidestep the

Re: Some minor points from ARROW-94 (https://github.com/apache/arrow/pull/58)

2016-05-25 Thread Micah Kornfield
Just to follow-up on this. I got distracted on a few other items on the C++ implementation side, but my next task is to get a String types working for the C++ IPC unit test. Once I send a PR for that, it might help clarify the concerns on both sides and we can hammer out the details from there.

Re: Some minor points from ARROW-94 (https://github.com/apache/arrow/pull/58)

2016-05-13 Thread Wes McKinney
Nudging this issue. We need to sketch out a plan to get IPC integration tests working between the Java and C++ implementations -- what's the most expedient way we can work toward making that happen? On Sun, May 1, 2016 at 1:02 AM, Micah Kornfield wrote: > s/spark/slack/g > > On Sun, May 1, 2016 a

Re: Some minor points from ARROW-94 (https://github.com/apache/arrow/pull/58)

2016-05-01 Thread Micah Kornfield
s/spark/slack/g On Sun, May 1, 2016 at 12:58 AM, Micah Kornfield wrote: > I'm not exactly sure of my availability if I am available on spark, I > can likely make the hangout. > > On Fri, Apr 29, 2016 at 4:40 PM, Wes McKinney wrote: >> I was traveling today but I can do a hangout about this next

Re: Some minor points from ARROW-94 (https://github.com/apache/arrow/pull/58)

2016-05-01 Thread Micah Kornfield
I'm not exactly sure of my availability if I am available on spark, I can likely make the hangout. On Fri, Apr 29, 2016 at 4:40 PM, Wes McKinney wrote: > I was traveling today but I can do a hangout about this next week. > > On Thu, Apr 28, 2016 at 7:53 PM, Jacques Nadeau wrote: >> Let's do a qu

Re: Some minor points from ARROW-94 (https://github.com/apache/arrow/pull/58)

2016-04-29 Thread Wes McKinney
I was traveling today but I can do a hangout about this next week. On Thu, Apr 28, 2016 at 7:53 PM, Jacques Nadeau wrote: > Let's do a quick hangout on this. I'd like to better understand as I'm not > sure we're all talking about the same thing. > > On Thu, Apr 28, 2016 at 5:30 PM, Micah Kornfiel

Re: Some minor points from ARROW-94 (https://github.com/apache/arrow/pull/58)

2016-04-28 Thread Jacques Nadeau
Let's do a quick hangout on this. I'd like to better understand as I'm not sure we're all talking about the same thing. On Thu, Apr 28, 2016 at 5:30 PM, Micah Kornfield wrote: > I'm -1 on making a new primitive type in the memory layout spec [1]. > > +1 on clarifying [2], to indicate it is expec

Re: Some minor points from ARROW-94 (https://github.com/apache/arrow/pull/58)

2016-04-28 Thread Micah Kornfield
I'm -1 on making a new primitive type in the memory layout spec [1]. +1 on clarifying [2], to indicate it is expected that the "Values array" for Utf8 and Binary types should never contain null elements. [1] https://github.com/apache/arrow/blob/master/format/Layout.md [2] https://github.com/apach

Re: Some minor points from ARROW-94 (https://github.com/apache/arrow/pull/58)

2016-04-28 Thread Wes McKinney
Bumping this conversation. I'm +0 on making VARBINARY and String (identical VARBINARY but with a UTF8 guarantee) primitive types in the spec. Let me know what others think. Thanks On Fri, Apr 22, 2016 at 6:30 PM, Wes McKinney wrote: > On Fri, Apr 22, 2016 at 6:06 PM, Jacques Nadeau wrote: >> O

Re: Some minor points from ARROW-94 (https://github.com/apache/arrow/pull/58)

2016-04-22 Thread Wes McKinney
On Fri, Apr 22, 2016 at 6:06 PM, Jacques Nadeau wrote: > On Fri, Apr 22, 2016 at 2:42 PM, Wes McKinney wrote: > >> On Fri, Apr 22, 2016 at 4:56 PM, Micah Kornfield >> wrote: >> > I like the current scheme of making String (UTF8) a primitive type in >> > regards to RPC but not modeling it as a sp

Re: Some minor points from ARROW-94 (https://github.com/apache/arrow/pull/58)

2016-04-22 Thread Jacques Nadeau
On Fri, Apr 22, 2016 at 2:42 PM, Wes McKinney wrote: > On Fri, Apr 22, 2016 at 4:56 PM, Micah Kornfield > wrote: > > I like the current scheme of making String (UTF8) a primitive type in > > regards to RPC but not modeling it as a special Array type. I think > > the key is formally describing h

Re: Some minor points from ARROW-94 (https://github.com/apache/arrow/pull/58)

2016-04-22 Thread Wes McKinney
On Fri, Apr 22, 2016 at 4:56 PM, Micah Kornfield wrote: > I like the current scheme of making String (UTF8) a primitive type in > regards to RPC but not modeling it as a special Array type. I think > the key is formally describing how logical types map to physical types > either is the Flatbuffer

Re: Some minor points from ARROW-94 (https://github.com/apache/arrow/pull/58)

2016-04-22 Thread Micah Kornfield
I like the current scheme of making String (UTF8) a primitive type in regards to RPC but not modeling it as a special Array type. I think the key is formally describing how logical types map to physical types either is the Flatbuffer schema or in a separate document. I think there are two use-cas

Re: Some minor points from ARROW-94 (https://github.com/apache/arrow/pull/58)

2016-04-22 Thread Wes McKinney
hi Jacques, Let's definitely hammer out this bit on the string / binary types. In the Flatbuffers spec, they are already first-class types: https://github.com/apache/arrow/blob/master/format/Message.fbs#L68 I agree that having a special primitive type for this common case (to avoid extra structu

Re: Some minor points from ARROW-94 (https://github.com/apache/arrow/pull/58)

2016-04-13 Thread Jacques Nadeau
I agree with everything Wes said. I'd also note that we need to address some issues with the definition of the varchar(string) and varbinary(binary) types. They need to be considered primitives rather than Arrow compositions. I think this will be clearer if I get up a schema spec based on our init

Re: Some minor points from ARROW-94 (https://github.com/apache/arrow/pull/58)

2016-04-13 Thread Wes McKinney
hi Micah, thank you for working through these details. On Sun, Apr 10, 2016 at 7:18 AM, Micah Kornfield wrote: > As part of the pull request, I converted example arrays to show their > full layout in gory detail. Part of this led to some clarifications > to questions that people have apparently