[jira] [Created] (ARROW-1792) [Plasma C++] continuous write tensor failed

2017-11-09 Thread Lu Qi (JIRA)
Lu Qi created ARROW-1792: - Summary: [Plasma C++] continuous write tensor failed Key: ARROW-1792 URL: https://issues.apache.org/jira/browse/ARROW-1792 Project: Apache Arrow Issue Type: Bug

[jira] [Created] (ARROW-1791) Integration tests generate date[DAY] values outside of reasonable range

2017-11-09 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1791: --- Summary: Integration tests generate date[DAY] values outside of reasonable range Key: ARROW-1791 URL: https://issues.apache.org/jira/browse/ARROW-1791 Project: Apache A

[jira] [Created] (ARROW-1790) [Format] Define logical data type that represents a "packed C struct" composed from other fixed-size primitive types

2017-11-09 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1790: --- Summary: [Format] Define logical data type that represents a "packed C struct" composed from other fixed-size primitive types Key: ARROW-1790 URL: https://issues.apache.org/jira/bro

[jira] [Created] (ARROW-1789) [Format] Consolidate specification documents and improve clarity for new implementation authors

2017-11-09 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1789: --- Summary: [Format] Consolidate specification documents and improve clarity for new implementation authors Key: ARROW-1789 URL: https://issues.apache.org/jira/browse/ARROW-1789

[jira] [Created] (ARROW-1788) Plasma store crashes when trying to abort objects for disconnected client

2017-11-09 Thread Stephanie Wang (JIRA)
Stephanie Wang created ARROW-1788: - Summary: Plasma store crashes when trying to abort objects for disconnected client Key: ARROW-1788 URL: https://issues.apache.org/jira/browse/ARROW-1788 Project: Ap

[jira] [Created] (ARROW-1787) [Python] Support reading parquet files into DataFrames in a backward compatible way

2017-11-09 Thread Phillip Cloud (JIRA)
Phillip Cloud created ARROW-1787: Summary: [Python] Support reading parquet files into DataFrames in a backward compatible way Key: ARROW-1787 URL: https://issues.apache.org/jira/browse/ARROW-1787 Pro

Re: [DISCUSS] Buffer Layouts and Dictionary Vectors

2017-11-09 Thread Wes McKinney
I opened https://issues.apache.org/jira/browse/ARROW-1785 https://issues.apache.org/jira/browse/ARROW-1786 I can take the liberty of removing the metadata per ARROW-1785 in the next few days if there are no objections. We will want to add documentation to indicate which buffers must accompany eac

[jira] [Created] (ARROW-1786) [Format] List expected on-wire buffer layouts for each kind of Arrow physical type in specification

2017-11-09 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1786: --- Summary: [Format] List expected on-wire buffer layouts for each kind of Arrow physical type in specification Key: ARROW-1786 URL: https://issues.apache.org/jira/browse/ARROW-1786

[jira] [Created] (ARROW-1785) [Format] Remove VectorLayout metadata from metadata

2017-11-09 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1785: --- Summary: [Format] Remove VectorLayout metadata from metadata Key: ARROW-1785 URL: https://issues.apache.org/jira/browse/ARROW-1785 Project: Apache Arrow Issue

Re: [DISCUSS] readerIndex/writerIndex in Java vector refactor

2017-11-09 Thread Li Jin
Gotcha. Thanks for the clarification. On Thu, Nov 9, 2017 at 2:27 PM, Jacques Nadeau wrote: > Yes, we should only set reader/writer index on getBuffers() > > On Thu, Nov 9, 2017 at 11:13 AM, Li Jin wrote: > > > I see. Is this understanding correct? > > > > For ArrowBufs in vector classes, their

Re: Arrow sync today

2017-11-09 Thread Wes McKinney
OK, in the future someone can send a new normal Hangout to the list as a fallback if you are unavailable to admit users on the Google Meet meeting On Wed, Nov 8, 2017 at 5:11 PM, Jacques Nadeau wrote: > We spent a bunch of time trying to figure it out and as far as I can tell, > there is no way o

Re: [DISCUSS] readerIndex/writerIndex in Java vector refactor

2017-11-09 Thread Jacques Nadeau
Yes, we should only set reader/writer index on getBuffers() On Thu, Nov 9, 2017 at 11:13 AM, Li Jin wrote: > I see. Is this understanding correct? > > For ArrowBufs in vector classes, their reader/writerIndex are 0. Only when > writing out a record batch, the writerIndex in ArrowBufs is then set

Re: [DISCUSS] readerIndex/writerIndex in Java vector refactor

2017-11-09 Thread Li Jin
I see. Is this understanding correct? For ArrowBufs in vector classes, their reader/writerIndex are 0. Only when writing out a record batch, the writerIndex in ArrowBufs is then set correctly. On Thu, Nov 9, 2017 at 1:28 PM, Siddharth Teotia wrote: > ReaderIndex and WriterIndex are important

Re: [DISCUSS] readerIndex/writerIndex in Java vector refactor

2017-11-09 Thread Siddharth Teotia
ReaderIndex and WriterIndex are important when we get the buffers (for sending over the wire). We get the buffers from one or more vectors, build a compound buffer and slice it on the other end when reconstructing the vectors. Writer index helps in demarcating the exact end point of last written da

[jira] [Created] (ARROW-1784) [Python] Read and write pandas.DataFrame in pyarrow.serialize by decomposing the BlockManager rather than coercing to Arrow format

2017-11-09 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1784: --- Summary: [Python] Read and write pandas.DataFrame in pyarrow.serialize by decomposing the BlockManager rather than coercing to Arrow format Key: ARROW-1784 URL: https://issues.apac

Re: [DISCUSS] Buffer Layouts and Dictionary Vectors

2017-11-09 Thread Li Jin
However, this is currently broken in java refactor branch. I am fixing this in https://issues.apache.org/jira/browse/ARROW-1779 On Thu, Nov 9, 2017 at 12:32 PM, Li Jin wrote: > If null count is 0, the java library sets the validity vectors to all 1s. > > https://github.com/apache/arrow/blob/mast

[jira] [Created] (ARROW-1783) [Python] Convert SerializedPyObject to/from sequence of component buffers with minimal memory allocation / copying

2017-11-09 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1783: --- Summary: [Python] Convert SerializedPyObject to/from sequence of component buffers with minimal memory allocation / copying Key: ARROW-1783 URL: https://issues.apache.org/jira/brows

Re: [DISCUSS] Buffer Layouts and Dictionary Vectors

2017-11-09 Thread Li Jin
If null count is 0, the java library sets the validity vectors to all 1s. https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/BitVector.java#L61 On Thu, Nov 9, 2017 at 12:23 PM, Wes McKinney wrote: > Yep, see https://github.com/apache/arrow/blob/master/

[DISCUSS] readerIndex/writerIndex in Java vector refactor

2017-11-09 Thread Li Jin
Hi All, I am reading Java vector refactor code and come cross readerIndex/writerIndex on ArrowBuf. This issue has been brought up by Siddharth Teotia earlier but I might have missed the discussion so what to clarify. My understanding is that the current implementation in java refactor branch igno

[jira] [Created] (ARROW-1782) [Python] Expose compressors as pyarrow.compress, pyarrow.decompress

2017-11-09 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1782: --- Summary: [Python] Expose compressors as pyarrow.compress, pyarrow.decompress Key: ARROW-1782 URL: https://issues.apache.org/jira/browse/ARROW-1782 Project: Apache Arrow

Re: [DISCUSS] Buffer Layouts and Dictionary Vectors

2017-11-09 Thread Wes McKinney
Yep, see https://github.com/apache/arrow/blob/master/format/Layout.md#null-bitmaps "Arrays having a 0 null count may choose to not allocate the null bitmap." I do not know what the Java library will do in the event of 0 null count and 0-length validity bitmap -- in theory this should be accounte

Re: [DISCUSS] Buffer Layouts and Dictionary Vectors

2017-11-09 Thread Brian Hulette
Ah! It didn't occur to me that a producer could just send a length-0 buffer since the reader implementations should ignore it anyway. I don't mind the 16 byte cost of the metadata - I was referring to the bloat of a 100% valid vector, which could be substantial. Part of me wants to argue that

Re: [DISCUSS] Buffer Layouts and Dictionary Vectors

2017-11-09 Thread Wes McKinney
> So I'll go after the other validity vector - maybe producers should be > allowed to omit the validity vector in the index? I just think if the goal is > to reduce bloat then redundant validity vectors seems like a logical place to > trim. Well, the cost of the additional buffer metadata is on

Re: [DISCUSS] Buffer Layouts and Dictionary Vectors

2017-11-09 Thread Brian Hulette
Good point. Its a nice feature of the format that a dictionary batch and a record batch with a single column look exactly the same when they represent the same logical type. So I'll go after the other validity vector - maybe producers should be allowed to omit the validity vector in the index