Re: Any standard way for min/max values per record-batch?

2021-07-19 Thread Kohei KaiGai
and physically closed rows tend to have similar values. Not only Apache Arrow files generated by pg2arrow, this min/max statistics values are appendable by rewrite of the Footer portion, without relocation of record-batches. So, we plan to provide a standalone command to attach the min/max

Pcap2Arrow - Packet capture and data conversion tool to Apache Arrow on the fly

2021-02-15 Thread Kohei KaiGai
Hello, Let me share my recent works below: https://github.com/heterodb/pg-strom/wiki/804:-Pcap2Arrow This standalone command-line tool allows to capture network packets from network interface devices, and convert them into Apache Arrow data format according to the pre-defined data schema for each

Any standard way for min/max values per record-batch?

2021-02-17 Thread Kohei KaiGai
Hello, Does Apache Arrow have any standard way to embed min/max values of the fields per record-batch basis? It looks FieldNode supports neither dedicated min/max attribute nor custom-metadata. https://github.com/apache/arrow/blob/master/format/Message.fbs#L28 If we embed an array of min/max valu

Re: Any standard way for min/max values per record-batch?

2021-02-17 Thread Kohei KaiGai
structure as a > parallel list on RecordBatch itself. > > If we do add a new structure or arbitrary key-value pair we should not use > KeyValue but should have something where the values can be bytes. > > On Wed, Feb 17, 2021 at 7:17 PM Kohei KaiGai wrote: > > > Hell

Re: Human-readable version of Arrow Schema?

2020-01-08 Thread Kohei KaiGai
Hello, pg2arrow [*1] has '--dump' mode to print out schema definition of the given Apache Arrow file. Does it make sense for you? $ ./pg2arrow --dump ~/hoge.arrow [Footer] {Footer: version=V4, schema={Schema: endianness=little, fields=[{Field: name="id", nullable=true, type={Int32}, children=[],

Format specification document?

2019-01-03 Thread Kohei KaiGai
Hello, I'm now trying to understand the Apache Arrow format for my application. Is there a format specification document including meta-data layout? I checked out the description at: https://github.com/apache/arrow/tree/master/docs/source/format https://github.com/apache/arrow/tree/master/format

Re: Format specification document?

2019-01-05 Thread Kohei KaiGai
y. ("pandas\0"). Value is at 0x0058 + 0x0010. Here is a int value: 0x03b4 (= 948byes), then the next byte (0x006c) begins the cstring body. ("{pandas_version ... ). I didn't follow the entire data file, however, it makes me more clear. Best regards, 2019年1月6日(日) 8:50 Wes McKinney : >

How about inet4/inet6/macaddr data types?

2019-04-29 Thread Kohei KaiGai
Hello folks, How about your opinions about network address types support in Apache Arrow data format? Network address always appears at network logs massively generated by any network facilities, and it is a significant information when people analyze their backward logs. I'm working on Apache Ar

Re: How about inet4/inet6/macaddr data types?

2019-04-29 Thread Kohei KaiGai
row specification? > > Thanks, > Micah > > On Monday, April 29, 2019, Kohei KaiGai wrote: > > > Hello folks, > > > > How about your opinions about network address types support in Apache > > Arrow data format? > > Network address always appears at netwo

Re: How about inet4/inet6/macaddr data types?

2019-04-30 Thread Kohei KaiGai
an see a UUID type I have defined and > serialized through Arrow's binary protocol machinery > > https://github.com/apache/arrow/blob/master/cpp/src/arrow/extension_type-test.cc > > Thanks > Wes > > [1]: > https://github.com/apache/arrow/commit/a79cc80988319