hi Kohei,

I'm awaiting community feedback about the approach to implementing
extension types, whether the approach that I've used (using the
following keys in custom_metadata [1]) is the one that we want to use
longer-term. This certainly seems like a good time to have that
discussion. If there is consensus then we can document it formally in
the specification documents, and we probably will want to hold a vote
to ensure that we are in agreement.

Thanks

[1]: 
https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/metadata-internal.cc#L63

On Tue, Apr 30, 2019 at 6:55 PM Kohei KaiGai <kai...@heterodb.com> wrote:
>
> Hello Wes,
>
> @ktou also introduced me your work.
> As long as the custom_metadata format to declare the custom datatype
> is well defined
> in the specification or document somewhere, independent from the
> library implementation,
> it looks to me sufficient.
> Does your UUID example use FixedSizeBinary raw-data type to wrap UUID and put
> "arrow_extension_name=uuid" and "arrow_extension_data=uuid-type-unique-code"
> on the custrom_metadata of Field "f0", right?
> If it is documented somewhere, people can reproduce the custom datatype by 
> their
> applications, and other folks can also read the custom datatype.
>
> Thanks,
>
> 2019年4月30日(火) 23:47 Wes McKinney <wesmck...@gmail.com>:
> >
> > hi Kohei,
> >
> > Since the introduction of arrow::ExtensionType in ARROW-585 [1] we
> > have a well-defined method of creating new data types without having
> > to manually interact with the custom_metadata Schema information. Can
> > you have a look at that and see if it meets your requirements? This
> > can be a useful way of extending the Arrow format for your use cases
> > while the community may discuss formally adding new logical types to
> > the format (or not).
> >
> > In the unit tests you can see a UUID type I have defined and
> > serialized through Arrow's binary protocol machinery
> >
> > https://github.com/apache/arrow/blob/master/cpp/src/arrow/extension_type-test.cc
> >
> > Thanks
> > Wes
> >
> > [1]: 
> > https://github.com/apache/arrow/commit/a79cc809883192417920b501e41a0e8b63cd0ad1
> >
> > On Tue, Apr 30, 2019 at 1:34 AM Kohei KaiGai <kai...@heterodb.com> wrote:
> > >
> > > Hello,
> > >
> > > It is an proposition to add new logical types for the Apache Arrow data 
> > > format.
> > >
> > > As Melik-Adamyan said, it is quite easy to convert 5-bytes
> > > FixedSizeBinary to PostgreSQL's inet
> > > data type by the Arrow_Fdw module (an extension of PostgreSQL
> > > responsible to data conversion),
> > > however, it is not obvious for readers whether it is network-address
> > > or just a bunch of small binary.
> > >
> > > https://www.postgresql.org/docs/11/sql-importforeignschema.html
> > > PostgreSQL has IMPORT FOREIGN SCHEMA command; that allows to define a
> > > foreign table
> > > according to schema information of the external data source.
> > > In case of Arrow_Fdw, we can define a foreign table without manual
> > > listing of columns with data
> > > types as follows:
> > >
> > >   IMPORT FOREIGN SCHEMA foo FROM arrow_fdw INTO public
> > >   OPTIONS (file '/opt/nvme/foo.arrow');
> > >
> > > In this case, Schema definition in the 'foo.arrow' can tell PostgreSQL
> > > how many columns are
> > > defined and its name, data types and so on. However, PostgreSQL may be
> > > confusing to convert
> > > the FixedSizeBinary (width=5) without any metadata support. It may be
> > > 'inet4' data type, and
> > > it also may be 'char(5)'.
> > >
> > > One idea is utilization of custom_metadata field in the Field-node. We
> > > may be able to mark it is
> > > a network address, not a blob. However, I didn't find out
> > > specification of the custom_metadata.
> > >
> > > I expect network address is widely used for log-data processing area,
> > > and not small number of
> > > application will support it. If so, it is not too niche requirement
> > > for a new logical data type definition
> > > in the Apache Arrow data format.
> > >
> > > Best regards,
> > >
> > > 2019年4月30日(火) 15:13 Micah Kornfield <emkornfi...@gmail.com>:
> > > >
> > > > Hi KaiGai Kohei,
> > > > Can you clarify if you are looking for advice on modelling these types 
> > > > or
> > > > proposing to add new logical types to the Arrow specification?
> > > >
> > > > Thanks,
> > > > Micah
> > > >
> > > > On Monday, April 29, 2019, Kohei KaiGai <kai...@heterodb.com> wrote:
> > > >
> > > > > Hello folks,
> > > > >
> > > > > How about your opinions about network address types support in Apache
> > > > > Arrow data format?
> > > > > Network address always appears at network logs massively generated by
> > > > > any network facilities,
> > > > > and it is a significant information when people analyze their backward
> > > > > logs.
> > > > >
> > > > > I'm working on Apache Arrow format mapping on PostgreSQL.
> > > > > http://heterodb.github.io/pg-strom/arrow_fdw/
> > > > >
> > > > > This extension allows to read Arrow files as if PostgreSQL's table
> > > > > using foreign table.
> > > > > Data types of Arrow shall be mapped to relevant PostgreSQL's data type
> > > > > according to the above
> > > > > documentation.
> > > > >
> > > > > https://www.postgresql.org/docs/current/datatype-net-types.html
> > > > > PostgreSQL supports some network address types and operators.
> > > > > For example, we can put a qualifier like:   WHERE addr <<= inet
> > > > > '192.168.1.0/24' , to find out all
> > > > > the records in the subnet of '192.168.1.0/24'.
> > > > >
> > > > > Probably, these three data types are now sufficient for most network
> > > > > logs: inet4, inet6 and macaddr.
> > > > > * inet4 is 32bit + optional 8bit (for netmask) fixed length array
> > > > > * inet6 is 128bit + optional 8bit (for netmask) fixed length array
> > > > > * macaddr is 48bit fixed length array.
> > > > >
> > > > > I don't favor to map the inetX types on flexible length Binary data
> > > > > type, because it takes 32bit offset
> > > > > to indicate 32 or 40bit value, inefficient so much, even though
> > > > > PostgreSQL allows to mix inet4/inet6
> > > > > data types in a same column.
> > > > >
> > > > > Thanks,
> > > > > --
> > > > > HeteroDB, Inc / The PG-Strom Project
> > > > > KaiGai Kohei <kai...@heterodb.com>
> > > > >
> > >
> > >
> > >
> > > --
> > > HeteroDB, Inc / The PG-Strom Project
> > > KaiGai Kohei <kai...@heterodb.com>
>
>
>
> --
> HeteroDB, Inc / The PG-Strom Project
> KaiGai Kohei <kai...@heterodb.com>

Reply via email to