hi Kohei, I'm awaiting community feedback about the approach to implementing extension types, whether the approach that I've used (using the following keys in custom_metadata [1]) is the one that we want to use longer-term. This certainly seems like a good time to have that discussion. If there is consensus then we can document it formally in the specification documents, and we probably will want to hold a vote to ensure that we are in agreement.
Thanks [1]: https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/metadata-internal.cc#L63 On Tue, Apr 30, 2019 at 6:55 PM Kohei KaiGai <kai...@heterodb.com> wrote: > > Hello Wes, > > @ktou also introduced me your work. > As long as the custom_metadata format to declare the custom datatype > is well defined > in the specification or document somewhere, independent from the > library implementation, > it looks to me sufficient. > Does your UUID example use FixedSizeBinary raw-data type to wrap UUID and put > "arrow_extension_name=uuid" and "arrow_extension_data=uuid-type-unique-code" > on the custrom_metadata of Field "f0", right? > If it is documented somewhere, people can reproduce the custom datatype by > their > applications, and other folks can also read the custom datatype. > > Thanks, > > 2019年4月30日(火) 23:47 Wes McKinney <wesmck...@gmail.com>: > > > > hi Kohei, > > > > Since the introduction of arrow::ExtensionType in ARROW-585 [1] we > > have a well-defined method of creating new data types without having > > to manually interact with the custom_metadata Schema information. Can > > you have a look at that and see if it meets your requirements? This > > can be a useful way of extending the Arrow format for your use cases > > while the community may discuss formally adding new logical types to > > the format (or not). > > > > In the unit tests you can see a UUID type I have defined and > > serialized through Arrow's binary protocol machinery > > > > https://github.com/apache/arrow/blob/master/cpp/src/arrow/extension_type-test.cc > > > > Thanks > > Wes > > > > [1]: > > https://github.com/apache/arrow/commit/a79cc809883192417920b501e41a0e8b63cd0ad1 > > > > On Tue, Apr 30, 2019 at 1:34 AM Kohei KaiGai <kai...@heterodb.com> wrote: > > > > > > Hello, > > > > > > It is an proposition to add new logical types for the Apache Arrow data > > > format. > > > > > > As Melik-Adamyan said, it is quite easy to convert 5-bytes > > > FixedSizeBinary to PostgreSQL's inet > > > data type by the Arrow_Fdw module (an extension of PostgreSQL > > > responsible to data conversion), > > > however, it is not obvious for readers whether it is network-address > > > or just a bunch of small binary. > > > > > > https://www.postgresql.org/docs/11/sql-importforeignschema.html > > > PostgreSQL has IMPORT FOREIGN SCHEMA command; that allows to define a > > > foreign table > > > according to schema information of the external data source. > > > In case of Arrow_Fdw, we can define a foreign table without manual > > > listing of columns with data > > > types as follows: > > > > > > IMPORT FOREIGN SCHEMA foo FROM arrow_fdw INTO public > > > OPTIONS (file '/opt/nvme/foo.arrow'); > > > > > > In this case, Schema definition in the 'foo.arrow' can tell PostgreSQL > > > how many columns are > > > defined and its name, data types and so on. However, PostgreSQL may be > > > confusing to convert > > > the FixedSizeBinary (width=5) without any metadata support. It may be > > > 'inet4' data type, and > > > it also may be 'char(5)'. > > > > > > One idea is utilization of custom_metadata field in the Field-node. We > > > may be able to mark it is > > > a network address, not a blob. However, I didn't find out > > > specification of the custom_metadata. > > > > > > I expect network address is widely used for log-data processing area, > > > and not small number of > > > application will support it. If so, it is not too niche requirement > > > for a new logical data type definition > > > in the Apache Arrow data format. > > > > > > Best regards, > > > > > > 2019年4月30日(火) 15:13 Micah Kornfield <emkornfi...@gmail.com>: > > > > > > > > Hi KaiGai Kohei, > > > > Can you clarify if you are looking for advice on modelling these types > > > > or > > > > proposing to add new logical types to the Arrow specification? > > > > > > > > Thanks, > > > > Micah > > > > > > > > On Monday, April 29, 2019, Kohei KaiGai <kai...@heterodb.com> wrote: > > > > > > > > > Hello folks, > > > > > > > > > > How about your opinions about network address types support in Apache > > > > > Arrow data format? > > > > > Network address always appears at network logs massively generated by > > > > > any network facilities, > > > > > and it is a significant information when people analyze their backward > > > > > logs. > > > > > > > > > > I'm working on Apache Arrow format mapping on PostgreSQL. > > > > > http://heterodb.github.io/pg-strom/arrow_fdw/ > > > > > > > > > > This extension allows to read Arrow files as if PostgreSQL's table > > > > > using foreign table. > > > > > Data types of Arrow shall be mapped to relevant PostgreSQL's data type > > > > > according to the above > > > > > documentation. > > > > > > > > > > https://www.postgresql.org/docs/current/datatype-net-types.html > > > > > PostgreSQL supports some network address types and operators. > > > > > For example, we can put a qualifier like: WHERE addr <<= inet > > > > > '192.168.1.0/24' , to find out all > > > > > the records in the subnet of '192.168.1.0/24'. > > > > > > > > > > Probably, these three data types are now sufficient for most network > > > > > logs: inet4, inet6 and macaddr. > > > > > * inet4 is 32bit + optional 8bit (for netmask) fixed length array > > > > > * inet6 is 128bit + optional 8bit (for netmask) fixed length array > > > > > * macaddr is 48bit fixed length array. > > > > > > > > > > I don't favor to map the inetX types on flexible length Binary data > > > > > type, because it takes 32bit offset > > > > > to indicate 32 or 40bit value, inefficient so much, even though > > > > > PostgreSQL allows to mix inet4/inet6 > > > > > data types in a same column. > > > > > > > > > > Thanks, > > > > > -- > > > > > HeteroDB, Inc / The PG-Strom Project > > > > > KaiGai Kohei <kai...@heterodb.com> > > > > > > > > > > > > > > > > > -- > > > HeteroDB, Inc / The PG-Strom Project > > > KaiGai Kohei <kai...@heterodb.com> > > > > -- > HeteroDB, Inc / The PG-Strom Project > KaiGai Kohei <kai...@heterodb.com>