[ https://issues.apache.org/jira/browse/HIVE-19207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alan Gates updated HIVE-19207: ------------------------------ Target Version/s: 3.1.0, 3.0.0 (was: 3.0.0, 3.1.0) Assignee: Alan Gates (was: Prasanth Jayachandran) Status: Patch Available (was: Open) Here is an initial pass at adding Avro support in streaming. I modified the RecordWriter interface to be parameterized by the type of records that the user is writing. Previously it seemed to assume that all records could be turned into byte[] and handled as such. This is not convient when you already have structured data like Avro. I tried to do it in a way that did not break backward compatibility with existing RecordWriter implementations. I've added two RecordWriter implementations, a StrictAvroWriter and a MappingAvroWriter. StrictAvroWriter assumes that the Hive table and Avro records exactly match in schema (or at least close enough that the type conversion can be done). It also assumes that the Avro schema passed to it exactly matches every Avro record in the stream. MappingAvroWriter takes a map of Hive column names to Avro paths. The avro path can be a simple column name, or a path through an Avro complex type. So the Hive column 'zipcode' could be mapped to an Avro column 'zipcode' or to an Avro record with a zipcode field (address.zipcode) or to an Avro map with a zipcode key (address[zipcode]). Again the system assumes the types are close enough that Hive can do type conversion if necessary. In this case the Avro schema passed to the writer does not have to exactly match every record in the stream, but it must be usuable to decode the referenced Avro columns for every record in the stream. Both writers support all Avro types except Null as a top level object. Avro unions created just to allow a null value are "read through" to the non-null type and that type is used. For example, an Avro nullableString will become a String in Hive. For both writers I did not use the existing AvroSerDe because it assumes that every Avro record has a schema encoded with it. In general this is not how I assume users generally stream their data. I did try to follow the same type conversions as the AvroSerDe. > Support avro record writer for streaming ingest > ----------------------------------------------- > > Key: HIVE-19207 > URL: https://issues.apache.org/jira/browse/HIVE-19207 > Project: Hive > Issue Type: Sub-task > Components: Streaming > Affects Versions: 3.1.0, 3.0.0 > Reporter: Prasanth Jayachandran > Assignee: Alan Gates > Priority: Major > Attachments: HIVE-19207.patch > > > Add support for Avro record writer in streaming ingest. -- This message was sent by Atlassian Jira (v8.3.2#803003)