Hi! I'm planning to use Hive to query custom Avro logging records. I transfer data via Flume to HDFS and pick it up from there
The Flume event schema is {"type":"record","name":"Event","fields":[{"name":"headers","type":{"type":"map","values":"string"}},{"name":"body","type":"bytes"}]} which means that my custom records appear to Hive as the binary .body record field. There are three ways that I see for querying my custom record fields: (1) Store my custom record fields as Flume event headers. Then Hive can query them as-is, out of the box. (2) Use a different Flume event serializer that doesn't use the Flume event schema in HDFS, but rather my schema directly. This requires a custom Flume setup. (3) Modify data on import to Hive, or create a view on the data once it is in Hive. Do you have any suggestions for how to go about this, or which route is preferable? Best regards, Manuel -- Manuel Simoni, Engineering Consultant msim...@gmail.com | Tel: +43 (0)664 346 5158 | Skype: manuelsimoni