I use MR to generate tables using Elephant-Bird's OutputFormat. Hive can read from EXTERNAL tables using ProtobufHiveSerde and ProtobufBlockInputFormat generated by Elephant-Bird. Create table statement looks like the following:
CREATE EXTERNAL TABLE IF NOT EXISTS TABLE_NAME ( ... ) ROW FORMAT SERDE 'elephantbird.proto.hive.serde.LzoXXXProtobufHiveSerde' STORED AS inputformat 'elephantbird.proto.mapred.input.DeprecatedLzoXXXProtobufBlockInputFormat' outputformat 'org.apache.hadoop.mapred.SequenceFileOutputFormat' LOCATION '/PATH'; So the solution is to use external tables. Let me know if it helps. On Thu, Sep 1, 2011 at 8:45 PM, Matias Silva <msi...@specificmedia.com> wrote: > Hi Everyone, is there any documentation regarding importing > GoogleProtocolBuffer files into Hive. I'm scouring over the internet > and the closest thing I came > across http://search-hadoop.com/m/9zF4MEW5Od1/v=plain > I saw something from Elephant-Bird where I can load the GPB file using pig > and then store it in a plain text format and then load > into Hive. It would be great if I can just load from GPB directly into > Hive. > Any pointers? > Thanks for your time and knowledge, > Matt > >