Re: Deserializing into multiple records

2014-04-08 Thread David Quigley
The nested json was a hard requirement On Tue, Apr 8, 2014 at 5:52 AM, Edward Capriolo wrote: > Use avro or protobuf support. > > > On Tuesday, April 8, 2014, Petter von Dolwitz (Hem) < > petter.von.dolw...@gmail.com> wrote: > > Good stuff! > > > > I am glad that I could help. > > > > Br, > > Pe

Re: Deserializing into multiple records

2014-04-08 Thread Edward Capriolo
Use avro or protobuf support. On Tuesday, April 8, 2014, Petter von Dolwitz (Hem) < petter.von.dolw...@gmail.com> wrote: > Good stuff! > > I am glad that I could help. > > Br, > Petter > > > 2014-04-04 6:02 GMT+02:00 David Quigley : >> >> Thanks again Petter, the custom input format was exactly wh

Re: Deserializing into multiple records

2014-04-08 Thread Petter von Dolwitz (Hem)
Good stuff! I am glad that I could help. Br, Petter 2014-04-04 6:02 GMT+02:00 David Quigley : > Thanks again Petter, the custom input format was exactly what I needed. > > Here is example of my code in case anyone is interested > https://github.com/quicklyNotQuigley/nest > > Basically gives yo

Re: Deserializing into multiple records

2014-04-03 Thread David Quigley
Thanks again Petter, the custom input format was exactly what I needed. Here is example of my code in case anyone is interested https://github.com/quicklyNotQuigley/nest Basically gives you SQL access to arbitrary json data. I know there are solutions for dealing with JSON data in hive fields but

Re: Deserializing into multiple records

2014-04-02 Thread David Quigley
Makes perfect sense, thanks Petter! On Wed, Apr 2, 2014 at 2:15 AM, Petter von Dolwitz (Hem) < petter.von.dolw...@gmail.com> wrote: > Hi David, > > you can implement a custom InputFormat (extends > org.apache.hadoop.mapred.FileInputFormat) accompanied by a custom > RecordReader (implements org.a

Re: Deserializing into multiple records

2014-04-02 Thread Petter von Dolwitz (Hem)
Hi David, you can implement a custom InputFormat (extends org.apache.hadoop.mapred.FileInputFormat) accompanied by a custom RecordReader (implements org.apache.hadoop.mapred.RecordReader). The RecordReader will be used to read your documents and from there you can decide which units you will retur

Deserializing into multiple records

2014-04-01 Thread David Quigley
We are currently streaming complex documents to hdfs with the hope of being able to query. Each single document logically breaks down into a set of individual records. In order to use Hive, we preprocess each input document into a set of discreet records, which we save on HDFS and create an externa