Re: get_json_object for nested field returning a String instead of an Array

2014-04-08 Thread David Quigley
Hi Narayanan, We have had some success with a similar use case using a custom input format / record reader to recursively split arbitrary json into a set of discreet records at runtime. No schema is needed. Doing something similar might give you the functionality you are looking for. https://githu

Re: Deserializing into multiple records

2014-04-08 Thread David Quigley
I am glad that I could help. > > > > Br, > > Petter > > > > > > 2014-04-04 6:02 GMT+02:00 David Quigley : > >> > >> Thanks again Petter, the custom input format was exactly what I needed. > >> Here is example of my code in case anyone is

Re: Deserializing into multiple records

2014-04-03 Thread David Quigley
d.myserde' > STORED AS INPUTFORMAT 'quigley.david.myinputformat' OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > LOCATION 'mylocation'; > > > Hope this helps. > > Br, > Petter > > > > > 2014-04-02 5:45 GMT+02:00 David Quigley : > > We are cur

Re: Deserializing into multiple records

2014-04-02 Thread David Quigley
hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > LOCATION 'mylocation'; > > > Hope this helps. > > Br, > Petter > > > > > 2014-04-02 5:45 GMT+02:00 David Quigley : > > We are currently streaming complex documents to hdfs with the hope of >>

Deserializing into multiple records

2014-04-01 Thread David Quigley
We are currently streaming complex documents to hdfs with the hope of being able to query. Each single document logically breaks down into a set of individual records. In order to use Hive, we preprocess each input document into a set of discreet records, which we save on HDFS and create an externa