Hi Jerome, thanks...I've already found "Brickhouse" and the Hive UDFs, but it didn't help.
Today I'll try again to process the json file after going through all my mails...maybe I'll find a solution. Best, Chris On Fri, Jun 20, 2014 at 7:16 PM, Jerome Banks <jba...@tagged.com> wrote: > Christian, > Sorry to spam this newsgroup, and this is not a commercial endorsement, > but check out the Hive UDFs in the Brickhouse project ( > http://github.com/klout/brickhouse ) ( > http://brickhouseconfessions.wordpress.com/2014/02/07/hive-and-json-made-simple/ > ) > > You can convert arbitrary complex Hive structures to an from json with > it's to_json and from_json UDF's. See the blog posting for an explanation. > > -- jerome > > > On Fri, Jun 20, 2014 at 8:26 AM, Christian Link <christian.l...@mdmp.com> > wrote: > >> hi, >> >> I'm very, very new to Hadoop, Hive, etc. and I have to import data into >> hive tables. >> >> Environment: Amazon EMR, S3, etc. >> >> The input file is on S3 and I copied it into my HDFS. >> >> 1. flat table with one column and loaded data into it: >> >> CREATE TABLE mdmp_raw_data (json_record STRING); >> LOAD DATA INPATH 'hdfs:///input-api/1403181319.json' OVERWRITE INTO >> TABLE `mdmp_raw_data`; >> That worked, I can access some data, like this: >> >> SELECT d.carrier, d.language, d.country >> FROM mdmp_raw_data a LATERAL VIEW json_tuple(a.data, >> 'requestTimestamp', 'context') b AS requestTimestamp, context >> LATERAL VIEW json_tuple(b.context, 'locale') c AS locale >> LATERAL VIEW json_tuple(c.locale, 'carrier', 'language', 'country') d >> AS carrier, language, country >> LIMIT 1; >> >> Result: o2 - de Deutsch Deutschland >> >> I can also select the array at once: >> >> SELECT b.requestTimestamp, b.batch >> FROM mdmp_raw_data a >> LATERAL VIEW json_tuple(a.data, 'requestTimestamp', 'batch') b AS >> requestTimestamp, batch >> LIMIT 1; >> This will give me: >> >> >> [{"timestamp":"2014-06-19T14:25:18+02:00","requestId":"2ca08247-5542-4cb4-be7e-4a8574fb77a8","sessionId":"f29ec175ca6b7d10","event":"TEST >> Doge >> Comments","userId":"doge96514016ruffruff","action":"track","context":{"library":"analytics-android","libraryVersion":"0.6.13"},"properties":{"comment":"Much >> joy."}}, ...] >> >> This "batch" may contain n events will a structure like above. >> >> I want to put all events in a table where each "element" will be stored >> in a unique column: timestamp, requestId, sessionId, event, userId, action, >> context, properties >> >> 2. explode the "batch" I read a lot about SerDe, etc. - but I don't get >> it. >> >> - I tried to create a table with an array and load the data into it - >> several errors >> use explode in query but it doesn't accept "batch" as array >> - integrated several SerDes but get things like "unknown function jspilt" >> - I'm lost in too many documents, howtos, etc. and could need some >> advices... >> >> Thank you in advance! >> >> Best, Chris >> > >