Hi, Chris, I like the Json serde solution better, but there's another alternative to achieve what you're trying to do. Using the Brickhouse's json_split UDF (https://github.com/klout/brickhouse) and the mdmp_raw_data table you already have, one can do the following:
ADD JAR file:///.../brickhouse-0.7.0-SNAPSHOT.jar; CREATE TEMPORARY FUNCTION json_split AS 'brickhouse.udf.json.JsonSplitUDF'; SELECT d.* FROM ( SELECT json_split(b.batch) as events FROM mdmp_raw_data a LATERAL VIEW json_tuple(a.json_record, 'batch') b AS batch ) events_view LATERAL VIEW explode(events_view.events) c AS event LATERAL VIEW json_tuple(c.event, 'action', 'context', 'event', 'properties', 'requestId', 'sessionId', 'timestamp', 'traits', 'userId') d AS action, context, event, properties, requestId, sessionId, timestamp, traits, userId ; With the SerDe, we'd have: ADD JAR file:///.../json-serde-1.1.9.3-SNAPSHOT.jar; CREATE TABLE mdmp_raw_data ( action string, batch array< struct< action:string, context:map<string,string>, event:string, properties:map<string,string>, requestId:string, sessionId:string, timestamp:string, traits:map<string,string>, userId:string > >, context struct< build:map<string,string>, device:struct< brand:string, manufacturer:string, model:string, release:string, sdk:int >, display:struct< density:double, height:int, width:int >, integrations:map<string,boolean>, library:string, libraryVersion:string, locale:map<string,string>, location:map<string,string>, telephony:map<string,string>, wifi:map<string,boolean> >, received_at string, requestTimestamp string, writeKey string ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' STORED AS TEXTFILE; LOAD DATA LOCAL INPATH 'test.json' OVERWRITE INTO TABLE mdmp_raw_data; SELECT b.batch.action, b.batch.context, b.batch.event, b.batch.properties, b.batch.requestId, b.batch.sessionId, b.batch.timestamp, b.batch.traits, b.batch.userId FROM mdmp_raw_data LATERAL VIEW explode(mdmp_raw_data.batch) b AS batch ; On 21 June 2014 01:53, Christian Link <christian.l...@mdmp.com> wrote: > aha! > ok...let me try this > > hmmm... I just run a master and two core nodes...I guess the jar needs to > be put somewhere on the core nodes? > > Thanks! > > > On Fri, Jun 20, 2014 at 5:37 PM, Edward Capriolo <edlinuxg...@gmail.com> > wrote: > >> I've also added the jar to my hive session with ADD JAR... >> >> This is not correct. For input formats/serde the file must be in your >> auxlib >> >> mkdir $HIVE_HOME/auxlib >> cp <whatever.jar> $HIVE_HOME/auxlib >> >> Really good json support >> https://github.com/rcongiu/Hive-JSON-Serde >> >> >> On Fri, Jun 20, 2014 at 11:35 AM, Christian Link <christian.l...@mdmp.com >> > wrote: >> >>> Hey Douglas, >>> >>> thanks a lot. >>> That's what I've tried - but it didn't work for me. >>> >>> I've created table with ROW FORMAT "one of the SerDe" and an array which >>> looked like the array in the json file. >>> >>> I've also added the jar to my hive session with ADD JAR... >>> >>> Loading data into that table with the array failed. >>> >>> I'm totally new to this and I guess I'm doing wrong right a the start >>> (loading data into a hive table or so)... >>> >>> Best, >>> Chris >>> >>> >>> On Fri, Jun 20, 2014 at 5:29 PM, Douglas Moore < >>> douglas.mo...@thinkbiganalytics.com> wrote: >>> >>>> Hi Chris, >>>> >>>> Try this Serde <https://github.com/rcongiu/Hive-JSON-Serde>, I looked >>>> at 6 of them a while back and this one was the best, including dealing with >>>> nested JSON structures: >>>> https://github.com/rcongiu/Hive-JSON-Serde >>>> >>>> - Douglas >>>> >>>> >>> >> > -- André Araújo Big Data Consultant/Solutions Architect The Pythian Group - Australia - www.pythian.com Office (calls from within Australia): 1300 366 021 x1270 Office (international): +61 2 8016 7000 x270 *OR* +1 613 565 8696 x1270 Mobile: +61 410 323 559 Fax: +61 2 9805 0544 IM: pythianaraujo @ AIM/MSN/Y! or ara...@pythian.com @ GTalk “Success is not about standing at the top, it's the steps you leave behind.” — Iker Pou (rock climber) -- --