Hi, Chris,

I like the Json serde solution better, but there's another alternative to
achieve what you're trying to do.
Using the Brickhouse's json_split UDF (https://github.com/klout/brickhouse)
and the mdmp_raw_data table you already have, one can do the following:

ADD JAR file:///.../brickhouse-0.7.0-SNAPSHOT.jar;
CREATE TEMPORARY FUNCTION json_split AS 'brickhouse.udf.json.JsonSplitUDF';

SELECT d.*
FROM
  (
    SELECT json_split(b.batch) as events
    FROM mdmp_raw_data a
    LATERAL VIEW json_tuple(a.json_record, 'batch') b AS batch
  ) events_view
  LATERAL VIEW explode(events_view.events) c AS event
  LATERAL VIEW json_tuple(c.event, 'action', 'context', 'event',
'properties', 'requestId', 'sessionId', 'timestamp', 'traits', 'userId') d
AS action, context, event, properties, requestId, sessionId, timestamp,
traits, userId
;


With the SerDe, we'd have:

ADD JAR file:///.../json-serde-1.1.9.3-SNAPSHOT.jar;
CREATE TABLE mdmp_raw_data (
  action string,
  batch array<
          struct<
            action:string,
            context:map<string,string>,
            event:string,
            properties:map<string,string>,
            requestId:string,
            sessionId:string,
            timestamp:string,
            traits:map<string,string>,
            userId:string
          >
        >,
  context struct<
            build:map<string,string>,
            device:struct<
                     brand:string,
                     manufacturer:string,
                     model:string,
                     release:string,
                     sdk:int
                   >,
            display:struct<
                      density:double,
                      height:int,
                      width:int
                    >,
            integrations:map<string,boolean>,
            library:string,
            libraryVersion:string,
            locale:map<string,string>,
            location:map<string,string>,
            telephony:map<string,string>,
            wifi:map<string,boolean>
          >,
  received_at string,
  requestTimestamp string,
  writeKey string
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH  'test.json' OVERWRITE INTO TABLE mdmp_raw_data;

SELECT
  b.batch.action,
  b.batch.context,
  b.batch.event,
  b.batch.properties,
  b.batch.requestId,
  b.batch.sessionId,
  b.batch.timestamp,
  b.batch.traits,
  b.batch.userId
FROM mdmp_raw_data
  LATERAL VIEW explode(mdmp_raw_data.batch) b AS batch
;



On 21 June 2014 01:53, Christian Link <christian.l...@mdmp.com> wrote:

> aha!
> ok...let me try this
>
> hmmm... I just run a master and two core nodes...I guess the jar needs to
> be put somewhere on the core nodes?
>
> Thanks!
>
>
> On Fri, Jun 20, 2014 at 5:37 PM, Edward Capriolo <edlinuxg...@gmail.com>
> wrote:
>
>> I've also added the jar to my hive session with ADD JAR...
>>
>> This is not correct. For input formats/serde the file must be in your
>> auxlib
>>
>> mkdir $HIVE_HOME/auxlib
>> cp <whatever.jar> $HIVE_HOME/auxlib
>>
>> Really good json support
>> https://github.com/rcongiu/Hive-JSON-Serde
>>
>>
>> On Fri, Jun 20, 2014 at 11:35 AM, Christian Link <christian.l...@mdmp.com
>> > wrote:
>>
>>> Hey Douglas,
>>>
>>> thanks a lot.
>>> That's what I've tried - but it didn't work for me.
>>>
>>> I've created table with ROW FORMAT "one of the SerDe" and an array which
>>> looked like the array in the json file.
>>>
>>> I've also added the jar to my hive session with ADD JAR...
>>>
>>> Loading data into that table with the array failed.
>>>
>>> I'm totally new to this and I guess I'm doing wrong right a the start
>>> (loading data into a hive table or so)...
>>>
>>> Best,
>>> Chris
>>>
>>>
>>> On Fri, Jun 20, 2014 at 5:29 PM, Douglas Moore <
>>> douglas.mo...@thinkbiganalytics.com> wrote:
>>>
>>>> Hi Chris,
>>>>
>>>> Try this Serde <https://github.com/rcongiu/Hive-JSON-Serde>, I looked
>>>> at 6 of them a while back and this one was the best, including dealing with
>>>> nested JSON structures:
>>>> https://github.com/rcongiu/Hive-JSON-Serde
>>>>
>>>> - Douglas
>>>>
>>>>
>>>
>>
>


-- 
André Araújo
Big Data Consultant/Solutions Architect
The Pythian Group - Australia - www.pythian.com

Office (calls from within Australia): 1300 366 021 x1270
Office (international): +61 2 8016 7000  x270 *OR* +1 613 565 8696   x1270
Mobile: +61 410 323 559
Fax: +61 2 9805 0544
IM: pythianaraujo @ AIM/MSN/Y! or ara...@pythian.com @ GTalk

“Success is not about standing at the top, it's the steps you leave behind.”
— Iker Pou (rock climber)

-- 


--



Reply via email to