I've been using twitter's elephantbird and have been very happy with
it so far. Here's an example of parsing a nested json with it:
json_eb = LOAD '$IN_DIRS' USING
com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') as
(json:map[]);
--parse json with twitter's library
parsed0 = FOREACH json_eb GENERATE STRSPLIT(json#'id',':').$2 AS
tweetId:chararray, STRSPLIT(json#'actor'#'id',':').$2 AS
userId:chararray, json#'postedTime' AS postedTime:chararray,
json#'twitter_entities'#'urls' AS
userPostedLinks:bag{T:(urlTypes:map[])};
On Wed, May 22, 2013 at 10:01 AM, Thomas Edison
<[email protected]> wrote:
> Hi all,
>
> I have a two fields in my pig input file. Let's say product_id and
> description. Description is a JSON objects that actually describes the
> product.
>
> Is there anything in Pig other than writing a custom UDF to parse the JSON
> object so that I can have some like product_id, product_property,
> product_property_value? Product_property and product_value are parsed from
> the description JSON object. Also one product could have multiple
> product_property.
>
> Thanks.
>
> T.E.