I also ran into same dilemma..here is something that I found easier and
working for me .. I compiled some sources from http://www.json.org/java/
import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.util.List;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
import org.apache.pig.data.TupleFactory;
import org.json.JSONArray;
import org.json.JSONException;
import org.json.JSONObject;
public class JsonParser extends EvalFunc<Tuple> {
@Override
public Tuple exec(Tuple input) throws IOException {
TupleFactory tf = TupleFactory.getInstance();
Tuple t = tf.newTuple();
if ( input.get(0) != null ){
String inString = (String) input.get(0);
try {
JSONObject jsn = new JSONObject(inString);
t.append(getJsonArr(jsn));
} catch (JSONException e) {
e.printStackTrace();
}
}
return t;
}
private String getJsonArr(JSONObject jsn) {
String jsnArrVal = "";
try {
if (!jsn.has("jsonKey"))
return null;
JSONArray jTagArray = jsn.getJSONArray("jsonKey");
for (int i=0; i<jTagArray.length(); i++){
JSONObject hst = jTagArray.getJSONObject(i);
String jsnArrVal = hst.getString("text") + jsnArrVal;
}
} catch (JSONException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return jsnArrVal;
}
}
On Mon, Nov 19, 2012 at 11:35 AM, Russell Jurney
<[email protected]>wrote:
> Ok, its even worse. My data is a big array.
>
> Am I being negative in saying that JSON and Pig is like a nightmare?
>
>
> On Mon, Nov 19, 2012 at 2:33 PM, Russell Jurney <[email protected]
> >wrote:
>
> > Wait... com.twitter.elephantbird.pig.load.JsonLoader() does not infer the
> > schema from a record. This is what I was looking for. Looks like I have
> to
> > write that myself.
> >
> > And yes, I understand the tradeoffs in doing so. Assuming a sample is the
> > overall schema is a big assumption.
> >
> >
> >
> > On Mon, Nov 19, 2012 at 2:30 PM, Russell Jurney <
> [email protected]>wrote:
> >
> >> Talking to myself... never mind, guava and json-simple are included with
> >> Pig.
> >>
> >>
> >> On Mon, Nov 19, 2012 at 2:27 PM, Russell Jurney <
> [email protected]
> >> > wrote:
> >>
> >>> Got it building. Are google collections and json-simple external deps?
> >>>
> >>>
> >>> On Mon, Nov 19, 2012 at 11:23 AM, Russell Jurney <
> >>> [email protected]> wrote:
> >>>
> >>>> It seems that everyone can build elephant-bird but me:
> >>>> https://github.com/kevinweil/elephant-bird/issues/272
> >>>>
> >>>>
> >>>> On Sun, Nov 18, 2012 at 7:31 PM, Arian Pasquali <
> >>>> [email protected]> wrote:
> >>>>
> >>>>> I dont think you really need to build it.
> >>>>> you can find it at any maven repository.
> >>>>>
> >>>>> Arian Rodrigo Pasquali
> >>>>> FEUP, SAPO Labs
> >>>>> http://www.arianpasquali.com
> >>>>> twitter @arianpasquali
> >>>>>
> >>>>>
> >>>>>
> >>>>> 2012/11/18 Arian Pasquali <[email protected]>
> >>>>>
> >>>>> > U dont need to build neither
> >>>>> > Just download those two jar I used in my example.
> >>>>> >
> >>>>> > Arian
> >>>>> >
> >>>>> > Em domingo, 18 de novembro de 2012, Russell Jurney escreveu:
> >>>>> >
> >>>>> >> Thanks - looks like I don't have to specify the schema, which is
> >>>>> good.
> >>>>> >>
> >>>>> >> I'll try and build elephant-bird.
> >>>>> >>
> >>>>> >> Russell Jurney http://datasyndrome.com
> >>>>> >>
> >>>>> >> On Nov 17, 2012, at 9:30 PM, Arian Pasquali <
> >>>>> [email protected]>
> >>>>> >> wrote:
> >>>>> >>
> >>>>> >> > keep calm
> >>>>> >> > and use elephant-bird
> >>>>> >> > https://github.com/kevinweil/elephant-bird<
> >>>>> >>
> >>>>>
> https://github.com/kevinweil/elephant-bird/blob/master/pig/src/main/java/com/twitter/elephantbird/pig/load/JsonLoader.java
> >>>>> >> >
> >>>>> >> >
> >>>>> >> > I posted here yesterday an example how to load tweets in json
> >>>>> >> > here goes again. I hope it helps.
> >>>>> >> >
> >>>>> >> > register 'elephant-bird-core-3.0.0.jar'
> >>>>> >> > register 'elephant-bird-pig-3.0.0.jar'
> >>>>> >> > register 'google-collections-1.0.jar'
> >>>>> >> > register 'json-simple-1.1.jar'
> >>>>> >> >
> >>>>> >> > json_lines = LOAD
> >>>>> >> > '/twitter_data/tweets/stream/v1/json/2012_10_10/08' USING
> >>>>> >> > com.twitter.elephantbird.pig.load.JsonLoader();
> >>>>> >> >
> >>>>> >> > geo_tweets = FOREACH json_lines GENERATE (CHARARRAY) $0#'id'
> AS
> >>>>> >> > id, (CHARARRAY) $0#'geoLocation' AS geoLocation;
> >>>>> >> >
> >>>>> >> > only_not_nulls = FILTER geo_tweets BY geoLocation is not
> null;
> >>>>> >> > store only_not_nulls into '/twitter_data/results/geo_tweets';
> >>>>> >> >
> >>>>> >> >
> >>>>> >> >
> >>>>> >> > Arian Rodrigo Pasquali
> >>>>> >> > FEUP, SAPO Labs
> >>>>> >> > http://www.arianpasquali.com
> >>>>> >> > twitter @arianpasquali
> >>>>> >> >
> >>>>> >> >
> >>>>> >> >
> >>>>> >> > 2012/11/18 Dan Young <[email protected]>
> >>>>> >> >
> >>>>> >> >> No sure if this helps, but in 0.11 I've been using this on EMR
> >>>>> for
> >>>>> >> some of
> >>>>> >> >> our JSON data....
> >>>>> >> >>
> >>>>> >> >> raw = load
> >>>>> 'hdfs:///cleaned_logs/clicks2/$year_id/$month_id/part-*'
> >>>>> >> USING
> >>>>> >> >>
> >>>>> >> >>
> >>>>> >>
> >>>>>
> JsonLoader('a:chararray,at:chararray,c1:(url:chararray,useragent:chararray,referrer:chararray,window:(innerheight:chararray,innerwidth:chararray,outerheight:chararray,outerwidth:chararray),resolution:(height:chararray,width:chararray)),cst:chararray,d:(a:chararray,b:chararray),i:chararray,id:chararray,ip:chararray,k:chararray,l:(lat:chararray,lng:chararray),p:chararray,pv:chararray,sa:chararray,sid:chararray,sst:chararray,t:chararray,uuid:chararray,v:chararray');
> >>>>> >> >>
> >>>>> >> >>
> >>>>> >> >> Regards,
> >>>>> >> >>
> >>>>> >> >> Dano
> >>>>> >> >>
> >>>>> >> >>
> >>>>> >> >>
> >>>>> >> >> On Sat, Nov 17, 2012 at 3:09 PM, Russell Jurney <
> >>>>> >> [email protected]
> >>>>> >> >>> wrote:
> >>>>> >> >>
> >>>>> >> >>> I have some JSON data with a uniform schema. I want to load it
> >>>>> in Pig.
> >>>>> >> >>> JsonStorage doesn't work, because the data has no schema.
> >>>>> >> >>>
> >>>>> >> >>> How can I load JSON data in Pig?
> >>>>> >> >>>
> >>>>> >> >>> --
> >>>>> >> >>> Russell Jurney twitter.com/rjurney [email protected]
> >>>>> >> >>> datasyndrome.com
> >>>>> >> >>>
> >>>>> >> >>
> >>>>> >>
> >>>>> >
> >>>>> >
> >>>>> > --
> >>>>> > Sent from Gmail Mobile
> >>>>> >
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Russell Jurney twitter.com/rjurney [email protected]
> >>>> datasyndrome.com
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Russell Jurney twitter.com/rjurney [email protected]
> >>> .com
> >>>
> >>
> >>
> >>
> >> --
> >> Russell Jurney twitter.com/rjurney [email protected].
> >> com
> >>
> >
> >
> >
> > --
> > Russell Jurney twitter.com/rjurney [email protected].
> > com
> >
>
>
>
> --
> Russell Jurney twitter.com/rjurney [email protected]
> datasyndrome.com
>