Hi guys, As for elephant-bird, it seems that it is not compatible with Pig 0.10 (CDH4) :( I am using this configuration: pig -version Apache Pig version 0.10.0-cdh4.1.1 (rexported) hadoop version Hadoop 2.0.0-cdh4.1.1 and getting just the same error as Tim explained: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
I am running it with the following commands: REGISTER elephant-bird-pig-3.0.2.jar; inputData = LOAD 'sample_simple.json' USING com.twitter.elephantbird.pig.load.JsonLoader() as (json:map[]); DUMP inputData; On Thu, Sep 27, 2012 at 8:48 AM, Dmitriy Ryaboy <[email protected]> wrote: > Yep. It's just JsonLoader. > By default it works on top of whatever's returned by TexInputFormat, but > you can override that, as long as the input format returns a string that's > valid json, we are cool (so in theory you could write a > TwitterAPIInputFormat or something, and get the json in Pig, not that I > would recommend that). > > D > > On Wed, Sep 26, 2012 at 9:34 PM, Russell Jurney <[email protected] > >wrote: > > > Does that work without lzo? > > > > Russell Jurney http://datasyndrome.com > > > > On Sep 26, 2012, at 9:00 PM, Dmitriy Ryaboy <[email protected]> wrote: > > > > > Try asking Michael May on gihub? This seems to be an issue with his > > Loader.. > > > > > > The JsonLoader in ElephantBird should work in this case if you turn on > > > nested parsing ( > > > > > > https://github.com/kevinweil/elephant-bird/blob/master/pig/src/main/java/com/twitter/elephantbird/pig/load/JsonLoader.java > > > ) > > > > > > D > > > > > > On Wed, Sep 26, 2012 at 2:31 PM, Deepak Tiwari <[email protected]> > > wrote: > > > > > >> My bad.. I think I have compiled from > > >> https://github.com/mmay/PigJsonLoader/blob/master/JsonLoader.javalong > > >> time > > >> back in my piggybank area..it indeed didnt come with the original > jar... > > >> > > >> Regards, > > >> > > >> Deepak > > >> > > >> On Tue, Sep 25, 2012 at 8:14 AM, Bill Graham <[email protected]> > > wrote: > > >> > > >>> I missed the part about Piggybank, but I'm confused because I don't > see > > >>> that class in SVN: > > >>> > > >>> > > >> > > > http://svn.apache.org/viewvc/pig/branches/branch-0.10/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/ > > >>> > > >>> Either way your error seems to be issues with parsing the doubles. > > >>> > > >>> > > >>> On Mon, Sep 24, 2012 at 2:24 PM, Vivek Shrivastava < > > >>> [email protected] > > >>>> wrote: > > >>> > > >>>> Thanks for responding Bill, However I am using JsonLoader that is in > > >> the > > >>>> Piggybank with Pig-0.10.0. > > >>>> > > >>>> It doesnt need any schema and converts Json data as map ( > > >>>> org.apache.pig.piggybank.storage.JsonLoader() as (json:map[]) ) and > I > > >>>> extract data from there using keys. I have processed huge amount of > > >> data > > >>>> without any problem and no schema was required. > > >>>> > > >>>> Regards, > > >>>> > > >>>> Vivek > > >>>> > > >>>> On Mon, Sep 24, 2012 at 2:03 PM, Bill Graham <[email protected]> > > >>> wrote: > > >>>> > > >>>>> This loader only works for data stored using JsonStorage. From the > > >>>>> javadocs: > > >>>>> > > >>>>> A loader for data stored using > > >>>>> JsonStorage< > > >>>>> > > >>> > > >> > > > http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/builtin/JsonStorage.html > > >>>>>> . > > >>>>> > > >>>>> This is not a generic JSON loader. It depends on the schema being > > >> stored > > >>>>> with the data when conceivably you could write a loader that > > >> determines > > >>>>> the > > >>>>> schema from the JSON. > > >>>>> > > >>>>> Was this data produced via JsonStorage? If not, you'll need to > write > > a > > >>>>> custom loader. > > >>>>> > > >>>>> On Mon, Sep 24, 2012 at 12:04 PM, Deepak Tiwari < > > [email protected] > > >>>>>> wrote: > > >>>>> > > >>>>>> Hi, > > >>>>>> > > >>>>>> I am try to parse this data using Pig parser > > >>>>>> org.apache.pig.piggybank.storage.JsonLoader > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>> > > >>> > > >> > > > {"geo":{"type":"Polygon","coordinates":[[[-91.3061478,-30.2688069],[-91.012471,-60.2688069],[-91.012471,-69.9306357],[-91.3061478,-29.9306357]]]}, > > >>>>>> > > >>>>>> I need to extract this array > > >>>>>> > > >>>>>> > > >>>>> > > >>> > > >> > > > [[[-91.3061478,-30.2688069],[-91.012471,-60.2688069],[-91.012471,-69.9306357],[-91.3061478,-29.9306357]]] > > >>>>>> > > >>>>>> I am getting this error while accessing flatten(geo#'coordinates') > > >> , I > > >>>>>> think that's the limitation ( "only standard Pig type is > supported") > > >>> of > > >>>>> the > > >>>>>> the parser, but wondering if someone has any workaround > > >>>>>> > > >>>>>> "java.lang.RuntimeException: Unexpected data type > > >>>>>> org.codehaus.jackson.node.DoubleNode found in stream. Note only > > >>> standard > > >>>>>> Pig type is supported when you output from UDF/LoadFunc" > > >>>>>> > > >>>>>> > > >>>>>> Thanks very much, > > >>>>>> > > >>>>>> Deepak > > >>>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> -- > > >>>>> *Note that I'm no longer using my Yahoo! email address. Please > email > > >> me > > >>> at > > >>>>> [email protected] going forward.* > > >>>>> > > >>>> > > >>>> > > >>> > > >>> > > >>> -- > > >>> *Note that I'm no longer using my Yahoo! email address. Please email > me > > >> at > > >>> [email protected] going forward.* > > >>> > > >> > > >
