Re: JsonLoader schema field order shouldn't matter

Ruslan Al-Fakikh Thu, 04 Apr 2013 17:51:50 -0700

Tim,

have you resolved the issue of using the elephant-bird with pig 0.10?


meghana,

I am using just the same configuration:
pig -version
Apache Pig version 0.10.0-cdh4.1.1 (rexported)
hadoop version
Hadoop 2.0.0-cdh4.1.1
and getting just the same error as Tim explained:
java.lang.IncompatibleClassChangeError: Found interface
org.apache.hadoop.mapreduce.Counter, but class was expected

Can you please give an example of your Pig script? I am running it with the
following commands:
REGISTER elephant-bird-pig-3.0.2.jar;
inputData = LOAD 'sample_simple.json' USING
com.twitter.elephantbird.pig.load.JsonLoader() as (json:map[]);
DUMP inputData;

Thanks in advance


On Fri, Jan 11, 2013 at 7:35 AM, Dmitriy Ryaboy <[email protected]> wrote:

> Tim, can you open a github issue with EB about compiling against 0.10?
> I think this is an easy fix.
>
>
> On Tue, Jan 8, 2013 at 9:38 AM, Alan Gates <[email protected]> wrote:
>
> > I would open a new JIRA, since 1914 is focussed on building an
> alternative
> > that discovers schema, while you are wanting to improve the existing one.
> >
> > Alan.
> >
> > On Jan 7, 2013, at 5:02 PM, Tim Sell wrote:
> >
> > > This seems like a bug to me. It makes it risky to work with JSON data
> > > generated by something other than Pig since the ordering might change.
> > > What do you think?
> > >
> > > I didn't see a bug for it in Jira, so would this (still open) one be
> > > the place to mention it? Or should I make a new one?
> > > https://issues.apache.org/jira/browse/PIG-1914
> > >
> > > ~T
> > >
> > >
> > > On 7 January 2013 20:24, Alan Gates <[email protected]> wrote:
> > >> Currently the JsonLoader does assume ordering of the fields.  It does
> > not do any name matching against the given schema to find the right
> field.
> > >>
> > >> Alan.
> > >>
> > >> On Jan 7, 2013, at 11:56 AM, Tim Sell wrote:
> > >>
> > >>> When using JsonLoader with Pig 0.10.0
> > >>>
> > >>> if I have an input.json file that looks like this:
> > >>>
> > >>> {"date": "2007-08-25", "id": 16}
> > >>> {"date": "2007-09-08", "id": 17}
> > >>> {"date": "2007-09-15", "id": 18}
> > >>>
> > >>> And I use
> > >>>
> > >>> a = LOAD 'input.json' USING JsonLoader('id:int,date:chararray');
> > >>> DUMP a;
> > >>>
> > >>> I get errors when it tries to force the date fields into an integer.
> > >>>
> > >>> Shouldn't this work independent of the ordering of the schema fields?
> > >>> Json writers generally don't make guarantees about the ordering.
> > >>>
> > >>> One alternative (though annoying) would to be use elephant bird
> > >>> instead, but I can't get that to compile against hadoop 2.0.0 and Pig
> > >>> 0.10.0.
> > >>>
> > >>> ~Tim
> > >>
> >
> >
>

Re: JsonLoader schema field order shouldn't matter

Reply via email to