Re: JsonLoader schema field order shouldn't matter

Alan Gates Tue, 08 Jan 2013 09:38:49 -0800

I would open a new JIRA, since 1914 is focussed on building an alternative that 
discovers schema, while you are wanting to improve the existing one.


Alan.

On Jan 7, 2013, at 5:02 PM, Tim Sell wrote:

> This seems like a bug to me. It makes it risky to work with JSON data
> generated by something other than Pig since the ordering might change.
> What do you think?
> 
> I didn't see a bug for it in Jira, so would this (still open) one be
> the place to mention it? Or should I make a new one?
> https://issues.apache.org/jira/browse/PIG-1914
> 
> ~T
> 
> 
> On 7 January 2013 20:24, Alan Gates <[email protected]> wrote:
>> Currently the JsonLoader does assume ordering of the fields.  It does not do 
>> any name matching against the given schema to find the right field.
>> 
>> Alan.
>> 
>> On Jan 7, 2013, at 11:56 AM, Tim Sell wrote:
>> 
>>> When using JsonLoader with Pig 0.10.0
>>> 
>>> if I have an input.json file that looks like this:
>>> 
>>> {"date": "2007-08-25", "id": 16}
>>> {"date": "2007-09-08", "id": 17}
>>> {"date": "2007-09-15", "id": 18}
>>> 
>>> And I use
>>> 
>>> a = LOAD 'input.json' USING JsonLoader('id:int,date:chararray');
>>> DUMP a;
>>> 
>>> I get errors when it tries to force the date fields into an integer.
>>> 
>>> Shouldn't this work independent of the ordering of the schema fields?
>>> Json writers generally don't make guarantees about the ordering.
>>> 
>>> One alternative (though annoying) would to be use elephant bird
>>> instead, but I can't get that to compile against hadoop 2.0.0 and Pig
>>> 0.10.0.
>>> 
>>> ~Tim
>>

Re: JsonLoader schema field order shouldn't matter

Reply via email to