So no one knows about this ? I was hoping to use some knowledge already acquired on this subject :(
On Tue, Apr 11, 2017 at 2:09 AM, S G <sg.online.em...@gmail.com> wrote: > Hi, > > There is a concept of JsonSerDe where you need to specify a structure for > your tables in order to query them. > > However, since the schema for an object is prone to change (once every few > months is not unexpected), how do you handle that change in your hive/pig > queries? > > Moreover, since JSON files are not demarcated according to schema, it is > possible that a single JSON file has json-data for multiple evolutions of a > schema (Like 10 objects of ClassAnimal1, 20 of ClassAnimal2, 100 of > ClassAnimal3 etc where ClassAnimal1, ClassAnimal2 and ClassAnimal3 > represent schema for ClassAnimal at different times). > > For such a JSON file, what is the recommended way of querying? > > I know that Avro solves this problem by maintaining a single file for a > single-kind of schema. So it will have 3 files for the above case, 1 each > for ClassAnimal1, ClassAnimal2 and ClassAnimal3) > > But since Avro is binary, hard to debug and requires a schema-repository > (for non-hive use-cases), we were hoping to solve this problem in JSON. > > Related questions: > 1) Is it even a problem worth solving? > 2) How many people use AvroSerDe as compared to JsonSerDe? > > Thanks > SG > >