Hi Narayanan, We have had some success with a similar use case using a custom input format / record reader to recursively split arbitrary json into a set of discreet records at runtime. No schema is needed. Doing something similar might give you the functionality you are looking for. https://github.com/quicklyNotQuigley/nest
* Nested objects are output as an individual row. * Arrays of objects are output as a set of individual rows. * Arrays of primitives are left as array fields, which you can explode, etc. On Tue, Apr 8, 2014 at 7:25 AM, Narayanan K <knarayana...@gmail.com> wrote: > Thanks Yong! > > On Mon, Apr 7, 2014 at 5:07 PM, java8964 <java8...@hotmail.com> wrote: > > Hi, Narayanan: > > > > The current problem is that for a generic solution, there is no way that > we > > know that element in the Json is an array. Keep in mind that in any > element > > of Json, it could be any valid structure. So it could be array, another > > structure, or map etc. > > > > You know your data, so you can say in this level, it is array. But > computer > > doesn't know, that is why you need to provide a schema. > > > > Think about it, in programming, we can cast that to array, but normally > that > > is NOT a good solution, so for a generic solution like any hadoop json > UDF, > > it will and should ask for a schema. > > > > For you case, if you know the data, it gets to be array, then write your > own > > UDF to cast it to an array, without any schema. But I don't think any > good, > > generic Json UDFs will support that for your case. > > > > Yong > > > >> Date: Mon, 7 Apr 2014 16:47:44 -0700 > >> Subject: Re: get_json_object for nested field returning a String instead > >> of an Array > >> From: knarayana...@gmail.com > >> To: user@hive.apache.org > > > >> > >> Thanks Peyman. > >> > >> Actually the problem with Hive-Json-Serde is that we need to provide > >> the entire schema upfront while creating the table. > >> > >> My requirement is that we just project/aggregate on the fields using > >> get_json_object after creating the external table without schema. This > >> way the external table is agnostic to any new schema changes. > >> > >> Would love to get a solution for converting get_json_object to return > >> an Array instead of a string.. Can we use any Hive UDFs to convert > >> string into an explodable Array object ? > >> > >> Thanks > >> Narayanan > >> > >> On Mon, Apr 7, 2014 at 4:14 PM, Peyman Mohajerian <mohaj...@gmail.com> > >> wrote: > >> > perhaps: https://github.com/rcongiu/Hive-JSON-Serde > >> > > >> > > >> > On Mon, Apr 7, 2014 at 6:52 PM, Narayanan K <knarayana...@gmail.com> > >> > wrote: > >> >> > >> >> Hi all > >> >> > >> >> I am using get_json_object to read a json text file. I have created > >> >> the external table as below : > >> >> > >> >> CREATE EXTERNAL TABLE EXT_TABLE ( json string) > >> >> PARTITIONED BY (dt string) > >> >> LOCATION '/users/abc/'; > >> >> > >> >> > >> >> The json data has some fields that are not simple fields but fields > >> >> which are nested fields like - "field" : [{"id":1},{"id":2}.. ]. > >> >> > >> >> While using the get_json_object to retrieve that field, it is > >> >> returning back a string instead of an Array. Hence I am not able to > >> >> explode the array as it is a string. > >> >> > >> >> Is there some way we can get an array of get_json_object instead of a > >> >> string so that we can perform explode on this nested field ? or > Anyway > >> >> we can convert the string into an array so that I can use explode ? > >> >> > >> >> Thanks in advance, > >> >> Narayanan > >> > > >> > >