Hi Narayanan,

We have had some success with a similar use case using a custom input
format / record reader to recursively split arbitrary json into a set of
discreet records at runtime. No schema is needed. Doing something similar
might give you the functionality you are looking for.
https://github.com/quicklyNotQuigley/nest

* Nested objects are output as an individual row.
* Arrays of objects are output as a set of individual rows.
* Arrays of primitives are left as array fields, which you can explode, etc.


On Tue, Apr 8, 2014 at 7:25 AM, Narayanan K <knarayana...@gmail.com> wrote:

> Thanks Yong!
>
> On Mon, Apr 7, 2014 at 5:07 PM, java8964 <java8...@hotmail.com> wrote:
> > Hi, Narayanan:
> >
> > The current problem is that for a generic solution, there is no way that
> we
> > know that element in the Json is an array. Keep in mind that in any
> element
> > of Json, it could be any valid structure. So it could be array, another
> > structure, or map etc.
> >
> > You know your data, so you can say in this level, it is array. But
> computer
> > doesn't know, that is why you need to provide a schema.
> >
> > Think about it, in programming, we can cast that to array, but normally
> that
> > is NOT a good solution, so for a generic solution like any hadoop json
> UDF,
> > it will and should ask for a schema.
> >
> > For you case, if you know the data, it gets to be array, then write your
> own
> > UDF to cast it to an array, without any schema. But I don't think any
> good,
> > generic Json UDFs will support that for your case.
> >
> > Yong
> >
> >> Date: Mon, 7 Apr 2014 16:47:44 -0700
> >> Subject: Re: get_json_object for nested field returning a String instead
> >> of an Array
> >> From: knarayana...@gmail.com
> >> To: user@hive.apache.org
> >
> >>
> >> Thanks Peyman.
> >>
> >> Actually the problem with Hive-Json-Serde is that we need to provide
> >> the entire schema upfront while creating the table.
> >>
> >> My requirement is that we just project/aggregate on the fields using
> >> get_json_object after creating the external table without schema. This
> >> way the external table is agnostic to any new schema changes.
> >>
> >> Would love to get a solution for converting get_json_object to return
> >> an Array instead of a string.. Can we use any Hive UDFs to convert
> >> string into an explodable Array object ?
> >>
> >> Thanks
> >> Narayanan
> >>
> >> On Mon, Apr 7, 2014 at 4:14 PM, Peyman Mohajerian <mohaj...@gmail.com>
> >> wrote:
> >> > perhaps: https://github.com/rcongiu/Hive-JSON-Serde
> >> >
> >> >
> >> > On Mon, Apr 7, 2014 at 6:52 PM, Narayanan K <knarayana...@gmail.com>
> >> > wrote:
> >> >>
> >> >> Hi all
> >> >>
> >> >> I am using get_json_object to read a json text file. I have created
> >> >> the external table as below :
> >> >>
> >> >> CREATE EXTERNAL TABLE EXT_TABLE ( json string)
> >> >> PARTITIONED BY (dt string)
> >> >> LOCATION '/users/abc/';
> >> >>
> >> >>
> >> >> The json data has some fields that are not simple fields but fields
> >> >> which are nested fields like - "field" : [{"id":1},{"id":2}.. ].
> >> >>
> >> >> While using the get_json_object to retrieve that field, it is
> >> >> returning back a string instead of an Array. Hence I am not able to
> >> >> explode the array as it is a string.
> >> >>
> >> >> Is there some way we can get an array of get_json_object instead of a
> >> >> string so that we can perform explode on this nested field ? or
> Anyway
> >> >> we can convert the string into an array so that I can use explode ?
> >> >>
> >> >> Thanks in advance,
> >> >> Narayanan
> >> >
> >> >
>

Reply via email to