Hi,
We started to work with Avro in CDH4 and to query the Avro files using
Hive.
This does work fine for us, except for unions.
We do not understand how to query the data inside a union using Hive.

For example, let's look at the following schema:

{
  "type":"record",
  "name":"event",
  "namespace":"com.mysite",
  "fields":[

    {
        "name":"eventbody",
        "type":{
            "type":"record", "name":"eventbody",
            "fields":[
                {
                    "name":"body",
                    "type":[
                       "null",
                       {
                        "type":"record",
                        "name":"event1",
                        "fields":[
                            {
                                "name":"event1Header",
                                "type":["null", { "type":"array",
"items":"string" }], "default":null
                            },
                            {
                                "name":"event1Body",
                                "type":["null", { "type":"array",
"items":"string" }], "default":null
                            }
                        ]
                    },
                   {
                        "type":"record",
                        "name":"event2",
                        "fields":[
                            {
                                "name":"page",
                                "type":{
                                    "type":"record", "name":"URL",
"fields":[{ "name":"url", "type":"string" }]
                                },
                                "default":null
                            },
                            {
                                "name":"referrer", "type":"string",
"default":null
                            }
                        ]
                    }
                ],
                    "default":null
                }
            ]
        },
        "default":null
    }
]}

Note that "body" is a union of three types:
null, "event1" and "event2"

If I run such a query:

SELECT eventbody.body from SRC;

I get line like this:
{2:{"page":{"url":"http://www.musite.com/index.jsp"},"referrer":{"url":";
www.search.com"}}}

The number "2" in the beginning of the JSON structure represents "events2"
union because it is the third element in the union.

My question then: If I want to query fields inside event2. E.g., the
page.url or the referrer fields how do I construct the select statement?


Thank you,
Ran

-- 
This message may contain confidential and/or privileged information. 
If you are not the addressee or authorized to receive this on behalf of the 
addressee you must not use, copy, disclose or take action based on this 
message or any information herein. 
If you have received this message in error, please advise the sender 
immediately by reply email and delete this message. Thank you.

Reply via email to