Hi Nishanth,

While what you suggest is indeed feasible, it is not something that I'd
recommend for the following reasons:

   1. Consumers of the data will need to write conditional code in their
   HQL which will likely be difficult to write and maintain (although this
   might be unavoidable regardless).
   2. Support for the union type in the Hive query engine is incomplete
   [1], and allows you to only get string representations of the union branch
   values. These will be difficult to interrogate. Certainly the code
   in HIVE-15434 [2] can remedy this, but this has not been merged so you'll
   need to build and deploy yourself.
   3. Should your consumers later wish to query the table using some other
   data processing framework, they'll struggle to find support for reading the
   union type. Spark [3] and Flink are lacking IIRC.

If you really are unable to make the joins more performant then I suggest
you try some alternative data modeling approaches that do not require the
union type. Largely we can reference the mapping strategies employed to
represent class hierarchies in RDBMSes. In this context, you are already
using 'one table per type'. To consolidate you could instead use single
table with a discriminator field, or a single table with a nullable field
per type. Either of these approaches will of course require that you modify
your schema.

(1) see warning here:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-UnionTypesunionUnionTypes
(2) https://issues.apache.org/jira/browse/HIVE-15434
(3) https://issues.apache.org/jira/browse/SPARK-21529

Cheers - Elliot.

On 31 July 2017 at 20:21, Nishanth S <nishanth.2...@gmail.com> wrote:

> Hello All,
> I have a  set of avro schemas(6 of  them) which do not have any relation
> between  them .The data in them is relatively small and are stored as 6
> different hive  tables now . What I would want to do  is to convert them
> into  a single hive table using avro unions .Is that something doable?.Some
> of our queries have joins to these tables and it is affecting performance.
> I am guessing one hive table  will be a better approach. Can you chime in f
> you have done something similar?.Any thoughts or pointers are highly
> appreciated.
>
> Thanks,
> Nishanth
>

Reply via email to