Hi Nishanth, While what you suggest is indeed feasible, it is not something that I'd recommend for the following reasons:
1. Consumers of the data will need to write conditional code in their HQL which will likely be difficult to write and maintain (although this might be unavoidable regardless). 2. Support for the union type in the Hive query engine is incomplete [1], and allows you to only get string representations of the union branch values. These will be difficult to interrogate. Certainly the code in HIVE-15434 [2] can remedy this, but this has not been merged so you'll need to build and deploy yourself. 3. Should your consumers later wish to query the table using some other data processing framework, they'll struggle to find support for reading the union type. Spark [3] and Flink are lacking IIRC. If you really are unable to make the joins more performant then I suggest you try some alternative data modeling approaches that do not require the union type. Largely we can reference the mapping strategies employed to represent class hierarchies in RDBMSes. In this context, you are already using 'one table per type'. To consolidate you could instead use single table with a discriminator field, or a single table with a nullable field per type. Either of these approaches will of course require that you modify your schema. (1) see warning here: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-UnionTypesunionUnionTypes (2) https://issues.apache.org/jira/browse/HIVE-15434 (3) https://issues.apache.org/jira/browse/SPARK-21529 Cheers - Elliot. On 31 July 2017 at 20:21, Nishanth S <nishanth.2...@gmail.com> wrote: > Hello All, > I have a set of avro schemas(6 of them) which do not have any relation > between them .The data in them is relatively small and are stored as 6 > different hive tables now . What I would want to do is to convert them > into a single hive table using avro unions .Is that something doable?.Some > of our queries have joins to these tables and it is affecting performance. > I am guessing one hive table will be a better approach. Can you chime in f > you have done something similar?.Any thoughts or pointers are highly > appreciated. > > Thanks, > Nishanth >