Some options are covered here, although there is no definitive guidance as
far as I know:

https://cwiki.apache.org/confluence/display/Hive/Unit+Testing+Hive+SQL#UnitTestingHiveSQL-Modularisation

On 15 December 2016 at 17:08, Saumitra Shahapure <
saumitra.offic...@gmail.com> wrote:

> Hello,
>
> We are running and maintaining quite big and complex Hive SELECT query
> right now. It's basically a single SELECT query which performs JOIN of
> about ten other SELECT query outputs.
>
> A simplest way to refactor that we can think of is to break this query
> down into multiple views and then join the views. There is similar
> possibility to create intermediate tables.
>
> However creating multiple DDLs in order to maintain a single DML is not
> very smooth. We would end up polluting metadata database by creating views
> / intermediate tables which are used in just this ETL.
>
> What are the other efficient ways to maintain complex SQL queries written
> in Hive? Are there better ways to break Hive query into multiple modules?
>
> -- Saumitra S. Shahapure
>

Reply via email to