Some options are covered here, although there is no definitive guidance as far as I know:
https://cwiki.apache.org/confluence/display/Hive/Unit+Testing+Hive+SQL#UnitTestingHiveSQL-Modularisation On 15 December 2016 at 17:08, Saumitra Shahapure < saumitra.offic...@gmail.com> wrote: > Hello, > > We are running and maintaining quite big and complex Hive SELECT query > right now. It's basically a single SELECT query which performs JOIN of > about ten other SELECT query outputs. > > A simplest way to refactor that we can think of is to break this query > down into multiple views and then join the views. There is similar > possibility to create intermediate tables. > > However creating multiple DDLs in order to maintain a single DML is not > very smooth. We would end up polluting metadata database by creating views > / intermediate tables which are used in just this ETL. > > What are the other efficient ways to maintain complex SQL queries written > in Hive? Are there better ways to break Hive query into multiple modules? > > -- Saumitra S. Shahapure >