Hello,

At the 2018 DataWorks conference in Berlin, Hotels.com presented Waggle
Dance <https://github.com/HotelsDotCom/waggle-dance>, a tool for federating
multiple Hive clusters and providing the illusion of a unified data catalog
from disparate instances.

We’ve been running Waggle Dance in production for well over a year and it
has formed a critical part of our data platform architecture and
infrastructure.We believe that this type of functionality will be of
increasing importance as Hadoop and Hive workloads migrate to the cloud.
While Waggle Dance is one solution, significant benefits could be realized
if these kinds of abilities were an integral part of the Hive platform.

If this sounds of interest, I've created a proposal on the Hive wiki. I've
outlined why we think such a feature is needed in Hive, the benefits gained
by offering it as a built-in feature, and representation of a possible
implementation. Our proposed implementation draws inspiration from the
remote table features present in some traditional RDBMSes, which may
already be familiar to you.

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=80452092

Feedback gratefully accepted,

Elliot.

Senior Engineer
Big Data Platform Team
Hotels.com

Reply via email to