Hey folks!

Currently we're running Hive 1.1.0 on prem, with the inability to upgrade
it easily. My team has been tasked with the problem of obtaining column
level lineage information, and being able to map the flow of data through
our environment.

The current proposal is to establish a connection to Hive, using a Java
service, and rewrite the queries such that they have EXPLAIN EXTENDED
pre-pended to each query, iterating through the result set and then
attempting to parse the strings that Hive returns in order to establish the
lineage.

I have seen that Hive has an object called QueryPlan, which I am wondering
if we would be able to obtain in the first place using the Hive libraries?

Has anyone done this before, or something similar? Does anyone have any
other suggestions as to how we might be able to do this, given the v1.1.0
constraint?

Kind regards,

Damien Hawes

Reply via email to