Hey folks! Currently we're running Hive 1.1.0 on prem, with the inability to upgrade it easily. My team has been tasked with the problem of obtaining column level lineage information, and being able to map the flow of data through our environment.
The current proposal is to establish a connection to Hive, using a Java service, and rewrite the queries such that they have EXPLAIN EXTENDED pre-pended to each query, iterating through the result set and then attempting to parse the strings that Hive returns in order to establish the lineage. I have seen that Hive has an object called QueryPlan, which I am wondering if we would be able to obtain in the first place using the Hive libraries? Has anyone done this before, or something similar? Does anyone have any other suggestions as to how we might be able to do this, given the v1.1.0 constraint? Kind regards, Damien Hawes