[jira] [Created] (HIVE-3453) Hive query persistence / auditing

Matt Goeke (JIRA) Wed, 12 Sep 2012 15:35:09 -0700

Matt Goeke created HIVE-3453:
--------------------------------

             Summary: Hive query persistence / auditing
                 Key: HIVE-3453
                 URL: https://issues.apache.org/jira/browse/HIVE-3453
             Project: Hive
          Issue Type: Improvement
          Components: CLI, Logging
            Reporter: Matt Goeke
            Priority: Minor



Currently our Hive warehouse is open to querying from any of our business 
analysts and we pool them by user in the fair scheduler to prevent someone from 
hogging cluster resources.  We are looking to start summarizing details of 
their queries so that we can view common questions they ask in order find ways 
to optimize our tables / submission process. One thought was to patch the Hive 
client / thrift server to write out the submitted queries to the DB that our 
metastore is on and from there we can perform some simple analytics to roll up 
a view of how they use the warehouse over time. This doesn't seem like it would 
be too difficult of an effort as the needed infrastructure is already in place 
but any suggestions or comments on this would be greatly appreciated.

I am leaving the implementation notes pretty blank as I would like to see what 
others in the community who have more experience in this project would 
recommend. 

Additional information from a [email protected] response:
Hey Matt,

We did something similar at Facebook to capture the information on who ran what 
on the clusters and dumped that out to an audit db. Specifically we were using 
Hive post execution hooks to achive that

http://hive.apache.org/docs/r0.7.0/api/org/apache/hadoop/hive/ql/hooks/PostExecute.html

this gets called from the hive cli mostly.

I am not sure if the particular hook that we had implemented was contributed 
back, but this could potentially be a cool contribution :)

Ashish


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-3453) Hive query persistence / auditing

Reply via email to