All,

I looked in the Hive JIRA and saw nothing like what we are looking to
implement so I am interesting in getting feedback as to whether there is
any overlap in this and any other current efforts:

Currently our Hive warehouse is open to querying from any of our business
analysts and we pool them by user in the fair scheduler to prevent someone
from hogging cluster resources.  We are looking to start summarizing
details of their queries so that we can view common questions they ask in
order find ways to optimize our tables / submission process. One thought
was to patch the Hive client / thrift server to write out the submitted
queries to the DB that our metastore is on and from there we can perform
some simple analytics to roll up a view of how they use the warehouse over
time. This doesn't seem like it would be too difficult of an effort as the
needed infrastructure is already in place but any suggestions or comments
on this would be greatly appreciated. Also if this is interesting to anyone
else we are happy to keep you in the loop as to any patches we create.

--
Matt Goeke

Reply via email to