Naveen Gangam created HIVE-21718:
------------------------------------

             Summary: Improvement performance of UpdateInputAccessTimeHook
                 Key: HIVE-21718
                 URL: https://issues.apache.org/jira/browse/HIVE-21718
             Project: Hive
          Issue Type: Improvement
          Components: HiveServer2
    Affects Versions: 2.1.1
            Reporter: Naveen Gangam
            Assignee: Naveen Gangam


Currently, Hive does not update the lastAccessTime property for any entities 
when a query accesses them. Thus it has not possible to know when a table was 
last accessed.
Hive does provide a configurable hook to HS2 that is execcuted as a pre-query 
hook prior to the query being executed. However, this hook is inefficient 
because for each table or partition it is attempting to update time for, it 
executes an "alter table ... " command internally. This is bad 
1) For a query touching 1000's of partitions, this hook takes forever to update 
them.
2) Meanwhile, it is holding up the original query from executing.

So even though we do not recommend using the hook, because the reward is too 
little (having lastAccessTime updated), we realize there is no other means to 
achieve this.
Also, we can improve the performance of the hook significantly by adding a new 
thrift API on HMS to update the lastAccessTime on the database rows directly 
instead of going to HMS front end for 1 entity at time (leading to 1000's of 
HMS calls that lead to multiple 1000's of calls to the database).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to