[ 
https://issues.apache.org/jira/browse/HIVE-21960?focusedWorklogId=284385&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-284385
 ]

ASF GitHub Bot logged work on HIVE-21960:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 29/Jul/19 16:34
            Start Date: 29/Jul/19 16:34
    Worklog Time Spent: 10m 
      Work Description: ashutosh-bapat commented on pull request #735: 
HIVE-21960 : Avoid running stats updater and partition management task on a 
replicated table.
URL: https://github.com/apache/hive/pull/735#discussion_r308324905
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java
 ##########
 @@ -220,6 +221,16 @@ private void stopWorkers() {
     String skipParam = 
table.getParameters().get(SKIP_STATS_AUTOUPDATE_PROPERTY);
     if ("true".equalsIgnoreCase(skipParam)) return null;
 
+    // If the table is being replicated into,
+    // 1. the stats are also replicated from the source, so we don't need 
those to be calculated
+    //    on the target again
+    // 2. updating stats requires a writeId to be created. Hence writeIds on 
source and target
+    //    can get out of sync when stats are updated. That can cause 
consistency issues.
+    String replTrgtParam = 
table.getParameters().get(ReplConst.REPL_TARGET_PROPERTY);
 
 Review comment:
   That window is too small, but nevertheless finite. Should we add 
last.repl.id with value 0 or 1- when creating the table similar to what we are 
doing for the db?
   
   The other option is to check db level property e.g. checkpoint, but that 
means every time we assess whether a table requires stats updater to work or 
partition mgmt task to work, it fetches the Database object. For a small window 
corner case this looks pretty expensive.
   
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 284385)
    Time Spent: 1h 10m  (was: 1h)

> HMS tasks on replica
> --------------------
>
>                 Key: HIVE-21960
>                 URL: https://issues.apache.org/jira/browse/HIVE-21960
>             Project: Hive
>          Issue Type: Improvement
>          Components: HiveServer2, repl
>    Affects Versions: 4.0.0
>            Reporter: Ashutosh Bapat
>            Assignee: Ashutosh Bapat
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HIVE-21960.01.patch, HIVE-21960.02.patch, 
> HIVE-21960.03.patch, Replication and House keeping tasks.pdf
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> An HMS performs a number of housekeeping tasks. Assess whether
>  # They are required to be performed in the replicated data
>  # Performing those on replicated data causes any issues and how to fix those.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to