[ 
https://issues.apache.org/jira/browse/HIVE-21206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-21206:
------------------------------------
    Description: 
Hive bootstrap replication of 1TB data onprem to onprem in Hive3 is running 
slower compared to Hive2.

Time taken for replications are as below:
||    Hive2- Hive2    ||  Hive3 - Hive3   ||
|Bootstrap: 01h27m| BootStrap: 03h45m |

Every MoveTask is closing and opening new metastore connection which is causing 
slow down.
{code}
2019-02-08T12:28:30,174 INFO  [HiveServer2-Background-Pool: Thread-1134]: 
ql.Driver (:()) - Starting task [Stage-5:MOVE] in serial mode
2019-02-08T12:28:30,177 INFO  [HiveServer2-Background-Pool: Thread-1134]: 
exec.Task (:()) - Loading data to table nondefault.nondefault_table1 from 
hdfs://mycluster1/warehouse/tablespace/managed/hive/nondefault.db/nondefault_table1/.hive-staging_hive_2019-02-08_12-28-23_584_1482331698286040936-3/-ext-10001
2019-02-08T12:28:30,189 INFO  [HiveServer2-Background-Pool: Thread-1134]: 
metastore.HiveMetaStoreClient (:()) - Trying to connect to metastore with URI 
thrift://ctr-e139-1542663976389-62755-01-000014.hwx.site:9083
2019-02-08T12:28:30,189 INFO  [HiveServer2-Background-Pool: Thread-1134]: 
metastore.HiveMetaStoreClient (:()) - HMSC::open(): Could not find delegation 
token. Creating KERBEROS-based thrift connection.
2019-02-08T12:28:30,206 INFO  [HiveServer2-Background-Pool: Thread-1134]: 
metastore.HiveMetaStoreClient (:()) - Opened a connection to metastore, current 
connections: 4
2019-02-08T12:28:30,206 INFO  [HiveServer2-Background-Pool: Thread-1134]: 
metastore.HiveMetaStoreClient (:()) - Connected to metastore.
2019-02-08T12:28:30,206 INFO  [HiveServer2-Background-Pool: Thread-1134]: 
metastore.RetryingMetaStoreClient (:()) - RetryingMetaStoreClient proxy=class 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient 
ugi=hive/ctr-e139-1542663976389-62755-01-000014.hwx.s...@hwqe.hortonworks.com 
(auth:KERBEROS) retries=24 delay=5 lifetime=0
2019-02-08T12:28:30,325 INFO  [org.apache.ranger.audit.queue.AuditBatchQueue1]: 
provider.BaseAuditHandler (:()) - Audit Status Log: 
name=hiveServer2.async.multi_dest.batch, 
finalDestination=hiveServer2.async.multi_dest.batch.solr, interval=01:00.002 
minutes, events=2, succcessCount=1, totalEvents=56, totalSuccessCount=25
2019-02-08T12:28:30,520 INFO  [HiveServer2-Background-Pool: Thread-1134]: 
common.FileUtils (FileUtils.java:mkdir(580)) - Creating directory if it doesn't 
exist: 
hdfs://mycluster1/warehouse/tablespace/managed/hive/nondefault.db/nondefault_table1/base_0000001
2019-02-08T12:28:31,245 INFO  [HiveServer2-Background-Pool: Thread-1134]: 
ql.Driver (:()) - Starting task [Stage-11:MOVE] in serial mode
2019-02-08T12:28:31,245 INFO  [HiveServer2-Background-Pool: Thread-1134]: 
metastore.HiveMetaStoreClient (:()) - Closed a connection to metastore, current 
connections: 3
2019-02-08T12:28:31,246 INFO  [HiveServer2-Background-Pool: Thread-1134]: 
exec.Task (:()) - Loading data to table nondefault.nondefault_table2 from 
hdfs://mycluster1/warehouse/tablespace/managed/hive/nondefault.db/nondefault_table2/.hive-staging_hive_2019-02-08_12-28-23_810_7457138692783022870-3/-ext-10002
2019-02-08T12:28:31,327 INFO  [HiveServer2-Background-Pool: Thread-1134]: 
metastore.HiveMetaStoreClient (:()) - Trying to connect to metastore with URI 
thrift://ctr-e139-1542663976389-62755-01-000014.hwx.site:9083
2019-02-08T12:28:31,327 INFO  [HiveServer2-Background-Pool: Thread-1134]: 
metastore.HiveMetaStoreClient (:()) - HMSC::open(): Could not find delegation 
token. Creating KERBEROS-based thrift connection.
2019-02-08T12:28:31,336 INFO  [HiveServer2-Background-Pool: Thread-1134]: 
metastore.HiveMetaStoreClient (:()) - Opened a connection to metastore, current 
connections: 4
2019-02-08T12:28:31,337 INFO  [HiveServer2-Background-Pool: Thread-1134]: 
metastore.HiveMetaStoreClient (:()) - Connected to metastore.
2019-02-08T12:28:31,337 INFO  [HiveServer2-Background-Pool: Thread-1134]: 
metastore.RetryingMetaStoreClient (:()) - RetryingMetaStoreClient proxy=class 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient 
ugi=hive/ctr-e139-1542663976389-62755-01-000014.hwx.s...@hwqe.hortonworks.com 
(auth:KERBEROS) retries=24 delay=5 lifetime=0
2019-02-08T12:28:31,642 INFO  [HiveServer2-Background-Pool: Thread-1134]: 
common.FileUtils (FileUtils.java:mkdir(580)) - Creating directory if it doesn't 
exist: 
hdfs://mycluster1/warehouse/tablespace/managed/hive/nondefault.db/nondefault_table2/base_0000001
{code}


  was:
Hive bootstrap replication of 1TB data onprem to onprem 
(Hive2(Strict_Managed=false) to Hive3(Strict_Managed=true)) is running slower

Time taken for replications are as below:
||    Hive2- Hive2    ||  Hive2 - Hive3   ||
|Bootstrap: 01h27m| BootStrap: 03h45m |



> Bootstrap replication is slow as it opens lot of metastore connections.
> -----------------------------------------------------------------------
>
>                 Key: HIVE-21206
>                 URL: https://issues.apache.org/jira/browse/HIVE-21206
>             Project: Hive
>          Issue Type: Bug
>          Components: repl
>    Affects Versions: 4.0.0
>            Reporter: Sankar Hariappan
>            Assignee: Sankar Hariappan
>            Priority: Major
>              Labels: DR, replication
>
> Hive bootstrap replication of 1TB data onprem to onprem in Hive3 is running 
> slower compared to Hive2.
> Time taken for replications are as below:
> ||    Hive2- Hive2    ||  Hive3 - Hive3   ||
> |Bootstrap: 01h27m| BootStrap: 03h45m |
> Every MoveTask is closing and opening new metastore connection which is 
> causing slow down.
> {code}
> 2019-02-08T12:28:30,174 INFO  [HiveServer2-Background-Pool: Thread-1134]: 
> ql.Driver (:()) - Starting task [Stage-5:MOVE] in serial mode
> 2019-02-08T12:28:30,177 INFO  [HiveServer2-Background-Pool: Thread-1134]: 
> exec.Task (:()) - Loading data to table nondefault.nondefault_table1 from 
> hdfs://mycluster1/warehouse/tablespace/managed/hive/nondefault.db/nondefault_table1/.hive-staging_hive_2019-02-08_12-28-23_584_1482331698286040936-3/-ext-10001
> 2019-02-08T12:28:30,189 INFO  [HiveServer2-Background-Pool: Thread-1134]: 
> metastore.HiveMetaStoreClient (:()) - Trying to connect to metastore with URI 
> thrift://ctr-e139-1542663976389-62755-01-000014.hwx.site:9083
> 2019-02-08T12:28:30,189 INFO  [HiveServer2-Background-Pool: Thread-1134]: 
> metastore.HiveMetaStoreClient (:()) - HMSC::open(): Could not find delegation 
> token. Creating KERBEROS-based thrift connection.
> 2019-02-08T12:28:30,206 INFO  [HiveServer2-Background-Pool: Thread-1134]: 
> metastore.HiveMetaStoreClient (:()) - Opened a connection to metastore, 
> current connections: 4
> 2019-02-08T12:28:30,206 INFO  [HiveServer2-Background-Pool: Thread-1134]: 
> metastore.HiveMetaStoreClient (:()) - Connected to metastore.
> 2019-02-08T12:28:30,206 INFO  [HiveServer2-Background-Pool: Thread-1134]: 
> metastore.RetryingMetaStoreClient (:()) - RetryingMetaStoreClient proxy=class 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient 
> ugi=hive/ctr-e139-1542663976389-62755-01-000014.hwx.s...@hwqe.hortonworks.com 
> (auth:KERBEROS) retries=24 delay=5 lifetime=0
> 2019-02-08T12:28:30,325 INFO  
> [org.apache.ranger.audit.queue.AuditBatchQueue1]: provider.BaseAuditHandler 
> (:()) - Audit Status Log: name=hiveServer2.async.multi_dest.batch, 
> finalDestination=hiveServer2.async.multi_dest.batch.solr, interval=01:00.002 
> minutes, events=2, succcessCount=1, totalEvents=56, totalSuccessCount=25
> 2019-02-08T12:28:30,520 INFO  [HiveServer2-Background-Pool: Thread-1134]: 
> common.FileUtils (FileUtils.java:mkdir(580)) - Creating directory if it 
> doesn't exist: 
> hdfs://mycluster1/warehouse/tablespace/managed/hive/nondefault.db/nondefault_table1/base_0000001
> 2019-02-08T12:28:31,245 INFO  [HiveServer2-Background-Pool: Thread-1134]: 
> ql.Driver (:()) - Starting task [Stage-11:MOVE] in serial mode
> 2019-02-08T12:28:31,245 INFO  [HiveServer2-Background-Pool: Thread-1134]: 
> metastore.HiveMetaStoreClient (:()) - Closed a connection to metastore, 
> current connections: 3
> 2019-02-08T12:28:31,246 INFO  [HiveServer2-Background-Pool: Thread-1134]: 
> exec.Task (:()) - Loading data to table nondefault.nondefault_table2 from 
> hdfs://mycluster1/warehouse/tablespace/managed/hive/nondefault.db/nondefault_table2/.hive-staging_hive_2019-02-08_12-28-23_810_7457138692783022870-3/-ext-10002
> 2019-02-08T12:28:31,327 INFO  [HiveServer2-Background-Pool: Thread-1134]: 
> metastore.HiveMetaStoreClient (:()) - Trying to connect to metastore with URI 
> thrift://ctr-e139-1542663976389-62755-01-000014.hwx.site:9083
> 2019-02-08T12:28:31,327 INFO  [HiveServer2-Background-Pool: Thread-1134]: 
> metastore.HiveMetaStoreClient (:()) - HMSC::open(): Could not find delegation 
> token. Creating KERBEROS-based thrift connection.
> 2019-02-08T12:28:31,336 INFO  [HiveServer2-Background-Pool: Thread-1134]: 
> metastore.HiveMetaStoreClient (:()) - Opened a connection to metastore, 
> current connections: 4
> 2019-02-08T12:28:31,337 INFO  [HiveServer2-Background-Pool: Thread-1134]: 
> metastore.HiveMetaStoreClient (:()) - Connected to metastore.
> 2019-02-08T12:28:31,337 INFO  [HiveServer2-Background-Pool: Thread-1134]: 
> metastore.RetryingMetaStoreClient (:()) - RetryingMetaStoreClient proxy=class 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient 
> ugi=hive/ctr-e139-1542663976389-62755-01-000014.hwx.s...@hwqe.hortonworks.com 
> (auth:KERBEROS) retries=24 delay=5 lifetime=0
> 2019-02-08T12:28:31,642 INFO  [HiveServer2-Background-Pool: Thread-1134]: 
> common.FileUtils (FileUtils.java:mkdir(580)) - Creating directory if it 
> doesn't exist: 
> hdfs://mycluster1/warehouse/tablespace/managed/hive/nondefault.db/nondefault_table2/base_0000001
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to