[jira] [Commented] (HIVE-21788) Support replication from haddop-2 (hive 3.0 and beelow) on-prem cluster to hadoop-3 (hive 4 and above) cloud cluster

Ashutosh Bapat (JIRA) Mon, 03 Jun 2019 08:26:06 -0700


    [ 
https://issues.apache.org/jira/browse/HIVE-21788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854709#comment-16854709
 ]


Ashutosh Bapat commented on HIVE-21788:
---------------------------------------

 
{noformat}
Collection<String> redactedProperties =
- jobConf.getStringCollection(MRJobConfig.MR_JOB_REDACTED_PROPERTIES);
+ jobConf.getStringCollection("mapreduce.job.redacted-properties");
 
 // Hide sensitive configuration values from MR HistoryUI by telling MR to 
redact the following list.
- jobConf.set(MRJobConfig.MR_JOB_REDACTED_PROPERTIES,
+ jobConf.set("mapreduce.job.redacted-properties",
 StringUtils.join(redactedProperties, COMMA));
 }{noformat}
 

Why do we need those changes? Aren't these constants defined when Hadoop-2 is 
used? This comment is
applicable to all the places where this change is repeated.

 
{noformat}
+ if (conf.get("mapreduce.framework.name") != null
+ && conf.get("mapreduce.framework.name").equals("yarn")) {{noformat}
{noformat}
+ jConf.set("yarn.scheduler.capacity.root.queues", "default");
+ jConf.set("yarn.scheduler.capacity.root.default.capacity", "100");
{noformat}
 
{noformat}
+ public int getJobTrackerPort() throws UnsupportedOperationException {
+ String address = conf.get("yarn.resourcemanager.address");{noformat}
 

 
{noformat}
+
+ if (!isLlap) { // Conf for non-llap
+ conf.setBoolean("hive.llap.io.enabled", false);
+ } else { // Conf for llap
+ conf.set("hive.llap.execution.mode", "only");{noformat}
 

 
{noformat}
+ conf.setInt("hive.tez.container.size", 128);{noformat}
Can we use ConfVar or some such static declaration instead of a hard-coded 
constant? This comment is
applicable to all the places where we are using hard-coded strings for config. 
The problem with
hard-coded configs is that if we change the config in future we won't be able 
to catch all the
places where it is used and won't be able to change all such places.

Can you please create PR?

 

> Support replication from haddop-2 (hive 3.0 and beelow) on-prem cluster to 
> hadoop-3 (hive 4 and above) cloud cluster
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-21788
>                 URL: https://issues.apache.org/jira/browse/HIVE-21788
>             Project: Hive
>          Issue Type: Task
>          Components: HiveServer2, repl
>    Affects Versions: 4.0.0
>            Reporter: mahesh kumar behera
>            Assignee: mahesh kumar behera
>            Priority: Major
>             Fix For: 4.0.0
>
>         Attachments: HIVE-21788.01.patch
>
>
> In case of replication to cloud both dump and load are executed in the source 
> cluster. This push based replication is done to avoid computation at target 
> cloud cluster. In case in the source cluster, strict managed table is not set 
> to true the tables will be non acid. So during replication to a cluster with 
> strict managed table, migration logic same as upgrade tool has to be applied 
> on the replicated data. This migration logic is implemented only in hive4.0. 
> So it's required that a hive 4.0 instance started at the source cluster. If 
> the source cluster has hadoop-2 installation, hive4 has to be built with 
> hadoop-2 and necessary changes are required in the pom files and the shim 
> files.
> 1. Change the pom.xml files to accept a profile for hadoop-2. If hadoop-2 
> profile is set, the hadoop version should be set accordingly to hadoop-2.
> 2. In shim creare a new file for hadoop-2. Based on the profile the 
> respective file will be included in the build.
> 3. Changed artifactId hadoop-hdfs-client to hadoop-client as in hadoop-2 the 
> jars are stored under hadoop-client folder.
>  
>  
> Command to enable hadop-2 dependency  —  mvn clean install package 
> -DskipTests  -Pdist -pl '!standalone-metastore, !llap-common, !llap-client, 
> !llap-ext-client, !llap-tez, !llap-server, !hbase-handler, !service, !hplsql, 
> !kryo-registrator' -Phadoop-2.7
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21788) Support replication from haddop-2 (hive 3.0 and beelow) on-prem cluster to hadoop-3 (hive 4 and above) cloud cluster

Reply via email to