[ https://issues.apache.org/jira/browse/HIVE-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573596#comment-14573596 ]
Mithun Radhakrishnan commented on HIVE-10754: --------------------------------------------- I see what we're trying to achieve, but I still need help understanding how this change fixes the problem. (Sorry. :/) Here's the relevant code from {{Job.java}} from Hadoop 2.6. {code:java|title=Job.java|borderStyle=solid|borderColor=#ccc|titleBGColor=#F7D6C1|bgColor=#FFFFCE} @Deprecated public Job(Configuration conf) throws IOException { this(new JobConf(conf)); } Job(JobConf conf) throws IOException { super(conf, null); // propagate existing user credentials to job this.credentials.mergeAll(this.ugi.getCredentials()); this.cluster = null; } public static Job getInstance(Configuration conf) throws IOException { // create with a null Cluster JobConf jobConf = new JobConf(conf); return new Job(jobConf); } {code} # The current implementation of {{HCatLoader.setLocation()}} calls {{new Job( Configuration )}}, which clones the {{JobConf}} inline and calls the private constructor {{Job(JobConf)}}. # Your improved implementation of {{HCatLoader.setLocation()}} calls {{Job.getInstance()}}. This method clones the {{JobConf}} explicitly, and then calls the private constructor {{Job(jobConf)}}. bq. These two are different (JobConf is not cloned when we call new Job(conf)). Both of these seem identical in effect to me. :/ There's no way for {{HCatLoader.setLocation()}} to call the {{Job(JobConf)}} constructor, because it's package-private, right? > Pig+Hcatalog doesn't work properly since we need to clone the Job instance in > HCatLoader > ---------------------------------------------------------------------------------------- > > Key: HIVE-10754 > URL: https://issues.apache.org/jira/browse/HIVE-10754 > Project: Hive > Issue Type: Sub-task > Components: HCatalog > Affects Versions: 1.2.0 > Reporter: Aihua Xu > Assignee: Aihua Xu > Attachments: HIVE-10754.patch > > > {noformat} > Create table tbl1 (key string, value string) stored as rcfile; > Create table tbl2 (key string, value string); > insert into tbl1 values( '1', '111'); > insert into tbl2 values('1', '2'); > {noformat} > Pig script: > {noformat} > src_tbl1 = FILTER tbl1 BY (key == '1'); > prj_tbl1 = FOREACH src_tbl1 GENERATE > key as tbl1_key, > value as tbl1_value, > '333' as tbl1_v1; > > src_tbl2 = FILTER tbl2 BY (key == '1'); > prj_tbl2 = FOREACH src_tbl2 GENERATE > key as tbl2_key, > value as tbl2_value; > > dump prj_tbl1; > dump prj_tbl2; > result = JOIN prj_tbl1 BY (tbl1_key), prj_tbl2 BY (tbl2_key); > prj_result = FOREACH result > GENERATE prj_tbl1::tbl1_key AS key1, > prj_tbl1::tbl1_value AS value1, > prj_tbl1::tbl1_v1 AS v1, > prj_tbl2::tbl2_key AS key2, > prj_tbl2::tbl2_value AS value2; > > dump prj_result; > {noformat} > The expected result is (1,111,333,1,2) while the result is (1,2,333,1,2). We > need to clone the job instance in HCatLoader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)