[jira] [Commented] (HIVE-10754) Pig+Hcatalog doesn't work properly since we need to clone the Job instance in HCatLoader

Mithun Radhakrishnan (JIRA) Thu, 04 Jun 2015 14:16:35 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573596#comment-14573596
 ]


Mithun Radhakrishnan commented on HIVE-10754:
---------------------------------------------

I see what we're trying to achieve, but I still need help understanding how 
this change fixes the problem. (Sorry. :/) 

Here's the relevant code from {{Job.java}} from Hadoop 2.6.

{code:java|title=Job.java|borderStyle=solid|borderColor=#ccc|titleBGColor=#F7D6C1|bgColor=#FFFFCE}
  @Deprecated
  public Job(Configuration conf) throws IOException {
    this(new JobConf(conf));
  }

  Job(JobConf conf) throws IOException {
    super(conf, null);
    // propagate existing user credentials to job
    this.credentials.mergeAll(this.ugi.getCredentials());
    this.cluster = null;
  }

 public static Job getInstance(Configuration conf) throws IOException {
    // create with a null Cluster
    JobConf jobConf = new JobConf(conf);
    return new Job(jobConf);
  }
{code}

# The current implementation of {{HCatLoader.setLocation()}} calls {{new Job( 
Configuration )}}, which clones the {{JobConf}} inline and calls the private 
constructor {{Job(JobConf)}}.
# Your improved implementation of {{HCatLoader.setLocation()}} calls 
{{Job.getInstance()}}. This method clones the {{JobConf}} explicitly, and then 
calls the private constructor {{Job(jobConf)}}.

bq. These two are different (JobConf is not cloned when we call new Job(conf)).
Both of these seem identical in effect to me. :/ There's no way for 
{{HCatLoader.setLocation()}} to call the {{Job(JobConf)}} constructor, because 
it's package-private, right?


> Pig+Hcatalog doesn't work properly since we need to clone the Job instance in 
> HCatLoader
> ----------------------------------------------------------------------------------------
>
>                 Key: HIVE-10754
>                 URL: https://issues.apache.org/jira/browse/HIVE-10754
>             Project: Hive
>          Issue Type: Sub-task
>          Components: HCatalog
>    Affects Versions: 1.2.0
>            Reporter: Aihua Xu
>            Assignee: Aihua Xu
>         Attachments: HIVE-10754.patch
>
>
> {noformat}
> Create table tbl1 (key string, value string) stored as rcfile;
> Create table tbl2 (key string, value string);
> insert into tbl1 values( '1', '111');
> insert into tbl2 values('1', '2');
> {noformat}
> Pig script:
> {noformat}
> src_tbl1 = FILTER tbl1 BY (key == '1');
> prj_tbl1 = FOREACH src_tbl1 GENERATE
>            key as tbl1_key,
>            value as tbl1_value,
>            '333' as tbl1_v1;
>            
> src_tbl2 = FILTER tbl2 BY (key == '1');
> prj_tbl2 = FOREACH src_tbl2 GENERATE
>            key as tbl2_key,
>            value as tbl2_value;
>            
> dump prj_tbl1;
> dump prj_tbl2;
> result = JOIN prj_tbl1 BY (tbl1_key), prj_tbl2 BY (tbl2_key);
> prj_result = FOREACH result 
>       GENERATE  prj_tbl1::tbl1_key AS key1,
>                 prj_tbl1::tbl1_value AS value1,
>                 prj_tbl1::tbl1_v1 AS v1,
>                 prj_tbl2::tbl2_key AS key2,
>                 prj_tbl2::tbl2_value AS value2;
>                
> dump prj_result;
> {noformat}
> The expected result is (1,111,333,1,2) while the result is (1,2,333,1,2).  We 
> need to clone the job instance in HCatLoader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10754) Pig+Hcatalog doesn't work properly since we need to clone the Job instance in HCatLoader

Reply via email to