[
https://issues.apache.org/jira/browse/HIVE-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14099716#comment-14099716
]
Zhichun Wu commented on HIVE-4997:
----------------------------------
@ [~dintskirveli] :
Your approach tries to attach each InputInfo to InputSplit in
HCatDelegatingInputFormat#getSplits, and generate InputJobInfo in
HCatDelegatingInputFormat#createRecordReader with the inputInfo attached. It
has to query hive metastore service when generating InputJobInfo in each map ,
so I think it may have an impact on metastore service when the maps are huge.
Also when we setup an security hadoop cluster, each map has to acquire a
delegation token in order to access metastore service. The current patch hasn't
take this part into consideration.
Here I think we can generate each InputJobInfo every time we add a table and
then we can serialize and attach Array<InputJobInfo> to job conf, we can fetch
each inputJobInfo from job conf in getSplits and createRecordReader. This will
avoid query metastore service in map phase. I've change the usage of adding
multiple input tables as below:
{code}
HCatMultipleInputs.init(job);
HCatMultipleInputs.addInput(test_table1, "default", null,
SequenceMapper.class);
HCatMultipleInputs.addInput(test_table2, null, "part='1'", TextMapper1.class);
HCatMultipleInputs.addInput(test_table2, null, "part='2'", TextMapper2.class);
HCatMultipleInputs.build();
{code}
I've upload HIVE-4997.4.patch which based on HIVE-4997.3.patch. It works on our
security hadoop 2.2.0 cluster. It just works and I upload it for demonstrate
the idea. I haven't put much thought into the quality of code and the design of
this new feature.
> HCatalog doesn't allow multiple input tables
> --------------------------------------------
>
> Key: HIVE-4997
> URL: https://issues.apache.org/jira/browse/HIVE-4997
> Project: Hive
> Issue Type: Improvement
> Components: HCatalog
> Affects Versions: 0.13.0
> Reporter: Daniel Intskirveli
> Fix For: 0.14.0
>
> Attachments: HIVE-4997.2.patch, HIVE-4997.3.patch, HIVE-4997.4.patch
>
>
> HCatInputFormat does not allow reading from multiple hive tables in the same
> MapReduce job.
--
This message was sent by Atlassian JIRA
(v6.2#6252)