[ 
https://issues.apache.org/jira/browse/HIVE-4690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739194#comment-13739194
 ] 

Navis commented on HIVE-4690:
-----------------------------

[~ashutoshc] I've investigated this a little.

CFIF in 20/20S accepts "mapred.max.split.size". But in shims for 20/20S, it's 
ignored and only "mapred.min.split.size" is applied for all of them (see  
HadoopShimsSecure.getSplits()). Even if it's set (by manually) CFIF in 20/20S 
does not split a file under the size of the block, making one split.

Shims for 23 uses same code with 20/20S but CFIF in 23 uses JobConf directly 
for retrieving configurations, and makes effect for that. And also it can split 
a file under the size of the block, making 22 splits.

There should be a following issue for setting "mapred.max.split.size", etc. 
properly for CFIF.
                
> stats_partscan_1.q makes different result with different hadhoop.mr.rev 
> ------------------------------------------------------------------------
>
>                 Key: HIVE-4690
>                 URL: https://issues.apache.org/jira/browse/HIVE-4690
>             Project: Hive
>          Issue Type: Sub-task
>    Affects Versions: 0.11.0
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Trivial
>         Attachments: HIVE-4690.D11163.1.patch
>
>
> stats_partscan_1.q uses mapred.min/max.split.size and logs number of files, 
> which can be different with different hadoop.mr.rev.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to