[
https://issues.apache.org/jira/browse/HIVE-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233888#comment-13233888
]
Carl Steinbach commented on HIVE-2877:
--------------------------------------
There are two distinct problems:
1) Many of the queries in split_sample.q and sample_islocalmode_hook.q are
nondeterministic. This can be fixed by adding ORDER BY clauses.
2) The second problem is more serious. Both of the tests set
mapred.max.split.size=300 and hive.merge.smallfiles.avgsize=1 in an effort to
force the generation of multiple splits and multiple output files. However,
Hadoop 0.20 is incapable of generating splits smaller than the block size when
using CombineFileInputFormat, so only one split is generated. This has a
significant impact on the results of the TABLESAMPLE(x PERCENT). This issue was
fixed in MAPREDUCE-2046 which is included in 0.23.
Suggested Fixes:
# Make the queries deterministic
# Restrict these tests to Hadoop versions >= 0.22
> TABLESAMPLE(x PERCENT) tests fail on 0.22/0.23
> ----------------------------------------------
>
> Key: HIVE-2877
> URL: https://issues.apache.org/jira/browse/HIVE-2877
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Carl Steinbach
> Assignee: Carl Steinbach
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira