[ 
https://issues.apache.org/jira/browse/HIVE-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233888#comment-13233888
 ] 

Carl Steinbach commented on HIVE-2877:
--------------------------------------

There are two distinct problems:

1) Many of the queries in split_sample.q and sample_islocalmode_hook.q are 
nondeterministic. This can be fixed by adding ORDER BY clauses.

2) The second problem is more serious. Both of the tests set 
mapred.max.split.size=300 and hive.merge.smallfiles.avgsize=1 in an effort to 
force the generation of multiple splits and multiple output files. However, 
Hadoop 0.20 is incapable of generating splits smaller than the block size when 
using CombineFileInputFormat, so only one split is generated. This has a 
significant impact on the results of the TABLESAMPLE(x PERCENT). This issue was 
fixed in MAPREDUCE-2046 which is included in 0.23.

Suggested Fixes: 
# Make the queries deterministic
# Restrict these tests to Hadoop versions >= 0.22

                
> TABLESAMPLE(x PERCENT) tests fail on 0.22/0.23
> ----------------------------------------------
>
>                 Key: HIVE-2877
>                 URL: https://issues.apache.org/jira/browse/HIVE-2877
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Carl Steinbach
>            Assignee: Carl Steinbach
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to