[ 
https://issues.apache.org/jira/browse/HIVE-5502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13790809#comment-13790809
 ] 

Prasanth J commented on HIVE-5502:
----------------------------------

Hi [~brocknoland].. seems like the test case failure is not related to the file 
size of TestFileDump.testDump.orc file.. TestFileDump unit test file contains 
two test cases ( testDump() and testDictionaryThreshold() ). These two test 
cases creates an ORC file with the same name (look for testFilePath variable 
initialization in openFileSystem()). This should be fixed to write to two 
different files which is based on the test case function name. 
I think the reason for seeing 2 different file size in your case is the passing 
test case contains the output of testDictionaryThreshold() whereas failing test 
case contains the output of testDump(). But the file size of 
TestFileDump.testDump.orc is not really important for these test cases. Its the 
contents of orc-file-dump.out file that is more important. Doing a diff of 
generated orc-file-dump.out vs golden file shows that 1st strip expects 5000 
rows but it got only 4000 rows. This is the reason for test case failure. I 
faced similar non-determinism when I run the test case from eclipse vs from 
console. From console I always get the correct result but when I try to run the 
test case from eclipse it fails all the time with the same issue (4000 rows vs 
5000 rows). The golden file in this case might have been generated by running 
"ant test -Dtestcase=TestFileDump". Since now you are testing using maven there 
might be some difference in ANT_OPTS vs MAVEN_OPTS. Thats my guess.

Moving forward there are two ways this can be fixed
1) Implement a deterministic memory manager that doesn't depend on the 
available memory for ORC test cases
2) Overwrite golden file when we move to maven

> ORC TestFileDump is flaky
> -------------------------
>
>                 Key: HIVE-5502
>                 URL: https://issues.apache.org/jira/browse/HIVE-5502
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Brock Noland
>            Priority: Minor
>         Attachments: TestFileDump.tar.gz
>
>
> I found in my maven work that TestFileDump is non-deterministic. For example 
> sometimes the output ORC file is much larger
> {noformat}
> pass:
> -rwxrwxrwx 1 brock brock 290055 Oct  9 12:02 TestFileDump.testDump.orc
> fail:
> -rwxrwxrwx 1 brock brock 1938634 Oct  9 12:08 TestFileDump.testDump.orc
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to