Xuefu Zhang created HIVE-4885:
---------------------------------

             Summary: Alternative object serialization for execution plan in 
hive testing 
                 Key: HIVE-4885
                 URL: https://issues.apache.org/jira/browse/HIVE-4885
             Project: Hive
          Issue Type: Improvement
          Components: CLI
    Affects Versions: 0.11.0, 0.10.0
            Reporter: Xuefu Zhang
            Assignee: Xuefu Zhang


Currently there are a lot of test cases involving in comparing execution plan, 
such as those in TestParse suite. XmlEncoder is used to serialize the generated 
plan by hive, and store it in the file for file diff comparison. However, 
XmlEncoder is tied with Java compiler, whose implementation may change from 
version to version. Thus, upgrade the compiler can generate a lot of fake test 
failures. The following is an example of diff generated when running hive with 
JDK7:

{code}
Begin query: case_sensitivity.q
diff -a 
/data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/build/ql/test/logs/positive/case_sensitivity.q.out
 
/data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/ql/src/test/results/compiler/parse/case_sensitivity.q.out
diff -a -b 
/data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/build/ql/test/logs/positive/case_sensitivity.q.xml
 
/data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/ql/src/test/results/compiler/plan/case_sensitivity.q.xml
3c3
<  <object class="org.apache.hadoop.hive.ql.exec.MapRedTask" id="MapRedTask0">
---
>  <object id="MapRedTask0" class="org.apache.hadoop.hive.ql.exec.MapRedTask"> 
12c12
<        <object class="java.util.ArrayList" id="ArrayList0">
---
>        <object id="ArrayList0" class="java.util.ArrayList"> 
14c14
<          <object class="org.apache.hadoop.hive.ql.exec.MoveTask" 
id="MoveTask0">
---
>          <object id="MoveTask0" 
> class="org.apache.hadoop.hive.ql.exec.MoveTask"> 
18c18
<              <object class="org.apache.hadoop.hive.ql.exec.MoveTask" 
id="MoveTask1">
---
>              <object id="MoveTask1" 
> class="org.apache.hadoop.hive.ql.exec.MoveTask"> 
22c22
<                  <object class="org.apache.hadoop.hive.ql.exec.StatsTask" 
id="StatsTask0">
---
>                  <object id="StatsTask0" 
> class="org.apache.hadoop.hive.ql.exec.StatsTask"> 
60c60
<                  <object class="org.apache.hadoop.hive.ql.exec.MapRedTask" 
id="MapRedTask1">
---
>                  <object id="MapRedTask1" 
> class="org.apache.hadoop.hive.ql.exec.MapRedTask"> 

{code}

As it can be seen, the only difference is the order of the attributes in the 
serialized XML doc, yet it brings 50+ test failures in Hive.

We need to have a better plan comparison, or object serialization to improve 
the situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to