data corruption with multi-table insert
---------------------------------------

                 Key: HIVE-1968
                 URL: https://issues.apache.org/jira/browse/HIVE-1968
             Project: Hive
          Issue Type: Bug
          Components: Query Processor
    Affects Versions: 0.7.0
            Reporter: Joydeep Sen Sarma


i had to run a conversion process to compute a checksum (sum(hash(all-columns)) 
of a table and convert it to a different compression format. trying to be 
clever - i did both of them in a single pass by doing something to the 
equivalent of:

from (select col1, col2, hash(col1, col2) as val from table_to_be_converted) i
insert overwrite table table_to_be_generated select i.col1, i.col2
insert overwrite table table_to_be_converted_checksum select sum(hash(i.val));

the plan looked correct. however - the data produced was erroneous - the 
checksums and the data were both wrong (and consistent with each other). i know 
this because:
- the checksum computed by the above query didn't match the checksum on the 
input table when calculated separately
- the checksum of the data output by this query (first insert clause) didn't 
match the input table's checksum (neither the one computed by the query above, 
nor by the one computed separately)

later on - i broke up this query into two independent ones - and the data and 
checksums were good (ie. they all matched up). so seems like there's some data 
corruption happening in MTI.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to