[ 
https://issues.apache.org/jira/browse/HIVE-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030360#comment-13030360
 ] 

Ning Zhang commented on HIVE-1968:
----------------------------------

@Joydeep, Yongqiang and I were trying to reproduce the bug but couldn't. We 
tried different query patterns (1 map-only job + 1 mapreduce job, and dynamic 
partition inserts) and on small & large data sets. All these worked as 
expected. So without a concrete example it's very hard to say it is a bug in 
multi-table inserts. Do you have any chance to dig into your query log and find 
out the specific query?

> data corruption with multi-table insert
> ---------------------------------------
>
>                 Key: HIVE-1968
>                 URL: https://issues.apache.org/jira/browse/HIVE-1968
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.7.0
>            Reporter: Joydeep Sen Sarma
>
> i had to run a conversion process to compute a checksum 
> (sum(hash(all-columns)) of a table and convert it to a different compression 
> format. trying to be clever - i did both of them in a single pass by doing 
> something to the equivalent of:
> from (select col1, col2, hash(col1, col2) as val from table_to_be_converted) i
> insert overwrite table table_to_be_generated select i.col1, i.col2
> insert overwrite table table_to_be_converted_checksum select sum(hash(i.val));
> the plan looked correct. however - the data produced was erroneous - the 
> checksums and the data were both wrong (and consistent with each other). i 
> know this because:
> - the checksum computed by the above query didn't match the checksum on the 
> input table when calculated separately
> - the checksum of the data output by this query (first insert clause) didn't 
> match the input table's checksum (neither the one computed by the query 
> above, nor by the one computed separately)
> later on - i broke up this query into two independent ones - and the data and 
> checksums were good (ie. they all matched up). so seems like there's some 
> data corruption happening in MTI.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to