[ https://issues.apache.org/jira/browse/HIVE-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030360#comment-13030360 ]
Ning Zhang commented on HIVE-1968: ---------------------------------- @Joydeep, Yongqiang and I were trying to reproduce the bug but couldn't. We tried different query patterns (1 map-only job + 1 mapreduce job, and dynamic partition inserts) and on small & large data sets. All these worked as expected. So without a concrete example it's very hard to say it is a bug in multi-table inserts. Do you have any chance to dig into your query log and find out the specific query? > data corruption with multi-table insert > --------------------------------------- > > Key: HIVE-1968 > URL: https://issues.apache.org/jira/browse/HIVE-1968 > Project: Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 0.7.0 > Reporter: Joydeep Sen Sarma > > i had to run a conversion process to compute a checksum > (sum(hash(all-columns)) of a table and convert it to a different compression > format. trying to be clever - i did both of them in a single pass by doing > something to the equivalent of: > from (select col1, col2, hash(col1, col2) as val from table_to_be_converted) i > insert overwrite table table_to_be_generated select i.col1, i.col2 > insert overwrite table table_to_be_converted_checksum select sum(hash(i.val)); > the plan looked correct. however - the data produced was erroneous - the > checksums and the data were both wrong (and consistent with each other). i > know this because: > - the checksum computed by the above query didn't match the checksum on the > input table when calculated separately > - the checksum of the data output by this query (first insert clause) didn't > match the input table's checksum (neither the one computed by the query > above, nor by the one computed separately) > later on - i broke up this query into two independent ones - and the data and > checksums were good (ie. they all matched up). so seems like there's some > data corruption happening in MTI. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira