[ https://issues.apache.org/jira/browse/HIVE-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557993#comment-13557993 ]
Hudson commented on HIVE-3699: ------------------------------ Integrated in Hive-trunk-h0.21 #1923 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1923/]) HIVE-3699 Multiple insert overwrite into multiple tables query stores same results in all tables (Navis via namit) (Revision 1435484) Result = SUCCESS namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1435484 Files : * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java * /hive/trunk/ql/src/test/queries/clientpositive/multi_insert_gby.q * /hive/trunk/ql/src/test/results/clientpositive/groupby_multi_single_reducer2.q.out * /hive/trunk/ql/src/test/results/clientpositive/multi_insert.q.out * /hive/trunk/ql/src/test/results/clientpositive/multi_insert_gby.q.out * /hive/trunk/ql/src/test/results/clientpositive/multi_insert_move_tasks_share_dependencies.q.out > Multiple insert overwrite into multiple tables query stores same results in > all tables > -------------------------------------------------------------------------------------- > > Key: HIVE-3699 > URL: https://issues.apache.org/jira/browse/HIVE-3699 > Project: Hive > Issue Type: Bug > Components: Query Processor > Environment: Cloudera 4.1 on Amazon Linux (rebranded Centos 6): > hive-0.9.0+150-1.cdh4.1.1.p0.4.el6.noarch > Reporter: Alexandre Fouché > Assignee: Navis > Fix For: 0.11.0 > > Attachments: HIVE-3699.D7743.1.patch, HIVE-3699.D7743.2.patch, > HIVE-3699.D7743.3.patch, HIVE-3699_hive-0.9.1.patch.txt > > > (Note: This might be related to HIVE-2750) > I am doing a query with multiple INSERT OVERWRITE to multiple tables in order > to scan the dataset only 1 time, and i end up having all these tables with > the same content ! It seems the GROUP BY query that returns results is > overwriting all the temp tables. > Weird enough, if i had further GROUP BY queries into additional temp tables, > grouped by a different field, then all temp tables, even the ones that would > have been wrong content are all correctly populated. > This is the misbehaving query: > FROM nikon > INSERT OVERWRITE TABLE e1 > SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions > WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid > INSERT OVERWRITE TABLE e2 > SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues > WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid > ; > It launches only one MR job and here are the results. Why does table 'e1' > contains results from table 'e2' ?! Table 'e1' should have been empty (see > individual SELECTs further below) > hive> SELECT * from e1; > OK > NULL 2 > 1627575 25 > 1627576 70 > 1690950 22 > 1690952 42 > 1696705 199 > 1696706 66 > 1696730 229 > 1696759 85 > 1696893 218 > Time taken: 0.229 seconds > hive> SELECT * from e2; > OK > NULL 2 > 1627575 25 > 1627576 70 > 1690950 22 > 1690952 42 > 1696705 199 > 1696706 66 > 1696730 229 > 1696759 85 > 1696893 218 > Time taken: 0.11 seconds > Here is are the result to the indiviual queries (only the second query > returns a result set): > hive> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions FROM > nikon > WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid; > (...) > OK > <- There are no results, this is normal > Time taken: 41.471 seconds > hive> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues FROM nikon > WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid; > (...) > OK > NULL 2 > 1627575 25 > 1627576 70 > 1690950 22 > 1690952 42 > 1696705 199 > 1696706 66 > 1696730 229 > 1696759 85 > 1696893 218 > Time taken: 39.607 seconds > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira