[ https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13548842#comment-13548842 ]
Hudson commented on HIVE-3552: ------------------------------ Integrated in hive-trunk-hadoop1 #4 (See [https://builds.apache.org/job/hive-trunk-hadoop1/4/]) HIVE-3552. performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys. (Revision 1430979) Result = ABORTED kevinwilfong : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1430979 Files : * /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java * /hive/trunk/conf/hive-default.xml.template * /hive/trunk/data/files/grouping_sets1.txt * /hive/trunk/data/files/grouping_sets2.txt * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java * /hive/trunk/ql/src/test/queries/clientnegative/groupby_grouping_sets6.q * /hive/trunk/ql/src/test/queries/clientnegative/groupby_grouping_sets7.q * /hive/trunk/ql/src/test/queries/clientpositive/groupby_grouping_sets2.q * /hive/trunk/ql/src/test/queries/clientpositive/groupby_grouping_sets3.q * /hive/trunk/ql/src/test/queries/clientpositive/groupby_grouping_sets4.q * /hive/trunk/ql/src/test/queries/clientpositive/groupby_grouping_sets5.q * /hive/trunk/ql/src/test/results/clientnegative/groupby_grouping_sets6.q.out * /hive/trunk/ql/src/test/results/clientnegative/groupby_grouping_sets7.q.out * /hive/trunk/ql/src/test/results/clientpositive/groupby_grouping_sets2.q.out * /hive/trunk/ql/src/test/results/clientpositive/groupby_grouping_sets3.q.out * /hive/trunk/ql/src/test/results/clientpositive/groupby_grouping_sets4.q.out * /hive/trunk/ql/src/test/results/clientpositive/groupby_grouping_sets5.q.out * /hive/trunk/ql/src/test/results/compiler/plan/groupby1.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/groupby2.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/groupby3.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/groupby4.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/groupby5.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/groupby6.q.xml > HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a > high number of grouping set keys > ------------------------------------------------------------------------------------------------------------- > > Key: HIVE-3552 > URL: https://issues.apache.org/jira/browse/HIVE-3552 > Project: Hive > Issue Type: New Feature > Components: Query Processor > Reporter: Namit Jain > Assignee: Namit Jain > Attachments: hive.3552.10.patch, hive.3552.11.patch, > hive.3552.12.patch, hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, > hive.3552.4.patch, hive.3552.5.patch, hive.3552.6.patch, hive.3552.7.patch, > hive.3552.8.patch, hive.3552.9.patch > > > This is a follow up for HIVE-3433. > Had a offline discussion with Sambavi - she pointed out a scenario where the > implementation in HIVE-3433 will not scale. Assume that the user is performing > a cube on many columns, say '8' columns. So, each row would generate 256 rows > for the hash table, which may kill the current group by implementation. > A better implementation would be to add an additional mr job - in the first > mr job perform the group by assuming there was no cube. Add another mr job, > where > you would perform the cube. The assumption is that the group by would have > decreased the output data significantly, and the rows would appear in the > order of > grouping keys which has a higher probability of hitting the hash table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira