[ https://issues.apache.org/jira/browse/HIVE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13801766#comment-13801766 ]
Hudson commented on HIVE-4957: ------------------------------ ABORTED: Integrated in Hive-trunk-hadoop2 #515 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/515/]) HIVE-4957 - Restrict number of bit vectors, to prevent out of Java heap memory (Shreepadma Venugopalan via Brock Noland) (brock: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1534337) * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java * /hive/trunk/ql/src/test/queries/clientnegative/compute_stats_long.q * /hive/trunk/ql/src/test/results/clientnegative/compute_stats_long.q.out > Restrict number of bit vectors, to prevent out of Java heap memory > ------------------------------------------------------------------ > > Key: HIVE-4957 > URL: https://issues.apache.org/jira/browse/HIVE-4957 > Project: Hive > Issue Type: Bug > Affects Versions: 0.11.0 > Reporter: Brock Noland > Assignee: Shreepadma Venugopalan > Fix For: 0.13.0 > > Attachments: HIVE-4957.1.patch, HIVE-4957.2.patch > > > normally increase number of bit vectors will increase calculation accuracy. > Let's say > {noformat} > select compute_stats(a, 40) from test_hive; > {noformat} > generally get better accuracy than > {noformat} > select compute_stats(a, 16) from test_hive; > {noformat} > But larger number of bit vectors also cause query run slower. When number of > bit vectors over 50, it won't help to increase accuracy anymore. But it still > increase memory usage, and crash Hive if number if too huge. Current Hive > doesn't prevent user use ridiculous large number of bit vectors in > 'compute_stats' query. > One example > {noformat} > select compute_stats(a, 999999999) from column_eight_types; > {noformat} > crashes Hive. > {noformat} > 2012-12-20 23:21:52,247 Stage-1 map = 0%, reduce = 0% > 2012-12-20 23:22:11,315 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.29 > sec > MapReduce Total cumulative CPU time: 290 msec > Ended Job = job_1354923204155_0777 with errors > Error during job, obtaining debugging information... > Job Tracking URL: > http://cs-10-20-81-171.cloud.cloudera.com:8088/proxy/application_1354923204155_0777/ > Examining task ID: task_1354923204155_0777_m_000000 (and more) from job > job_1354923204155_0777 > Task with the most failures(4): > ----- > Task ID: > task_1354923204155_0777_m_000000 > URL: > > http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1354923204155_0777&tipid=task_1354923204155_0777_m_000000 > ----- > Diagnostic Messages for this Task: > Error: Java heap space > {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)