[ 
https://issues.apache.org/jira/browse/HIVE-18445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16325380#comment-16325380
 ] 

Laszlo Bodor edited comment on HIVE-18445 at 1/19/18 1:02 PM:
--------------------------------------------------------------

The issue here is, that the test is about exhausting the local mapper task's 
memory, and to achieve this, it sets a parameter at the beginning...
{code:java}
set hive.mapjoin.localtask.max.memory.usage = 0.0001;
{code}
...so the task can use the 0.01% percent of the process' memory. It seems to be 
ok for testing memory exhaustion, but the problem is that it affects all 
queries.

Checking the q.out file, it seems like we expect an exhaustion by running the 
2nd query:
{code:java}
FROM src src1 JOIN src src2 ON (src1.key = src2.key) JOIN src src3 ON (src1.key 
+ src2.key = src3.key)
INSERT OVERWRITE TABLE dest_j2 SELECT src1.key, src3.value;
{code}
But when the test fails, it fails on the first statement (which is not supposed 
to fail):
{code:java}
FROM srcpart src1 JOIN src src2 ON (src1.key = src2.key)
INSERT OVERWRITE TABLE dest1 SELECT src1.key, src2.value 
where (src1.ds = '2008-04-08' or src1.ds = '2008-04-09' )and (src1.hr = '12' or 
src1.hr = '11');
{code}
I think the best practise would be to set the parameter before the target 
query, and reset it to default (or a higher value) after, like:
{code:java}
set hive.mapjoin.localtask.max.memory.usage = 0.0001;

FROM src src1 JOIN src src2 ON (src1.key = src2.key) JOIN src src3 ON (src1.key 
+ src2.key = src3.key)
INSERT OVERWRITE TABLE dest_j2 SELECT src1.key, src3.value;

set hive.mapjoin.localtask.max.memory.usage = 0.9;
{code}


was (Author: abstractdog):
The issue here is, that the test is about exhausting the local mapper task's 
memory, and to achieve this, it sets a parameter at the beginning...
{code}
set hive.mapjoin.localtask.max.memory.usage = 0.0001;
{code}
...so the task can use the 0.0001% percent of the process' memory. It seems to 
be ok for testing memory exhaustion, but the problem is that it affects all 
queries.

Checking the q.out file, it seems like we expect an exhaustion by running the 
2nd query:
{code}
FROM src src1 JOIN src src2 ON (src1.key = src2.key) JOIN src src3 ON (src1.key 
+ src2.key = src3.key)
INSERT OVERWRITE TABLE dest_j2 SELECT src1.key, src3.value;
{code}

But when the test fails, it fails on the first statement (which is not supposed 
to fail):
{code}
FROM srcpart src1 JOIN src src2 ON (src1.key = src2.key)
INSERT OVERWRITE TABLE dest1 SELECT src1.key, src2.value 
where (src1.ds = '2008-04-08' or src1.ds = '2008-04-09' )and (src1.hr = '12' or 
src1.hr = '11');
{code}

I think the best practise would be to set the parameter before the target 
query, and reset it to default (or a higher value) after, like:
{code}
set hive.mapjoin.localtask.max.memory.usage = 0.0001;

FROM src src1 JOIN src src2 ON (src1.key = src2.key) JOIN src src3 ON (src1.key 
+ src2.key = src3.key)
INSERT OVERWRITE TABLE dest_j2 SELECT src1.key, src3.value;

set hive.mapjoin.localtask.max.memory.usage = 0.9;
{code}

> qtests: auto_join25.q fails permanently
> ---------------------------------------
>
>                 Key: HIVE-18445
>                 URL: https://issues.apache.org/jira/browse/HIVE-18445
>             Project: Hive
>          Issue Type: Bug
>          Components: Tests
>            Reporter: Laszlo Bodor
>            Assignee: Laszlo Bodor
>            Priority: Major
>
> org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join25] 
> (batchId=72)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to