[
https://issues.apache.org/jira/browse/HIVE-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12967105#action_12967105
]
Namit Jain commented on HIVE-1830:
----------------------------------
After HIVE-1642, joins are automatically converted into map-joins at physical
optimization time.
However, this may lead to problems.
For eg: consider the query:
select T1.val, count(1) from T1 join T2 on T1.key=T2.key group by T1.val
This will have 2 map-reduce jobs, one for the join and the other for group by.
Before HIVE-1642, the partial group for aggregation will be performed in the
reducer where the join is performed.
However, after HIVE-1642, the same will be performed in the mapper. The local
task will confirm that there is just
enough memory to hold the map-join data. Hoever, it does not take into account
the memory needed for partial group
by.
So, in case there is group by followed by join, it is a good idea to reduce the
memory given to the local task to validate
if there is enough memory to fit small table - it can be controlled by a new
configuration paramter, but it can be some
default: say 70% of total memory (instead of 90%).
Also, the group by may still run out of memory, so it might be a good idea to
check in group by for free memory and
periodically flush memory
> mappers in group followed by joins may die OOM
> ----------------------------------------------
>
> Key: HIVE-1830
> URL: https://issues.apache.org/jira/browse/HIVE-1830
> Project: Hive
> Issue Type: Bug
> Reporter: Namit Jain
> Assignee: Liyin Tang
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.