[ 
https://issues.apache.org/jira/browse/HIVE-27858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17817252#comment-17817252
 ] 

Ryu Kobayashi commented on HIVE-27858:
--------------------------------------

[~okumin] and I investigated and found that the root cause was combinatorial 
explosion of execution plans. [^query.sql] has 20 nested CTEs, and after 
expansion, 40,000 Operators, including 4,000 TableScanOperators, are generated. 
The final memory consumption is also huge, since each has a copied String, 
HashMap, and ColumnInfo. The same issue happens even if it reduce the number of 
columns.

heap dump:

!image-2024-02-14-17-30-59-359.png|width=482,height=213!

Operators:

!image-2024-02-14-17-31-32-714.png|width=488,height=21!

We believe that CTE materialization is currently the only effective way. This 
query will succeed if we set `hive.optimize.cte.materialize.threshold` and 
`hive.optimize.cte.materialize.full.aggregate.only=false`. However, due to a 
known bug(HIVE-24167) in CTE materialization, 
`hive.cbo.fallback.strategy=NEVER` will fail. Therefore, we need to disable CBO 
or set `hive.cbo.fallback.strategy=CONSERVATIVE`. And also, re-investigate, we 
found that this issue also happens in Hive 2 when 
`hive.optimize.cte.materialize.threshold=-1`. So, it is considered not to be 
degraded.

In summary, this query will succeed even in Hive 4 if I set the following 
parameters: Therefore, I think that this ticket itself can be closed without 
fix. What do you think?
{code:java}
set hive.optimize.cte.materialize.threshold=3;
set hive.optimize.cte.materialize.full.aggregate.only=false;
set hive.cbo.fallback.strategy=CONSERVATIVE ;{code}

> OOM happens when selecting many columns and  JOIN.
> --------------------------------------------------
>
>                 Key: HIVE-27858
>                 URL: https://issues.apache.org/jira/browse/HIVE-27858
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Planning
>    Affects Versions: 4.0.0-beta-1
>            Reporter: Ryu Kobayashi
>            Assignee: okumin
>            Priority: Critical
>              Labels: hive-4.0.0-must
>             Fix For: Not Applicable
>
>         Attachments: HIVE-27858.full.q, HIVE-27858.less.columns.q, ddl.sql, 
> query.sql
>
>
> OOM happens when executing [^query.sql] using a table in [^ddl.sql]. These 
> did not happen in Hive 2 previously.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to