[jira] [Commented] (HIVE-19439) MapWork shouldn't be reused when Spark task fails during initialization

Rui Li (JIRA) Mon, 07 May 2018 19:37:32 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-19439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16466772#comment-16466772
 ]


Rui Li commented on HIVE-19439:
-------------------------------

Hi [~vihangk1], the task is retried by Spark, and it calls 
SparkMapRecordHandler::init to initialize the map operator. This is where we 
retrieve the MapWork [from 
cache|https://github.com/apache/hive/blob/rel/release-2.2.0/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java#L75].
I'm not sure whether we have a way to reset the operators to UNINIT state. If 
not, guess we have to clear the cache when initialization fails.

> MapWork shouldn't be reused when Spark task fails during initialization
> -----------------------------------------------------------------------
>
>                 Key: HIVE-19439
>                 URL: https://issues.apache.org/jira/browse/HIVE-19439
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>            Reporter: Rui Li
>            Priority: Major
>
> Issue identified in HIVE-19388. When a Spark task fails during initializing 
> the map operator, the task is retried with the same MapWork retrieved from 
> cache. This can be problematic because the MapWork may be partially 
> initialized, e.g. some operators are already in INIT state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19439) MapWork shouldn't be reused when Spark task fails during initialization

Reply via email to