[ https://issues.apache.org/jira/browse/HIVE-19439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16466772#comment-16466772 ]
Rui Li commented on HIVE-19439: ------------------------------- Hi [~vihangk1], the task is retried by Spark, and it calls SparkMapRecordHandler::init to initialize the map operator. This is where we retrieve the MapWork [from cache|https://github.com/apache/hive/blob/rel/release-2.2.0/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java#L75]. I'm not sure whether we have a way to reset the operators to UNINIT state. If not, guess we have to clear the cache when initialization fails. > MapWork shouldn't be reused when Spark task fails during initialization > ----------------------------------------------------------------------- > > Key: HIVE-19439 > URL: https://issues.apache.org/jira/browse/HIVE-19439 > Project: Hive > Issue Type: Bug > Components: Spark > Reporter: Rui Li > Priority: Major > > Issue identified in HIVE-19388. When a Spark task fails during initializing > the map operator, the task is retried with the same MapWork retrieved from > cache. This can be problematic because the MapWork may be partially > initialized, e.g. some operators are already in INIT state. -- This message was sent by Atlassian JIRA (v7.6.3#76005)