[jira] [Commented] (HIVE-7431) When run on spark cluster, some spark tasks may fail

Rui Li (JIRA) Thu, 17 Jul 2014 23:49:19 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-7431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14066114#comment-14066114
 ]


Rui Li commented on HIVE-7431:
------------------------------

[~xuefuz] This failure happens when I run select count ( * ) or max/min queries 
on a table.
Spark cluster is deployed in standalone mode.

I added some log to debug the issue. I found that the issue is due to setting 
parent for a TS more than once. Take the malformed op tree I mentioned earlier 
for example, when MAP (69) is created, we set TS (65) as its child and set TS 
(65)'s parent to MAP (69). But later when creating MAP (71), TS (65) is set as 
its child again. Now TS (65)'s parent doesn't contain MAP (69), which triggers 
the exception.

I'm not familiar with how the map operator is structured but I suppose a TS 
shouldn't be assigned to multiple MAPs right? Please note the successful tasks 
don't have such a malformed tree.

I'm still working to find the root cause. Any thoughts on this issue?

> When run on spark cluster, some spark tasks may fail
> ----------------------------------------------------
>
>                 Key: HIVE-7431
>                 URL: https://issues.apache.org/jira/browse/HIVE-7431
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Rui Li
>
> When running queries on spark, some spark tasks fail (usually the first 
> couple of tasks) with the following stack trace:
> {quote}
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:154)
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:60)
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:35)
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:161)
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:161)
> org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:559)
> org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:559)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
> ...
> {quote}
> Observed for spark standalone cluster. Not verified for spark on yarn or 
> mesos.
> NO PRECOMMIT TESTS. This is for spark branch only.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7431) When run on spark cluster, some spark tasks may fail

Reply via email to