[ https://issues.apache.org/jira/browse/HIVE-7431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14066114#comment-14066114 ]
Rui Li commented on HIVE-7431: ------------------------------ [~xuefuz] This failure happens when I run select count ( * ) or max/min queries on a table. Spark cluster is deployed in standalone mode. I added some log to debug the issue. I found that the issue is due to setting parent for a TS more than once. Take the malformed op tree I mentioned earlier for example, when MAP (69) is created, we set TS (65) as its child and set TS (65)'s parent to MAP (69). But later when creating MAP (71), TS (65) is set as its child again. Now TS (65)'s parent doesn't contain MAP (69), which triggers the exception. I'm not familiar with how the map operator is structured but I suppose a TS shouldn't be assigned to multiple MAPs right? Please note the successful tasks don't have such a malformed tree. I'm still working to find the root cause. Any thoughts on this issue? > When run on spark cluster, some spark tasks may fail > ---------------------------------------------------- > > Key: HIVE-7431 > URL: https://issues.apache.org/jira/browse/HIVE-7431 > Project: Hive > Issue Type: Bug > Reporter: Rui Li > > When running queries on spark, some spark tasks fail (usually the first > couple of tasks) with the following stack trace: > {quote} > org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:154) > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:60) > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:35) > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:161) > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:161) > org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:559) > org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:559) > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158) > ... > {quote} > Observed for spark standalone cluster. Not verified for spark on yarn or > mesos. > NO PRECOMMIT TESTS. This is for spark branch only. -- This message was sent by Atlassian JIRA (v6.2#6252)