Hi everyone. I'm running into an issue with SparkContexts when running on
Yarn. The issue is observable when I reproduce these steps in the
spark-shell (version 1.4.1):

scala> sc
res0: org.apache.spark.SparkContext = org.apache.spark.SparkContext@7b965dee

*Note the pointer address of sc.

(Then yarn application -kill <application-id> on the corresponding yarn
application)

scala> val rdd = sc.parallelize(List(1,2,3))
java.lang.IllegalStateException: Cannot call methods on a stopped
SparkContext
  at
org.apache.spark.SparkContext.org$apache$spark$SparkContext$$assertNotStopped(SparkContext.scala:103)
  at
org.apache.spark.SparkContext.defaultParallelism(SparkContext.scala:1914)
  at
org.apache.spark.SparkContext.parallelize$default$2(SparkContext.scala:695)
  ... 49 elided

(Great, the SparkContext has been stopped by the killed yarn application, as
expected.)

alternatively:

scala> sc.stop()
15/07/29 12:10:14 INFO SparkContext: SparkContext already stopped.

(OK, so it's confirmed that it has been stopped.)

scala> org.apache.spark.SparkContext.getOrCreate
res3: org.apache.spark.SparkContext = org.apache.spark.SparkContext@7b965dee

(Hm, that's the same SparkContext, note the pointer address.)

The issue here is that the SparkContext.getOrCreate method returns either
the active SparkContext, if it exists, or creates a new one. Here it is
returning the original SparkContext, meaning the one we verified was stopped
above is still active. How can we recover from this? We can't use the
current one once it's been stopped (unless we allow for multiple contexts to
run using the spark.driver.allowMultipleContexts flag, but that's a band-aid
solution), and we can't seem to create a new one, because the old one is
still marked as active.

Digging a little deeper, in the body of the stop() method of SparkContext,
it seems like we never get to the clearActiveContext() call by the end,
which would have marked the context as inactive. Any future call to stop(),
however, will exit early since the stopped variable is true (hence the
"SparkContext already stopped." log message). So I don't see any other way
to mark the context as not active. Something about how the SparkContext was
stopped after killing the yarn application is preventing the SparkContext
from cleaning up properly.

Any ideas about this?

Thanks,

Andres



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/stopped-SparkContext-remaining-active-tp24065.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to