GitHub user willb opened a pull request:
https://github.com/apache/spark/pull/143
SPARK-897: preemptively serialize closures
These commits cause `ClosureCleaner.clean` to attempt to serialize the
cleaned closure with the default closure serializer and throw a
`SparkException` if
Github user willb commented on a diff in the pull request:
https://github.com/apache/spark/pull/143#discussion_r10622998
--- Diff:
streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala ---
@@ -533,7 +533,7 @@ abstract class DStream[T: ClassTag] (
* on
Github user willb commented on the pull request:
https://github.com/apache/spark/pull/143#issuecomment-37686459
Yes, my understanding of SPARK-897 is that the issue is ensuring
serializability errors are reported to the user as soon as possible. And
essentially what these commits do
Github user willb commented on the pull request:
https://github.com/apache/spark/pull/143#issuecomment-37690238
Here's what I was thinking about that: I left the check in `DAGScheduler`
in place because preemptive checking is optional (and indeed not done
everywhere) and it
Github user willb commented on the pull request:
https://github.com/apache/spark/pull/143#issuecomment-37693355
A configuration option makes sense to me and I'm happy to add it. Let me
know if you have strong feelings about what it should be called.
---
If your project is s