[GitHub] spark pull request: SPARK-897: preemptively serialize closures

2014-03-14 Thread willb
GitHub user willb opened a pull request: https://github.com/apache/spark/pull/143 SPARK-897: preemptively serialize closures These commits cause `ClosureCleaner.clean` to attempt to serialize the cleaned closure with the default closure serializer and throw a `SparkException` if

[GitHub] spark pull request: SPARK-897: preemptively serialize closures

2014-03-14 Thread willb
Github user willb commented on a diff in the pull request: https://github.com/apache/spark/pull/143#discussion_r10622998 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala --- @@ -533,7 +533,7 @@ abstract class DStream[T: ClassTag] ( * on

[GitHub] spark pull request: SPARK-897: preemptively serialize closures

2014-03-14 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/143#issuecomment-37686459 Yes, my understanding of SPARK-897 is that the issue is ensuring serializability errors are reported to the user as soon as possible. And essentially what these commits do

[GitHub] spark pull request: SPARK-897: preemptively serialize closures

2014-03-14 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/143#issuecomment-37690238 Here's what I was thinking about that: I left the check in `DAGScheduler` in place because preemptive checking is optional (and indeed not done everywhere) and it

[GitHub] spark pull request: SPARK-897: preemptively serialize closures

2014-03-14 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/143#issuecomment-37693355 A configuration option makes sense to me and I'm happy to add it. Let me know if you have strong feelings about what it should be called. --- If your project is s