[GitHub] spark pull request: SPARK-897: preemptively serialize closures

2014-03-14 Thread kayousterhout
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/143#issuecomment-37711356 I know @pwendell has expressed concern about config option bloat so maybe he has an opinion here...I would be in favor of not adding a config option because it's a r

[GitHub] spark pull request: SPARK-897: preemptively serialize closures

2014-03-14 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/143#issuecomment-37693355 A configuration option makes sense to me and I'm happy to add it. Let me know if you have strong feelings about what it should be called. --- If your project is set up for

[GitHub] spark pull request: SPARK-897: preemptively serialize closures

2014-03-14 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/143#issuecomment-37691023 I was thinking maybe we want a config option for this - which is on by default, but can be turned off. What do you guys think? --- If your project is set up for it, you can

[GitHub] spark pull request: SPARK-897: preemptively serialize closures

2014-03-14 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/143#issuecomment-37690238 Here's what I was thinking about that: I left the check in `DAGScheduler` in place because preemptive checking is optional (and indeed not done everywhere) and it seems lik

[GitHub] spark pull request: SPARK-897: preemptively serialize closures

2014-03-14 Thread kayousterhout
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/143#issuecomment-37686977 Ah sorry I didn't see that clean() gets called when the RDD is created and not just when the job is submitted. I think the check in DAGScheduler should be removed n

[GitHub] spark pull request: SPARK-897: preemptively serialize closures

2014-03-14 Thread willb
Github user willb commented on the pull request: https://github.com/apache/spark/pull/143#issuecomment-37686459 Yes, my understanding of SPARK-897 is that the issue is ensuring serializability errors are reported to the user as soon as possible. And essentially what these commits do

[GitHub] spark pull request: SPARK-897: preemptively serialize closures

2014-03-14 Thread willb
Github user willb commented on a diff in the pull request: https://github.com/apache/spark/pull/143#discussion_r10622998 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala --- @@ -533,7 +533,7 @@ abstract class DStream[T: ClassTag] ( * on ea

[GitHub] spark pull request: SPARK-897: preemptively serialize closures

2014-03-14 Thread kayousterhout
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/143#issuecomment-37685303 I'm not sure this fixes the problem Reynold was referring to in his pull request. If you look in DAGScheduler.scala, on line 773, it does essentially the same thing

[GitHub] spark pull request: SPARK-897: preemptively serialize closures

2014-03-14 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/143#discussion_r10622197 --- Diff: core/src/test/scala/org/apache/spark/serializer/ProactiveClosureSerializationSuite.scala --- @@ -0,0 +1,79 @@ +/* + * Licensed to the

[GitHub] spark pull request: SPARK-897: preemptively serialize closures

2014-03-14 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/143#discussion_r10622183 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala --- @@ -533,7 +533,7 @@ abstract class DStream[T: ClassTag] (

[GitHub] spark pull request: SPARK-897: preemptively serialize closures

2014-03-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/143#issuecomment-37681460 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13183/ --- If your project

[GitHub] spark pull request: SPARK-897: preemptively serialize closures

2014-03-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/143#issuecomment-37681456 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-897: preemptively serialize closures

2014-03-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/143#issuecomment-37676118 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have t

[GitHub] spark pull request: SPARK-897: preemptively serialize closures

2014-03-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/143#issuecomment-37676117 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not hav

[GitHub] spark pull request: SPARK-897: preemptively serialize closures

2014-03-14 Thread willb
GitHub user willb opened a pull request: https://github.com/apache/spark/pull/143 SPARK-897: preemptively serialize closures These commits cause `ClosureCleaner.clean` to attempt to serialize the cleaned closure with the default closure serializer and throw a `SparkException` if d