Re: NotSerializableException in Spark Streaming

2014-12-15 Thread Nicholas Chammas
This still seems to be broken. In 1.1.1, it errors immediately on this line (from the above repro script): liveTweets.map(t => noop(t)).print() The stack trace is: org.apache.spark.SparkException: Task not serializable at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureClean

Re: NotSerializableException in Spark Streaming

2014-07-24 Thread Nicholas Chammas
Yep, here goes! Here are my environment vitals: - Spark 1.0.0 - EC2 cluster with 1 slave spun up using spark-ec2 - twitter4j 3.0.3 - spark-shell called with --jars argument to load spark-streaming-twitter_2.10-1.0.0.jar as well as all the twitter4j jars. Now, while I’m in the S

Re: NotSerializableException in Spark Streaming

2014-07-15 Thread Tathagata Das
I am very curious though. Can you post a concise code example which we can run to reproduce this problem? TD On Tue, Jul 15, 2014 at 2:54 PM, Tathagata Das wrote: > I am not entire sure off the top of my head. But a possible (usually > works) workaround is to define the function as a val inste

Re: NotSerializableException in Spark Streaming

2014-07-15 Thread Tathagata Das
I am not entire sure off the top of my head. But a possible (usually works) workaround is to define the function as a val instead of a def. For example def func(i: Int): Boolean = { true } can be written as val func = (i: Int) => { true } Hope this helps for now. TD On Tue, Jul 15, 2014 at 9

Re: NotSerializableException in Spark Streaming

2014-07-15 Thread Nicholas Chammas
Hey Diana, Did you ever figure this out? I’m running into the same exception, except in my case the function I’m calling is a KMeans model.predict(). In regular Spark it works, and Spark Streaming without the call to model.predict() also works, but when put together I get this serialization exce

NotSerializableException in Spark Streaming

2014-05-14 Thread Diana Carroll
Hey all, trying to set up a pretty simple streaming app and getting some weird behavior. First, a non-streaming job that works fine: I'm trying to pull out lines of a log file that match a regex, for which I've set up a function: def getRequestDoc(s: String): String = { "KBDOC-[0-9]*".r.find