Re: SparkContext & Threading

2015-06-06 Thread William Briggs
Hi Lee, I'm stuck with only mobile devices for correspondence right now, so I can't get to shell to play with this issue - this is all supposition; I think that the lambdas are closing over the context because it's a constructor parameter to your Runnable class, which is why inlining the lambdas in

Re: SparkContext & Threading

2015-06-06 Thread Lee McFadden
Hi Will, That doesn't seem to be the case and was part of the source of my confusion. The code currently in the run method of the runnable works perfectly fine with the lambda expressions when it is invoked from the main method. They also work when they are invoked from within a separate method on

Re: SparkContext & Threading

2015-06-06 Thread Will Briggs
Hi Lee, it's actually not related to threading at all - you would still have the same problem even if you were using a single thread. See this section ( https://spark.apache.org/docs/latest/programming-guide.html#passing-functions-to-spark) of the Spark docs. On June 5, 2015, at 5:12 PM, Lee M

Re: SparkContext & Threading

2015-06-05 Thread Lee McFadden
On Fri, Jun 5, 2015 at 2:05 PM Will Briggs wrote: > Your lambda expressions on the RDDs in the SecondRollup class are closing > around the context, and Spark has special logic to ensure that all > variables in a closure used on an RDD are Serializable - I hate linking to > Quora, but there's a go

Re: SparkContext & Threading

2015-06-05 Thread Will Briggs
Your lambda expressions on the RDDs in the SecondRollup class are closing around the context, and Spark has special logic to ensure that all variables in a closure used on an RDD are Serializable - I hate linking to Quora, but there's a good explanation here: http://www.quora.com/What-does-Clos

Re: SparkContext & Threading

2015-06-05 Thread Lee McFadden
On Fri, Jun 5, 2015 at 1:00 PM Igor Berman wrote: > Lee, what cluster do you use? standalone, yarn-cluster, yarn-client, mesos? > Spark standalone, v1.2.1.

Re: SparkContext & Threading

2015-06-05 Thread Lee McFadden
On Fri, Jun 5, 2015 at 12:58 PM Marcelo Vanzin wrote: > You didn't show the error so the only thing we can do is speculate. You're > probably sending the object that's holding the SparkContext reference over > the network at some point (e.g. it's used by a task run in an executor), > and that's w

Re: SparkContext & Threading

2015-06-05 Thread Igor Berman
Lee, what cluster do you use? standalone, yarn-cluster, yarn-client, mesos? in yarn-cluster the driver program is executed inside one of nodes in cluster, so might be that driver code needs to be serialized to be sent to some node On 5 June 2015 at 22:55, Lee McFadden wrote: > > On Fri, Jun 5, 2

Re: SparkContext & Threading

2015-06-05 Thread Marcelo Vanzin
On Fri, Jun 5, 2015 at 12:55 PM, Lee McFadden wrote: > Regarding serialization, I'm still confused as to why I was getting a > serialization error in the first place as I'm executing these Runnable > classes from a java thread pool. I'm fairly new to Scala/JVM world and > there doesn't seem to b

Re: SparkContext & Threading

2015-06-05 Thread Lee McFadden
On Fri, Jun 5, 2015 at 12:30 PM Marcelo Vanzin wrote: > Ignoring the serialization thing (seems like a red herring): > People seem surprised that I'm getting the Serialization exception at all - I'm not convinced it's a red herring per se, but on to the blocking issue... > > You might be using

Re: SparkContext & Threading

2015-06-05 Thread Marcelo Vanzin
Ignoring the serialization thing (seems like a red herring): On Fri, Jun 5, 2015 at 11:48 AM, Lee McFadden wrote: > 15/06/05 11:35:32 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, > localhost): java.lang.NoSuchMethodError: > org.apache.spark.executor.TaskMetrics.inputMetrics_$eq(Lscala

Re: SparkContext & Threading

2015-06-05 Thread Igor Berman
+1 to question about serializaiton. SparkContext is still in driver process(even if it has several threads from which you submit jobs) as for the problem, check your classpath, scala version, spark version etc. such errors usually happens when there is some conflict in classpath. Maybe you compiled

Re: SparkContext & Threading

2015-06-05 Thread Lee McFadden
You can see an example of the constructor for the class which executes a job in my opening post. I'm attempting to instantiate and run the class using the code below: ``` val conf = new SparkConf() .setAppName(appNameBase.format("Test")) val connector = CassandraConnector(conf)

Re: SparkContext & Threading

2015-06-05 Thread Marcelo Vanzin
On Fri, Jun 5, 2015 at 11:48 AM, Lee McFadden wrote: > Initially I had issues passing the SparkContext to other threads as it is > not serializable. Eventually I found that adding the @transient annotation > prevents a NotSerializableException. > This is really puzzling. How are you passing the