Re: Spark Streaming + Kafka + Mesos/Marathon strangeness

2014-03-28 Thread Tathagata Das
The cleaner ttl was introduced as a "brute force" method to clean all old data and metadata in the system, so that the system can run 24/7. The cleaner ttl should be set to a large value, so that RDDs older than that are not used. Though there are some cases where you may want to use an RDD again a

Re: Spark Streaming + Kafka + Mesos/Marathon strangeness

2014-03-27 Thread Evgeny Shishkin
On 28 Mar 2014, at 01:44, Tathagata Das wrote: > The more I think about it the problem is not about /tmp, its more about the > workers not having enough memory. Blocks of received data could be falling > out of memory before it is getting processed. > BTW, what is the storage level that you a

Re: Spark Streaming + Kafka + Mesos/Marathon strangeness

2014-03-27 Thread Tathagata Das
The more I think about it the problem is not about /tmp, its more about the workers not having enough memory. Blocks of received data could be falling out of memory before it is getting processed. BTW, what is the storage level that you are using for your input stream? If you are using MEMORY_ONLY,

Re: Spark Streaming + Kafka + Mesos/Marathon strangeness

2014-03-27 Thread Scott Clasen
Heh sorry that wasnt a clear question, I know 'how' to set it but dont know what value to use in a mesos cluster, since the processes are running in lxc containers they wont be sharing a filesystem (or machine for that matter) I cant use an s3n:// url for local dir can I? -- View this message

Re: Spark Streaming + Kafka + Mesos/Marathon strangeness

2014-03-27 Thread Tathagata Das
spark.local.dir should be specified in the same way as other configuration parameters. On Thu, Mar 27, 2014 at 10:32 AM, Scott Clasen wrote: > I think now that this is because spark.local.dir is defaulting to /tmp, and > since the tasks a

Re: Spark Streaming + Kafka + Mesos/Marathon strangeness

2014-03-27 Thread Scott Clasen
I think now that this is because spark.local.dir is defaulting to /tmp, and since the tasks are not running on the same machine, the file is not found when the second task takes over. How do you set spark.local.dir appropriately when running on mesos? -- View this message in context: http://ap

Re: Spark Streaming + Kafka + Mesos/Marathon strangeness

2014-03-26 Thread Scott Clasen
The web-ui shows 3 executors, the driver and one spark task on each worker. I do see that there were 8 successful tasks and the ninth failed like so... java.lang.Exception (java.lang.Exception: Could not compute split, block input-0-1395860790200 not found) org.apache.spark.rdd.BlockRDD.compute(B

Re: Spark Streaming + Kafka + Mesos/Marathon strangeness

2014-03-26 Thread Tathagata Das
Does the Spark applications web-ui give any indication on what kind of resources it is getting from mesos? TD On Wed, Mar 26, 2014 at 12:18 PM, Scott Clasen wrote: > I have a mesos cluster which runs marathon. > > I am using marathon to launch a long running spark streaming job which > consumes