I don't think it solves Cody's problem which still need more
investigating, but I believe it does solve the problem you described
earlier.
I just confirmed with Mesos folks that we no longer need the minimum
memory requirement so we'll be dropping that soon and the workaround
might not be needed f
My problem is that I'm not sure this workaround would solve things, given the
issue described here (where there was a lot of memory free but it didn't get
re-offered). If you think it does, it would be good to explain why it behaves
like that.
Matei
On August 25, 2014 at 2:28:18 PM, Timothy Ch
Hi Matei,
I'm going to investigate from both Mesos and Spark side will hopefully
have a good long term solution. In the mean time having a work around
to start with is going to unblock folks.
Tim
On Mon, Aug 25, 2014 at 1:08 PM, Matei Zaharia wrote:
> Anyway it would be good if someone from the
Anyway it would be good if someone from the Mesos side investigates this and
proposes a solution. The 32 MB per task hack isn't completely foolproof either
(e.g. people might allocate all the RAM to their executor and thus stop being
able to launch tasks), so maybe we wait on a Mesos fix for thi
This is kind of weird then, seems perhaps unrelated to this issue (or at least
to the way I understood it). Is the problem maybe that Mesos saw 0 MB being
freed and didn't re-offer the machine *even though there was more than 32 MB
free overall*?
Matei
On August 25, 2014 at 12:59:59 PM, Cody K
I definitely saw a case where
a. the only job running was a 256m shell
b. I started a 2g job
c. a little while later the same user as in a started another 256m shell
My job immediately stopped making progress. Once user a killed his shells,
it started again.
This is on nodes with ~15G of memory
BTW it seems to me that even without that patch, you should be getting tasks
launched as long as you leave at least 32 MB of memory free on each machine
(that is, the sum of the executor memory sizes is not exactly the same as the
total size of the machine). Then Mesos will be able to re-offer t
We have not tried the work-around because there are other bugs in there
that affected our set-up, though it seems it would help.
On Mon, Aug 25, 2014 at 12:54 AM, Timothy Chen wrote:
> +1 to have the work around in.
>
> I'll be investigating from the Mesos side too.
>
> Tim
>
> On Sun, Aug 24,
+1 to have the work around in.
I'll be investigating from the Mesos side too.
Tim
On Sun, Aug 24, 2014 at 9:52 PM, Matei Zaharia wrote:
> Yeah, Mesos in coarse-grained mode probably wouldn't work here. It's too bad
> that this happens in fine-grained mode -- would be really good to fix. I'll
Yeah, Mesos in coarse-grained mode probably wouldn't work here. It's too bad
that this happens in fine-grained mode -- would be really good to fix. I'll see
if we can get the workaround in https://github.com/apache/spark/pull/1860 into
Spark 1.1. Incidentally have you tried that?
Matei
On Augu
Hi Matei,
We have an analytics team that uses the cluster on a daily basis. They use
two types of 'run modes':
1) For running actual queries, they set the spark.executor.memory to
something between 4 and 8GB of RAM/worker.
2) A shell that takes a minimal amount of memory on workers (128MB) for
Hey Gary, just as a workaround, note that you can use Mesos in coarse-grained
mode by setting spark.mesos.coarse=true. Then it will hold onto CPUs for the
duration of the job.
Matei
On August 23, 2014 at 7:57:30 AM, Gary Malouf (malouf.g...@gmail.com) wrote:
I just wanted to bring up a signifi
12 matches
Mail list logo