I try to use org.apache.spark.util.collection.BitSet instead of
RoaringBitMap, and it can save about 20% memories but runs much slower.
For the 200K tasks job,
RoaringBitMap uses 3 Long[1024] and 1 Short[3392]
=3*64*1024+16*3392=250880(bit)
BitSet uses 1 Long[3125] = 3125*64=20(bit)
Memory s
well, it was -08, and ssh stopped working (according to the alerts)
just as i was logging in to kill off any errant processes. i've taken
that worker offline in jenkins and will be rebooting it asap.
on a positive note, i was able to clear out -07 before anything
horrible happened to that one.
O
Hi all, I want to launch spark job on yarn by java, but it seemes that there is
no way to set numExecutors int the class SparkLauncher. Is there any way to set
numExecutors ?Thanks
qinggangwa...@gmail.com
With Jerry's permission, sending this back to the dev list to close the
loop.
-- Forwarded message --
From: Jerry Lam
Date: Tue, Oct 20, 2015 at 3:54 PM
Subject: Re: If you use Spark 1.5 and disabled Tungsten mode ...
To: Reynold Xin
Yup, coarse grained mode works just fine. :
ok, based on the timing, i *think* this might be the culprit:
https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop1.0,label=spark-test/3814/console
On Tue, Oct 20, 2015 at 3:35 PM, shane knapp wrote:
> -06 just kinda came back...
>
> [root@amp-jenkins-w
amp-jenkins-worker-06 is back up.
my next bets are on -07 and -08... :\
https://amplab.cs.berkeley.edu/jenkins/computer/
On Tue, Oct 20, 2015 at 3:39 PM, shane knapp wrote:
> here's the related stack trace from dmesg... UID 500 is jenkins.
>
> Out of memory: Kill process 142764 (java) score 4
here's the related stack trace from dmesg... UID 500 is jenkins.
Out of memory: Kill process 142764 (java) score 40 or sacrifice child
Killed process 142764, UID 500, (java) total-vm:24685036kB,
anon-rss:5730824kB, file-rss:64kB
Uhhuh. NMI received for unknown reason 21 on CPU 0.
Do you have a st
-06 just kinda came back...
[root@amp-jenkins-worker-06 ~]# uptime
15:29:07 up 26 days, 7:34, 2 users, load average: 1137.91, 1485.69, 1635.89
the builds that, from looking at the process table, seem to be at
fault are the Spark-Master-Maven-pre-yarn matrix builds, and possibly
a Spark-Master
starting this saturday (oct 17) we started getting alerts on the
jenkins workers that various processes were dying (specifically ssh).
since then, we've had half of our workers OOM due to java processes
and have had now to reboot two of them (-05 and -06).
if we look at the current machine that's
Hi Reynold,
Yes, I'm using 1.5.1. I see them quite often. Sometimes it recovers but
sometimes it does not. For one particular job, it failed all the time with
the acquire-memory issue. I'm using spark on mesos with fine grained mode.
Does it make a difference?
Best Regards,
Jerry
On Tue, Oct 20
Jerry - I think that's been fixed in 1.5.1. Do you still see it?
On Tue, Oct 20, 2015 at 2:11 PM, Jerry Lam wrote:
> I disabled it because of the "Could not acquire 65536 bytes of memory". It
> happens to fail the job. So for now, I'm not touching it.
>
> On Tue, Oct 20, 2015 at 4:48 PM, charmee
I disabled it because of the "Could not acquire 65536 bytes of memory". It
happens to fail the job. So for now, I'm not touching it.
On Tue, Oct 20, 2015 at 4:48 PM, charmee wrote:
> We had disabled tungsten after we found few performance issues, but had to
> enable it back because we found that
We had disabled tungsten after we found few performance issues, but had to
enable it back because we found that when we had large number of group by
fields, if tungsten is disabled the shuffle keeps failing.
Here is an excerpt from one of our engineers with his analysis.
With Tungsten Enabled (
Hi Prakhar,
I start to know your problem, you expected that the killed exexcutor by
heartbeat mechanism should be launched again but seems not. This problem I
think is fixed in the version 1.5 of Spark, you could check this jira
https://issues.apache.org/jira/browse/SPARK-8119
Thanks
Saisai
2015
Thanks sai for the input,
So the problem is : i start my job with some fixed number of executors, but
when a host running my executors goes unreachable, driver reduces the total
number of executors. And never increases it.
I have a repro for the issue, attaching logs:
Running spark job is co
In our case, we are dealing with 20TB text data which is separated to about
200k map tasks and 200k reduce tasks, and our driver's memory is 15G,.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/MapStatus-too-large-for-drvier-tp14704p14707.html
Sent fr
How big is your driver heap size? And any reason why you'd need 200k map
and 200k reduce tasks?
On Mon, Oct 19, 2015 at 11:59 PM, yaoqin wrote:
> Hi everyone,
>
> When I run a spark job contains quite a lot of tasks(in my case is
> 200,000*200,000), the driver occured OOM mainly caused by t
Hi all,
I noticed that in ml.classification.LogisticRegression, users are not
allowed to set initial coefficients, while it is supported in
mllib.classification.LogisticRegressionWithSGD.
Sometimes we know specific coefficients are close to the final optima.
e.g., we usually pick yesterday's outp
Hi everyone,
When I run a spark job contains quite a lot of tasks(in my case is
200,000*200,000), the driver occured OOM mainly caused by the object MapStatus,
As is shown in the pic bellow, RoaringBitmap that used to mark which block is
empty seems to use too many memories.
Are there any
19 matches
Mail list logo