quick update:
things are looking slightly... better. the number of failing builds
due to GC overhead has decreased slightly since the reboots last
week... in fact, in the last three days the only builds to be
affected are spark-master-test-maven-hadoop-2.7 (three failures) and
spark-master-test
(adding michael armbrust and josh rosen for visibility)
ok. roughly 9% of all spark tests builds (including both PRB builds
are failing due to GC overhead limits.
$ wc -l SPARK_TEST_BUILDS GC_FAIL
1350 SPARK_TEST_BUILDS
125 GC_FAIL
here are the affected builds (over the past ~2 weeks):
$ sor
On Fri, Jan 6, 2017 at 12:20 PM, shane knapp wrote:
> FYI, this is happening across all spark builds... not just the PRB.
s/all/almost all/
-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
FYI, this is happening across all spark builds... not just the PRB.
i'm compiling a report now and will email that out this afternoon.
:(
On Thu, Jan 5, 2017 at 9:00 PM, shane knapp wrote:
> unsurprisingly, we had another GC:
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder
unsurprisingly, we had another GC:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70949/console
so, definitely not the system (everything looks hunky dory on the build node).
> It can always be some memory leak; if we increase the memory settings
> and OOMs still happen, that
But is there any non-memory-leak reason why the tests should need more
memory? In theory each test should be cleaning up it's own Spark Context
etc. right? My memory is that OOM issues in the tests in the past have been
indicative of memory leaks somewhere.
I do agree that it doesn't seem likely
On Thu, Jan 5, 2017 at 4:58 PM, Kay Ousterhout wrote:
> But is there any non-memory-leak reason why the tests should need more
> memory? In theory each test should be cleaning up it's own Spark Context
> etc. right? My memory is that OOM issues in the tests in the past have been
> indicative of m
Seems like the OOM is coming from tests, which most probably means
it's not an infrastructure issue. Maybe tests just need more memory
these days and we need to update maven / sbt scripts.
On Thu, Jan 5, 2017 at 1:19 PM, shane knapp wrote:
> as of first thing this morning, here's the list of rece
Thanks for looking into this Shane!
On Thu, Jan 5, 2017 at 1:19 PM, shane knapp wrote:
> as of first thing this morning, here's the list of recent GC overhead
> build failures:
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70891/
> console
> https://amplab.cs.berkeley.edu/
as of first thing this morning, here's the list of recent GC overhead
build failures:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70891/console
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70874/console
https://amplab.cs.berkeley.edu/jenkins/job/SparkPul
preliminary findings: seems to be transient, and affecting 4% of
builds from late december until now (which is as far back as we keep
build records for the PRB builds).
408 builds
16 builds.gc <--- failures
it's also happening across all workers at about the same rate.
and best of all, the
nope, no changes to jenkins in the past few months. ganglia graphs
show higher, but not worrying, memory usage on the workers when the
jobs failed...
i'll take a closer look later tonite/first thing tomorrow morning.
shane
On Tue, Jan 3, 2017 at 4:35 PM, Kay Ousterhout wrote:
> I've noticed a
12 matches
Mail list logo