subject:"Re\: Tests failing with GC limit exceeded"

Re: Tests failing with GC limit exceeded

2017-01-10 Thread shane knapp

quick update: things are looking slightly... better. the number of failing builds due to GC overhead has decreased slightly since the reboots last week... in fact, in the last three days the only builds to be affected are spark-master-test-maven-hadoop-2.7 (three failures) and spark-master-test

Re: Tests failing with GC limit exceeded

2017-01-06 Thread shane knapp

(adding michael armbrust and josh rosen for visibility) ok. roughly 9% of all spark tests builds (including both PRB builds are failing due to GC overhead limits. $ wc -l SPARK_TEST_BUILDS GC_FAIL 1350 SPARK_TEST_BUILDS 125 GC_FAIL here are the affected builds (over the past ~2 weeks): $ sor

Re: Tests failing with GC limit exceeded

2017-01-06 Thread shane knapp

On Fri, Jan 6, 2017 at 12:20 PM, shane knapp wrote: > FYI, this is happening across all spark builds... not just the PRB. s/all/almost all/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Tests failing with GC limit exceeded

2017-01-06 Thread shane knapp

FYI, this is happening across all spark builds... not just the PRB. i'm compiling a report now and will email that out this afternoon. :( On Thu, Jan 5, 2017 at 9:00 PM, shane knapp wrote: > unsurprisingly, we had another GC: > > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder

Re: Tests failing with GC limit exceeded

2017-01-05 Thread shane knapp

unsurprisingly, we had another GC: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70949/console so, definitely not the system (everything looks hunky dory on the build node). > It can always be some memory leak; if we increase the memory settings > and OOMs still happen, that

Re: Tests failing with GC limit exceeded

2017-01-05 Thread Kay Ousterhout

But is there any non-memory-leak reason why the tests should need more memory? In theory each test should be cleaning up it's own Spark Context etc. right? My memory is that OOM issues in the tests in the past have been indicative of memory leaks somewhere. I do agree that it doesn't seem likely

Re: Tests failing with GC limit exceeded

2017-01-05 Thread Marcelo Vanzin

On Thu, Jan 5, 2017 at 4:58 PM, Kay Ousterhout wrote: > But is there any non-memory-leak reason why the tests should need more > memory? In theory each test should be cleaning up it's own Spark Context > etc. right? My memory is that OOM issues in the tests in the past have been > indicative of m

Re: Tests failing with GC limit exceeded

2017-01-05 Thread Marcelo Vanzin

Seems like the OOM is coming from tests, which most probably means it's not an infrastructure issue. Maybe tests just need more memory these days and we need to update maven / sbt scripts. On Thu, Jan 5, 2017 at 1:19 PM, shane knapp wrote: > as of first thing this morning, here's the list of rece

Re: Tests failing with GC limit exceeded

2017-01-05 Thread Kay Ousterhout

Thanks for looking into this Shane! On Thu, Jan 5, 2017 at 1:19 PM, shane knapp wrote: > as of first thing this morning, here's the list of recent GC overhead > build failures: > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70891/ > console > https://amplab.cs.berkeley.edu/

Re: Tests failing with GC limit exceeded

2017-01-05 Thread shane knapp

as of first thing this morning, here's the list of recent GC overhead build failures: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70891/console https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70874/console https://amplab.cs.berkeley.edu/jenkins/job/SparkPul

Re: Tests failing with GC limit exceeded

2017-01-04 Thread shane knapp

preliminary findings: seems to be transient, and affecting 4% of builds from late december until now (which is as far back as we keep build records for the PRB builds). 408 builds 16 builds.gc <--- failures it's also happening across all workers at about the same rate. and best of all, the

Re: Tests failing with GC limit exceeded

2017-01-03 Thread shane knapp

nope, no changes to jenkins in the past few months. ganglia graphs show higher, but not worrying, memory usage on the workers when the jobs failed... i'll take a closer look later tonite/first thing tomorrow morning. shane On Tue, Jan 3, 2017 at 4:35 PM, Kay Ousterhout wrote: > I've noticed a

Re: Tests failing with GC limit exceeded

Re: Tests failing with GC limit exceeded

Re: Tests failing with GC limit exceeded

Re: Tests failing with GC limit exceeded

Re: Tests failing with GC limit exceeded

Re: Tests failing with GC limit exceeded

Re: Tests failing with GC limit exceeded

Re: Tests failing with GC limit exceeded

Re: Tests failing with GC limit exceeded

Re: Tests failing with GC limit exceeded

Re: Tests failing with GC limit exceeded

Re: Tests failing with GC limit exceeded

12 matches

Site Navigation

Mail list logo

Footer information