working on it. we'll have intermittent downtime the next ~30 mins. On Sun, May 21, 2017 at 12:01 PM, shane knapp <skn...@berkeley.edu> wrote: > yeah. i noticed that and restarted it a few minutes ago. i'll have > some time later this afternoon to take a closer look... :\ > > On Sun, May 21, 2017 at 9:08 AM, Kazuaki Ishizaki <ishiz...@jp.ibm.com> wrote: >> It looked well these days. However, it seems to go down slowly again... >> >> When I tried to see console log (e.g. >> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77149/consoleFull), >> a server returns "proxy error." >> >> Regards, >> Kazuaki Ishizaki >> >> >> >> From: shane knapp <skn...@berkeley.edu> >> To: Sean Owen <so...@cloudera.com> >> Cc: dev <dev@spark.apache.org> >> Date: 2017/05/20 09:43 >> Subject: Re: [build system] jenkins got itself wedged... >> ________________________________ >> >> >> >> last update of the week: >> >> things are looking great... we're GCing happily and staying well >> within our memory limits. >> >> i'm going to do one more restart after the two pull request builds >> finish to re-enable backups, and call it a weekend. :) >> >> shane >> >> On Fri, May 19, 2017 at 8:29 AM, shane knapp <skn...@berkeley.edu> wrote: >>> this is hopefully my final email on the subject... :) >>> >>> things have seemed to settled down after my GC tuning, and system >>> load/cpu usage/memory has been nice and flat all night. i'll continue >>> to keep an eye on things but it looks like we've weathered the worst >>> part of the storm. >>> >>> On Thu, May 18, 2017 at 6:40 PM, shane knapp <skn...@berkeley.edu> wrote: >>>> after needing another restart this afternoon, i did some homework and >>>> aggressively twiddled some GC settings[1]. since then, things have >>>> definitely smoothed out w/regards to memory and cpu usage spikes. >>>> >>>> i've attached a screenshot of slightly happier looking graphs. >>>> >>>> still keeping an eye on things, and hoping that i can go back to being >>>> a lurker... ;) >>>> >>>> shane >>>> >>>> 1 - https://jenkins.io/blog/2016/11/21/gc-tuning/ >>>> >>>> On Thu, May 18, 2017 at 11:20 AM, shane knapp <skn...@berkeley.edu> >>>> wrote: >>>>> ok, more updates: >>>>> >>>>> 1) i audited all of the builds, and found that the spark-*-compile-* >>>>> and spark-*-test-* jobs were set to the identical cron time trigger, >>>>> so josh rosen and i updated them to run at H/5 (instead of */5). load >>>>> balancing ftw. >>>>> >>>>> 2) the jenkins master is now running on java8, which has moar bettar >>>>> GC management under the hood. >>>>> >>>>> i'll be keeping an eye on this today, and if we start seeing GC >>>>> overhead failures, i'll start doing more GC performance tuning. >>>>> thankfully, cloudbees has a relatively decent guide that i'll be >>>>> following here: https://jenkins.io/blog/2016/11/21/gc-tuning/ >>>>> >>>>> shane >>>>> >>>>> On Thu, May 18, 2017 at 8:39 AM, shane knapp <skn...@berkeley.edu> >>>>> wrote: >>>>>> yeah, i spoke too soon. jenkins is still misbehaving, but FINALLY i'm >>>>>> getting some error messages in the logs... looks like jenkins is >>>>>> thrashing on GC. >>>>>> >>>>>> now that i know what's up, i should be able to get this sorted today. >>>>>> >>>>>> On Thu, May 18, 2017 at 12:39 AM, Sean Owen <so...@cloudera.com> wrote: >>>>>>> I'm not sure if it's related, but I still can't get Jenkins to test >>>>>>> PRs. For >>>>>>> example, triggering it through the spark-prs.appspot.com UI gives >>>>>>> me... >>>>>>> >>>>>>> https://spark-prs.appspot.com/trigger-jenkins/18012 >>>>>>> >>>>>>> Internal Server Error >>>>>>> >>>>>>> That might be from the appspot app though? >>>>>>> >>>>>>> But posting "Jenkins test this please" on PRs doesn't seem to work, >>>>>>> and I >>>>>>> can't reach Jenkins: >>>>>>> >>>>>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/ >>>>>>> >>>>>>> On Thu, May 18, 2017 at 12:44 AM shane knapp <skn...@berkeley.edu> >>>>>>> wrote: >>>>>>>> >>>>>>>> after another couple of restarts due to high load and system >>>>>>>> unresponsiveness, i finally found what is the most likely culprit: >>>>>>>> >>>>>>>> a typo in the jenkins config where the java heap size was configured. >>>>>>>> instead of -Xmx16g, we had -Dmx16G... which could easily explain the >>>>>>>> random and non-deterministic system hangs we've had over the past >>>>>>>> couple of years. >>>>>>>> >>>>>>>> anyways, it's been corrected and the master seems to be humming >>>>>>>> along, >>>>>>>> for real this time, w/o issue. i'll continue to keep an eye on this >>>>>>>> for the rest of the week, but things are looking MUCH better now. >>>>>>>> >>>>>>>> sorry again for the interruptions in service. >>>>>>>> >>>>>>>> shane >>>>>>>> >>>>>>>> On Wed, May 17, 2017 at 9:59 AM, shane knapp <skn...@berkeley.edu> >>>>>>>> wrote: >>>>>>>> > ok, we're back up, system load looks cromulent and we're happily >>>>>>>> > building (again). >>>>>>>> > >>>>>>>> > shane >>>>>>>> > >>>>>>>> > On Wed, May 17, 2017 at 9:50 AM, shane knapp <skn...@berkeley.edu> >>>>>>>> > wrote: >>>>>>>> >> i'm going to need to perform a quick reboot on the jenkins master. >>>>>>>> >> it >>>>>>>> >> looks like it's hung again. >>>>>>>> >> >>>>>>>> >> sorry about this! >>>>>>>> >> >>>>>>>> >> shane >>>>>>>> >> >>>>>>>> >> On Tue, May 16, 2017 at 12:55 PM, shane knapp >>>>>>>> >> <skn...@berkeley.edu> >>>>>>>> >> wrote: >>>>>>>> >>> ...but just now i started getting alerts on system load, which >>>>>>>> >>> was >>>>>>>> >>> rather high. i had to kick jenkins again, and will keep an eye >>>>>>>> >>> on the >>>>>>>> >>> master and possible need to reboot. >>>>>>>> >>> >>>>>>>> >>> sorry about the interruption of service... >>>>>>>> >>> >>>>>>>> >>> shane >>>>>>>> >>> >>>>>>>> >>> On Tue, May 16, 2017 at 8:18 AM, shane knapp >>>>>>>> >>> <skn...@berkeley.edu> >>>>>>>> >>> wrote: >>>>>>>> >>>> ...so i kicked it and it's now back up and happily building. >>>>>>>> >>>>>>>> --------------------------------------------------------------------- >>>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>>> >>>>>>> >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> >>
--------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org