yeah, i spoke too soon. jenkins is still misbehaving, but FINALLY i'm getting some error messages in the logs... looks like jenkins is thrashing on GC.
now that i know what's up, i should be able to get this sorted today. On Thu, May 18, 2017 at 12:39 AM, Sean Owen <so...@cloudera.com> wrote: > I'm not sure if it's related, but I still can't get Jenkins to test PRs. For > example, triggering it through the spark-prs.appspot.com UI gives me... > > https://spark-prs.appspot.com/trigger-jenkins/18012 > > Internal Server Error > > That might be from the appspot app though? > > But posting "Jenkins test this please" on PRs doesn't seem to work, and I > can't reach Jenkins: > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/ > > On Thu, May 18, 2017 at 12:44 AM shane knapp <skn...@berkeley.edu> wrote: >> >> after another couple of restarts due to high load and system >> unresponsiveness, i finally found what is the most likely culprit: >> >> a typo in the jenkins config where the java heap size was configured. >> instead of -Xmx16g, we had -Dmx16G... which could easily explain the >> random and non-deterministic system hangs we've had over the past >> couple of years. >> >> anyways, it's been corrected and the master seems to be humming along, >> for real this time, w/o issue. i'll continue to keep an eye on this >> for the rest of the week, but things are looking MUCH better now. >> >> sorry again for the interruptions in service. >> >> shane >> >> On Wed, May 17, 2017 at 9:59 AM, shane knapp <skn...@berkeley.edu> wrote: >> > ok, we're back up, system load looks cromulent and we're happily >> > building (again). >> > >> > shane >> > >> > On Wed, May 17, 2017 at 9:50 AM, shane knapp <skn...@berkeley.edu> >> > wrote: >> >> i'm going to need to perform a quick reboot on the jenkins master. it >> >> looks like it's hung again. >> >> >> >> sorry about this! >> >> >> >> shane >> >> >> >> On Tue, May 16, 2017 at 12:55 PM, shane knapp <skn...@berkeley.edu> >> >> wrote: >> >>> ...but just now i started getting alerts on system load, which was >> >>> rather high. i had to kick jenkins again, and will keep an eye on the >> >>> master and possible need to reboot. >> >>> >> >>> sorry about the interruption of service... >> >>> >> >>> shane >> >>> >> >>> On Tue, May 16, 2017 at 8:18 AM, shane knapp <skn...@berkeley.edu> >> >>> wrote: >> >>>> ...so i kicked it and it's now back up and happily building. >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> > --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org