yeah, i spoke too soon.  jenkins is still misbehaving, but FINALLY i'm
getting some error messages in the logs...   looks like jenkins is
thrashing on GC.

now that i know what's up, i should be able to get this sorted today.

On Thu, May 18, 2017 at 12:39 AM, Sean Owen <so...@cloudera.com> wrote:
> I'm not sure if it's related, but I still can't get Jenkins to test PRs. For
> example, triggering it through the spark-prs.appspot.com UI gives me...
>
> https://spark-prs.appspot.com/trigger-jenkins/18012
>
> Internal Server Error
>
> That might be from the appspot app though?
>
> But posting "Jenkins test this please" on PRs doesn't seem to work, and I
> can't reach Jenkins:
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/
>
> On Thu, May 18, 2017 at 12:44 AM shane knapp <skn...@berkeley.edu> wrote:
>>
>> after another couple of restarts due to high load and system
>> unresponsiveness, i finally found what is the most likely culprit:
>>
>> a typo in the jenkins config where the java heap size was configured.
>> instead of -Xmx16g, we had -Dmx16G...  which could easily explain the
>> random and non-deterministic system hangs we've had over the past
>> couple of years.
>>
>> anyways, it's been corrected and the master seems to be humming along,
>> for real this time, w/o issue.  i'll continue to keep an eye on this
>> for the rest of the week, but things are looking MUCH better now.
>>
>> sorry again for the interruptions in service.
>>
>> shane
>>
>> On Wed, May 17, 2017 at 9:59 AM, shane knapp <skn...@berkeley.edu> wrote:
>> > ok, we're back up, system load looks cromulent and we're happily
>> > building (again).
>> >
>> > shane
>> >
>> > On Wed, May 17, 2017 at 9:50 AM, shane knapp <skn...@berkeley.edu>
>> > wrote:
>> >> i'm going to need to perform a quick reboot on the jenkins master.  it
>> >> looks like it's hung again.
>> >>
>> >> sorry about this!
>> >>
>> >> shane
>> >>
>> >> On Tue, May 16, 2017 at 12:55 PM, shane knapp <skn...@berkeley.edu>
>> >> wrote:
>> >>> ...but just now i started getting alerts on system load, which was
>> >>> rather high.  i had to kick jenkins again, and will keep an eye on the
>> >>> master and possible need to reboot.
>> >>>
>> >>> sorry about the interruption of service...
>> >>>
>> >>> shane
>> >>>
>> >>> On Tue, May 16, 2017 at 8:18 AM, shane knapp <skn...@berkeley.edu>
>> >>> wrote:
>> >>>> ...so i kicked it and it's now back up and happily building.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to