Yeah, that's what I figured -- those workers are under load. Thanks.

On Fri, Jul 10, 2020 at 12:43 PM shane knapp ☠ <skn...@berkeley.edu> wrote:

> only 125561, 125562 and 125564 were impacted by -9.
>
> 125565 exited w/a code of 15 (143 - 128), which means the process was
> terminated for unknown reasons.
>
> 125563 looks like mima failed due to a bunch of errors.
>
> i just spot checked a bunch of recent failed PRB builds from today and
> they all seemed to be legit.
>
> another thing that might be happening is an overload of PRB builds on the
> workers due to the backlog...  the workers are under a LOT of load right
> now, and i can put some rate limiting in to see if that helps out.
>
> shane
>
> On Fri, Jul 10, 2020 at 11:31 AM Frank Yin <ukby.1...@gmail.com> wrote:
>
>> Like from build number 125565 to 125561, all impacted by kill -9.
>>
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125565/console
>>
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125564/console
>>
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125563/console
>>
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125562/console
>>
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125561/console
>>
>> On Fri, Jul 10, 2020 at 9:35 AM shane knapp ☠ <skn...@berkeley.edu>
>> wrote:
>>
>>> define "a lot" and provide some links to those builds, please.  there
>>> are roughly 2000 builds per day, and i can't do more than keep a cursory
>>> eye on things.
>>>
>>> the infrastructure that the tests run on hasn't changed one bit on any
>>> of the workers, and 'kill -9' could be a timeout, flakiness caused by old
>>> build processes remaining on the workers after the master went down, or me
>>> trying to clean things up w/o a reboot.  or, perhaps, something wrong w/the
>>> infra.  :)
>>>
>>> On Fri, Jul 10, 2020 at 9:28 AM Frank Yin <ukby.1...@gmail.com> wrote:
>>>
>>>> Agree, but I’ve seen a lot of kill by signal 9, assuming that
>>>> infrastructure?
>>>>
>>>> On Fri, Jul 10, 2020 at 8:19 AM shane knapp ☠ <skn...@berkeley.edu>
>>>> wrote:
>>>>
>>>>> yeah, i can't do much for flaky tests...  just flaky infrastructure.
>>>>>
>>>>>
>>>>> On Fri, Jul 10, 2020 at 12:41 AM Hyukjin Kwon <gurwls...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Couple of flaky tests can happen. It's usual. Seems it got better now
>>>>>> at least. I will keep monitoring the builds.
>>>>>>
>>>>>> 2020년 7월 10일 (금) 오후 4:33, ukby1234 <ukby.1...@gmail.com>님이 작성:
>>>>>>
>>>>>>> Looks like Jenkins isn't stable still. My PR fails two times in a
>>>>>>> row:
>>>>>>>
>>>>>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125565/console
>>>>>>>
>>>>>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125536/testReport
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Sent from:
>>>>>>> http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>>>
>>>>>>>
>>>>>
>>>>> --
>>>>> Shane Knapp
>>>>> Computer Guy / Voice of Reason
>>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>> https://rise.cs.berkeley.edu
>>>>>
>>>>
>>>
>>> --
>>> Shane Knapp
>>> Computer Guy / Voice of Reason
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>>
>>
>
> --
> Shane Knapp
> Computer Guy / Voice of Reason
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>

Reply via email to