One more:

https://source.cloud.google.com/results/invocations/8f6a185e-0258-4f9c-b573-e79d1a86a5a1/targets/gem5%2Fgcp_ubuntu%2Fpresubmit/log

On Thu, Aug 5, 2021 at 3:02 PM Gabe Black <gabe.bl...@gmail.com> wrote:

> Yeah, I wouldn't be surprised if it was specific to kokoro, or something
> about how its networking is set up. Those failures seem to have stopped
> happening now, or at least are happening much less. I don't remember seeing
> one for a while at least. Hopefully we can avoid making our timeout any
> longer! Thanks for looking into it, Bobby.
>
> Gabe
>
> On Thu, Aug 5, 2021 at 1:16 PM Bobby Bruce <bbr...@ucdavis.edu> wrote:
>
>> That theory could be true, and I certainly don't have any better ideas,
>> though I've never observed any hang on my local machine when recreating
>> this issue. It could be something specific to Kokoro.
>>
>> I've fixed the gem5art error here:
>> https://gem5-review.googlesource.com/c/public/gem5/+/49044. We can see
>> if this fixes the timeout issue. If the timeout error persists  we can
>> consider increasing the timeout:
>> https://gem5-review.googlesource.com/c/public/gem5/+/48443
>>
>>
>> --
>> Dr. Bobby R. Bruce
>> Room 3050,
>> Kemper Hall, UC Davis
>> Davis,
>> CA, 95616
>>
>> web: https://www.bobbybruce.net
>>
>>
>> On Thu, Jul 22, 2021 at 8:33 PM Gabe Black <gabe.bl...@gmail.com> wrote:
>>
>>> Another possibility is that while the gem5-art error may not actually
>>> kill the run, it may, for instance, have failed trying to download
>>> something with a generous timeout, and waiting for that timeout pushed the
>>> rest of the run out enough to trip the timeout? Just a thought. I haven't
>>> checked exhaustively, but it feels like the timeout always goes along with
>>> the gem5-art error message.
>>>
>>> Gabe
>>>
>>> On Thu, Jul 22, 2021 at 5:27 PM Gabe Black <gabe.bl...@gmail.com> wrote:
>>>
>>>> Ok, thanks. I don't know if you saw the CL I put up recently where the
>>>> src/base/cprintftime.cc executable (the one built from that source) was
>>>> broken, which made kokoro fail. The breakage was real and worth fixing, but
>>>> I'm not sure why kokoro was trying to build it in the first place? Maybe
>>>> sometimes kokoro tries building things that we didn't really want it to.
>>>>
>>>> In my recent scons hacking, I ran into that accidentally when
>>>> build/X86/${BLAHBLAH} expanded into build/X86/ because that variable didn't
>>>> exist, so scons went of and started building EVERYTHING it knew about below
>>>> build/X86/. Hypothetically, that could explain the long build times and the
>>>> building of that random other binary? Maybe we have some sort of race
>>>> condition where a target expands to an empty string?
>>>>
>>>> Gabe
>>>>
>>>> On Thu, Jul 22, 2021 at 3:04 PM Bobby Bruce <bbr...@ucdavis.edu> wrote:
>>>>
>>>>> Ok, so I did look into this today and didn't find anything. On my
>>>>> desktop machine the difference in running the pre-submit tests from the
>>>>> stable branch and develop branch (including building the binaries) was 
>>>>> only
>>>>> 10 minutes so we've really not done anything to increase the build/test
>>>>> times to a significant extent. My running theory is Kokoro was running
>>>>> slower (??? I have no idea what Kokoro is actually doing or running on
>>>>> behind the scenes so I don't know whether that makes sense, but I cannot
>>>>> think of any other explanation). I don't like the solution, but I've
>>>>> submitted a patch to increase the timeout to 7 hours which should stop 
>>>>> this
>>>>> timeout event from happening:
>>>>> https://gem5-review.googlesource.com/c/public/gem5/+/48443
>>>>>
>>>>> I still haven't looked into the gem5 error yet but I'm pretty
>>>>> confident this shouldn't interfere with the presubmit validation.
>>>>>
>>>>> --
>>>>> Dr. Bobby R. Bruce
>>>>> Room 3050,
>>>>> Kemper Hall, UC Davis
>>>>> Davis,
>>>>> CA, 95616
>>>>>
>>>>> web: https://www.bobbybruce.net
>>>>>
>>>>>
>>>>> On Wed, Jul 21, 2021 at 5:37 PM Gabe Black <gabe.bl...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Ok thanks, Bobby. Please let me know if you find anything, especially
>>>>>> if it looks like it's a bug in kokoro itself somehow.
>>>>>>
>>>>>> Gabe
>>>>>>
>>>>>> On Wed, Jul 21, 2021 at 3:52 PM Bobby Bruce <bbr...@ucdavis.edu>
>>>>>> wrote:
>>>>>>
>>>>>>> There's definitely something funny going on with the gem5art tests
>>>>>>> there but I believe that error is happening without triggering a 
>>>>>>> non-zero
>>>>>>> exit code. The gem5art test script is set to `set -e`, which means the
>>>>>>> script should exit immediately after a failure, yet it doesn't. The 
>>>>>>> testing
>>>>>>> also continues onto the other tests. I'll look into this.
>>>>>>>
>>>>>>> In the example you linked, the issue appears to be because it has
>>>>>>> reached the 6 hour timeout. We could increase the timeout to fix this, 
>>>>>>> but
>>>>>>> I'd like to know why our build/test times have increased enough to push 
>>>>>>> us
>>>>>>> over the 6 hour line.  I'll see if I can figure it out as well.
>>>>>>>
>>>>>>> --
>>>>>>> Dr. Bobby R. Bruce
>>>>>>> Room 3050,
>>>>>>> Kemper Hall, UC Davis
>>>>>>> Davis,
>>>>>>> CA, 95616
>>>>>>>
>>>>>>> web: https://www.bobbybruce.net
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 21, 2021 at 2:51 PM Gabe Black via gem5-dev <
>>>>>>> gem5-dev@gem5.org> wrote:
>>>>>>>
>>>>>>>> I've seen many kokoro failures lately, including this one which
>>>>>>>> seems to be from a problem in gem5-art? Any idea what's going on?
>>>>>>>>
>>>>>>>>
>>>>>>>> https://source.cloud.google.com/results/invocations/caae5aad-91a6-4c6e-9fbe-20962f9c5519/targets/gem5%2Fgcp_ubuntu%2Fpresubmit/log
>>>>>>>> _______________________________________________
>>>>>>>> gem5-dev mailing list -- gem5-dev@gem5.org
>>>>>>>> To unsubscribe send an email to gem5-dev-le...@gem5.org
>>>>>>>> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
>>>>>>>
>>>>>>>
_______________________________________________
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Reply via email to