One more: https://source.cloud.google.com/results/invocations/8f6a185e-0258-4f9c-b573-e79d1a86a5a1/targets/gem5%2Fgcp_ubuntu%2Fpresubmit/log
On Thu, Aug 5, 2021 at 3:02 PM Gabe Black <gabe.bl...@gmail.com> wrote: > Yeah, I wouldn't be surprised if it was specific to kokoro, or something > about how its networking is set up. Those failures seem to have stopped > happening now, or at least are happening much less. I don't remember seeing > one for a while at least. Hopefully we can avoid making our timeout any > longer! Thanks for looking into it, Bobby. > > Gabe > > On Thu, Aug 5, 2021 at 1:16 PM Bobby Bruce <bbr...@ucdavis.edu> wrote: > >> That theory could be true, and I certainly don't have any better ideas, >> though I've never observed any hang on my local machine when recreating >> this issue. It could be something specific to Kokoro. >> >> I've fixed the gem5art error here: >> https://gem5-review.googlesource.com/c/public/gem5/+/49044. We can see >> if this fixes the timeout issue. If the timeout error persists we can >> consider increasing the timeout: >> https://gem5-review.googlesource.com/c/public/gem5/+/48443 >> >> >> -- >> Dr. Bobby R. Bruce >> Room 3050, >> Kemper Hall, UC Davis >> Davis, >> CA, 95616 >> >> web: https://www.bobbybruce.net >> >> >> On Thu, Jul 22, 2021 at 8:33 PM Gabe Black <gabe.bl...@gmail.com> wrote: >> >>> Another possibility is that while the gem5-art error may not actually >>> kill the run, it may, for instance, have failed trying to download >>> something with a generous timeout, and waiting for that timeout pushed the >>> rest of the run out enough to trip the timeout? Just a thought. I haven't >>> checked exhaustively, but it feels like the timeout always goes along with >>> the gem5-art error message. >>> >>> Gabe >>> >>> On Thu, Jul 22, 2021 at 5:27 PM Gabe Black <gabe.bl...@gmail.com> wrote: >>> >>>> Ok, thanks. I don't know if you saw the CL I put up recently where the >>>> src/base/cprintftime.cc executable (the one built from that source) was >>>> broken, which made kokoro fail. The breakage was real and worth fixing, but >>>> I'm not sure why kokoro was trying to build it in the first place? Maybe >>>> sometimes kokoro tries building things that we didn't really want it to. >>>> >>>> In my recent scons hacking, I ran into that accidentally when >>>> build/X86/${BLAHBLAH} expanded into build/X86/ because that variable didn't >>>> exist, so scons went of and started building EVERYTHING it knew about below >>>> build/X86/. Hypothetically, that could explain the long build times and the >>>> building of that random other binary? Maybe we have some sort of race >>>> condition where a target expands to an empty string? >>>> >>>> Gabe >>>> >>>> On Thu, Jul 22, 2021 at 3:04 PM Bobby Bruce <bbr...@ucdavis.edu> wrote: >>>> >>>>> Ok, so I did look into this today and didn't find anything. On my >>>>> desktop machine the difference in running the pre-submit tests from the >>>>> stable branch and develop branch (including building the binaries) was >>>>> only >>>>> 10 minutes so we've really not done anything to increase the build/test >>>>> times to a significant extent. My running theory is Kokoro was running >>>>> slower (??? I have no idea what Kokoro is actually doing or running on >>>>> behind the scenes so I don't know whether that makes sense, but I cannot >>>>> think of any other explanation). I don't like the solution, but I've >>>>> submitted a patch to increase the timeout to 7 hours which should stop >>>>> this >>>>> timeout event from happening: >>>>> https://gem5-review.googlesource.com/c/public/gem5/+/48443 >>>>> >>>>> I still haven't looked into the gem5 error yet but I'm pretty >>>>> confident this shouldn't interfere with the presubmit validation. >>>>> >>>>> -- >>>>> Dr. Bobby R. Bruce >>>>> Room 3050, >>>>> Kemper Hall, UC Davis >>>>> Davis, >>>>> CA, 95616 >>>>> >>>>> web: https://www.bobbybruce.net >>>>> >>>>> >>>>> On Wed, Jul 21, 2021 at 5:37 PM Gabe Black <gabe.bl...@gmail.com> >>>>> wrote: >>>>> >>>>>> Ok thanks, Bobby. Please let me know if you find anything, especially >>>>>> if it looks like it's a bug in kokoro itself somehow. >>>>>> >>>>>> Gabe >>>>>> >>>>>> On Wed, Jul 21, 2021 at 3:52 PM Bobby Bruce <bbr...@ucdavis.edu> >>>>>> wrote: >>>>>> >>>>>>> There's definitely something funny going on with the gem5art tests >>>>>>> there but I believe that error is happening without triggering a >>>>>>> non-zero >>>>>>> exit code. The gem5art test script is set to `set -e`, which means the >>>>>>> script should exit immediately after a failure, yet it doesn't. The >>>>>>> testing >>>>>>> also continues onto the other tests. I'll look into this. >>>>>>> >>>>>>> In the example you linked, the issue appears to be because it has >>>>>>> reached the 6 hour timeout. We could increase the timeout to fix this, >>>>>>> but >>>>>>> I'd like to know why our build/test times have increased enough to push >>>>>>> us >>>>>>> over the 6 hour line. I'll see if I can figure it out as well. >>>>>>> >>>>>>> -- >>>>>>> Dr. Bobby R. Bruce >>>>>>> Room 3050, >>>>>>> Kemper Hall, UC Davis >>>>>>> Davis, >>>>>>> CA, 95616 >>>>>>> >>>>>>> web: https://www.bobbybruce.net >>>>>>> >>>>>>> >>>>>>> On Wed, Jul 21, 2021 at 2:51 PM Gabe Black via gem5-dev < >>>>>>> gem5-dev@gem5.org> wrote: >>>>>>> >>>>>>>> I've seen many kokoro failures lately, including this one which >>>>>>>> seems to be from a problem in gem5-art? Any idea what's going on? >>>>>>>> >>>>>>>> >>>>>>>> https://source.cloud.google.com/results/invocations/caae5aad-91a6-4c6e-9fbe-20962f9c5519/targets/gem5%2Fgcp_ubuntu%2Fpresubmit/log >>>>>>>> _______________________________________________ >>>>>>>> gem5-dev mailing list -- gem5-dev@gem5.org >>>>>>>> To unsubscribe send an email to gem5-dev-le...@gem5.org >>>>>>>> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s >>>>>>> >>>>>>>
_______________________________________________ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s