This seems to have fixed the problem, please let me know if you see any further issues.
On Tue, Apr 11, 2023 at 3:51 PM Danny McCormick <dannymccorm...@google.com> wrote: > I went ahead and made the limit 40 runs on the following jobs (PR > <https://github.com/apache/beam/pull/26224>): > > beam_PostCommit_Go_VR_Flink > beam_PostCommit_Java_Nexmark_Flink > beam_PostCommit_Python_Examples_Flink > beam_PreCommit_Java_* > beam_PreCommit_Python_* > beam_PreCommit_SQL_* > > It doesn't quite stick to my proposed 5.0 GB limit, but all of these are > >2.5GB. > > I'm not sure how long it will take for this to take effect (my guess is it > will happen lazily as jobs are run). > > Thanks, > Danny > > On Tue, Apr 11, 2023 at 11:49 AM Danny McCormick < > dannymccorm...@google.com> wrote: > >> > Regarding the "(and not guaranteed to work)" part, is the resolution >> that the memory issues may still persist and we restore the normal >> retention limit (and we look for another fix), or that we never restore >> back to the normal retention limit? >> >> Mostly, I'm just not 100% certain that this is the only source of disk >> space pressure. I think it should work, but I have no way of testing that >> hypothesis (other than doing it). >> >> > Also, considering the number of flaky tests in general [1], code >> coverage might not be the pressing issue. Should it be disabled everywhere >> in favor of more reliable / faster builds? Unless Devs here are willing to >> commit on taking actions, it doesn’t seem to provide too much value >> recording these numbers as part of the normal pre commit jobs? >> >> I think most flakes are unrelated to this issue, so I don't think >> removing code coverage is going to solve our problems here. If we need to >> remove all code coverage to fix the issues we're currently experiencing, >> then I think that is definitely worth it (at least until we can find a >> better way to do coverage). But I'm not sure if that will be necessary yet. >> >> > Is there a technical reason we can't migrate Java code coverage over >> to the Codecov tool/Actions like we have with Go and Python? >> >> I have no context on this and will defer to others. >> >> On Tue, Apr 11, 2023 at 11:27 AM Jack McCluskey via dev < >> dev@beam.apache.org> wrote: >> >>> Is there a technical reason we can't migrate Java code coverage over to >>> the Codecov tool/Actions like we have with Go and Python? >>> >>> On Tue, Apr 11, 2023 at 11:25 AM Moritz Mack <mm...@talend.com> wrote: >>> >>>> Yes, sorry Robert for being so unspecific. With everywhere I meant Java >>>> only, my bad! >>>> >>>> >>>> >>>> On 11.04.23, 17:17, "Robert Burke" <rob...@frantil.com> wrote: >>>> >>>> >>>> >>>> The coverage issue is only with the Java builds in specific. Go abd >>>> Python have their coverage numbers codecov uploads done in GitHub Actions >>>> instead. On Tue, Apr 11, 2023, 8: 14 AM Moritz Mack <mmack@ talend. >>>> com> wrote: Thanks so much >>>> >>>> The coverage issue is only with the Java builds in specific. >>>> >>>> >>>> >>>> Go abd Python have their coverage numbers codecov uploads done in >>>> GitHub Actions instead. >>>> >>>> >>>> >>>> On Tue, Apr 11, 2023, 8:14 AM Moritz Mack <mm...@talend.com> wrote: >>>> >>>> Thanks so much for looking into this! >>>> >>>> I’m absolutely +1 for removing Jenkins related friction and the >>>> proposed changes sound legitimate. >>>> >>>> >>>> >>>> Also, considering the number of flaky tests in general [1], code >>>> coverage might not be the pressing issue. Should it be disabled everywhere >>>> in favor of more reliable / faster builds? Unless Devs here are willing to >>>> commit on taking actions, it doesn’t seem to provide too much value >>>> recording these numbers as part of the normal pre commit jobs? >>>> >>>> >>>> >>>> Kind regards, >>>> >>>> Moritz >>>> >>>> >>>> >>>> [1] >>>> https://github.com/apache/beam/issues?q=is%3Aissue+is%3Aopen+label%3Aflake >>>> <https://urldefense.com/v3/__https:/github.com/apache/beam/issues?q=is*3Aissue*is*3Aopen*label*3Aflake__;JSslKyU!!CiXD_PY!RkH_w2sEQbwhDcEU-WRkzIjHDKJthN2D60BB0rHBqsKCU1xJyZrnMH2LbYXQMfExzOwHEPuPCtJw$> >>>> >>>> >>>> >>>> On 11.04.23, 16:24, "Danny McCormick via dev" <dev@beam.apache.org> >>>> wrote: >>>> >>>> >>>> >>>> ;tldr - I want to temporarily reduce the number of builds that we >>>> retain to reduce pressure on Jenkins Hey everyone, over the past few days >>>> our Jenkins runs have been particularly flaky across the board, with errors >>>> like the following showing >>>> >>>> *;tldr - I want to temporarily reduce the number of builds that we >>>> retain to reduce pressure on Jenkins* >>>> >>>> >>>> >>>> Hey everyone, over the past few days our Jenkins runs have been >>>> particularly flaky across the board, with errors like the following showing >>>> up all over the place [1]: >>>> >>>> >>>> >>>> java.nio.file.FileSystemException: >>>> /home/jenkins/jenkins-home/jobs/beam_PreCommit_Python_Phrase/builds/3352/changelog.xml: >>>> No space left on device [2] >>>> >>>> >>>> >>>> These errors indicate that we're out of space on the Jenkins master >>>> node. After some digging (thanks @Yi Hu <ya...@google.com> @Ahmet Altay >>>> <al...@google.com> and @Bruno Volpato <bvolp...@google.com> for >>>> contributing), we've determined that at least one large contributing issue >>>> is that some of our builds are eating up too much space. For example, our >>>> beam_PreCommit_Java_Commit build is taking up 28GB of space by itself (this >>>> is just one example). >>>> >>>> >>>> >>>> @Yi Hu <ya...@google.com> found one change around code coverage that >>>> is likely heavily contributing to the problem and rolled that back [3]. We >>>> can continue to find other contributing factors here. >>>> >>>> >>>> >>>> In the meantime, to get us back to healthy *I propose that we reduce >>>> the number of builds that we are retaining to 40 for all jobs that are >>>> using a large amount of storage (>5GB)*. This will hopefully allow us >>>> to return Jenkins to a normal functioning state, though it will do so at >>>> the cost of a significant amount of build history (right now, for example, >>>> beam_PreCommit_Java_Commit is at 400 retained builds). We could restore the >>>> normal retention limit once the underlying problem is resolved. Given that >>>> this is irreversible (and not guaranteed to work), I wanted to gather >>>> feedback before doing this. Personally, I rarely use builds that old, but >>>> others may feel differently. >>>> >>>> >>>> >>>> Please let me know if you have any objections or support for this >>>> proposal. >>>> >>>> >>>> >>>> Thanks, >>>> >>>> Danny >>>> >>>> >>>> >>>> [1] Tracking issue: https://github.com/apache/beam/issues/26197 >>>> <https://urldefense.com/v3/__https:/github.com/apache/beam/issues/26197__;!!CiXD_PY!T2IvNgoS_ZY4KOTpgTeZ000JosJwCZPhtfAsW6XuJwTrlb8ok4GFLYwv5Yp30wxLqKghq03f6F2F$> >>>> >>>> [2] Example run with this error: >>>> https://ci-beam.apache.org/job/beam_PreCommit_Python_Phrase/3352/console >>>> <https://urldefense.com/v3/__https:/ci-beam.apache.org/job/beam_PreCommit_Python_Phrase/3352/console__;!!CiXD_PY!T2IvNgoS_ZY4KOTpgTeZ000JosJwCZPhtfAsW6XuJwTrlb8ok4GFLYwv5Yp30wxLqKghqwigO9QA$> >>>> >>>> [3] Rollback PR: https://github.com/apache/beam/pull/26199 >>>> <https://urldefense.com/v3/__https:/github.com/apache/beam/pull/26199__;!!CiXD_PY!T2IvNgoS_ZY4KOTpgTeZ000JosJwCZPhtfAsW6XuJwTrlb8ok4GFLYwv5Yp30wxLqKghq5Cv-ikZ$> >>>> >>>> *As a recipient of an email from the Talend Group, your personal data >>>> will be processed by our systems. Please see our Privacy Notice >>>> <https://www.talend.com/privacy-policy/> for more information about our >>>> collection and use of your personal information, our security practices, >>>> and your data protection rights, including any rights you may have to >>>> object to automated-decision making or profiling we use to analyze support >>>> or marketing related communications. To manage or discontinue promotional >>>> communications, use the communication preferences portal >>>> <https://info.talend.com/emailpreferencesen.html>. To exercise your data >>>> protection rights, use the privacy request form >>>> <https://urldefense.com/v3/__https:/talend.my.onetrust.com/webform/ef906c5a-de41-4ea0-ba73-96c079cdd15a/b191c71d-f3cb-4a42-9815-0c3ca021704cl__;!!CiXD_PY!RkH_w2sEQbwhDcEU-WRkzIjHDKJthN2D60BB0rHBqsKCU1xJyZrnMH2LbYXQMfExzOwHEBi0T6bp$>. >>>> Contact us here <https://www.talend.com/contact/>or by mail to either of >>>> our co-headquarters: Talend, Inc.: 400 South El Camino Real, Ste 1400, San >>>> Mateo, CA 94402; Talend SAS: 5/7 rue Salomon De Rothschild, 92150 Suresnes, >>>> France * >>>> >>>> *As a recipient of an email from the Talend Group, your personal data >>>> will be processed by our systems. Please see our Privacy Notice >>>> <https://www.talend.com/privacy-policy/>*for more information about >>>> our collection and use of your personal information, our security >>>> practices, and your data protection rights, including any rights you may >>>> have to object to automated-decision making or profiling we use to analyze >>>> support or marketing related communications. To manage or discontinue >>>> promotional communications, use the communication preferences portal >>>> <https://info.talend.com/emailpreferencesen.html>. To exercise your >>>> data protection rights, use the privacy request form >>>> <https://talend.my.onetrust.com/webform/ef906c5a-de41-4ea0-ba73-96c079cdd15a/b191c71d-f3cb-4a42-9815-0c3ca021704cl>. >>>> Contact us here <https://www.talend.com/contact/>or by mail to either >>>> of our co-headquarters: Talend, Inc.: 400 South El Camino Real, Ste 1400, >>>> San Mateo, CA 94402; Talend SAS: 5/7 rue Salomon De Rothschild, 92150 >>>> Suresnes, France >>>> >>>