This seems to have fixed the problem, please let me know if you see any
further issues.

On Tue, Apr 11, 2023 at 3:51 PM Danny McCormick <dannymccorm...@google.com>
wrote:

> I went ahead and made the limit 40 runs on the following jobs (PR
> <https://github.com/apache/beam/pull/26224>):
>
> beam_PostCommit_Go_VR_Flink
> beam_PostCommit_Java_Nexmark_Flink
> beam_PostCommit_Python_Examples_Flink
> beam_PreCommit_Java_*
> beam_PreCommit_Python_*
> beam_PreCommit_SQL_*
>
> It doesn't quite stick to my proposed 5.0 GB limit, but all of these are
> >2.5GB.
>
> I'm not sure how long it will take for this to take effect (my guess is it
> will happen lazily as jobs are run).
>
> Thanks,
> Danny
>
> On Tue, Apr 11, 2023 at 11:49 AM Danny McCormick <
> dannymccorm...@google.com> wrote:
>
>> > Regarding the "(and not guaranteed to work)" part, is the resolution
>> that the memory issues may still persist and we restore the normal
>> retention limit (and we look for another fix), or that we never restore
>> back to the normal retention limit?
>>
>> Mostly, I'm just not 100% certain that this is the only source of disk
>> space pressure. I think it should work, but I have no way of testing that
>> hypothesis (other than doing it).
>>
>> > Also, considering the number of flaky tests in general [1], code
>> coverage might not be the pressing issue. Should it be disabled everywhere
>> in favor of more reliable / faster builds? Unless Devs here are willing to
>> commit on taking actions, it doesn’t seem to provide too much value
>> recording these numbers as part of the normal pre commit jobs?
>>
>> I think most flakes are unrelated to this issue, so I don't think
>> removing code coverage is going to solve our problems here. If we need to
>> remove all code coverage to fix the issues we're currently experiencing,
>> then I think that is definitely worth it (at least until we can find a
>> better way to do coverage). But I'm not sure if that will be necessary yet.
>>
>> > Is there a technical reason we can't migrate Java code coverage over
>> to the Codecov tool/Actions like we have with Go and Python?
>>
>> I have no context on this and will defer to others.
>>
>> On Tue, Apr 11, 2023 at 11:27 AM Jack McCluskey via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Is there a technical reason we can't migrate Java code coverage over to
>>> the Codecov tool/Actions like we have with Go and Python?
>>>
>>> On Tue, Apr 11, 2023 at 11:25 AM Moritz Mack <mm...@talend.com> wrote:
>>>
>>>> Yes, sorry Robert for being so unspecific. With everywhere I meant Java
>>>> only, my bad!
>>>>
>>>>
>>>>
>>>> On 11.04.23, 17:17, "Robert Burke" <rob...@frantil.com> wrote:
>>>>
>>>>
>>>>
>>>> The coverage issue is only with the Java builds in specific. Go abd
>>>> Python have their coverage numbers codecov uploads done in GitHub Actions
>>>> instead. On Tue, Apr 11, 2023, 8: 14 AM Moritz Mack <mmack@ talend.
>>>> com> wrote: Thanks so much
>>>>
>>>> The coverage issue is only with the Java builds in specific.
>>>>
>>>>
>>>>
>>>> Go abd Python have their coverage numbers codecov uploads done in
>>>> GitHub Actions instead.
>>>>
>>>>
>>>>
>>>> On Tue, Apr 11, 2023, 8:14 AM Moritz Mack <mm...@talend.com> wrote:
>>>>
>>>> Thanks so much for looking into this!
>>>>
>>>> I’m absolutely +1 for removing Jenkins related friction and the
>>>> proposed changes sound legitimate.
>>>>
>>>>
>>>>
>>>> Also, considering the number of flaky tests in general [1], code
>>>> coverage might not be the pressing issue. Should it be disabled everywhere
>>>> in favor of more reliable / faster builds? Unless Devs here are willing to
>>>> commit on taking actions, it doesn’t seem to provide too much value
>>>> recording these numbers as part of the normal pre commit jobs?
>>>>
>>>>
>>>>
>>>> Kind regards,
>>>>
>>>> Moritz
>>>>
>>>>
>>>>
>>>> [1]
>>>> https://github.com/apache/beam/issues?q=is%3Aissue+is%3Aopen+label%3Aflake
>>>> <https://urldefense.com/v3/__https:/github.com/apache/beam/issues?q=is*3Aissue*is*3Aopen*label*3Aflake__;JSslKyU!!CiXD_PY!RkH_w2sEQbwhDcEU-WRkzIjHDKJthN2D60BB0rHBqsKCU1xJyZrnMH2LbYXQMfExzOwHEPuPCtJw$>
>>>>
>>>>
>>>>
>>>> On 11.04.23, 16:24, "Danny McCormick via dev" <dev@beam.apache.org>
>>>> wrote:
>>>>
>>>>
>>>>
>>>> ;tldr - I want to temporarily reduce the number of builds that we
>>>> retain to reduce pressure on Jenkins Hey everyone, over the past few days
>>>> our Jenkins runs have been particularly flaky across the board, with errors
>>>> like the following showing
>>>>
>>>> *;tldr - I want to temporarily reduce the number of builds that we
>>>> retain to reduce pressure on Jenkins*
>>>>
>>>>
>>>>
>>>> Hey everyone, over the past few days our Jenkins runs have been
>>>> particularly flaky across the board, with errors like the following showing
>>>> up all over the place [1]:
>>>>
>>>>
>>>>
>>>> java.nio.file.FileSystemException: 
>>>> /home/jenkins/jenkins-home/jobs/beam_PreCommit_Python_Phrase/builds/3352/changelog.xml:
>>>>  No space left on device [2]
>>>>
>>>>
>>>>
>>>> These errors indicate that we're out of space on the Jenkins master
>>>> node. After some digging (thanks @Yi Hu <ya...@google.com> @Ahmet Altay
>>>> <al...@google.com> and @Bruno Volpato <bvolp...@google.com> for
>>>> contributing), we've determined that at least one large contributing issue
>>>> is that some of our builds are eating up too much space. For example, our
>>>> beam_PreCommit_Java_Commit build is taking up 28GB of space by itself (this
>>>> is just one example).
>>>>
>>>>
>>>>
>>>> @Yi Hu <ya...@google.com> found one change around code coverage that
>>>> is likely heavily contributing to the problem and rolled that back [3]. We
>>>> can continue to find other contributing factors here.
>>>>
>>>>
>>>>
>>>> In the meantime, to get us back to healthy *I propose that we reduce
>>>> the number of builds that we are retaining to 40 for all jobs that are
>>>> using a large amount of storage (>5GB)*. This will hopefully allow us
>>>> to return Jenkins to a normal functioning state, though it will do so at
>>>> the cost of a significant amount of build history (right now, for example,
>>>> beam_PreCommit_Java_Commit is at 400 retained builds). We could restore the
>>>> normal retention limit once the underlying problem is resolved. Given that
>>>> this is irreversible (and not guaranteed to work), I wanted to gather
>>>> feedback before doing this. Personally, I rarely use builds that old, but
>>>> others may feel differently.
>>>>
>>>>
>>>>
>>>> Please let me know if you have any objections or support for this
>>>> proposal.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Danny
>>>>
>>>>
>>>>
>>>> [1] Tracking issue: https://github.com/apache/beam/issues/26197
>>>> <https://urldefense.com/v3/__https:/github.com/apache/beam/issues/26197__;!!CiXD_PY!T2IvNgoS_ZY4KOTpgTeZ000JosJwCZPhtfAsW6XuJwTrlb8ok4GFLYwv5Yp30wxLqKghq03f6F2F$>
>>>>
>>>> [2] Example run with this error:
>>>> https://ci-beam.apache.org/job/beam_PreCommit_Python_Phrase/3352/console
>>>> <https://urldefense.com/v3/__https:/ci-beam.apache.org/job/beam_PreCommit_Python_Phrase/3352/console__;!!CiXD_PY!T2IvNgoS_ZY4KOTpgTeZ000JosJwCZPhtfAsW6XuJwTrlb8ok4GFLYwv5Yp30wxLqKghqwigO9QA$>
>>>>
>>>> [3] Rollback PR: https://github.com/apache/beam/pull/26199
>>>> <https://urldefense.com/v3/__https:/github.com/apache/beam/pull/26199__;!!CiXD_PY!T2IvNgoS_ZY4KOTpgTeZ000JosJwCZPhtfAsW6XuJwTrlb8ok4GFLYwv5Yp30wxLqKghq5Cv-ikZ$>
>>>>
>>>> *As a recipient of an email from the Talend Group, your personal data
>>>> will be processed by our systems. Please see our Privacy Notice
>>>> <https://www.talend.com/privacy-policy/> for more information about our
>>>> collection and use of your personal information, our security practices,
>>>> and your data protection rights, including any rights you may have to
>>>> object to automated-decision making or profiling we use to analyze support
>>>> or marketing related communications. To manage or discontinue promotional
>>>> communications, use the communication preferences portal
>>>> <https://info.talend.com/emailpreferencesen.html>. To exercise your data
>>>> protection rights, use the privacy request form
>>>> <https://urldefense.com/v3/__https:/talend.my.onetrust.com/webform/ef906c5a-de41-4ea0-ba73-96c079cdd15a/b191c71d-f3cb-4a42-9815-0c3ca021704cl__;!!CiXD_PY!RkH_w2sEQbwhDcEU-WRkzIjHDKJthN2D60BB0rHBqsKCU1xJyZrnMH2LbYXQMfExzOwHEBi0T6bp$>.
>>>> Contact us here <https://www.talend.com/contact/>or by mail to either of
>>>> our co-headquarters: Talend, Inc.: 400 South El Camino Real, Ste 1400, San
>>>> Mateo, CA 94402; Talend SAS: 5/7 rue Salomon De Rothschild, 92150 Suresnes,
>>>> France *
>>>>
>>>> *As a recipient of an email from the Talend Group, your personal data
>>>> will be processed by our systems. Please see our Privacy Notice
>>>> <https://www.talend.com/privacy-policy/>*for more information about
>>>> our collection and use of your personal information, our security
>>>> practices, and your data protection rights, including any rights you may
>>>> have to object to automated-decision making or profiling we use to analyze
>>>> support or marketing related communications. To manage or discontinue
>>>> promotional communications, use the communication preferences portal
>>>> <https://info.talend.com/emailpreferencesen.html>. To exercise your
>>>> data protection rights, use the privacy request form
>>>> <https://talend.my.onetrust.com/webform/ef906c5a-de41-4ea0-ba73-96c079cdd15a/b191c71d-f3cb-4a42-9815-0c3ca021704cl>.
>>>> Contact us here <https://www.talend.com/contact/>or by mail to either
>>>> of our co-headquarters: Talend, Inc.: 400 South El Camino Real, Ste 1400,
>>>> San Mateo, CA 94402; Talend SAS: 5/7 rue Salomon De Rothschild, 92150
>>>> Suresnes, France
>>>>
>>>

Reply via email to