Hi all, Following back up on this thread, some tips on validating RCs are now documented [1]. Please do add any instructions, especially for more SDK/Runner specific combos.
I'll take a closer look now into the automation discussed above on this thread. Thanks, Svetak [1] https://github.com/apache/beam/pull/29595 On Wed, Oct 25, 2023 at 9:52 AM Danny McCormick via dev <dev@beam.apache.org> wrote: > > One easy and standard way to make it more resilient would be to make it > idempotent instead of counting on uptime or receiving any particular event. > > Yep, agreed that this wouldn't be super hard if someone wants to take it > on. Basically it would just be updating the tool to run on a schedule and > look for issues that have been closed as completed in the last N days (more > or less this query - > https://github.com/apache/beam/issues?q=is%3Aissue+is%3Aclosed+reason%3Acompleted+created%3A%3E2023-01-01+). > I have seen some milestones intentionally removed from issues after the bot > adds them (probably because it's non-obvious that you can mark an issue as > not planned instead), so we'd probably want to account for that and no-op > if a milestone was removed post-close. > > One downside of this approach is that you significantly increase the > chances of an issue getting misassigned to the wrong milestone if it comes > in around the cut; you'd need to either account for this by checking out > the repo to get the version at the time the issue was closed > (expensive/non-trivial) or live with this downside. It's probably an ok > downside to live with. > > You could also do a hybrid approach where you run on issue close and run a > scheduled or manual pre-release step to clean up any stragglers. This would > be the most robust option. > > On Wed, Oct 25, 2023 at 7:43 AM Kenneth Knowles <k...@apache.org> wrote: > >> Agree. As long as we are getting enough of them, then our records as well >> as any automation depending on it are fine. One easy and standard way to >> make it more resilient would be to make it idempotent instead of counting >> on uptime or receiving any particular event. >> >> Kenn >> >> On Tue, Oct 24, 2023 at 2:58 PM Danny McCormick < >> dannymccorm...@google.com> wrote: >> >>> Looks like for some reason the workflow didn't trigger. This is running >>> on GitHub's hosted runners, so my best guess is an outage. >>> >>> Looking at a more refined query, this year there have been 14 issues >>> that were missed by the automation (3 had their milestone manually removed) >>> - >>> https://github.com/apache/beam/issues?q=is%3Aissue+no%3Amilestone+is%3Aclosed+reason%3Acompleted+created%3A%3E2023-01-01 >>> out >>> of 605 total - >>> https://github.com/apache/beam/issues?q=is%3Aissue+is%3Aclosed+reason%3Acompleted+created%3A%3E2023-01-01+ >>> - >>> as best I can tell there were a small number of workflow flakes and then >>> GHA didn't correctly trigger a few. >>> >>> If we wanted, we could set up some recurring automation to go through >>> and try to pick up the ones without milestones (or modify our existing >>> automation to be more tolerant to failures), but it doesn't seem super >>> urgent to me (feel free to disagree). I don't think this piece needs to be >>> perfect. >>> >>> On Tue, Oct 24, 2023 at 2:40 PM Kenneth Knowles <k...@apache.org> wrote: >>> >>>> Just grabbing one at random for an example, >>>> https://github.com/apache/beam/issues/28635 seems like it was closed >>>> as completed but not tagged. >>>> >>>> I'm happy to see that the bot reads the version from the repo to find >>>> the appropriate milestone, rather than using the nearest open one. Just >>>> recording that for the thread since I first read the description as the >>>> latter. >>>> >>>> Kenn >>>> >>>> On Tue, Oct 24, 2023 at 2:34 PM Danny McCormick via dev < >>>> dev@beam.apache.org> wrote: >>>> >>>>> We do tag issues to milestones when the issue is marked as "completed" >>>>> (as opposed to "not planned") - >>>>> https://github.com/apache/beam/blob/master/.github/workflows/assign_milestone.yml. >>>>> So I think using issues is probably about as accurate as using commits. >>>>> >>>>> > It looks like we have 820 with no milestone >>>>> https://github.com/apache/beam/issues?q=is%3Aissue+no%3Amilestone+is%3Aclosed >>>>> >>>>> Most predate the automation, though maybe not all? Some of those may >>>>> have been closed as "not planned". >>>>> >>>>> > This could (should) be automatically discoverable. A (closed) issues >>>>> is associated with commits which are associated with a release. >>>>> >>>>> Today, we just tag issues to the upcoming milestone when they're >>>>> closed. In theory you could do something more sophisticated using linked >>>>> commits, but in practice people aren't clean enough about linking commits >>>>> to issues. Again, this is fixable by automation/enforcement, but I don't >>>>> think it actually gives us much value beyond what we have today. >>>>> >>>>> On Tue, Oct 24, 2023 at 1:54 PM Robert Bradshaw via dev < >>>>> dev@beam.apache.org> wrote: >>>>> >>>>>> On Tue, Oct 24, 2023 at 10:35 AM Kenneth Knowles <k...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> Tangentially related: >>>>>>> >>>>>>> Long ago, attaching an issue to a release was a mandatory step as >>>>>>> part of closing. Now I think it is not. Is it automatically happening? >>>>>>> It >>>>>>> looks like we have 820 with no milestone >>>>>>> https://github.com/apache/beam/issues?q=is%3Aissue+no%3Amilestone+is%3Aclosed >>>>>>> >>>>>> >>>>>> This could (should) be automatically discoverable. A (closed) issues >>>>>> is associated with commits which are associated with a release. >>>>>> >>>>>> >>>>>>> On Tue, Oct 24, 2023 at 1:25 PM Chamikara Jayalath via dev < >>>>>>> dev@beam.apache.org> wrote: >>>>>>> >>>>>>>> +1 for going by the commits since this is what matters at the end >>>>>>>> of the day. Also, many issues may not get tagged correctly for a given >>>>>>>> release due to either the contributor not tagging the issue or due to >>>>>>>> commits for the issue spanning multiple Beam releases. >>>>>>>> >>>>>>>> For example, >>>>>>>> >>>>>>>> For all commits in a given release RC: >>>>>>>> * If we find a Github issue for the commit: add a notice to the >>>>>>>> Github issue >>>>>>>> * Else: add the notice to a generic issue for the release >>>>>>>> including tags for the commit ID, PR author, and the committer who >>>>>>>> merged >>>>>>>> the PR. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Cham >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Oct 23, 2023 at 11:49 AM Danny McCormick via dev < >>>>>>>> dev@beam.apache.org> wrote: >>>>>>>> >>>>>>>>> I'd probably vote to include both the issue filer and the >>>>>>>>> contributor. It is pretty equally straightforward - one way to do this >>>>>>>>> would be using all issues related to that release's milestone and >>>>>>>>> extracting the issue author and the issue closer. >>>>>>>>> >>>>>>>>> This does leave out the (unfortunately sizable) set of >>>>>>>>> contributions that don't have an associated issue; if we're worried >>>>>>>>> about >>>>>>>>> that, we could always fall back to anyone with a commit in the last >>>>>>>>> release >>>>>>>>> who doesn't have an associated issue (aka what I thought we were >>>>>>>>> initially >>>>>>>>> proposing and what I think Airflow does today). >>>>>>>>> >>>>>>>>> I'm pretty much +1 on any sort of automation here, and it >>>>>>>>> certainly can come in stages :) >>>>>>>>> >>>>>>>>> On Mon, Oct 23, 2023 at 1:50 PM Johanna Öjeling via dev < >>>>>>>>> dev@beam.apache.org> wrote: >>>>>>>>> >>>>>>>>>> Yes that's a good point to include also those who created the >>>>>>>>>> issue. >>>>>>>>>> >>>>>>>>>> On Mon, Oct 23, 2023, 19:18 Robert Bradshaw via dev < >>>>>>>>>> dev@beam.apache.org> wrote: >>>>>>>>>> >>>>>>>>>>> On Mon, Oct 23, 2023 at 7:26 AM Danny McCormick via dev < >>>>>>>>>>> dev@beam.apache.org> wrote: >>>>>>>>>>> >>>>>>>>>>>> So to summarize, I think there's broad consensus (or at least >>>>>>>>>>>> lazy consensus) around the following: >>>>>>>>>>>> >>>>>>>>>>>> - (1) Updating our release email/guidelines to be more specific >>>>>>>>>>>> about what we mean by release validation/how to be helpful during >>>>>>>>>>>> this >>>>>>>>>>>> process. This includes both encouraging validation within each >>>>>>>>>>>> user's own >>>>>>>>>>>> code base and encouraging people to document/share their process of >>>>>>>>>>>> validation and link it in the release spreadsheet. >>>>>>>>>>>> - (2) Doing something like what Airflow does (#29424 >>>>>>>>>>>> <https://github.com/apache/airflow/issues/29424>) and creating >>>>>>>>>>>> an issue asking people who have contributed to the current release >>>>>>>>>>>> to help >>>>>>>>>>>> validate their changes. >>>>>>>>>>>> >>>>>>>>>>>> I'm also +1 on doing both of these. The first bit (updating our >>>>>>>>>>>> guidelines) is relatively easy - it should just require updating >>>>>>>>>>>> https://github.com/apache/beam/blob/master/contributor-docs/release-guide.md#vote-and-validate-the-release-candidate >>>>>>>>>>>> . >>>>>>>>>>>> >>>>>>>>>>>> I took a look at the second piece (copying what Airflow does) >>>>>>>>>>>> to see if we could just copy their automation, but it looks like >>>>>>>>>>>> it's >>>>>>>>>>>> tied to airflow breeze >>>>>>>>>>>> <https://github.com/apache/airflow/blob/main/dev/breeze/src/airflow_breeze/provider_issue_TEMPLATE.md.jinja2> >>>>>>>>>>>> (their repo-specific automation tooling), so we'd probably need to >>>>>>>>>>>> build >>>>>>>>>>>> the automation ourselves. It shouldn't be terrible, basically we'd >>>>>>>>>>>> want a >>>>>>>>>>>> GitHub Action that compares the current release tag with the last >>>>>>>>>>>> release >>>>>>>>>>>> tag, grabs all the commits in between, parses them to get the >>>>>>>>>>>> author, and >>>>>>>>>>>> creates an issue with that data, but it does represent more effort >>>>>>>>>>>> than >>>>>>>>>>>> just updating a markdown file. There might even be an existing >>>>>>>>>>>> Action that >>>>>>>>>>>> can help with this, I haven't looked too hard. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I was thinking along the lines of a script that would scrape the >>>>>>>>>>> issues resolved in a given release and add a comment to them noting >>>>>>>>>>> that >>>>>>>>>>> the change is in release N and encouraging (with clear >>>>>>>>>>> instructions) how >>>>>>>>>>> this can be validated. Creating a "validate this release" issue >>>>>>>>>>> with all >>>>>>>>>>> "contributing" participants could be an interesting way to do this >>>>>>>>>>> as well. >>>>>>>>>>> (I think it'd be valuable to get those who filed the issue, not >>>>>>>>>>> just those >>>>>>>>>>> who fixed it, to validate.) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> As our next release manager, I'm happy to review PRs for either >>>>>>>>>>>> of these if anyone wants to volunteer to help out. If not, I'm >>>>>>>>>>>> happy to >>>>>>>>>>>> update the guidelines, but I probably won't have time to add the >>>>>>>>>>>> commit >>>>>>>>>>>> inspection tooling (I'm planning on throwing any extra time towards >>>>>>>>>>>> continuing to automate release candidate creation which is >>>>>>>>>>>> currently a more >>>>>>>>>>>> impactful problem IMO). I would very much like it if both of these >>>>>>>>>>>> things >>>>>>>>>>>> happened though :) >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Danny >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Oct 23, 2023 at 10:05 AM XQ Hu <x...@google.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> +1. This is a great idea to try. @Danny McCormick >>>>>>>>>>>>> <dannymccorm...@google.com> FYI as our next release manager. >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Oct 18, 2023 at 2:30 PM Johanna Öjeling via dev < >>>>>>>>>>>>> dev@beam.apache.org> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> When I have contributed to Apache Airflow, they have tagged >>>>>>>>>>>>>> all contributors concerned in a GitHub issue when the RC is >>>>>>>>>>>>>> available and >>>>>>>>>>>>>> asked us to validate it. Example: #29424 >>>>>>>>>>>>>> <https://github.com/apache/airflow/issues/29424>. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I found that to be an effective way to notify contributors of >>>>>>>>>>>>>> the RC and nudge them to help out. In the issue description >>>>>>>>>>>>>> there is a >>>>>>>>>>>>>> reference to the guidelines on how to test the RC and a note >>>>>>>>>>>>>> that people >>>>>>>>>>>>>> are encouraged to vote on the mailing list (which could >>>>>>>>>>>>>> admittedly be more >>>>>>>>>>>>>> highlighted because I did not pay attention to it until now and >>>>>>>>>>>>>> was unaware >>>>>>>>>>>>>> that contributors had a vote). >>>>>>>>>>>>>> >>>>>>>>>>>>>> It might be an idea to consider something similar here to >>>>>>>>>>>>>> increase the participation? >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Oct 17, 2023 at 7:01 PM Jack McCluskey via dev < >>>>>>>>>>>>>> dev@beam.apache.org> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I'm +1 on helping explain what we mean by "validate the RC" >>>>>>>>>>>>>>> since we're really just asking users to see if their existing >>>>>>>>>>>>>>> use cases >>>>>>>>>>>>>>> work along with our typical slate of tests. I don't know if >>>>>>>>>>>>>>> offloading that >>>>>>>>>>>>>>> work to our active validators is the right approach though, >>>>>>>>>>>>>>> documentation/screen share of their specific workflow is >>>>>>>>>>>>>>> definitely less >>>>>>>>>>>>>>> useful than having a more general outline of how to install the >>>>>>>>>>>>>>> RC and >>>>>>>>>>>>>>> things to look out for when testing. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, Oct 17, 2023 at 12:55 PM Austin Bennett < >>>>>>>>>>>>>>> aus...@apache.org> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Great effort. I'm also interested in streamlining releases >>>>>>>>>>>>>>>> -- so if there are alot of manual tests that could be >>>>>>>>>>>>>>>> automated, would be >>>>>>>>>>>>>>>> great to discover and then look to address. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tue, Oct 17, 2023 at 8:47 AM Robert Bradshaw via dev < >>>>>>>>>>>>>>>> dev@beam.apache.org> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> +1 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I would also strongly suggest that people try out the >>>>>>>>>>>>>>>>> release against their own codebases. This has the benefit of >>>>>>>>>>>>>>>>> ensuring the >>>>>>>>>>>>>>>>> release won't break your own code when they go out, and >>>>>>>>>>>>>>>>> stress-tests >>>>>>>>>>>>>>>>> the new code against real-world pipelines. (Ideally our own >>>>>>>>>>>>>>>>> tests are all >>>>>>>>>>>>>>>>> passing, and this validation is automated as much as possible >>>>>>>>>>>>>>>>> (though >>>>>>>>>>>>>>>>> ensuring it matches our documentation and works in a clean >>>>>>>>>>>>>>>>> environment >>>>>>>>>>>>>>>>> still has value), but there's a lot of code and uses out >>>>>>>>>>>>>>>>> there that we >>>>>>>>>>>>>>>>> don't have access to during normal Beam development.) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Tue, Oct 17, 2023 at 8:21 AM Svetak Sundhar via dev < >>>>>>>>>>>>>>>>> dev@beam.apache.org> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I’ve participated in RC testing for a few releases and >>>>>>>>>>>>>>>>>> have observed a bit of a knowledge gap in how releases can >>>>>>>>>>>>>>>>>> be tested. Given >>>>>>>>>>>>>>>>>> that Beam encourages contributors to vote on RC’s regardless >>>>>>>>>>>>>>>>>> of tenure, and >>>>>>>>>>>>>>>>>> that voting on an RC is a relatively low-effort, high >>>>>>>>>>>>>>>>>> leverage way to >>>>>>>>>>>>>>>>>> influence the release of the library, I propose the >>>>>>>>>>>>>>>>>> following: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> During the vote for the next release, voters can document >>>>>>>>>>>>>>>>>> the process they followed on a separate document, and add >>>>>>>>>>>>>>>>>> the link on >>>>>>>>>>>>>>>>>> column G here >>>>>>>>>>>>>>>>>> <https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=437054928>. >>>>>>>>>>>>>>>>>> One step further, could be a screencast of running the test, >>>>>>>>>>>>>>>>>> and attaching >>>>>>>>>>>>>>>>>> a link of that. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> We can keep repeating this through releases until we have >>>>>>>>>>>>>>>>>> documentation for many of the different tests. We can then >>>>>>>>>>>>>>>>>> add these docs >>>>>>>>>>>>>>>>>> into the repo. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I’m proposing this because I’ve gathered the following >>>>>>>>>>>>>>>>>> feedback from colleagues that are tangentially involved with >>>>>>>>>>>>>>>>>> Beam: They are >>>>>>>>>>>>>>>>>> interested in participating in release validation, but don’t >>>>>>>>>>>>>>>>>> know how to >>>>>>>>>>>>>>>>>> get started. Happy to hear other suggestions too, if there >>>>>>>>>>>>>>>>>> are any to >>>>>>>>>>>>>>>>>> address the above. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Svetak Sundhar >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Data Engineer >>>>>>>>>>>>>>>>>> s <nellywil...@google.com>vetaksund...@google.com >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>