Hi all,

Following back up on this thread, some tips on validating RCs are now
documented [1]. Please do add any instructions, especially for more
SDK/Runner specific combos.

I'll take a closer look now into the automation discussed above on this
thread.

Thanks,
Svetak

[1] https://github.com/apache/beam/pull/29595


On Wed, Oct 25, 2023 at 9:52 AM Danny McCormick via dev <dev@beam.apache.org>
wrote:

> > One easy and standard way to make it more resilient would be to make it
> idempotent instead of counting on uptime or receiving any particular event.
>
> Yep, agreed that this wouldn't be super hard if someone wants to take it
> on. Basically it would just be updating the tool to run on a schedule and
> look for issues that have been closed as completed in the last N days (more
> or less this query -
> https://github.com/apache/beam/issues?q=is%3Aissue+is%3Aclosed+reason%3Acompleted+created%3A%3E2023-01-01+).
> I have seen some milestones intentionally removed from issues after the bot
> adds them (probably because it's non-obvious that you can mark an issue as
> not planned instead), so we'd probably want to account for that and no-op
> if a milestone was removed post-close.
>
> One downside of this approach is that you significantly increase the
> chances of an issue getting misassigned to the wrong milestone if it comes
> in around the cut; you'd need to either account for this by checking out
> the repo to get the version at the time the issue was closed
> (expensive/non-trivial) or live with this downside. It's probably an ok
> downside to live with.
>
> You could also do a hybrid approach where you run on issue close and run a
> scheduled or manual pre-release step to clean up any stragglers. This would
> be the most robust option.
>
> On Wed, Oct 25, 2023 at 7:43 AM Kenneth Knowles <k...@apache.org> wrote:
>
>> Agree. As long as we are getting enough of them, then our records as well
>> as any automation depending on it are fine. One easy and standard way to
>> make it more resilient would be to make it idempotent instead of counting
>> on uptime or receiving any particular event.
>>
>> Kenn
>>
>> On Tue, Oct 24, 2023 at 2:58 PM Danny McCormick <
>> dannymccorm...@google.com> wrote:
>>
>>> Looks like for some reason the workflow didn't trigger. This is running
>>> on GitHub's hosted runners, so my best guess is an outage.
>>>
>>> Looking at a more refined query, this year there have been 14 issues
>>> that were missed by the automation (3 had their milestone manually removed)
>>> -
>>> https://github.com/apache/beam/issues?q=is%3Aissue+no%3Amilestone+is%3Aclosed+reason%3Acompleted+created%3A%3E2023-01-01
>>>  out
>>> of 605 total -
>>> https://github.com/apache/beam/issues?q=is%3Aissue+is%3Aclosed+reason%3Acompleted+created%3A%3E2023-01-01+
>>>  -
>>> as best I can tell there were a small number of workflow flakes and then
>>> GHA didn't correctly trigger a few.
>>>
>>> If we wanted, we could set up some recurring automation to go through
>>> and try to pick up the ones without milestones (or modify our existing
>>> automation to be more tolerant to failures), but it doesn't seem super
>>> urgent to me (feel free to disagree). I don't think this piece needs to be
>>> perfect.
>>>
>>> On Tue, Oct 24, 2023 at 2:40 PM Kenneth Knowles <k...@apache.org> wrote:
>>>
>>>> Just grabbing one at random for an example,
>>>> https://github.com/apache/beam/issues/28635 seems like it was closed
>>>> as completed but not tagged.
>>>>
>>>> I'm happy to see that the bot reads the version from the repo to find
>>>> the appropriate milestone, rather than using the nearest open one. Just
>>>> recording that for the thread since I first read the description as the
>>>> latter.
>>>>
>>>> Kenn
>>>>
>>>> On Tue, Oct 24, 2023 at 2:34 PM Danny McCormick via dev <
>>>> dev@beam.apache.org> wrote:
>>>>
>>>>> We do tag issues to milestones when the issue is marked as "completed"
>>>>> (as opposed to "not planned") -
>>>>> https://github.com/apache/beam/blob/master/.github/workflows/assign_milestone.yml.
>>>>> So I think using issues is probably about as accurate as using commits.
>>>>>
>>>>> > It looks like we have 820 with no milestone
>>>>> https://github.com/apache/beam/issues?q=is%3Aissue+no%3Amilestone+is%3Aclosed
>>>>>
>>>>> Most predate the automation, though maybe not all? Some of those may
>>>>> have been closed as "not planned".
>>>>>
>>>>> > This could (should) be automatically discoverable. A (closed) issues
>>>>> is associated with commits which are associated with a release.
>>>>>
>>>>> Today, we just tag issues to the upcoming milestone when they're
>>>>> closed. In theory you could do something more sophisticated using linked
>>>>> commits, but in practice people aren't clean enough about linking commits
>>>>> to issues. Again, this is fixable by automation/enforcement, but I don't
>>>>> think it actually gives us much value beyond what we have today.
>>>>>
>>>>> On Tue, Oct 24, 2023 at 1:54 PM Robert Bradshaw via dev <
>>>>> dev@beam.apache.org> wrote:
>>>>>
>>>>>> On Tue, Oct 24, 2023 at 10:35 AM Kenneth Knowles <k...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Tangentially related:
>>>>>>>
>>>>>>> Long ago, attaching an issue to a release was a mandatory step as
>>>>>>> part of closing. Now I think it is not. Is it automatically happening? 
>>>>>>> It
>>>>>>> looks like we have 820 with no milestone
>>>>>>> https://github.com/apache/beam/issues?q=is%3Aissue+no%3Amilestone+is%3Aclosed
>>>>>>>
>>>>>>
>>>>>> This could (should) be automatically discoverable. A (closed) issues
>>>>>> is associated with commits which are associated with a release.
>>>>>>
>>>>>>
>>>>>>> On Tue, Oct 24, 2023 at 1:25 PM Chamikara Jayalath via dev <
>>>>>>> dev@beam.apache.org> wrote:
>>>>>>>
>>>>>>>> +1 for going by the commits since this is what matters at the end
>>>>>>>> of the day. Also, many issues may not get tagged correctly for a given
>>>>>>>> release due to either the contributor not tagging the issue or due to
>>>>>>>> commits for the issue spanning multiple Beam releases.
>>>>>>>>
>>>>>>>> For example,
>>>>>>>>
>>>>>>>> For all commits in a given release RC:
>>>>>>>>   * If we find a Github issue for the commit: add a notice to the
>>>>>>>> Github issue
>>>>>>>>   * Else: add the notice to a generic issue for the release
>>>>>>>> including tags for the commit ID, PR author, and the committer who 
>>>>>>>> merged
>>>>>>>> the PR.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Cham
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Oct 23, 2023 at 11:49 AM Danny McCormick via dev <
>>>>>>>> dev@beam.apache.org> wrote:
>>>>>>>>
>>>>>>>>> I'd probably vote to include both the issue filer and the
>>>>>>>>> contributor. It is pretty equally straightforward - one way to do this
>>>>>>>>> would be using all issues related to that release's milestone and
>>>>>>>>> extracting the issue author and the issue closer.
>>>>>>>>>
>>>>>>>>> This does leave out the (unfortunately sizable) set of
>>>>>>>>> contributions that don't have an associated issue; if we're worried 
>>>>>>>>> about
>>>>>>>>> that, we could always fall back to anyone with a commit in the last 
>>>>>>>>> release
>>>>>>>>> who doesn't have an associated issue (aka what I thought we were 
>>>>>>>>> initially
>>>>>>>>> proposing and what I think Airflow does today).
>>>>>>>>>
>>>>>>>>> I'm pretty much +1 on any sort of automation here, and it
>>>>>>>>> certainly can come in stages :)
>>>>>>>>>
>>>>>>>>> On Mon, Oct 23, 2023 at 1:50 PM Johanna Öjeling via dev <
>>>>>>>>> dev@beam.apache.org> wrote:
>>>>>>>>>
>>>>>>>>>> Yes that's a good point to include also those who created the
>>>>>>>>>> issue.
>>>>>>>>>>
>>>>>>>>>> On Mon, Oct 23, 2023, 19:18 Robert Bradshaw via dev <
>>>>>>>>>> dev@beam.apache.org> wrote:
>>>>>>>>>>
>>>>>>>>>>> On Mon, Oct 23, 2023 at 7:26 AM Danny McCormick via dev <
>>>>>>>>>>> dev@beam.apache.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> So to summarize, I think there's broad consensus (or at least
>>>>>>>>>>>> lazy consensus) around the following:
>>>>>>>>>>>>
>>>>>>>>>>>> - (1) Updating our release email/guidelines to be more specific
>>>>>>>>>>>> about what we mean by release validation/how to be helpful during 
>>>>>>>>>>>> this
>>>>>>>>>>>> process. This includes both encouraging validation within each 
>>>>>>>>>>>> user's own
>>>>>>>>>>>> code base and encouraging people to document/share their process of
>>>>>>>>>>>> validation and link it in the release spreadsheet.
>>>>>>>>>>>> - (2) Doing something like what Airflow does (#29424
>>>>>>>>>>>> <https://github.com/apache/airflow/issues/29424>) and creating
>>>>>>>>>>>> an issue asking people who have contributed to the current release 
>>>>>>>>>>>> to help
>>>>>>>>>>>> validate their changes.
>>>>>>>>>>>>
>>>>>>>>>>>> I'm also +1 on doing both of these. The first bit (updating our
>>>>>>>>>>>> guidelines) is relatively easy - it should just require updating
>>>>>>>>>>>> https://github.com/apache/beam/blob/master/contributor-docs/release-guide.md#vote-and-validate-the-release-candidate
>>>>>>>>>>>> .
>>>>>>>>>>>>
>>>>>>>>>>>> I took a look at the second piece (copying what Airflow does)
>>>>>>>>>>>> to see if we could just copy their automation, but it looks like 
>>>>>>>>>>>> it's
>>>>>>>>>>>> tied to airflow breeze
>>>>>>>>>>>> <https://github.com/apache/airflow/blob/main/dev/breeze/src/airflow_breeze/provider_issue_TEMPLATE.md.jinja2>
>>>>>>>>>>>> (their repo-specific automation tooling), so we'd probably need to 
>>>>>>>>>>>> build
>>>>>>>>>>>> the automation ourselves. It shouldn't be terrible, basically we'd 
>>>>>>>>>>>> want a
>>>>>>>>>>>> GitHub Action that compares the current release tag with the last 
>>>>>>>>>>>> release
>>>>>>>>>>>> tag, grabs all the commits in between, parses them to get the 
>>>>>>>>>>>> author, and
>>>>>>>>>>>> creates an issue with that data, but it does represent more effort 
>>>>>>>>>>>> than
>>>>>>>>>>>> just updating a markdown file. There might even be an existing 
>>>>>>>>>>>> Action that
>>>>>>>>>>>> can help with this, I haven't looked too hard.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I was thinking along the lines of a script that would scrape the
>>>>>>>>>>> issues resolved in a given release and add a comment to them noting 
>>>>>>>>>>> that
>>>>>>>>>>> the change is in release N and encouraging (with clear 
>>>>>>>>>>> instructions) how
>>>>>>>>>>> this can be validated. Creating a "validate this release" issue 
>>>>>>>>>>> with all
>>>>>>>>>>> "contributing" participants could be an interesting way to do this 
>>>>>>>>>>> as well.
>>>>>>>>>>> (I think it'd be valuable to get those who filed the issue, not 
>>>>>>>>>>> just those
>>>>>>>>>>> who fixed it, to validate.)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> As our next release manager, I'm happy to review PRs for either
>>>>>>>>>>>> of these if anyone wants to volunteer to help out. If not, I'm 
>>>>>>>>>>>> happy to
>>>>>>>>>>>> update the guidelines, but I probably won't have time to add the 
>>>>>>>>>>>> commit
>>>>>>>>>>>> inspection tooling (I'm planning on throwing any extra time towards
>>>>>>>>>>>> continuing to automate release candidate creation which is 
>>>>>>>>>>>> currently a more
>>>>>>>>>>>> impactful problem IMO). I would very much like it if both of these 
>>>>>>>>>>>> things
>>>>>>>>>>>> happened though :)
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Danny
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Oct 23, 2023 at 10:05 AM XQ Hu <x...@google.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> +1. This is a great idea to try. @Danny McCormick
>>>>>>>>>>>>> <dannymccorm...@google.com> FYI as our next release manager.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Oct 18, 2023 at 2:30 PM Johanna Öjeling via dev <
>>>>>>>>>>>>> dev@beam.apache.org> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> When I have contributed to Apache Airflow, they have tagged
>>>>>>>>>>>>>> all contributors concerned in a GitHub issue when the RC is 
>>>>>>>>>>>>>> available and
>>>>>>>>>>>>>> asked us to validate it. Example: #29424
>>>>>>>>>>>>>> <https://github.com/apache/airflow/issues/29424>.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I found that to be an effective way to notify contributors of
>>>>>>>>>>>>>> the RC and nudge them to help out. In the issue description 
>>>>>>>>>>>>>> there is a
>>>>>>>>>>>>>> reference to the guidelines on how to test the RC and a note 
>>>>>>>>>>>>>> that people
>>>>>>>>>>>>>> are encouraged to vote on the mailing list (which could 
>>>>>>>>>>>>>> admittedly be more
>>>>>>>>>>>>>> highlighted because I did not pay attention to it until now and 
>>>>>>>>>>>>>> was unaware
>>>>>>>>>>>>>> that contributors had a vote).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It might be an idea to consider something similar here to
>>>>>>>>>>>>>> increase the participation?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Oct 17, 2023 at 7:01 PM Jack McCluskey via dev <
>>>>>>>>>>>>>> dev@beam.apache.org> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'm +1 on helping explain what we mean by "validate the RC"
>>>>>>>>>>>>>>> since we're really just asking users to see if their existing 
>>>>>>>>>>>>>>> use cases
>>>>>>>>>>>>>>> work along with our typical slate of tests. I don't know if 
>>>>>>>>>>>>>>> offloading that
>>>>>>>>>>>>>>> work to our active validators is the right approach though,
>>>>>>>>>>>>>>> documentation/screen share of their specific workflow is 
>>>>>>>>>>>>>>> definitely less
>>>>>>>>>>>>>>> useful than having a more general outline of how to install the 
>>>>>>>>>>>>>>> RC and
>>>>>>>>>>>>>>> things to look out for when testing.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Oct 17, 2023 at 12:55 PM Austin Bennett <
>>>>>>>>>>>>>>> aus...@apache.org> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Great effort.  I'm also interested in streamlining releases
>>>>>>>>>>>>>>>> -- so if there are alot of manual tests that could be 
>>>>>>>>>>>>>>>> automated, would be
>>>>>>>>>>>>>>>> great to discover and then look to address.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Oct 17, 2023 at 8:47 AM Robert Bradshaw via dev <
>>>>>>>>>>>>>>>> dev@beam.apache.org> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I would also strongly suggest that people try out the
>>>>>>>>>>>>>>>>> release against their own codebases. This has the benefit of 
>>>>>>>>>>>>>>>>> ensuring the
>>>>>>>>>>>>>>>>> release won't break your own code when they go out, and 
>>>>>>>>>>>>>>>>> stress-tests
>>>>>>>>>>>>>>>>> the new code against real-world pipelines. (Ideally our own 
>>>>>>>>>>>>>>>>> tests are all
>>>>>>>>>>>>>>>>> passing, and this validation is automated as much as possible 
>>>>>>>>>>>>>>>>> (though
>>>>>>>>>>>>>>>>> ensuring it matches our documentation and works in a clean 
>>>>>>>>>>>>>>>>> environment
>>>>>>>>>>>>>>>>> still has value), but there's a lot of code and uses out 
>>>>>>>>>>>>>>>>> there that we
>>>>>>>>>>>>>>>>> don't have access to during normal Beam development.)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Oct 17, 2023 at 8:21 AM Svetak Sundhar via dev <
>>>>>>>>>>>>>>>>> dev@beam.apache.org> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I’ve participated in RC testing for a few releases and
>>>>>>>>>>>>>>>>>> have observed a bit of a knowledge gap in how releases can 
>>>>>>>>>>>>>>>>>> be tested. Given
>>>>>>>>>>>>>>>>>> that Beam encourages contributors to vote on RC’s regardless 
>>>>>>>>>>>>>>>>>> of tenure, and
>>>>>>>>>>>>>>>>>> that voting on an RC is a relatively low-effort, high 
>>>>>>>>>>>>>>>>>> leverage way to
>>>>>>>>>>>>>>>>>> influence the release of the library, I propose the 
>>>>>>>>>>>>>>>>>> following:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> During the vote for the next release, voters can document
>>>>>>>>>>>>>>>>>> the process they followed on a separate document, and add 
>>>>>>>>>>>>>>>>>> the link on
>>>>>>>>>>>>>>>>>> column G here
>>>>>>>>>>>>>>>>>> <https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=437054928>.
>>>>>>>>>>>>>>>>>> One step further, could be a screencast of running the test, 
>>>>>>>>>>>>>>>>>> and attaching
>>>>>>>>>>>>>>>>>> a link of that.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> We can keep repeating this through releases until we have
>>>>>>>>>>>>>>>>>> documentation for many of the different tests. We can then 
>>>>>>>>>>>>>>>>>> add these docs
>>>>>>>>>>>>>>>>>> into the repo.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I’m proposing this because I’ve gathered the following
>>>>>>>>>>>>>>>>>> feedback from colleagues that are tangentially involved with 
>>>>>>>>>>>>>>>>>> Beam: They are
>>>>>>>>>>>>>>>>>> interested in participating in release validation, but don’t 
>>>>>>>>>>>>>>>>>> know how to
>>>>>>>>>>>>>>>>>> get started. Happy to hear other suggestions too, if there 
>>>>>>>>>>>>>>>>>> are any to
>>>>>>>>>>>>>>>>>> address the above.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Svetak Sundhar
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   Data Engineer
>>>>>>>>>>>>>>>>>> s <nellywil...@google.com>vetaksund...@google.com
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>

Reply via email to