Re: [DISCUSS] Switch to using GitHub pull requests?

Jordan West Fri, 24 Jan 2020 11:40:15 -0800

Looks like there are two slack plugins for Jenkins. They trigger after
builds and if my rusty Jenkins-fu is right the trunk build can be scheduled
to run daily and then have the plugin post to slack when its done. Not an
expert and can't poke at the Jenkins instance myself so not sure what
limitations there are.


https://plugins.jenkins.io/slack
https://plugins.jenkins.io/global-slack-notifier

Jordan

On Fri, Jan 24, 2020 at 11:27 AM Jeff Jirsa <jji...@gmail.com> wrote:

> Can someone find a circleci or jenkins bot that posts to the #cassandra-dev
> channel in ASF slack once a day?
>
>
> On Fri, Jan 24, 2020 at 11:23 AM Jordan West <jorda...@gmail.com> wrote:
>
> > Keeping trunk green at all times is a great goal to strive for, I'd love
> to
> > continue to work towards it, but in my experience its not easy. Flaky
> > tests, for the reason folks mentioned, are a real challenge. A standard
> we
> > could use while we work towards the more ambitious one, and we are pretty
> > close to using already as Josh mentioned, that I've seen work well is
> > multiple successive green runs (ideally on different platforms as well)
> > before a certain release and better visibility/documentation of test
> runs &
> > flakiness.
> >
> > We can make incremental improvements towards this! Some I've heard on
> this
> > thread or am personally interested in are below. I think even making one
> or
> > two of these changes would be an improvement.
> >
> > - A regularly run / on commit trunk build, visible to the public, should
> > give us more visibility into test health vs. todays status quo of having
> to
> > search the CI history of different branches.
> >
> > - A process of documenting known flaky tests like a JIRA and maybe an
> > annotation (or just a comment) that references that JIRA (not that runs
> the
> > test multiple times to ask flakiness). Those JIRAs can be assigned to
> > specific releases in the current cycle like we have been doing for 4.0.
> > This could be paired w/ making it explicit when in the release cycle its
> ok
> > to merge w/ flaky tests (if they are documented).
> >
> > - Surfacing CI results on JIRA when CI is triggered (manually or
> > automatically) makes it easier for reviewers and checking history at a
> > later date.
> >
> > - Running CI automatically for contributions that the ASF says its ok for
> > -- as David said, other projects seem to make this work and it doesn't
> seem
> > to be an insurmountable problem since the list of signed ICLA users is
> > known & the GitHub API is powerful.
> >
> > - Automatically transitioning JIRAs to Patch Available when the PR method
> > is used to open a ticket (don't know if this is possible, currently it
> adds
> > the pull-request-available label)
> >
> > Jordan
> >
> >
> > On Fri, Jan 24, 2020 at 9:30 AM Joshua McKenzie <jmcken...@apache.org>
> > wrote:
> >
> > > >
> > > > I also don't think it leads to the right behaviour or incentives.
> > >
> > > The gap between when a test is authored and the point at which it's
> > > determined to be flaky, as the difficulty with responsibility
> assignment
> > > (an "unrelated" change can in some cases make a previously stable test
> > > become flaky) makes this a real devil of a problem to fix. Hence it's
> > long
> > > and rich legacy. ;)
> > >
> > > While I agree with the general sentiment of "if we email the dev list
> > with
> > > a failure, or we git blame a test and poke the author to fix it they'll
> > do
> > > the right thing", we still end up in cases where people have rotated
> off
> > > the project and nobody feels a sense of ownership over a test failure
> for
> > > something someone else wrote, or a circumstance in which another change
> > > broke something, etc. At least from where I sit, I can't see a solution
> > to
> > > this problem that doesn't involve some collective action for things not
> > > directly under one's purview.
> > >
> > > Also, fwiw in my experience, "soft" gatekeeping for things like this
> will
> > > just lead to the problem persisting into perpetuity. The problem
> strikes
> > me
> > > as too complex and temporally / unpredictably distributed to be
> solvable
> > by
> > > incentivizing the "right" behavior (proactive prevention of
> introduction
> > of
> > > things like this, hygiene and rigor on authorship, etc), but I'm sure
> > > there's ways of approaching this that I'm not thinking of.
> > >
> > > But maybe I'm making a mountain out of a molehill. @bes - if you think
> > that
> > > emailing the dev list when a failure is encountered on rotation would
> be
> > > sufficient to keep this problem under control with an obviously much
> > > lighter touch, I'm +1 for giving it a shot.
> > >
> > > On Fri, Jan 24, 2020 at 10:12 AM Benedict Elliott Smith <
> > > bened...@apache.org>
> > > wrote:
> > >
> > > > > due to oversight on a commit or a delta breaking some test the
> author
> > > > thinks is unrelated to their diff but turns out to be a second-order
> > > > consequence of their change that they didn't expect
> > > >
> > > > In my opinion/experience, this is all a direct consequence of lack of
> > > > trust in CI caused by flakiness.  We have finite time to dedicate to
> > our
> > > > jobs, and figuring out whether or not a run is really clean for this
> > > patch
> > > > is genuinely costly when  you cannot trust the result,  Those costs
> > > > multiple rapidly across the contributor base.
> > > >
> > > > That does not conflict with what you are saying.  I don't, however,
> > think
> > > > it is reasonable to place the burden on the person trying to commit
> at
> > > that
> > > > moment, whether or not by positive sentiment or "computer says no".
> I
> > > also
> > > > don't think it leads to the right behaviour or incentives.
> > > >
> > > > I further think there's been a degradation of community behaviour to
> > some
> > > > extent caused by the bifurcation of CI infrastructure and approach.
> > > > Ideally we would all use a common platform, and there would be
> regular
> > > > trunk runs to compare against, like-for-like.
> > > >
> > > > IMO, we should email dev@ if there are failing runs for trunk, and
> > there
> > > > should be a rotating role amongst the contributors to figure out who
> > > broke
> > > > it, and poke them to fix it (or to just fix it, if easy).
> > > >
> > > >
> > > > On 24/01/2020, 14:57, "Joshua McKenzie" <jmcken...@apache.org>
> wrote:
> > > >
> > > >     >
> > > >     > gating PRs on clean runs won’t achieve anything other than
> > dealing
> > > > with
> > > >     > folks who straight up ignore the spirit of the policy and
> > knowingly
> > > > commit
> > > >     > code with test breakage
> > > >
> > > >     I think there's some nuance here. We have a lot of suites
> (novnode,
> > > > cdc,
> > > >     etc etc) where failures show up because people didn't run those
> > tests
> > > > or
> > > >     didn't think to check them when they did. Likewise, I'd posit we
> > > have a
> > > >     non-trivial number of failures (>= 15%? making up a number here)
> > that
> > > > are
> > > >     due to oversight on a commit or a delta breaking some test the
> > author
> > > >     thinks is unrelated to their diff but turns out to be a
> > second-order
> > > >     consequence of their change that they didn't expect. I'm
> certainly
> > > not
> > > >     claiming we have bad actors here merging known test failures
> > because
> > > > they
> > > >     don't care.
> > > >
> > > >     This seems to me like an issue of collective ownership, and
> whether
> > > or
> > > > not
> > > >     we're willing to take it as a project. If I have a patch, I run
> CI,
> > > > and a
> > > >     test fails that's unrelated to my diff (or I think is, or
> conclude
> > > it's
> > > >     unrelated after inspection, whatever), that's the crucial moment
> > > where
> > > > we
> > > >     can either say "Welp, not my problem. Merge time.", or say "hey,
> we
> > > all
> > > >     live in this neighborhood together and while this trash on the
> > ground
> > > > isn't
> > > >     actually mine, it's my neighborhood so if I clean this up it'll
> > > > benefit me
> > > >     and all the rest of us."
> > > >
> > > >     Depending on how much time and energy fixing a flake like that
> may
> > > > take,
> > > >     this may prove to be economically unsustainable for some/many
> > > > participants
> > > >     on the project. A lot of us are paid to work on C* by
> organizations
> > > > with
> > > >     specific priorities for the project that are not directly related
> > to
> > > > "has
> > > >     green test board". But I do feel comfortable making the case that
> > > > there's a
> > > >     world in which "don't merge if any tests fail, clean up whatever
> > > > failures
> > > >     you run into" *could* be a sustainable model assuming everyone in
> > the
> > > >     ecosystem was willing and able to engage in that collectively
> > > > benefiting
> > > >     behavior.
> > > >
> > > >     Does the above make sense?
> > > >
> > > >     On Fri, Jan 24, 2020 at 7:39 AM Aleksey Yeshchenko
> > > >     <alek...@apple.com.invalid> wrote:
> > > >
> > > >     > As for GH for code review, I find that it works very well for
> > nits.
> > > > It’s
> > > >     > also great for doc changes, given how GH allows you suggest
> > changes
> > > > to
> > > >     > files in-place and automatically create PRs for those changes.
> > That
> > > > lowers
> > > >     > the barrier for those tiny contributions.
> > > >     >
> > > >     > For anything relatively substantial, I vastly prefer to
> summarise
> > > my
> > > >     > feedback (and see others’ feedback summarised) in JIRA
> comments -
> > > an
> > > >     > opinion I and other contributors have shared in one or two
> > similar
> > > > threads
> > > >     > over the years.
> > > >     >
> > > >     >
> > > >     > > On 24 Jan 2020, at 12:21, Aleksey Yeshchenko
> > > > <alek...@apple.com.INVALID>
> > > >     > wrote:
> > > >     > >
> > > >     > > The person introducing flakiness to a test will almost always
> > > have
> > > > run
> > > >     > it locally and on CI first with success. It’s usually later
> when
> > > > they first
> > > >     > start failing, and it’s often tricky to attribute to a
> particular
> > > >     > commit/person.
> > > >     > >
> > > >     > > So long as we have these - and we’ve had flaky tests for as
> > long
> > > > as C*
> > > >     > has existed - the problem will persist, and gating PRs on clean
> > > runs
> > > > won’t
> > > >     > achieve anything other than dealing with folks who straight up
> > > > ignore the
> > > >     > spirit of the policy and knowingly commit code with test
> breakage
> > > > that can
> > > >     > be attributed to their change. I’m not aware of such committers
> > in
> > > > this
> > > >     > community, however.
> > > >     > >
> > > >     > >> On 24 Jan 2020, at 09:01, Benedict Elliott Smith <
> > > > bened...@apache.org>
> > > >     > wrote:
> > > >     > >>
> > > >     > >>> I find it only useful for nits, or for coaching-level
> > comments
> > > > that I
> > > >     > would never want propagated to Jira.
> > > >     > >>
> > > >     > >> Actually, I'll go one step further. GitHub encourages
> comments
> > > > that are
> > > >     > too trivial, poisoning the well for third parties trying to
> find
> > > > useful
> > > >     > information.  If the comment wouldn't be made in Jira, it
> > probably
> > > >     > shouldn't be made.
> > > >     > >>
> > > >     > >>
> > > >     > >>
> > > >     > >> On 24/01/2020, 08:56, "Benedict Elliott Smith" <
> > > > bened...@apache.org>
> > > >     > wrote:
> > > >     > >>
> > > >     > >>   The common factor is flaky tests, not people.  You get a
> > clean
> > > > run,
> > > >     > you commit.  Turns out, a test was flaky.  This reduces trust
> in
> > > CI,
> > > > so
> > > >     > people commit without looking as closely at results.  Gating on
> > > > clean tests
> > > >     > doesn't help, as you run until you're clean.  Rinse and repeat.
> > > > Breakages
> > > >     > accumulate.
> > > >     > >>
> > > >     > >>   This is what happens leading up to every release - nobody
> > > > commits
> > > >     > knowing there's a breakage.  We have a problem with bad tests,
> > not
> > > > bad
> > > >     > people or process.
> > > >     > >>
> > > >     > >>   FWIW, I no longer like the GitHub workflow.  I find it
> only
> > > > useful
> > > >     > for nits, or for coaching-level comments that I would never
> want
> > > > propagated
> > > >     > to Jira.  I find a strong patch submission of any size is
> better
> > > > managed
> > > >     > with human-curated Jira comments, and provides a better
> permanent
> > > > record.
> > > >     > When skimming a discussion, Jira is more informative than
> GitHub.
> > > > Even
> > > >     > with the GitHub UX, the context hinders rather than helps.
> > > >     > >>
> > > >     > >>   As to propagating to Jira: has anyone here ever read them?
> > I
> > > > haven't
> > > >     > as they're impenetrable; ugly and almost entirely noise.  If
> > > > anything, I
> > > >     > would prefer that we discourage GitHub for review as a project,
> > not
> > > > move
> > > >     > towards it.
> > > >     > >>
> > > >     > >>   This is without getting into the problem of multiple
> branch
> > > PRs.
> > > >     > Until this is _provably_ painless, we cannot introduce a
> workflow
> > > > that
> > > >     > requires it and blocks commit on it.  Working with multiple
> > > branches
> > > > is
> > > >     > difficult enough already, surely?
> > > >     > >>
> > > >     > >>
> > > >     > >>
> > > >     > >>   On 24/01/2020, 03:16, "Jeff Jirsa" <jji...@gmail.com>
> > wrote:
> > > >     > >>
> > > >     > >>       100% agree
> > > >     > >>
> > > >     > >>       François and team wrote a doc on testing and gating
> > > commits
> > > >     > >>       Blake wrote a doc on testing and gating commits
> > > >     > >>       Every release there’s a thread on testing and gating
> > > commits
> > > >     > >>
> > > >     > >>       People are the common factor every time. Nobody wants
> to
> > > > avoid
> > > >     > merging their patch because someone broke a test elsewhere.
> > > >     > >>
> > > >     > >>       We can’t block merging technically with the repo as it
> > > > exists now
> > > >     > so it’s always going to come down to people and peer pressure,
> > > until
> > > > or
> > > >     > unless someone starts reverting commits that break tests
> > > >     > >>
> > > >     > >>       (Of course, someone could write a tool that
> > automatically
> > > > reverts
> > > >     > new commits as long as tests fail....)
> > > >     > >>
> > > >     > >>       On Jan 23, 2020, at 5:54 PM, Joshua McKenzie <
> > > >     > jmcken...@apache.org> wrote:
> > > >     > >>>
> > > >     > >>> 
> > > >     > >>>>
> > > >     > >>>>
> > > >     > >>>> I am reacting to what I currently see
> > > >     > >>>> happening in the project; tests fail as the norm and this
> is
> > > > kinda
> > > >     > seen as
> > > >     > >>>> expected, even though it goes against the policies as I
> > > > understand it.
> > > >     > >>>
> > > >     > >>> After over half a decade seeing us all continue to struggle
> > > with
> > > > this
> > > >     > >>> problem, I've come around to the school of "apply pain" (I
> > mean
> > > > that as
> > > >     > >>> light-hearted as you can take it) when there's a failure to
> > > > incent
> > > >     > fixing;
> > > >     > >>> specifically in this case, the only idea I can think of is
> > > > preventing
> > > >     > merge
> > > >     > >>> w/any failing tests on a PR. We go through this cycle as we
> > > > approach
> > > >     > each
> > > >     > >>> major release: we have the gatekeeper of "we're not going
> to
> > > cut
> > > > a
> > > >     > release
> > > >     > >>> with failing tests obviously", and we clean them up. After
> > the
> > > >     > release, the
> > > >     > >>> pressure is off, we exhale, relax, and flaky test failures
> > (and
> > > > others)
> > > >     > >>> start to creep back in.
> > > >     > >>>
> > > >     > >>> If the status quo is the world we want to live in, that's
> > > > totally fine
> > > >     > and
> > > >     > >>> no judgement intended - we can build tooling around test
> > > failure
> > > >     > history
> > > >     > >>> and known flaky tests etc to optimize engineer workflows
> > around
> > > > that
> > > >     > >>> expectation. But what I keep seeing on threads like this
> (and
> > > > have
> > > >     > always
> > > >     > >>> heard brought up in conversation) is that our collective
> > > *moral*
> > > >     > stance is
> > > >     > >>> that we should have green test boards and not merge code if
> > it
> > > >     > introduces
> > > >     > >>> failing tests.
> > > >     > >>>
> > > >     > >>> Not looking to prescribe or recommend anything, just hoping
> > > that
> > > >     > >>> observation above might be of interest or value to the
> > > > conversation.
> > > >     > >>>
> > > >     > >>>> On Thu, Jan 23, 2020 at 4:17 PM Michael Shuler <
> > > >     > mich...@pbandjelly.org>
> > > >     > >>>> wrote:
> > > >     > >>>>
> > > >     > >>>>> On 1/23/20 3:53 PM, David Capwell wrote:
> > > >     > >>>>>
> > > >     > >>>>> 2) Nightly build email to dev@?
> > > >     > >>>>
> > > >     > >>>> Nope. builds@c.a.o is where these go.
> > > >     > >>>>
> > > https://lists.apache.org/list.html?bui...@cassandra.apache.org
> > > >     > >>>>
> > > >     > >>>> Michael
> > > >     > >>>>
> > > >     > >>>>
> > > > ---------------------------------------------------------------------
> > > >     > >>>> To unsubscribe, e-mail:
> > dev-unsubscr...@cassandra.apache.org
> > > >     > >>>> For additional commands, e-mail:
> > > dev-h...@cassandra.apache.org
> > > >     > >>>>
> > > >     > >>>>
> > > >     > >>
> > > >     > >>
> > > >     >
> > > > ---------------------------------------------------------------------
> > > >     > >>       To unsubscribe, e-mail:
> > > > dev-unsubscr...@cassandra.apache.org
> > > >     > >>       For additional commands, e-mail:
> > > > dev-h...@cassandra.apache.org
> > > >     > >>
> > > >     > >>
> > > >     > >>
> > > >     > >>
> > > >     > >>
> > > >     > >>
> > > >
> ---------------------------------------------------------------------
> > > >     > >>   To unsubscribe, e-mail:
> > dev-unsubscr...@cassandra.apache.org
> > > >     > >>   For additional commands, e-mail:
> > > dev-h...@cassandra.apache.org
> > > >     > >>
> > > >     > >>
> > > >     > >>
> > > >     > >>
> > > >     > >>
> > > >     > >>
> > > > ---------------------------------------------------------------------
> > > >     > >> To unsubscribe, e-mail:
> dev-unsubscr...@cassandra.apache.org
> > > >     > >> For additional commands, e-mail:
> > dev-h...@cassandra.apache.org
> > > >     > >>
> > > >     > >
> > > >     > >
> > > >     > >
> > > > ---------------------------------------------------------------------
> > > >     > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > >     > > For additional commands, e-mail:
> dev-h...@cassandra.apache.org
> > > >     > >
> > > >     >
> > > >     >
> > > >     >
> > > ---------------------------------------------------------------------
> > > >     > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > >     > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > > >     >
> > > >     >
> > > >
> > > >
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Switch to using GitHub pull requests?

Reply via email to