Can someone find a circleci or jenkins bot that posts to the #cassandra-dev channel in ASF slack once a day?
On Fri, Jan 24, 2020 at 11:23 AM Jordan West <jorda...@gmail.com> wrote: > Keeping trunk green at all times is a great goal to strive for, I'd love to > continue to work towards it, but in my experience its not easy. Flaky > tests, for the reason folks mentioned, are a real challenge. A standard we > could use while we work towards the more ambitious one, and we are pretty > close to using already as Josh mentioned, that I've seen work well is > multiple successive green runs (ideally on different platforms as well) > before a certain release and better visibility/documentation of test runs & > flakiness. > > We can make incremental improvements towards this! Some I've heard on this > thread or am personally interested in are below. I think even making one or > two of these changes would be an improvement. > > - A regularly run / on commit trunk build, visible to the public, should > give us more visibility into test health vs. todays status quo of having to > search the CI history of different branches. > > - A process of documenting known flaky tests like a JIRA and maybe an > annotation (or just a comment) that references that JIRA (not that runs the > test multiple times to ask flakiness). Those JIRAs can be assigned to > specific releases in the current cycle like we have been doing for 4.0. > This could be paired w/ making it explicit when in the release cycle its ok > to merge w/ flaky tests (if they are documented). > > - Surfacing CI results on JIRA when CI is triggered (manually or > automatically) makes it easier for reviewers and checking history at a > later date. > > - Running CI automatically for contributions that the ASF says its ok for > -- as David said, other projects seem to make this work and it doesn't seem > to be an insurmountable problem since the list of signed ICLA users is > known & the GitHub API is powerful. > > - Automatically transitioning JIRAs to Patch Available when the PR method > is used to open a ticket (don't know if this is possible, currently it adds > the pull-request-available label) > > Jordan > > > On Fri, Jan 24, 2020 at 9:30 AM Joshua McKenzie <jmcken...@apache.org> > wrote: > > > > > > > I also don't think it leads to the right behaviour or incentives. > > > > The gap between when a test is authored and the point at which it's > > determined to be flaky, as the difficulty with responsibility assignment > > (an "unrelated" change can in some cases make a previously stable test > > become flaky) makes this a real devil of a problem to fix. Hence it's > long > > and rich legacy. ;) > > > > While I agree with the general sentiment of "if we email the dev list > with > > a failure, or we git blame a test and poke the author to fix it they'll > do > > the right thing", we still end up in cases where people have rotated off > > the project and nobody feels a sense of ownership over a test failure for > > something someone else wrote, or a circumstance in which another change > > broke something, etc. At least from where I sit, I can't see a solution > to > > this problem that doesn't involve some collective action for things not > > directly under one's purview. > > > > Also, fwiw in my experience, "soft" gatekeeping for things like this will > > just lead to the problem persisting into perpetuity. The problem strikes > me > > as too complex and temporally / unpredictably distributed to be solvable > by > > incentivizing the "right" behavior (proactive prevention of introduction > of > > things like this, hygiene and rigor on authorship, etc), but I'm sure > > there's ways of approaching this that I'm not thinking of. > > > > But maybe I'm making a mountain out of a molehill. @bes - if you think > that > > emailing the dev list when a failure is encountered on rotation would be > > sufficient to keep this problem under control with an obviously much > > lighter touch, I'm +1 for giving it a shot. > > > > On Fri, Jan 24, 2020 at 10:12 AM Benedict Elliott Smith < > > bened...@apache.org> > > wrote: > > > > > > due to oversight on a commit or a delta breaking some test the author > > > thinks is unrelated to their diff but turns out to be a second-order > > > consequence of their change that they didn't expect > > > > > > In my opinion/experience, this is all a direct consequence of lack of > > > trust in CI caused by flakiness. We have finite time to dedicate to > our > > > jobs, and figuring out whether or not a run is really clean for this > > patch > > > is genuinely costly when you cannot trust the result, Those costs > > > multiple rapidly across the contributor base. > > > > > > That does not conflict with what you are saying. I don't, however, > think > > > it is reasonable to place the burden on the person trying to commit at > > that > > > moment, whether or not by positive sentiment or "computer says no". I > > also > > > don't think it leads to the right behaviour or incentives. > > > > > > I further think there's been a degradation of community behaviour to > some > > > extent caused by the bifurcation of CI infrastructure and approach. > > > Ideally we would all use a common platform, and there would be regular > > > trunk runs to compare against, like-for-like. > > > > > > IMO, we should email dev@ if there are failing runs for trunk, and > there > > > should be a rotating role amongst the contributors to figure out who > > broke > > > it, and poke them to fix it (or to just fix it, if easy). > > > > > > > > > On 24/01/2020, 14:57, "Joshua McKenzie" <jmcken...@apache.org> wrote: > > > > > > > > > > > gating PRs on clean runs won’t achieve anything other than > dealing > > > with > > > > folks who straight up ignore the spirit of the policy and > knowingly > > > commit > > > > code with test breakage > > > > > > I think there's some nuance here. We have a lot of suites (novnode, > > > cdc, > > > etc etc) where failures show up because people didn't run those > tests > > > or > > > didn't think to check them when they did. Likewise, I'd posit we > > have a > > > non-trivial number of failures (>= 15%? making up a number here) > that > > > are > > > due to oversight on a commit or a delta breaking some test the > author > > > thinks is unrelated to their diff but turns out to be a > second-order > > > consequence of their change that they didn't expect. I'm certainly > > not > > > claiming we have bad actors here merging known test failures > because > > > they > > > don't care. > > > > > > This seems to me like an issue of collective ownership, and whether > > or > > > not > > > we're willing to take it as a project. If I have a patch, I run CI, > > > and a > > > test fails that's unrelated to my diff (or I think is, or conclude > > it's > > > unrelated after inspection, whatever), that's the crucial moment > > where > > > we > > > can either say "Welp, not my problem. Merge time.", or say "hey, we > > all > > > live in this neighborhood together and while this trash on the > ground > > > isn't > > > actually mine, it's my neighborhood so if I clean this up it'll > > > benefit me > > > and all the rest of us." > > > > > > Depending on how much time and energy fixing a flake like that may > > > take, > > > this may prove to be economically unsustainable for some/many > > > participants > > > on the project. A lot of us are paid to work on C* by organizations > > > with > > > specific priorities for the project that are not directly related > to > > > "has > > > green test board". But I do feel comfortable making the case that > > > there's a > > > world in which "don't merge if any tests fail, clean up whatever > > > failures > > > you run into" *could* be a sustainable model assuming everyone in > the > > > ecosystem was willing and able to engage in that collectively > > > benefiting > > > behavior. > > > > > > Does the above make sense? > > > > > > On Fri, Jan 24, 2020 at 7:39 AM Aleksey Yeshchenko > > > <alek...@apple.com.invalid> wrote: > > > > > > > As for GH for code review, I find that it works very well for > nits. > > > It’s > > > > also great for doc changes, given how GH allows you suggest > changes > > > to > > > > files in-place and automatically create PRs for those changes. > That > > > lowers > > > > the barrier for those tiny contributions. > > > > > > > > For anything relatively substantial, I vastly prefer to summarise > > my > > > > feedback (and see others’ feedback summarised) in JIRA comments - > > an > > > > opinion I and other contributors have shared in one or two > similar > > > threads > > > > over the years. > > > > > > > > > > > > > On 24 Jan 2020, at 12:21, Aleksey Yeshchenko > > > <alek...@apple.com.INVALID> > > > > wrote: > > > > > > > > > > The person introducing flakiness to a test will almost always > > have > > > run > > > > it locally and on CI first with success. It’s usually later when > > > they first > > > > start failing, and it’s often tricky to attribute to a particular > > > > commit/person. > > > > > > > > > > So long as we have these - and we’ve had flaky tests for as > long > > > as C* > > > > has existed - the problem will persist, and gating PRs on clean > > runs > > > won’t > > > > achieve anything other than dealing with folks who straight up > > > ignore the > > > > spirit of the policy and knowingly commit code with test breakage > > > that can > > > > be attributed to their change. I’m not aware of such committers > in > > > this > > > > community, however. > > > > > > > > > >> On 24 Jan 2020, at 09:01, Benedict Elliott Smith < > > > bened...@apache.org> > > > > wrote: > > > > >> > > > > >>> I find it only useful for nits, or for coaching-level > comments > > > that I > > > > would never want propagated to Jira. > > > > >> > > > > >> Actually, I'll go one step further. GitHub encourages comments > > > that are > > > > too trivial, poisoning the well for third parties trying to find > > > useful > > > > information. If the comment wouldn't be made in Jira, it > probably > > > > shouldn't be made. > > > > >> > > > > >> > > > > >> > > > > >> On 24/01/2020, 08:56, "Benedict Elliott Smith" < > > > bened...@apache.org> > > > > wrote: > > > > >> > > > > >> The common factor is flaky tests, not people. You get a > clean > > > run, > > > > you commit. Turns out, a test was flaky. This reduces trust in > > CI, > > > so > > > > people commit without looking as closely at results. Gating on > > > clean tests > > > > doesn't help, as you run until you're clean. Rinse and repeat. > > > Breakages > > > > accumulate. > > > > >> > > > > >> This is what happens leading up to every release - nobody > > > commits > > > > knowing there's a breakage. We have a problem with bad tests, > not > > > bad > > > > people or process. > > > > >> > > > > >> FWIW, I no longer like the GitHub workflow. I find it only > > > useful > > > > for nits, or for coaching-level comments that I would never want > > > propagated > > > > to Jira. I find a strong patch submission of any size is better > > > managed > > > > with human-curated Jira comments, and provides a better permanent > > > record. > > > > When skimming a discussion, Jira is more informative than GitHub. > > > Even > > > > with the GitHub UX, the context hinders rather than helps. > > > > >> > > > > >> As to propagating to Jira: has anyone here ever read them? > I > > > haven't > > > > as they're impenetrable; ugly and almost entirely noise. If > > > anything, I > > > > would prefer that we discourage GitHub for review as a project, > not > > > move > > > > towards it. > > > > >> > > > > >> This is without getting into the problem of multiple branch > > PRs. > > > > Until this is _provably_ painless, we cannot introduce a workflow > > > that > > > > requires it and blocks commit on it. Working with multiple > > branches > > > is > > > > difficult enough already, surely? > > > > >> > > > > >> > > > > >> > > > > >> On 24/01/2020, 03:16, "Jeff Jirsa" <jji...@gmail.com> > wrote: > > > > >> > > > > >> 100% agree > > > > >> > > > > >> François and team wrote a doc on testing and gating > > commits > > > > >> Blake wrote a doc on testing and gating commits > > > > >> Every release there’s a thread on testing and gating > > commits > > > > >> > > > > >> People are the common factor every time. Nobody wants to > > > avoid > > > > merging their patch because someone broke a test elsewhere. > > > > >> > > > > >> We can’t block merging technically with the repo as it > > > exists now > > > > so it’s always going to come down to people and peer pressure, > > until > > > or > > > > unless someone starts reverting commits that break tests > > > > >> > > > > >> (Of course, someone could write a tool that > automatically > > > reverts > > > > new commits as long as tests fail....) > > > > >> > > > > >> On Jan 23, 2020, at 5:54 PM, Joshua McKenzie < > > > > jmcken...@apache.org> wrote: > > > > >>> > > > > >>> > > > > >>>> > > > > >>>> > > > > >>>> I am reacting to what I currently see > > > > >>>> happening in the project; tests fail as the norm and this is > > > kinda > > > > seen as > > > > >>>> expected, even though it goes against the policies as I > > > understand it. > > > > >>> > > > > >>> After over half a decade seeing us all continue to struggle > > with > > > this > > > > >>> problem, I've come around to the school of "apply pain" (I > mean > > > that as > > > > >>> light-hearted as you can take it) when there's a failure to > > > incent > > > > fixing; > > > > >>> specifically in this case, the only idea I can think of is > > > preventing > > > > merge > > > > >>> w/any failing tests on a PR. We go through this cycle as we > > > approach > > > > each > > > > >>> major release: we have the gatekeeper of "we're not going to > > cut > > > a > > > > release > > > > >>> with failing tests obviously", and we clean them up. After > the > > > > release, the > > > > >>> pressure is off, we exhale, relax, and flaky test failures > (and > > > others) > > > > >>> start to creep back in. > > > > >>> > > > > >>> If the status quo is the world we want to live in, that's > > > totally fine > > > > and > > > > >>> no judgement intended - we can build tooling around test > > failure > > > > history > > > > >>> and known flaky tests etc to optimize engineer workflows > around > > > that > > > > >>> expectation. But what I keep seeing on threads like this (and > > > have > > > > always > > > > >>> heard brought up in conversation) is that our collective > > *moral* > > > > stance is > > > > >>> that we should have green test boards and not merge code if > it > > > > introduces > > > > >>> failing tests. > > > > >>> > > > > >>> Not looking to prescribe or recommend anything, just hoping > > that > > > > >>> observation above might be of interest or value to the > > > conversation. > > > > >>> > > > > >>>> On Thu, Jan 23, 2020 at 4:17 PM Michael Shuler < > > > > mich...@pbandjelly.org> > > > > >>>> wrote: > > > > >>>> > > > > >>>>> On 1/23/20 3:53 PM, David Capwell wrote: > > > > >>>>> > > > > >>>>> 2) Nightly build email to dev@? > > > > >>>> > > > > >>>> Nope. builds@c.a.o is where these go. > > > > >>>> > > https://lists.apache.org/list.html?bui...@cassandra.apache.org > > > > >>>> > > > > >>>> Michael > > > > >>>> > > > > >>>> > > > --------------------------------------------------------------------- > > > > >>>> To unsubscribe, e-mail: > dev-unsubscr...@cassandra.apache.org > > > > >>>> For additional commands, e-mail: > > dev-h...@cassandra.apache.org > > > > >>>> > > > > >>>> > > > > >> > > > > >> > > > > > > > --------------------------------------------------------------------- > > > > >> To unsubscribe, e-mail: > > > dev-unsubscr...@cassandra.apache.org > > > > >> For additional commands, e-mail: > > > dev-h...@cassandra.apache.org > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > --------------------------------------------------------------------- > > > > >> To unsubscribe, e-mail: > dev-unsubscr...@cassandra.apache.org > > > > >> For additional commands, e-mail: > > dev-h...@cassandra.apache.org > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > --------------------------------------------------------------------- > > > > >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > > > >> For additional commands, e-mail: > dev-h...@cassandra.apache.org > > > > >> > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > > > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > > > > > >