Would I need a vote on installing this plugin, or can I just open a ticket to infra?
On Wed, Jun 6, 2018, 16:18 Robert Bradshaw <[email protected]> wrote: > Even if it's not perfect, seems like it'd surely be a net win (and > probably a large one). Also, the build cache should look back at more than > just the single previous build, so if any previous jobs (up to the cache > size limit) built/tested artifacts unchanged by the current PR, the results > would live in the cache. > > I would look at (a) and (b) only if this isn't already good enough. > > On Wed, Jun 6, 2018 at 3:50 PM Udi Meiri <[email protected]> wrote: > >> To follow up on the Jenkins Job Cacher Plugin: >> >> Using a Jenkins plugin to save and reuse the Gradle cache for successive >> precommit jobs. >> The problem with this approach is that the precommit runs that a Jenkins >> server runs are unrelated. >> Say you have 2 PRs, A and B, and the precommit job for B reuses the cache >> left by the job for A. >> The diff between the two will cause tests affected both by A and B to be >> rerun (at least). >> If A modifies Python code, then the job for B must rerun ALL Python tests >> (since Gradle doesn't do dependency tracking for Python). >> >> Proposal: >> a. The cache plugin is still useful for successive Java precommit jobs, >> but not for Python. (Go, I have no idea) >> We could use it exclusively for Java precommits. >> b. To avoid running precommit jobs for code not touched by a PR, look at >> the paths of files changed. >> For example, a PR touching only files under sdks/python/... need only run >> Python precommit tests. >> >> On Tue, Jun 5, 2018 at 7:24 PM Udi Meiri <[email protected]> wrote: >> >>> I've been having a separate discussion on the proposal doc, which is >>> ready for another round of reviews. >>> Change summary: >>> - Changed fast requirement to be < 30 minutes and simplify the check as >>> an aggregate for each precommit job type. >>> - Updated slowness notification methods to include automated methods: as >>> a precommit check result type on GitHub, as a bug. >>> - Merged in the metrics design doc. >>> - Added detailed design section. >>> - Added list of deliverables. >>> >>> What I would like is consensus regarding: >>> - How fast we want precommit runs to be. I propose 30m. >>> - Deadline for fixing a slow test before it is temporarily removed from >>> precommit. I propose 24 hours. >>> >>> >>> Replying to the thread: >>> >>> 1. I like the idea of using the Jenkins Job Cacher Plugin to skip >>> unaffected tests (BEAM-4400). >>> >>> 2. Java Precommit tests include integration tests (example >>> <https://builds.apache.org/view/A-D/view/Beam/job/beam_PreCommit_Java_GradleBuild/lastCompletedBuild/testReport/org.apache.beam.examples/> >>> ). >>> We could split these out to get much faster results, i.e., a separate >>> precommit just for basic integration tests (which will still need to run in >>> <30m). >>> Perhaps lint checks for Python could be split out as well. >>> >>> I'll add these suggestions to the doc tomorrow. >>> >>> On Thu, May 24, 2018 at 9:25 AM Scott Wegner <[email protected]> wrote: >>> >>>> So, it sounds like there's agreement that we should improve precommit >>>> times by only running necessary tests, and configuring Jenkins Job >>>> Caching + Gradle build cache is a path to get there. I've filed BEAM-4400 >>>> [1] to follow-up on this. >>>> >>>> Getting back to Udi's original proposal [2]: I see value in defining a >>>> metric and target for overall pre-commit timing. The proposal for an >>>> initial "2 hour" target is helpful as a guardrail: we're already hitting >>>> it, but if we drift to a point where we're not, that should trigger some >>>> action to be taken to get back to a healthy state. >>>> >>>> I wouldn't mind separately setting a more aspiration goal of getting >>>> the pre-commits even faster (i.e. 15-30 mins), but I suspect that would >>>> require a concerted effort to evaluate and improve existing tests across >>>> the codebase. One idea would be to set up ensure the metric reporting can >>>> show the trend, and which tests are responsible for the most walltime, so >>>> that we know where to invest any efforts to improve tests. >>>> >>>> >>>> [1] https://issues.apache.org/jira/browse/BEAM-4400 >>>> [2] >>>> https://docs.google.com/document/d/1udtvggmS2LTMmdwjEtZCcUQy6aQAiYTI3OrTP8CLfJM/edit?usp=sharing >>>> >>>> >>>> On Wed, May 23, 2018 at 11:46 AM Kenneth Knowles <[email protected]> >>>> wrote: >>>> >>>>> With regard to the Job Cacher Plugin: I think it is an infra ticket to >>>>> install? And I guess we need it longer term when we move to containerized >>>>> builds anyhow? One thing I've experienced with the Travis-CI cache is that >>>>> the time spent uploading & downloading the remote cache - in that case of >>>>> all the pip installed dependencies - negated the benefits. Probably for >>>>> Beam it will have a greater benefit if we can skip most of the build. >>>>> >>>>> Regarding integration tests in precommit: I think it is OK to run >>>>> maybe one Dataflow job in precommit, but it should be in parallel with the >>>>> unit tests and just a smoke test that takes 5 minutes, not a suite that >>>>> takes 35 minutes. So IMO that is low-hanging fruit. If this would make >>>>> postcommit unstable, then it also means precommit is unstable. Both are >>>>> troublesome. >>>>> >>>>> More short term, some possible hacks: >>>>> >>>>> - Point gradle to cache outside the git workspace. We already did >>>>> this for .m2 and it helped a lot. >>>>> - Intersect touched files with projects. Our nonstandard project >>>>> names might be a pain here. Not sure if fixing that is on the roadmap. >>>>> >>>>> Kenn >>>>> >>>>> On Wed, May 23, 2018 at 9:31 AM Ismaël Mejía <[email protected]> >>>>> wrote: >>>>> >>>>>> I second Robert idea of ‘inteligently’ running only the affected >>>>>> tests, >>>>>> probably >>>>>> there is no need to run Java for a go fix (and eventually if any >>>>>> issue it >>>>>> can be >>>>>> catched in postcommit), same for a dev who just fixed something in >>>>>> KafkaIO >>>>>> and has >>>>>> to wait for other IO tests to pass. I suppose that languages, IOs and >>>>>> extensions >>>>>> are ‘easy’ to isolate so maybe we can start with those. >>>>>> >>>>>> Earlier signals are also definitely great to have too, but not sure >>>>>> how we >>>>>> can >>>>>> have those with the current infra. >>>>>> >>>>>> From a quicklook the biggest time is consumed by the examples module >>>>>> probably >>>>>> because they run in Dataflow with real IOs no?, that module alone >>>>>> takes ~35 >>>>>> minutes, so maybe moving it to postcommit will gain us some quick >>>>>> improvement. >>>>>> On the other hand we should probably not dismiss the consequences of >>>>>> moving >>>>>> more >>>>>> stuff to postcommit given that our current postcommit is not the most >>>>>> stable, or >>>>>> the quickest, only the Dataflow suite takes 1h30! >>>>>> >>>>>> >>>>>> On Tue, May 22, 2018 at 12:01 AM Mikhail Gryzykhin <[email protected] >>>>>> > >>>>>> wrote: >>>>>> >>>>>> > What we can do here is estimate how much effort we want to put in >>>>>> and set >>>>>> remote target. >>>>>> > Such as: >>>>>> > Third quarter 2018 -- 1hr SLO >>>>>> > Forth quarter 2018 -- 30min SLO, >>>>>> > etc. >>>>>> >>>>>> > Combined with policy for newly added tests, this can give us some >>>>>> goal to >>>>>> aim for. >>>>>> >>>>>> > --Mikhail >>>>>> >>>>>> > Have feedback? >>>>>> >>>>>> >>>>>> > On Mon, May 21, 2018 at 2:06 PM Scott Wegner <[email protected]> >>>>>> wrote: >>>>>> >>>>>> >> Thanks for the proposal, I left comments in the doc. Overall I >>>>>> think >>>>>> it's a great idea. >>>>>> >>>>>> >> I've seen other projects with much faster pre-commits, and it >>>>>> requires >>>>>> strict guidelines on unit test design and keeping tests isolated >>>>>> in-memory >>>>>> as much as possible. That's not currently the case in Java; we have >>>>>> pre-commits which submit pipelines to Dataflow service. >>>>>> >>>>>> >> I don't know if it's feasible to get Java down to 15-20 mins in the >>>>>> short term, but a good starting point would be to document the >>>>>> requirements >>>>>> for a test to run as pre-commit, and start enforcing it for new tests. >>>>>> >>>>>> >>>>>> >> On Fri, May 18, 2018 at 3:25 PM Henning Rohde <[email protected]> >>>>>> wrote: >>>>>> >>>>>> >>> Good proposal. I think it should be considered in tandem with the >>>>>> "No >>>>>> commit on red post-commit" proposal and could be far more ambitious >>>>>> than 2 >>>>>> hours. For example, something in the <15-20 mins range, say, would be >>>>>> much >>>>>> less of an inconvenience to the development effort. Go takes ~3 mins, >>>>>> which >>>>>> means that it is practical to wait until a PR is green before asking >>>>>> anyone >>>>>> to look at it. If I need to wait for a Java or Python pre-commit, I >>>>>> task >>>>>> switch and come back later. If the post-commits are enforced to be >>>>>> green, >>>>>> we could possibly gain a much more productive flow at the cost of the >>>>>> occasional post-commit break, compared to now. Maybe IOs can be less >>>>>> extensively tested pre-commit, for example, or only if actually >>>>>> changed? >>>>>> >>>>>> >>> I also like Robert's suggestion of spitting up pre-commits into >>>>>> something more fine-grained to get a clear partial signal quicker. If >>>>>> we >>>>>> have an adequate number of Jenkins slots, it might also speed things >>>>>> up >>>>>> overall. >>>>>> >>>>>> >>> Thanks, >>>>>> >>> Henning >>>>>> >>>>>> >>> On Fri, May 18, 2018 at 12:30 PM Scott Wegner <[email protected] >>>>>> > >>>>>> wrote: >>>>>> >>>>>> >>>> re: intelligently skipping tests for code that doesn't change >>>>>> (i.e. >>>>>> Java tests on Python PR): this should be possible. We already have >>>>>> build-caching enabled in Gradle, but I believe it is local to the git >>>>>> workspace and doesn't persist between Jenkins runs. >>>>>> >>>>>> >>>> With a quick search, I see there is a Jenkins Build Cacher >>>>>> Plugin [1] >>>>>> that hooks into Gradle build cache and does exactly what we need. Does >>>>>> anybody know whether we could get this enabled on our Jenkins? >>>>>> >>>>>> >>>> [1] https://wiki.jenkins.io/display/JENKINS/Job+Cacher+Plugin >>>>>> >>>>>> >>>> On Fri, May 18, 2018 at 12:08 PM Robert Bradshaw < >>>>>> [email protected]> >>>>>> wrote: >>>>>> >>>>>> >>>>> [somehow my email got garbled...] >>>>>> >>>>>> >>>>> Now that we're using gradle, perhaps we could be more >>>>>> intelligent >>>>>> about only running the affected tests? E.g. when you touch Python (or >>>>>> Go) >>>>>> you shouldn't need to run the Java precommit at all, which would >>>>>> reduce the >>>>>> latency for those PRs and also the time spent in queue. Presumably >>>>>> this >>>>>> could even be applied per-module for the Java tests. (Maybe a large, >>>>>> shared >>>>>> build cache could help here as well...) >>>>>> >>>>>> >>>>> I also wouldn't be opposed to a quicker immediate signal, plus >>>>>> more >>>>>> extensive tests before actually merging. It's also nice to not have >>>>>> to wait >>>>>> an hour to see that you have a lint error; quick stuff like that >>>>>> could be >>>>>> signaled quickly before a contributor looses context. >>>>>> >>>>>> >>>>> - Robert >>>>>> >>>>>> >>>>>> >>>>>> >>>>> On Fri, May 18, 2018 at 5:55 AM Kenneth Knowles <[email protected] >>>>>> > >>>>>> wrote: >>>>>> >>>>>> >>>>>> I like the idea. I think it is a good time for the project to >>>>>> start >>>>>> tracking this and keeping it usable. >>>>>> >>>>>> >>>>>> Certainly 2 hours is more than enough, is that not so? The Java >>>>>> precommit seems to take <=40 minutes while Python takes ~20 and Go is >>>>>> so >>>>>> fast it doesn't matter. Do we have enough stragglers that we don't >>>>>> make it >>>>>> in the 95th percentile? Is the time spent in the Jenkins queue? >>>>>> >>>>>> >>>>>> For our current coverage, I'd be willing to go for: >>>>>> >>>>>> >>>>>> - 1 hr hard cap (someone better at stats could choose %ile) >>>>>> >>>>>> - roll back or remove test from precommit if fix looks like >>>>>> more >>>>>> than 1 week (roll back if it is perf degradation, remove test from >>>>>> precommit if it is additional coverage that just doesn't fit in the >>>>>> time) >>>>>> >>>>>> >>>>>> There's a longer-term issue that doing a full build each time >>>>>> is >>>>>> expected to linearly scale up with the size of our repo (it is the >>>>>> monorepo >>>>>> problem but for a minirepo) so there is no cap that is feasible until >>>>>> we >>>>>> have effective cross-build caching. And my long-term goal would be <30 >>>>>> minutes. At the latency of opening a pull request and then checking >>>>>> your >>>>>> email that's not burdensome, but an hour is. >>>>>> >>>>>> >>>>>> Kenn >>>>>> >>>>>> >>>>>> On Thu, May 17, 2018 at 6:54 PM Udi Meiri <[email protected]> >>>>>> wrote: >>>>>> >>>>>> >>>>>>> HI, >>>>>> >>>>>>> I have a proposal to improve contributor experience by keeping >>>>>> precommit times low. >>>>>> >>>>>> >>>>>>> I'm looking to get community consensus and approval about: >>>>>> >>>>>>> 1. How long should precommits take. 2 hours @95th percentile >>>>>> over >>>>>> the past 4 weeks is the current proposal. >>>>>> >>>>>>> 2. The process for dealing with slowness. Do we: fix, roll >>>>>> back, >>>>>> remove a test from precommit? >>>>>> >>>>>>> Rolling back if a fix is estimated to take longer than 2 >>>>>> weeks is >>>>>> the current proposal. >>>>>> >>>>>> >>>>>> >>>>>> https://docs.google.com/document/d/1udtvggmS2LTMmdwjEtZCcUQy6aQAiYTI3OrTP8CLfJM/edit?usp=sharing >>>>>> >>>>>
smime.p7s
Description: S/MIME Cryptographic Signature
