Re: Kafka trunk test & build stability

Nick Telford Tue, 02 Jan 2024 07:00:54 -0800

Addendum: I've opened a PR with what I believe are the changes necessary to
enable Remote Build Caching, if you choose to go that route:
https://github.com/apache/kafka/pull/15109


On Tue, 2 Jan 2024 at 14:31, Nick Telford <[email protected]> wrote:

> Hi everyone,
>
> Regarding building a "dependency graph"... Gradle already has this
> information, albeit fairly coarse-grained. You might be able to get some
> considerable improvement by configuring the Gradle Remote Build Cache. It
> looks like it's currently disabled explicitly:
> https://github.com/apache/kafka/blob/trunk/settings.gradle#L46
>
> The trick is to have trunk builds write to the cache, and PR builds only
> read from it. This way, any PR based on trunk should be able to cache not
> only the compilation, but also the tests from dependent modules that
> haven't changed (e.g. for a PR that only touches the connect/streams
> modules).
>
> This would probably be preferable to having to hand-maintain some
> rules/dependency graph in the CI configuration, and it's quite
> straight-forward to configure.
>
> Bonus points if the Remote Build Cache is readable publicly, enabling
> contributors to benefit from it locally.
>
> Regards,
> Nick
>
> On Tue, 2 Jan 2024 at 13:00, Lucas Brutschy <[email protected]>
> wrote:
>
>> Thanks for all the work that has already been done on this in the past
>> days!
>>
>> Have we considered running our test suite with
>> -XX:+HeapDumpOnOutOfMemoryError and uploading the heap dumps as
>> Jenkins build artifacts? This could speed up debugging. Even if we
>> store them only for a day and do it only for trunk, I think it could
>> be worth it. The heap dumps shouldn't contain any secrets, and I
>> checked with the ASF infra team, and they are not concerned about the
>> additional disk usage.
>>
>> Cheers,
>> Lucas
>>
>> On Wed, Dec 27, 2023 at 2:25 PM Divij Vaidya <[email protected]>
>> wrote:
>> >
>> > I have started to perform an analysis of the OOM at
>> > https://issues.apache.org/jira/browse/KAFKA-16052. Please feel free to
>> > contribute to the investigation.
>> >
>> > --
>> > Divij Vaidya
>> >
>> >
>> >
>> > On Wed, Dec 27, 2023 at 1:23 AM Justine Olshan
>> <[email protected]>
>> > wrote:
>> >
>> > > I am still seeing quite a few OOM errors in the builds and I was
>> curious if
>> > > folks had any ideas on how to identify the cause and fix the issue. I
>> was
>> > > looking in gradle enterprise and found some info about memory usage,
>> but
>> > > nothing detailed enough to help figure the issue out.
>> > >
>> > > OOMs sometimes fail the build immediately and in other cases I see it
>> get
>> > > stuck for 8 hours. (See
>> > >
>> > >
>> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka/detail/trunk/2508/pipeline/12
>> > > )
>> > >
>> > > I appreciate all the work folks are doing here and I will continue to
>> try
>> > > to help as best as I can.
>> > >
>> > > Justine
>> > >
>> > > On Tue, Dec 26, 2023 at 1:04 PM David Arthur
>> > > <[email protected]> wrote:
>> > >
>> > > > S2. We’ve looked into this before, and it wasn’t possible at the
>> time
>> > > with
>> > > > JUnit. We commonly set a timeout on each test class (especially
>> > > integration
>> > > > tests). It is probably worth looking at this again and seeing if
>> > > something
>> > > > has changed with JUnit (or our usage of it) that would allow a
>> global
>> > > > timeout.
>> > > >
>> > > >
>> > > > S3. Dedicated infra sounds nice, if we can get it. It would at least
>> > > remove
>> > > > some variability between the builds, and hopefully eliminate the
>> > > > infra/setup class of failures.
>> > > >
>> > > >
>> > > > S4. Running tests for what has changed sounds nice, but I think it
>> is
>> > > risky
>> > > > to implement broadly. As Sophie mentioned, there are probably some
>> lines
>> > > we
>> > > > could draw where we feel confident that only running a subset of
>> tests is
>> > > > safe. As a start, we could probably work towards skipping CI for
>> non-code
>> > > > PRs.
>> > > >
>> > > >
>> > > > ---
>> > > >
>> > > >
>> > > > As an aside, I experimented with build caching and running affected
>> > > tests a
>> > > > few months ago. I used the opportunity to play with Github Actions,
>> and I
>> > > > quite liked it. Here’s the workflow I used:
>> > > >
>> https://github.com/mumrah/kafka/blob/trunk/.github/workflows/push.yml. I
>> > > > was trying to see if we could use a build cache to reduce the
>> compilation
>> > > > time on PRs. A nightly/periodic job would build trunk and populate a
>> > > Gradle
>> > > > build cache. PR builds would read from that cache which would
>> enable them
>> > > > to only compile changed code. The same idea could be extended to
>> tests,
>> > > but
>> > > > I didn’t get that far.
>> > > >
>> > > >
>> > > > As for Github Actions, the idea there is that ASF would provide
>> generic
>> > > > Action “runners” that would pick up jobs from the Github Action
>> build
>> > > queue
>> > > > and run them. It is also possible to self-host runners to expand the
>> > > build
>> > > > capacity of the project (i.e., other organizations could donate
>> > > > build capacity). The advantage of this is that we would have more
>> control
>> > > > over our build/reports and not be “stuck” with whatever ASF Jenkins
>> > > offers.
>> > > > The Actions workflows are very customizable and it would let us
>> create
>> > > our
>> > > > own custom plugins. There is also a substantial marketplace of
>> plugins. I
>> > > > think it’s worth exploring this more, I just haven’t had time
>> lately.
>> > > >
>> > > > On Tue, Dec 26, 2023 at 3:24 PM Sophie Blee-Goldman <
>> > > [email protected]
>> > > > >
>> > > > wrote:
>> > > >
>> > > > > Regarding:
>> > > > >
>> > > > > S-4. Separate tests ran depending on what module is changed.
>> > > > > >
>> > > > > - This makes sense although is tricky to implement successfully,
>> as
>> > > > > > unrelated tests may expose problems in an unrelated change (e.g
>> > > > changing
>> > > > > > core stuff like clients, the server, etc)
>> > > > >
>> > > > >
>> > > > > Imo this avenue could provide a massive improvement to dev
>> productivity
>> > > > > with very little effort or investment, and if we do it right,
>> without
>> > > > even
>> > > > > any risk. We should be able to draft a simple dependency graph
>> between
>> > > > > modules and then skip the tests for anything that is clearly,
>> provably
>> > > > > unrelated and/or upstream of the target changes. This has the
>> potential
>> > > > to
>> > > > > substantially speed up and improve the developer experience in
>> modules
>> > > at
>> > > > > the end of the dependency graph, which I believe is worth doing
>> even if
>> > > > it
>> > > > > unfortunately would not benefit everyone equally.
>> > > > >
>> > > > > For example, we can save a lot of grief with just a simple set of
>> rules
>> > > > > that are easy to check. I'll throw out a few to start with:
>> > > > >
>> > > > >    1. A pure docs PR (ie that only touches files under the docs/
>> > > > directory)
>> > > > >    should be allowed to skip the tests of all modules
>> > > > >    2. Connect PRs (that only touch connect/) only need to run the
>> > > Connect
>> > > > >    tests -- ie they can skip the tests for core, clients,
>> streams, etc
>> > > > >    3. Similarly, Streams PRs should only need to run the Streams
>> tests
>> > > --
>> > > > >    but again, only if all the changes are contained within
>> streams/
>> > > > >
>> > > > > I'll let others chime in on how or if we can construct some safe
>> rules
>> > > as
>> > > > > to which modules can or can't be skipped between the core,
>> clients,
>> > > raft,
>> > > > > storage, etc
>> > > > >
>> > > > > And over time we could in theory build up a literal dependency
>> graph
>> > > on a
>> > > > > more granular level so that, for example, changes to the
>> core/storage
>> > > > > module are allowed to skip any Streams tests that don't use an
>> embedded
>> > > > > broker, ie all unit tests and TopologyTestDriver-based integration
>> > > tests.
>> > > > > The danger here would be in making sure this graph is kept up to
>> date
>> > > as
>> > > > > tests are added and changed, but my point is just that there's a
>> way to
>> > > > > extend the benefit of this tactic to those who work primarily on
>> the
>> > > core
>> > > > > module as well. Personally, I think we should just start out with
>> the
>> > > > > example ruleset listed above, workshop it a bit since there might
>> be
>> > > > other
>> > > > > obvious rules I left out, and try to implement it.
>> > > > >
>> > > > > Thoughts?
>> > > > >
>> > > > > On Tue, Dec 26, 2023 at 2:25 AM Stanislav Kozlovski
>> > > > > <[email protected]> wrote:
>> > > > >
>> > > > > > Great discussion!
>> > > > > >
>> > > > > >
>> > > > > > Greg, that was a good call out regarding the two long-running
>> > > builds. I
>> > > > > > missed that 90d view.
>> > > > > >
>> > > > > > My takeaway from that is that our average build time for tests
>> is
>> > > > between
>> > > > > > 3-4 hours. Which in of itself seems large.
>> > > > > >
>> > > > > > But then reconciling this with Sophie's statement - is it
>> possible
>> > > that
>> > > > > > these timed-out 8-hour builds don't get captured in that view?
>> > > > > >
>> > > > > > It is weird that people are reporting these things and Gradle
>> > > > Enterprise
>> > > > > > isn't showing them.
>> > > > > >
>> > > > > > ---
>> > > > > >
>> > > > > > > I think that these particularly nasty builds could be
>> explained by
>> > > > > > long-tail slowdowns causing arbitrary tests to take an
>> excessive time
>> > > > to
>> > > > > > execute.
>> > > > > >
>> > > > > > I'm not sure I understood that. If the tests have timeouts,
>> where
>> > > would
>> > > > > the
>> > > > > > slowdown come from? Problems in tearing down the test?
>> > > > > >
>> > > > > > ---
>> > > > > >
>> > > > > > David, thanks for the great work in identifying and even fixing
>> those
>> > > > two
>> > > > > > top offenders! And thank you for cherry-picking to 3.7
>> > > > > >
>> > > > > > --
>> > > > > >
>> > > > > > All in all, from this thread I can summarize a few potential
>> > > solutions:
>> > > > > >
>> > > > > > S-1. Dedicated work identifying and fixing some of the issues
>> (e.g.
>> > > > what
>> > > > > > David did).
>> > > > > > - Should help alleviate the issues as it can be speculated that
>> it's
>> > > > > > frequently 1 or 2 tests causing the majority of issues.
>> > > > > > - With regards to that, KAFKA-16045 seems open for taking if
>> there
>> > > are
>> > > > > any
>> > > > > > volunteers
>> > > > > > - Sophie's list also contains good candidates
>> > > > > >
>> > > > > > S-2. Global 10-minute timeout for tests.
>> > > > > > - Should lay the foundation for a strong catch-all for any
>> > > misbehaving
>> > > > > > tests. I like this idea since it's guaranteed to save each
>> > > contributor
>> > > > > many
>> > > > > > hours of waiting for an 8hr+ time out build.
>> > > > > > - Luke already has a PR out for this:
>> > > > > > https://github.com/apache/kafka/pull/15065
>> > > > > >
>> > > > > > S-3. Separate infrastructure for our CI
>> > > > > > - This would help with Greg's comment about the developer
>> machine
>> > > being
>> > > > > > 2-20 times faster than the CI.
>> > > > > > - Requires volunteer funding from external companies. If every
>> > > > > contributor
>> > > > > > would bring up the idea with their employer, we may be able to
>> stitch
>> > > > > > something together.
>> > > > > >
>> > > > > > S-4. Separate tests ran depending on what module is changed.
>> > > > > > - This makes sense although is tricky to implement
>> successfully, as
>> > > > > > unrelated tests may expose problems in an unrelated change (e.g
>> > > > changing
>> > > > > > core stuff like clients, the server, etc)
>> > > > > >
>> > > > > > S-5. Greater committer diligence when merging PRs
>> > > > > > - This should always be there. Unfortunately it is a bit of a
>> > > > > > self-perpetuating effect in that when the builds get worse,
>> people
>> > > are
>> > > > > > incentivized to be less diligent (slowed down while in a rush to
>> > > merge,
>> > > > > > recency bias of failed builds, etc.)
>> > > > > >
>> > > > > > On Fri, Dec 22, 2023 at 4:16 PM Justine Olshan
>> > > > > > <[email protected]>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > Thanks David! I think this should help a lot!
>> > > > > > >
>> > > > > > > While we should include these improvements, I think it is
>> also good
>> > > > to
>> > > > > > > remind folks that a lot of these issues come from merging on
>> builds
>> > > > > that
>> > > > > > > regress the CI.
>> > > > > > > I know I'm not perfect at this (and have merged on flaky and
>> > > failing
>> > > > > > > tests), but let's all be super careful going forward. There
>> were a
>> > > > few
>> > > > > > > times I retried the build 10+ times and thought it was other
>> issues
>> > > > > with
>> > > > > > > the CI but the failed builds were actually due to the changes
>> I
>> > > > > wrote/was
>> > > > > > > reviewing.
>> > > > > > >
>> > > > > > > We all need to work together on this to ensure the builds stay
>> > > > healthy!
>> > > > > > > Thanks all for being concerned about our builds!
>> > > > > > >
>> > > > > > > Justine
>> > > > > > >
>> > > > > > > On Fri, Dec 22, 2023 at 6:02 AM David Jacot <
>> [email protected]
>> > > >
>> > > > > > wrote:
>> > > > > > >
>> > > > > > > > I just merged both PRs.
>> > > > > > > >
>> > > > > > > > Cheers,
>> > > > > > > > David
>> > > > > > > >
>> > > > > > > > Le ven. 22 déc. 2023 à 14:38, David Jacot <
>> [email protected]
>> > > >
>> > > > a
>> > > > > > > écrit
>> > > > > > > > :
>> > > > > > > >
>> > > > > > > > > Hey folks,
>> > > > > > > > >
>> > > > > > > > > I believe that my two PRs will fix most of the issues. I
>> have
>> > > > also
>> > > > > > > > tweaked
>> > > > > > > > > the configuration of Jenkins to fix the issues relating to
>> > > > cloning
>> > > > > > the
>> > > > > > > > > repo. There may be other issues but the overall situation
>> > > should
>> > > > be
>> > > > > > > much
>> > > > > > > > > better when I merge those two.
>> > > > > > > > >
>> > > > > > > > > I will update this thread when I merge them.
>> > > > > > > > >
>> > > > > > > > > Cheers,
>> > > > > > > > > David
>> > > > > > > > >
>> > > > > > > > > Le ven. 22 déc. 2023 à 14:22, Divij Vaidya <
>> > > > > [email protected]>
>> > > > > > a
>> > > > > > > > > écrit :
>> > > > > > > > >
>> > > > > > > > >> Hey folks
>> > > > > > > > >>
>> > > > > > > > >> I think David (dajac) has some fixes lined-up to improve
>> CI
>> > > such
>> > > > > as
>> > > > > > > > >> https://github.com/apache/kafka/pull/15063 and
>> > > > > > > > >> https://github.com/apache/kafka/pull/15062.
>> > > > > > > > >>
>> > > > > > > > >> I have some bandwidth for the next two days to work on
>> fixing
>> > > > the
>> > > > > > CI.
>> > > > > > > > Let
>> > > > > > > > >> me start by taking a look at the list that Sophie shared
>> here.
>> > > > > > > > >>
>> > > > > > > > >> --
>> > > > > > > > >> Divij Vaidya
>> > > > > > > > >>
>> > > > > > > > >>
>> > > > > > > > >>
>> > > > > > > > >> On Fri, Dec 22, 2023 at 2:05 PM Luke Chen <
>> [email protected]>
>> > > > > > wrote:
>> > > > > > > > >>
>> > > > > > > > >> > Hi Sophie and Philip and all,
>> > > > > > > > >> >
>> > > > > > > > >> > I share the same pain as you.
>> > > > > > > > >> > I've been waiting for a CI build result in a PR for
>> days.
>> > > > > > > > >> Unfortunately, I
>> > > > > > > > >> > can only get 1 result each day because it takes 8
>> hours for
>> > > > each
>> > > > > > > run,
>> > > > > > > > >> and
>> > > > > > > > >> > with failed results. :(
>> > > > > > > > >> >
>> > > > > > > > >> > I've looked into the 8 hour timeout build issue and
>> would
>> > > like
>> > > > > to
>> > > > > > > > >> propose
>> > > > > > > > >> > to set a global test timeout as 10 mins using the
>> junit5
>> > > > feature
>> > > > > > > > >> > <
>> > > > > > > > >> >
>> > > > > > > > >>
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> https://junit.org/junit5/docs/current/user-guide/#writing-tests-declarative-timeouts-default-timeouts
>> > > > > > > > >> > >
>> > > > > > > > >> > .
>> > > > > > > > >> > This way, we can fail those long running tests quickly
>> > > without
>> > > > > > > > impacting
>> > > > > > > > >> > other tests.
>> > > > > > > > >> > PR: https://github.com/apache/kafka/pull/15065
>> > > > > > > > >> > I've tested in my local environment and it works as
>> > > expected.
>> > > > > > > > >> >
>> > > > > > > > >> > Any feedback is welcome.
>> > > > > > > > >> >
>> > > > > > > > >> > Thanks.
>> > > > > > > > >> > Luke
>> > > > > > > > >> >
>> > > > > > > > >> > On Fri, Dec 22, 2023 at 8:08 AM Philip Nee <
>> > > > [email protected]
>> > > > > >
>> > > > > > > > wrote:
>> > > > > > > > >> >
>> > > > > > > > >> > > Hey Sophie - I've gotten 2 inflight PRs each with
>> more
>> > > than
>> > > > 15
>> > > > > > > > >> retries...
>> > > > > > > > >> > > Namely: https://github.com/apache/kafka/pull/15023
>> and
>> > > > > > > > >> > > https://github.com/apache/kafka/pull/15035
>> > > > > > > > >> > >
>> > > > > > > > >> > > justin filed a flaky test report here though:
>> > > > > > > > >> > > https://issues.apache.org/jira/browse/KAFKA-16045
>> > > > > > > > >> > >
>> > > > > > > > >> > > P
>> > > > > > > > >> > >
>> > > > > > > > >> > > On Thu, Dec 21, 2023 at 3:18 PM Sophie Blee-Goldman <
>> > > > > > > > >> > [email protected]
>> > > > > > > > >> > > >
>> > > > > > > > >> > > wrote:
>> > > > > > > > >> > >
>> > > > > > > > >> > > > On a related note, has anyone else had trouble
>> getting
>> > > > even
>> > > > > a
>> > > > > > > > single
>> > > > > > > > >> > run
>> > > > > > > > >> > > > with no build failures lately? I've had multiple
>> > > pure-docs
>> > > > > PRs
>> > > > > > > > >> blocked
>> > > > > > > > >> > > for
>> > > > > > > > >> > > > days or even weeks because of miscellaneous infra,
>> test,
>> > > > and
>> > > > > > > > timeout
>> > > > > > > > >> > > > failures. I know we just had a discussion about
>> whether
>> > > > it's
>> > > > > > > > >> acceptable
>> > > > > > > > >> > > to
>> > > > > > > > >> > > > ever merge with a failing build, and the consensus
>> > > (which
>> > > > I
>> > > > > > > agree
>> > > > > > > > >> with)
>> > > > > > > > >> > > was
>> > > > > > > > >> > > > NO -- but seriously, this is getting ridiculous.
>> The
>> > > build
>> > > > > > might
>> > > > > > > > be
>> > > > > > > > >> the
>> > > > > > > > >> > > > worst I've ever seen it, and it just makes it
>> really
>> > > > > difficult
>> > > > > > > to
>> > > > > > > > >> > > maintain
>> > > > > > > > >> > > > good will with external contributors.
>> > > > > > > > >> > > >
>> > > > > > > > >> > > > Take for example this small docs PR:
>> > > > > > > > >> > > > https://github.com/apache/kafka/pull/14949
>> > > > > > > > >> > > >
>> > > > > > > > >> > > > It's on its 7th replay, with the first 6 runs all
>> having
>> > > > (at
>> > > > > > > > least)
>> > > > > > > > >> one
>> > > > > > > > >> > > > build that failed completely. The issues I saw on
>> this
>> > > one
>> > > > > PR
>> > > > > > > are
>> > > > > > > > a
>> > > > > > > > >> > good
>> > > > > > > > >> > > > summary of what I've been seeing elsewhere, so
>> here's
>> > > the
>> > > > > > > > briefing:
>> > > > > > > > >> > > >
>> > > > > > > > >> > > > 1. gradle issue:
>> > > > > > > > >> > > >
>> > > > > > > > >> > > > > * What went wrong:
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > > > Gradle could not start your build.
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > > > > Cannot create service of type
>> > > > BuildSessionActionExecutor
>> > > > > > > using
>> > > > > > > > >> > method
>> > > > > > > > >> > > > >
>> > > > > > > > >> > >
>> > > > > > > > >>
>> > > > > > >
>> > > > >
>> > >
>> LauncherServices$ToolingBuildSessionScopeServices.createActionExecutor()
>> > > > > > > > >> > > > as
>> > > > > > > > >> > > > > there is a problem with parameter #21 of type
>> > > > > > > > >> > > > FileSystemWatchingInformation.
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > > >    > Cannot create service of type
>> > > > > > > > >> > BuildLifecycleAwareVirtualFileSystem
>> > > > > > > > >> > > > > using method
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > >
>> > > > > > > > >> > >
>> > > > > > > > >> >
>> > > > > > > > >>
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> VirtualFileSystemServices$GradleUserHomeServices.createVirtualFileSystem()
>> > > > > > > > >> > > > > as there is a problem with parameter #7 of type
>> > > > > > > > >> GlobalCacheLocations.
>> > > > > > > > >> > > > >       > Cannot create service of type
>> > > > GlobalCacheLocations
>> > > > > > > using
>> > > > > > > > >> > method
>> > > > > > > > >> > > > >
>> > > GradleUserHomeScopeServices.createGlobalCacheLocations()
>> > > > > as
>> > > > > > > > there
>> > > > > > > > >> is
>> > > > > > > > >> > a
>> > > > > > > > >> > > > > problem with parameter #1 of type
>> List<GlobalCache>.
>> > > > > > > > >> > > > >          > Could not create service of type
>> > > > > > > > FileAccessTimeJournal
>> > > > > > > > >> > using
>> > > > > > > > >> > > > >
>> > > > GradleUserHomeScopeServices.createFileAccessTimeJournal().
>> > > > > > > > >> > > > >             > Timeout waiting to lock journal
>> cache
>> > > > > > > > >> > > > > (/home/jenkins/.gradle/caches/journal-1). It is
>> > > > currently
>> > > > > in
>> > > > > > > use
>> > > > > > > > >> by
>> > > > > > > > >> > > > another
>> > > > > > > > >> > > > > Gradle instance.
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > >
>> > > > > > > > >> > > > 2. git issue:
>> > > > > > > > >> > > >
>> > > > > > > > >> > > > > ERROR: Error cloning remote repo 'origin'
>> > > > > > > > >> > > > > hudson.plugins.git.GitException:
>> java.io.IOException:
>> > > > > Remote
>> > > > > > > > call
>> > > > > > > > >> on
>> > > > > > > > >> > > > > builds43 failed
>> > > > > > > > >> > > >
>> > > > > > > > >> > > >
>> > > > > > > > >> > > > 3. storage test calling System.exit (I think)
>> > > > > > > > >> > > >
>> > > > > > > > >> > > > > * What went wrong:
>> > > > > > > > >> > > > >  Execution failed for task ':storage:test'.
>> > > > > > > > >> > > > >  > Process 'Gradle Test Executor 73' finished
>> with
>> > > > > non-zero
>> > > > > > > exit
>> > > > > > > > >> > value
>> > > > > > > > >> > > 1
>> > > > > > > > >> > > >
>> > > > > > > > >> > > >     This problem might be caused by incorrect test
>> > > process
>> > > > > > > > >> > configuration.
>> > > > > > > > >> > > >
>> > > > > > > > >> > > >
>> > > > > > > > >> > > > 4.  3/4 builds aborted suddenly for no clear reason
>> > > > > > > > >> > > >
>> > > > > > > > >> > > > 5. 1 build was aborted, 1 build failed due to a
>> > > gradle(?)
>> > > > > > issue
>> > > > > > > > >> with a
>> > > > > > > > >> > > > storage test:
>> > > > > > > > >> > > >
>> > > > > > > > >> > > > Failed to map supported failure
>> > > > > > > > >> 'org.opentest4j.AssertionFailedError:
>> > > > > > > > >> > > > > Failed to observe commit callback before
>> timeout' with
>> > > > > > mapper
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > >
>> > > > > > > > >> > >
>> > > > > > > > >> >
>> > > > > > > > >>
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> 'org.gradle.api.internal.tasks.testing.failure.mappers.OpenTestAssertionFailedMapper@38bb78ea
>> > > > > > > > >> > > > ':
>> > > > > > > > >> > > > > null
>> > > > > > > > >> > > >
>> > > > > > > > >> > > >
>> > > > > > > > >> > > >
>> > > > > > > > >> > > > * What went wrong:
>> > > > > > > > >> > > > > Execution failed for task ':storage:test'.
>> > > > > > > > >> > > > > > Process 'Gradle Test Executor 73' finished with
>> > > > non-zero
>> > > > > > > exit
>> > > > > > > > >> > value 1
>> > > > > > > > >> > > > >   This problem might be caused by incorrect test
>> > > process
>> > > > > > > > >> > configuration.
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > >
>> > > > > > > > >> > > > 6.  Unknown issue with a core test:
>> > > > > > > > >> > > >
>> > > > > > > > >> > > > > Unexpected exception thrown.
>> > > > > > > > >> > > > >
>> > > org.gradle.internal.remote.internal.MessageIOException:
>> > > > > > Could
>> > > > > > > > not
>> > > > > > > > >> > read
>> > > > > > > > >> > > > > message from '/127.0.0.1:46952'.
>> > > > > > > > >> > > > >   at
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > >
>> > > > > > > > >> > >
>> > > > > > > > >> >
>> > > > > > > > >>
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> org.gradle.internal.remote.internal.inet.SocketConnection.receive(SocketConnection.java:94)
>> > > > > > > > >> > > > >   at
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > >
>> > > > > > > > >> > >
>> > > > > > > > >> >
>> > > > > > > > >>
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> org.gradle.internal.remote.internal.hub.MessageHub$ConnectionReceive.run(MessageHub.java:270)
>> > > > > > > > >> > > > >   at
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > >
>> > > > > > > > >> > >
>> > > > > > > > >> >
>> > > > > > > > >>
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64)
>> > > > > > > > >> > > > >   at
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > >
>> > > > > > > > >> > >
>> > > > > > > > >> >
>> > > > > > > > >>
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> org.gradle.internal.concurrent.AbstractManagedExecutor$1.run(AbstractManagedExecutor.java:47)
>> > > > > > > > >> > > > >   at
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > >
>> > > > > > > > >> > >
>> > > > > > > > >> >
>> > > > > > > > >>
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
>> > > > > > > > >> > > > >   at
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > >
>> > > > > > > > >> > >
>> > > > > > > > >> >
>> > > > > > > > >>
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
>> > > > > > > > >> > > > >   at
>> java.base/java.lang.Thread.run(Thread.java:1583)
>> > > > > > > > >> > > > > Caused by: java.lang.IllegalArgumentException
>> > > > > > > > >> > > > >   at
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > >
>> > > > > > > > >> > >
>> > > > > > > > >> >
>> > > > > > > > >>
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> org.gradle.internal.remote.internal.hub.InterHubMessageSerializer$MessageReader.read(InterHubMessageSerializer.java:72)
>> > > > > > > > >> > > > >   at
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > >
>> > > > > > > > >> > >
>> > > > > > > > >> >
>> > > > > > > > >>
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> org.gradle.internal.remote.internal.hub.InterHubMessageSerializer$MessageReader.read(InterHubMessageSerializer.java:52)
>> > > > > > > > >> > > > >   at
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > >
>> > > > > > > > >> > >
>> > > > > > > > >> >
>> > > > > > > > >>
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> org.gradle.internal.remote.internal.inet.SocketConnection.receive(SocketConnection.java:81)
>> > > > > > > > >> > > > > ... 6 more
>> > > > > > > > >> > > > >
>> org.gradle.internal.remote.internal.ConnectException:
>> > > > > Could
>> > > > > > > not
>> > > > > > > > >> > connect
>> > > > > > > > >> > > > to
>> > > > > > > > >> > > > > server [1d62bf97-6a3e-441d-93b6-093617cbbea9
>> > > port:41289,
>> > > > > > > > >> addresses:[/
>> > > > > > > > >> > > > > 127.0.0.1]]. Tried addresses: [/127.0.0.1].
>> > > > > > > > >> > > > >   at
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > >
>> > > > > > > > >> > >
>> > > > > > > > >> >
>> > > > > > > > >>
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.connect(TcpOutgoingConnector.java:67)
>> > > > > > > > >> > > > >   at
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > >
>> > > > > > > > >> > >
>> > > > > > > > >> >
>> > > > > > > > >>
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> org.gradle.internal.remote.internal.hub.MessageHubBackedClient.getConnection(MessageHubBackedClient.java:36)
>> > > > > > > > >> > > > >   at
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > >
>> > > > > > > > >> > >
>> > > > > > > > >> >
>> > > > > > > > >>
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:103)
>> > > > > > > > >> > > > >   at
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > >
>> > > > > > > > >> > >
>> > > > > > > > >> >
>> > > > > > > > >>
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:65)
>> > > > > > > > >> > > > >   at
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > >
>> > > > > > > > >> > >
>> > > > > > > > >> >
>> > > > > > > > >>
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> worker.org.gradle.process.internal.worker.GradleWorkerMain.run(GradleWorkerMain.java:69)
>> > > > > > > > >> > > > >   at
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > >
>> > > > > > > > >> > >
>> > > > > > > > >> >
>> > > > > > > > >>
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> worker.org.gradle.process.internal.worker.GradleWorkerMain.main(GradleWorkerMain.java:74)
>> > > > > > > > >> > > > > Caused by: java.net.ConnectException: Connection
>> > > refused
>> > > > > > > > >> > > > >   at java.base/sun.nio.ch.Net.pollConnect(Native
>> > > > Method)
>> > > > > > > > >> > > > >   at java.base/sun.nio.ch.Net
>> > > > > .pollConnectNow(Net.java:682)
>> > > > > > > > >> > > > >   at
>> > > > > > > > >> > > > > java.base/sun.nio.ch
>> > > > > > > > >> > > >
>> > > > > > >
>> .SocketChannelImpl.finishTimedConnect(SocketChannelImpl.java:1191)
>> > > > > > > > >> > > > >   at
>> > > > > > > > >> > > > > java.base/sun.nio.ch
>> > > > > > > > >> > > >
>> > > > > > .SocketChannelImpl.blockingConnect(SocketChannelImpl.java:1233)
>> > > > > > > > >> > > > >   at java.base/sun.nio.ch
>> > > > > > > > >> > > .SocketAdaptor.connect(SocketAdaptor.java:102)
>> > > > > > > > >> > > > >   at
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > >
>> > > > > > > > >> > >
>> > > > > > > > >> >
>> > > > > > > > >>
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.tryConnect(TcpOutgoingConnector.java:81)
>> > > > > > > > >> > > > >   at
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > >
>> > > > > > > > >> > >
>> > > > > > > > >> >
>> > > > > > > > >>
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.connect(TcpOutgoingConnector.java:54)
>> > > > > > > > >> > > > > ... 5 more
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > >
>> > > > > > > > >> > > >
>> > > > > > > > >> > > >
>> > > > > > > > >> > > > >  * What went wrong:
>> > > > > > > > >> > > >
>> > > > > > > > >> > > > Execution failed for task ':core:test'.
>> > > > > > > > >> > > >
>> > > > > > > > >> > > > > Process 'Gradle Test Executor 104' finished with
>> > > > non-zero
>> > > > > > exit
>> > > > > > > > >> value
>> > > > > > > > >> > 1
>> > > > > > > > >> > > >
>> > > > > > > > >> > > >   This problem might be caused by incorrect test
>> process
>> > > > > > > > >> configuration.
>> > > > > > > > >> > > >
>> > > > > > > > >> > > >
>> > > > > > > > >> > > > I've seen almost all of the above issues multiple
>> times,
>> > > > so
>> > > > > it
>> > > > > > > > might
>> > > > > > > > >> > be a
>> > > > > > > > >> > > > good list to start with to focus any efforts on
>> > > improving
>> > > > > the
>> > > > > > > > build.
>> > > > > > > > >> > That
>> > > > > > > > >> > > > said, I'm not sure what we can really do about
>> most of
>> > > > > these,
>> > > > > > > and
>> > > > > > > > >> not
>> > > > > > > > >> > > sure
>> > > > > > > > >> > > > how to narrow down the root cause in the more
>> mysterious
>> > > > > cases
>> > > > > > > of
>> > > > > > > > >> > aborted
>> > > > > > > > >> > > > builds and the builds that end with "finished with
>> > > > non-zero
>> > > > > > exit
>> > > > > > > > >> value
>> > > > > > > > >> > 1
>> > > > > > > > >> > > "
>> > > > > > > > >> > > > with no additional context (that I could find)
>> > > > > > > > >> > > >
>> > > > > > > > >> > > > If nothing else, there seems to be something
>> happening
>> > > in
>> > > > > one
>> > > > > > > (or
>> > > > > > > > >> more)
>> > > > > > > > >> > > of
>> > > > > > > > >> > > > the storage tests, because by far the most common
>> > > failure
>> > > > > I've
>> > > > > > > > seen
>> > > > > > > > >> is
>> > > > > > > > >> > > that
>> > > > > > > > >> > > > in 3 & 5. Unfortunately it's not really clear to
>> me how
>> > > to
>> > > > > > tell
>> > > > > > > > >> which
>> > > > > > > > >> > is
>> > > > > > > > >> > > > the offending test, so I'm not even sure what to
>> file a
>> > > > > ticket
>> > > > > > > for
>> > > > > > > > >> > > >
>> > > > > > > > >> > > > On Tue, Dec 19, 2023 at 11:55 PM David Jacot
>> > > > > > > > >> > <[email protected]
>> > > > > > > > >> > > >
>> > > > > > > > >> > > > wrote:
>> > > > > > > > >> > > >
>> > > > > > > > >> > > > > The slowness of the CI is definitely causing us
>> a lot
>> > > of
>> > > > > > > pain. I
>> > > > > > > > >> > wonder
>> > > > > > > > >> > > > if
>> > > > > > > > >> > > > > we should move to a dedicated CI infrastructure
>> for
>> > > > Kafka.
>> > > > > > Our
>> > > > > > > > >> > > > integration
>> > > > > > > > >> > > > > tests are quite heavy and ASF's CI is not really
>> tuned
>> > > > for
>> > > > > > > them.
>> > > > > > > > >> We
>> > > > > > > > >> > > could
>> > > > > > > > >> > > > > tune it for our needs and this would also allow
>> > > external
>> > > > > > > > >> companies to
>> > > > > > > > >> > > > > sponsor more workers. I heard that we have a few
>> cloud
>> > > > > > > providers
>> > > > > > > > >> in
>> > > > > > > > >> > > > > the community ;). I think that we should consider
>> > > this.
>> > > > > What
>> > > > > > > do
>> > > > > > > > >> you
>> > > > > > > > >> > > > think?
>> > > > > > > > >> > > > > I already discussed this with the INFRA team. I
>> could
>> > > > > > continue
>> > > > > > > > if
>> > > > > > > > >> we
>> > > > > > > > >> > > > > believe that it is a way forward.
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > > > Best,
>> > > > > > > > >> > > > > David
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > > > On Wed, Dec 20, 2023 at 12:17 AM Stanislav
>> Kozlovski
>> > > > > > > > >> > > > > <[email protected]> wrote:
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > > > > Hey Николай,
>> > > > > > > > >> > > > > >
>> > > > > > > > >> > > > > > Apologies about this - I wasn't aware of this
>> > > > behavior.
>> > > > > I
>> > > > > > > have
>> > > > > > > > >> made
>> > > > > > > > >> > > all
>> > > > > > > > >> > > > > the
>> > > > > > > > >> > > > > > gists public.
>> > > > > > > > >> > > > > >
>> > > > > > > > >> > > > > >
>> > > > > > > > >> > > > > >
>> > > > > > > > >> > > > > > On Wed, Dec 20, 2023 at 12:09 AM Greg Harris
>> > > > > > > > >> > > > > <[email protected]
>> > > > > > > > >> > > > > > >
>> > > > > > > > >> > > > > > wrote:
>> > > > > > > > >> > > > > >
>> > > > > > > > >> > > > > > > Hey Stan,
>> > > > > > > > >> > > > > > >
>> > > > > > > > >> > > > > > > Thanks for opening the discussion. I haven't
>> been
>> > > > > > looking
>> > > > > > > at
>> > > > > > > > >> > > overall
>> > > > > > > > >> > > > > > > build duration recently, so it's good that
>> you are
>> > > > > > calling
>> > > > > > > > it
>> > > > > > > > >> > out.
>> > > > > > > > >> > > > > > >
>> > > > > > > > >> > > > > > > I worry about us over-indexing on this one
>> build,
>> > > > > which
>> > > > > > > > itself
>> > > > > > > > >> > > > appears
>> > > > > > > > >> > > > > > > to be an outlier. I only see one other build
>> [1]
>> > > > above
>> > > > > > 6h
>> > > > > > > > >> overall
>> > > > > > > > >> > > in
>> > > > > > > > >> > > > > > > the last 90 days in this view: [2]
>> > > > > > > > >> > > > > > > And I don't see any overlap of failed tests
>> in
>> > > these
>> > > > > two
>> > > > > > > > >> builds,
>> > > > > > > > >> > > > which
>> > > > > > > > >> > > > > > > makes it less likely that these particular
>> failed
>> > > > > tests
>> > > > > > > are
>> > > > > > > > >> the
>> > > > > > > > >> > > > causes
>> > > > > > > > >> > > > > > > of long build times.
>> > > > > > > > >> > > > > > >
>> > > > > > > > >> > > > > > > Separately, I've been investigating build
>> > > > environment
>> > > > > > > > >> slowness,
>> > > > > > > > >> > and
>> > > > > > > > >> > > > > > > trying to connect it with test failures [3].
>> I
>> > > > > observed
>> > > > > > > that
>> > > > > > > > >> the
>> > > > > > > > >> > CI
>> > > > > > > > >> > > > > > > build environment is 2-20 times slower than
>> my
>> > > > > developer
>> > > > > > > > >> machine
>> > > > > > > > >> > > (M1
>> > > > > > > > >> > > > > > > mac).
>> > > > > > > > >> > > > > > > When I simulate a similar slowdown locally,
>> there
>> > > > are
>> > > > > > > tests
>> > > > > > > > >> which
>> > > > > > > > >> > > > > > > become significantly more flakey, often due
>> to
>> > > > > > hard-coded
>> > > > > > > > >> > timeouts.
>> > > > > > > > >> > > > > > > I think that these particularly nasty builds
>> could
>> > > > be
>> > > > > > > > >> explained
>> > > > > > > > >> > by
>> > > > > > > > >> > > > > > > long-tail slowdowns causing arbitrary tests
>> to
>> > > take
>> > > > an
>> > > > > > > > >> excessive
>> > > > > > > > >> > > time
>> > > > > > > > >> > > > > > > to execute.
>> > > > > > > > >> > > > > > >
>> > > > > > > > >> > > > > > > Rather than trying to find signals in these
>> rare
>> > > > test
>> > > > > > > > >> failures, I
>> > > > > > > > >> > > > > > > think we should find tests that have these
>> sorts
>> > > of
>> > > > > > > failures
>> > > > > > > > >> more
>> > > > > > > > >> > > > > > > regularly.
>> > > > > > > > >> > > > > > > There are lots of builds in the 5-6h duration
>> > > > bracket,
>> > > > > > > which
>> > > > > > > > >> is
>> > > > > > > > >> > > > > > > certainly unacceptably long. We should look
>> into
>> > > > these
>> > > > > > > > builds
>> > > > > > > > >> to
>> > > > > > > > >> > > find
>> > > > > > > > >> > > > > > > improvements and optimizations.
>> > > > > > > > >> > > > > > >
>> > > > > > > > >> > > > > > > [1] https://ge.apache.org/s/ygh4gbz4uma6i/
>> > > > > > > > >> > > > > > > [2]
>> > > > > > > > >> > > > > > >
>> > > > > > > > >> > > > > >
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > >
>> > > > > > > > >> > >
>> > > > > > > > >> >
>> > > > > > > > >>
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> https://ge.apache.org/scans?list.sortColumn=buildDuration&search.relativeStartTime=P90D&search.rootProjectNames=kafka&search.tags=trunk&search.timeZoneId=America%2FNew_York
>> > > > > > > > >> > > > > > > [3]
>> https://github.com/apache/kafka/pull/15008
>> > > > > > > > >> > > > > > >
>> > > > > > > > >> > > > > > > Thanks for looking into this!
>> > > > > > > > >> > > > > > > Greg
>> > > > > > > > >> > > > > > >
>> > > > > > > > >> > > > > > > On Tue, Dec 19, 2023 at 3:45 PM Николай
>> Ижиков <
>> > > > > > > > >> > > [email protected]>
>> > > > > > > > >> > > > > > > wrote:
>> > > > > > > > >> > > > > > > >
>> > > > > > > > >> > > > > > > > Hello, Stanislav.
>> > > > > > > > >> > > > > > > >
>> > > > > > > > >> > > > > > > > Can you, please, make the gist public.
>> > > > > > > > >> > > > > > > > Private gists not available for some GitHub
>> > > users
>> > > > > even
>> > > > > > > if
>> > > > > > > > >> link
>> > > > > > > > >> > > are
>> > > > > > > > >> > > > > > known.
>> > > > > > > > >> > > > > > > >
>> > > > > > > > >> > > > > > > > > 19 дек. 2023 г., в 17:33, Stanislav
>> Kozlovski
>> > > <
>> > > > > > > > >> > > > > > [email protected]>
>> > > > > > > > >> > > > > > > написал(а):
>> > > > > > > > >> > > > > > > > >
>> > > > > > > > >> > > > > > > > > Hey everybody,
>> > > > > > > > >> > > > > > > > > I've heard various complaints that build
>> times
>> > > > in
>> > > > > > > trunk
>> > > > > > > > >> are
>> > > > > > > > >> > > > taking
>> > > > > > > > >> > > > > > too
>> > > > > > > > >> > > > > > > > > long, some taking as much as 8 hours (the
>> > > > > timeout) -
>> > > > > > > and
>> > > > > > > > >> this
>> > > > > > > > >> > > is
>> > > > > > > > >> > > > > > > slowing us
>> > > > > > > > >> > > > > > > > > down from being able to meet the code
>> freeze
>> > > > > > deadline
>> > > > > > > > for
>> > > > > > > > >> > 3.7.
>> > > > > > > > >> > > > > > > > >
>> > > > > > > > >> > > > > > > > > I took it upon myself to gather up some
>> data
>> > > in
>> > > > > > Gradle
>> > > > > > > > >> > > Enterprise
>> > > > > > > > >> > > > > to
>> > > > > > > > >> > > > > > > see if
>> > > > > > > > >> > > > > > > > > there are any outlier tests that are
>> causing
>> > > > this
>> > > > > > > > >> slowness.
>> > > > > > > > >> > > Turns
>> > > > > > > > >> > > > > out
>> > > > > > > > >> > > > > > > there
>> > > > > > > > >> > > > > > > > > are a few, in this particular build -
>> > > > > > > > >> > > > > > > https://ge.apache.org/s/un2hv7n6j374k/
>> > > > > > > > >> > > > > > > > > - which took 10 hours and 29 minutes in
>> total.
>> > > > > > > > >> > > > > > > > >
>> > > > > > > > >> > > > > > > > > I have compiled the tests that took a
>> > > > > > > disproportionately
>> > > > > > > > >> > large
>> > > > > > > > >> > > > > amount
>> > > > > > > > >> > > > > > > of
>> > > > > > > > >> > > > > > > > > time (20m+), alongside their time, error
>> > > message
>> > > > > > and a
>> > > > > > > > >> link
>> > > > > > > > >> > to
>> > > > > > > > >> > > > > their
>> > > > > > > > >> > > > > > > full
>> > > > > > > > >> > > > > > > > > log output here -
>> > > > > > > > >> > > > > > > > >
>> > > > > > > > >> > > > > > >
>> > > > > > > > >> > > > > >
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > >
>> > > > > > > > >> > >
>> > > > > > > > >> >
>> > > > > > > > >>
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> https://gist.github.com/stanislavkozlovski/8959f7ee59434f774841f4ae2f5228c2
>> > > > > > > > >> > > > > > > > >
>> > > > > > > > >> > > > > > > > > It includes failures from core, streams,
>> > > storage
>> > > > > and
>> > > > > > > > >> clients.
>> > > > > > > > >> > > > > > > > > Interestingly, some other tests that
>> don't
>> > > fail
>> > > > > also
>> > > > > > > > take
>> > > > > > > > >> a
>> > > > > > > > >> > > long
>> > > > > > > > >> > > > > time
>> > > > > > > > >> > > > > > > in
>> > > > > > > > >> > > > > > > > > what is apparently the test harness
>> framework.
>> > > > See
>> > > > > > the
>> > > > > > > > >> gist
>> > > > > > > > >> > for
>> > > > > > > > >> > > > > more
>> > > > > > > > >> > > > > > > > > information.
>> > > > > > > > >> > > > > > > > >
>> > > > > > > > >> > > > > > > > > I am starting this thread with the
>> intention
>> > > of
>> > > > > > > getting
>> > > > > > > > >> the
>> > > > > > > > >> > > > > > discussion
>> > > > > > > > >> > > > > > > > > started and brainstorming what we can do
>> to
>> > > get
>> > > > > the
>> > > > > > > > build
>> > > > > > > > >> > times
>> > > > > > > > >> > > > > back
>> > > > > > > > >> > > > > > > under
>> > > > > > > > >> > > > > > > > > control.
>> > > > > > > > >> > > > > > > > >
>> > > > > > > > >> > > > > > > > >
>> > > > > > > > >> > > > > > > > > --
>> > > > > > > > >> > > > > > > > > Best,
>> > > > > > > > >> > > > > > > > > Stanislav
>> > > > > > > > >> > > > > > > >
>> > > > > > > > >> > > > > > >
>> > > > > > > > >> > > > > >
>> > > > > > > > >> > > > > >
>> > > > > > > > >> > > > > > --
>> > > > > > > > >> > > > > > Best,
>> > > > > > > > >> > > > > > Stanislav
>> > > > > > > > >> > > > > >
>> > > > > > > > >> > > > >
>> > > > > > > > >> > > >
>> > > > > > > > >> > >
>> > > > > > > > >> >
>> > > > > > > > >>
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > > >
>> > > > > > --
>> > > > > > Best,
>> > > > > > Stanislav
>> > > > > >
>> > > > >
>> > > >
>> > > >
>> > > > --
>> > > > -David
>> > > >
>> > >
>>
>>

Re: Kafka trunk test & build stability

Reply via email to