Hi all,

I've noticed the Spark tests getting increasingly flaky -- it seems more
common than not now that the tests need to be re-run at least once on PRs
before they pass.  This is both annoying and problematic because it makes
it harder to tell when a PR is introducing new flakiness.

To try to clean this up, I'd propose filing a JIRA *every time* Jenkins
fails on a PR (for a reason unrelated to the PR).  Just provide a quick
description of the failure -- e.g., "Flaky test: DagSchedulerSuite" or
"Tests failed because 250m timeout expired", a link to the failed build,
and include the "Tests" component.  If there's already a JIRA for the
issue, just comment with a link to the latest failure.  I know folks don't
always have time to track down why a test failed, but this it at least
helpful to someone else who, later on, is trying to diagnose when the issue
started to find the problematic code / test.

If this seems like too high overhead, feel free to suggest alternative ways
to make the tests less flaky!

-Kay

Reply via email to