Not a specific jira but was looking at all the recent jiras with the
"correctness" label and things are definitely being handled in consistently in
my opinion (https://issues.apache.org/jira/issues/?jql=labels+%3D+correctness).
The inconsistencies are in the things I've mentioned above. Priority is not
set high enough, description is not clear, some backported to 2.2, some not.
Obviously there could be ones without the "correctness" label as well since
until recently I was also not aware that this label should be applied for this
type of issues.
We have no real guidelines in this area for developers and committers to follow
so I think defining some would help everyone.
I realize everyone's time is important and everyone has different priorities
but I think this sort of issue would be one we as a community should take care
of above everything else. If I'm a business using Apache Spark for business
critical things and I find that there is data loss or corruption issues
consistently in the releases and its not our highest priority to fix, I'm going
to very hesitant to use and stay with Spark.
One specific example of priority is in the 2.4 code freeze/release thread where
it was brought up to release without SPARK-23243. And really we have done a
bunch of releases without this, but until recently it wasn't marked as a
blocker as well. I'll admit that I missed this jira when it was filed and only
recently became aware of it. I changed the priority on it.
| I share frustration that Somebody should be working on Important Things, but
don't think the difference between getting those done and not done is reminding
people that Important Things need doing. What's the cause that leads to
concrete corrective action?
I'm not really sure what you mean by this, this proposal is to introduce a
process for this type of issue so its at least brought to peoples attention. We
can't do anything to make people work on certain things. If they aren't raised
as important issues then its really easy to miss these things. If its a
blocker we should also not be doing any new releases without a fix for it which
may motivate people to look at it.
I agree it would be good for us to make it more official about which branches
are being maintained. I think at this point its still 2.1.x, 2.2.x, and 2.3.x
since we recently did releases of all of these. Since 2.4 will be coming out
we should definitely think about stop maintaining 2.1.x. Perhaps we need a
table on our release page about this. But this should be a separate thread.
Tom
On Monday, August 13, 2018, 9:03:42 AM CDT, Sean Owen <[email protected]>
wrote:
I doubt the question is whether people want to take such issues seriously --
all else equal, of course everyone does.
A JIRA label plus place in the release notes sounds like a good concrete step
that isn't happening consistently now. That's a clear flag that at least one
person believes issue X is a blocker.
Is this about specific JIRAs? I think it's more useful to illustrate in the
context of specific issues. For example I haven't been following JIRAs well,
and don't know what is being contested here.
I share frustration that Somebody should be working on Important Things, but
don't think the difference between getting those done and not done is reminding
people that Important Things need doing. What's the cause that leads to
concrete corrective action?
Do we need more committers? Fewer new features? More conservative releases?
Less work on X to work on this?
Lastly you raise an important question as an aside, one we haven't answered:
when does a branch go inactive? I am sure 2.0.x is inactive, de facto, along
with all 1.x. I think 2.1.x is inactive too. Should we put any rough guidance
in place? a branch is maintained for 12-18 months?
On Mon, Aug 13, 2018 at 8:45 AM Tom Graves <[email protected]> wrote:
Hello all,
I've noticed some inconsistencies in the way we are handling data
loss/correctness issues. I think we need to take these very seriously as they
could be causing businesses real money and impacting real decisions and
business logic. I would like to discuss how we can make sure these are
handled consistently and with urgency going forward.
A few things I would like to propose are below. Most of these are up to the
developers and committers to ensure happen so want to know what everyone thinks
and if people have other ideas?
- label any correctness/data loss jira with "correctness"- jira marked as
blocker by default if someone suspects a corruption/loss issue- Make sure
description is clear about when it occurs and impact to the user. - ensure
its back ported to all active branches- See if we can have a separate section
in the release notes for these
Thanks,Tom Graves