Which of the unresolved bugs in spark-core do you think will require an API-breaking change to fix? If there are none of those, then we are still essentially on track for a 1.0.0 release.
The number of contributions and pace of change now is quite high, but I don't think that waiting for the pace to slow before releasing 1.0 is viable. If Spark's short history is any guide to its near future, the pace will not slow by any significant amount for any noteworthy length of time, but rather will continue to increase. What we need to be aiming for, I think, is to have the great majority of those new contributions being made to MLLlib, GraphX, SparkSQL and other areas of the code that we have clearly marked as not frozen in 1.x. I think we are already seeing that, but if I am just not recognizing breakage of our semantic versioning guarantee that will be forced on us by some pending changes, now would be a good time to set me straight. On Sat, May 17, 2014 at 4:26 AM, Mridul Muralidharan <mri...@gmail.com>wrote: > I had echoed similar sentiments a while back when there was a discussion > around 0.10 vs 1.0 ... I would have preferred 0.10 to stabilize the api > changes, add missing functionality, go through a hardening release before > 1.0 > > But the community preferred a 1.0 :-) > > Regards, > Mridul > > On 17-May-2014 3:19 pm, "Sean Owen" <so...@cloudera.com> wrote: > > > > On this note, non-binding commentary: > > > > Releases happen in local minima of change, usually created by > > internally enforced code freeze. Spark is incredibly busy now due to > > external factors -- recently a TLP, recently discovered by a large new > > audience, ease of contribution enabled by Github. It's getting like > > the first year of mainstream battle-testing in a month. It's been very > > hard to freeze anything! I see a number of non-trivial issues being > > reported, and I don't think it has been possible to triage all of > > them, even. > > > > Given the high rate of change, my instinct would have been to release > > 0.10.0 now. But won't it always be very busy? I do think the rate of > > significant issues will slow down. > > > > Version ain't nothing but a number, but if it has any meaning it's the > > semantic versioning meaning. 1.0 imposes extra handicaps around > > striving to maintain backwards-compatibility. That may end up being > > bent to fit in important changes that are going to be required in this > > continuing period of change. Hadoop does this all the time > > unfortunately and gets away with it, I suppose -- minor version > > releases are really major. (On the other extreme, HBase is at 0.98 and > > quite production-ready.) > > > > Just consider this a second vote for focus on fixes and 1.0.x rather > > than new features and 1.x. I think there are a few steps that could > > streamline triage of this flood of contributions, and make all of this > > easier, but that's for another thread. > > > > > > On Fri, May 16, 2014 at 8:50 PM, Mark Hamstra <m...@clearstorydata.com> > wrote: > > > +1, but just barely. We've got quite a number of outstanding bugs > > > identified, and many of them have fixes in progress. I'd hate to see > those > > > efforts get lost in a post-1.0.0 flood of new features targeted at > 1.1.0 -- > > > in other words, I'd like to see 1.0.1 retain a high priority relative > to > > > 1.1.0. > > > > > > Looking through the unresolved JIRAs, it doesn't look like any of the > > > identified bugs are show-stoppers or strictly regressions (although I > will > > > note that one that I have in progress, SPARK-1749, is a bug that we > > > introduced with recent work -- it's not strictly a regression because > we > > > had equally bad but different behavior when the DAGScheduler exceptions > > > weren't previously being handled at all vs. being slightly mis-handled > > > now), so I'm not currently seeing a reason not to release. >