I would make the case for interface stability not just api stability. Particularly given that we have significantly changed some of our interfaces, I want to ensure developers/users are not seeing red flags.
Bugs and code stability can be addressed in minor releases if found, but behavioral change and/or interface changes would be a much more invasive issue for our users. Regards Mridul On 18-May-2014 2:19 am, "Matei Zaharia" <matei.zaha...@gmail.com> wrote: > As others have said, the 1.0 milestone is about API stability, not about > saying “we’ve eliminated all bugs”. The sooner you declare 1.0, the sooner > users can confidently build on Spark, knowing that the application they > build today will still run on Spark 1.9.9 three years from now. This is > something that I’ve seen done badly (and experienced the effects thereof) > in other big data projects, such as MapReduce and even YARN. The result is > that you annoy users, you end up with a fragmented userbase where everyone > is building against a different version, and you drastically slow down > development. > > With a project as fast-growing as fast-growing as Spark in particular, > there will be new bugs discovered and reported continuously, especially in > the non-core components. Look at the graph of # of contributors in time to > Spark: https://www.ohloh.net/p/apache-spark (bottom-most graph; “commits” > changed when we started merging each patch as a single commit). This is not > slowing down, and we need to have the culture now that we treat API > stability and release numbers at the level expected for a 1.0 project > instead of having people come in and randomly change the API. > > I’ll also note that the issues marked “blocker” were marked so by their > reporters, since the reporter can set the priority. I don’t consider stuff > like parallelize() not partitioning ranges in the same way as other > collections a blocker — it’s a bug, it would be good to fix it, but it only > affects a small number of use cases. Of course if we find a real blocker > (in particular a regression from a previous version, or a feature that’s > just completely broken), we will delay the release for that, but at some > point you have to say “okay, this fix will go into the next maintenance > release”. Maybe we need to write a clear policy for what the issue > priorities mean. > > Finally, I believe it’s much better to have a culture where you can make > releases on a regular schedule, and have the option to make a maintenance > release in 3-4 days if you find new bugs, than one where you pile up stuff > into each release. This is what much large project than us, like Linux, do, > and it’s the only way to avoid indefinite stalling with a large contributor > base. In the worst case, if you find a new bug that warrants immediate > release, it goes into 1.0.1 a week after 1.0.0 (we can vote on 1.0.1 in > three days with just your bug fix in it). And if you find an API that you’d > like to improve, just add a new one and maybe deprecate the old one — at > some point we have to respect our users and let them know that code they > write today will still run tomorrow. > > Matei > > On May 17, 2014, at 10:32 AM, Kan Zhang <kzh...@apache.org> wrote: > > > +1 on the running commentary here, non-binding of course :-) > > > > > > On Sat, May 17, 2014 at 8:44 AM, Andrew Ash <and...@andrewash.com> > wrote: > > > >> +1 on the next release feeling more like a 0.10 than a 1.0 > >> On May 17, 2014 4:38 AM, "Mridul Muralidharan" <mri...@gmail.com> > wrote: > >> > >>> I had echoed similar sentiments a while back when there was a > discussion > >>> around 0.10 vs 1.0 ... I would have preferred 0.10 to stabilize the api > >>> changes, add missing functionality, go through a hardening release > before > >>> 1.0 > >>> > >>> But the community preferred a 1.0 :-) > >>> > >>> Regards, > >>> Mridul > >>> > >>> On 17-May-2014 3:19 pm, "Sean Owen" <so...@cloudera.com> wrote: > >>>> > >>>> On this note, non-binding commentary: > >>>> > >>>> Releases happen in local minima of change, usually created by > >>>> internally enforced code freeze. Spark is incredibly busy now due to > >>>> external factors -- recently a TLP, recently discovered by a large new > >>>> audience, ease of contribution enabled by Github. It's getting like > >>>> the first year of mainstream battle-testing in a month. It's been very > >>>> hard to freeze anything! I see a number of non-trivial issues being > >>>> reported, and I don't think it has been possible to triage all of > >>>> them, even. > >>>> > >>>> Given the high rate of change, my instinct would have been to release > >>>> 0.10.0 now. But won't it always be very busy? I do think the rate of > >>>> significant issues will slow down. > >>>> > >>>> Version ain't nothing but a number, but if it has any meaning it's the > >>>> semantic versioning meaning. 1.0 imposes extra handicaps around > >>>> striving to maintain backwards-compatibility. That may end up being > >>>> bent to fit in important changes that are going to be required in this > >>>> continuing period of change. Hadoop does this all the time > >>>> unfortunately and gets away with it, I suppose -- minor version > >>>> releases are really major. (On the other extreme, HBase is at 0.98 and > >>>> quite production-ready.) > >>>> > >>>> Just consider this a second vote for focus on fixes and 1.0.x rather > >>>> than new features and 1.x. I think there are a few steps that could > >>>> streamline triage of this flood of contributions, and make all of this > >>>> easier, but that's for another thread. > >>>> > >>>> > >>>> On Fri, May 16, 2014 at 8:50 PM, Mark Hamstra < > m...@clearstorydata.com > >>> > >>> wrote: > >>>>> +1, but just barely. We've got quite a number of outstanding bugs > >>>>> identified, and many of them have fixes in progress. I'd hate to see > >>> those > >>>>> efforts get lost in a post-1.0.0 flood of new features targeted at > >>> 1.1.0 -- > >>>>> in other words, I'd like to see 1.0.1 retain a high priority relative > >>> to > >>>>> 1.1.0. > >>>>> > >>>>> Looking through the unresolved JIRAs, it doesn't look like any of the > >>>>> identified bugs are show-stoppers or strictly regressions (although I > >>> will > >>>>> note that one that I have in progress, SPARK-1749, is a bug that we > >>>>> introduced with recent work -- it's not strictly a regression because > >>> we > >>>>> had equally bad but different behavior when the DAGScheduler > >> exceptions > >>>>> weren't previously being handled at all vs. being slightly > >> mis-handled > >>>>> now), so I'm not currently seeing a reason not to release. > >>> > >> > >