Proposal submission: I think we should keep this as open as possible. If there is a problem with too many open proposals, then we should tackle that as a fix rather than excluding participation. Perhaps it will end up that way, but I think it's worth trying a more open model first.
Majority vs consensus: My rationale is that I don't think we want to consider a proposal approved if it had objections serious enough that committers down-voted (or PMC depending on who gets a vote). If these proposals are like PEPs, then they represent a significant amount of community effort and I wouldn't want to move forward if up to half of the community thinks it's an untenable idea. rb On Mon, Oct 10, 2016 at 12:07 PM, Cody Koeninger <c...@koeninger.org> wrote: > I think this is closer to a procedural issue than a code modification > issue, hence why majority. If everyone thinks consensus is better, I > don't care. Again, I don't feel strongly about the way we achieve > clarity, just that we achieve clarity. > > On Mon, Oct 10, 2016 at 2:02 PM, Ryan Blue <rb...@netflix.com> wrote: > > Sorry, I missed that the proposal includes majority approval. Why > majority > > instead of consensus? I think we want to build consensus around these > > proposals and it makes sense to discuss until no one would veto. > > > > rb > > > > On Mon, Oct 10, 2016 at 11:54 AM, Ryan Blue <rb...@netflix.com> wrote: > >> > >> +1 to votes to approve proposals. I agree that proposals should have an > >> official mechanism to be accepted, and a vote is an established means of > >> doing that well. I like that it includes a period to review the > proposal and > >> I think proposals should have been discussed enough ahead of a vote to > >> survive the possibility of a veto. > >> > >> I also like the names that are short and (mostly) unique, like SEP. > >> > >> Where I disagree is with the requirement that a committer must formally > >> propose an enhancement. I don't see the value of restricting this: if > >> someone has the will to write up a proposal then they should be > encouraged > >> to do so and start a discussion about it. Even if there is a political > >> reality as Cody says, what is the value of codifying that in our > process? I > >> think restricting who can submit proposals would only undermine them by > >> pushing contributors out. Maybe I'm missing something here? > >> > >> rb > >> > >> > >> > >> On Mon, Oct 10, 2016 at 7:41 AM, Cody Koeninger <c...@koeninger.org> > >> wrote: > >>> > >>> Yes, users suggesting SIPs is a good thing and is explicitly called > >>> out in the linked document under the Who? section. Formally proposing > >>> them, not so much, because of the political realities. > >>> > >>> Yes, implementation strategy definitely affects goals. There are all > >>> kinds of examples of this, I'll pick one that's my fault so as to > >>> avoid sounding like I'm blaming: > >>> > >>> When I implemented the Kafka DStream, one of my (not explicitly agreed > >>> upon by the community) goals was to make sure people could use the > >>> Dstream with however they were already using Kafka at work. The lack > >>> of explicit agreement on that goal led to all kinds of fighting with > >>> committers, that could have been avoided. The lack of explicit > >>> up-front strategy discussion led to the DStream not really working > >>> with compacted topics. I knew about compacted topics, but don't have > >>> a use for them, so had a blind spot there. If there was explicit > >>> up-front discussion that my strategy was "assume that batches can be > >>> defined on the driver solely by beginning and ending offsets", there's > >>> a greater chance that a user would have seen that and said, "hey, what > >>> about non-contiguous offsets in a compacted topic". > >>> > >>> This kind of thing is only going to happen smoothly if we have a > >>> lightweight user-visible process with clear outcomes. > >>> > >>> On Mon, Oct 10, 2016 at 1:34 AM, assaf.mendelson > >>> <assaf.mendel...@rsa.com> wrote: > >>> > I agree with most of what Cody said. > >>> > > >>> > Two things: > >>> > > >>> > First we can always have other people suggest SIPs but mark them as > >>> > “unreviewed” and have committers basically move them forward. The > >>> > problem is > >>> > that writing a good document takes time. This way we can leverage non > >>> > committers to do some of this work (it is just another way to > >>> > contribute). > >>> > > >>> > > >>> > > >>> > As for strategy, in many cases implementation strategy can affect the > >>> > goals. > >>> > I will give a small example: In the current structured streaming > >>> > strategy, > >>> > we group by the time to achieve a sliding window. This is definitely > an > >>> > implementation decision and not a goal. However, I can think of > several > >>> > aggregation functions which have the time inside their calculation > >>> > buffer. > >>> > For example, let’s say we want to return a set of all distinct > values. > >>> > One > >>> > way to implement this would be to make the set into a map and have > the > >>> > value > >>> > contain the last time seen. Multiplying it across the groupby would > >>> > cost a > >>> > lot in performance. So adding such a strategy would have a great > effect > >>> > on > >>> > the type of aggregations and their performance which does affect the > >>> > goal. > >>> > Without adding the strategy, it is easy for whoever goes to the > design > >>> > document to not think about these cases. Furthermore, it might be > >>> > decided > >>> > that these cases are rare enough so that the strategy is still good > >>> > enough > >>> > but how would we know it without user feedback? > >>> > > >>> > I believe this example is exactly what Cody was talking about. Since > >>> > many > >>> > times implementation strategies have a large effect on the goal, we > >>> > should > >>> > have it discussed when discussing the goals. In addition, while it is > >>> > often > >>> > easy to throw out completely infeasible goals, it is often much > harder > >>> > to > >>> > figure out that the goals are unfeasible without fine tuning. > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > Assaf. > >>> > > >>> > > >>> > > >>> > From: Cody Koeninger-2 [via Apache Spark Developers List] > >>> > [mailto:ml-node+[hidden email]] > >>> > Sent: Monday, October 10, 2016 2:25 AM > >>> > To: Mendelson, Assaf > >>> > Subject: Re: Spark Improvement Proposals > >>> > > >>> > > >>> > > >>> > Only committers should formally submit SIPs because in an apache > >>> > project only commiters have explicit political power. If a user > can't > >>> > find a commiter willing to sponsor an SIP idea, they have no way to > >>> > get the idea passed in any case. If I can't find a committer to > >>> > sponsor this meta-SIP idea, I'm out of luck. > >>> > > >>> > I do not believe unrealistic goals can be found solely by inspection. > >>> > We've managed to ignore unrealistic goals even after implementation! > >>> > Focusing on APIs can allow people to think they've solved something, > >>> > when there's really no way of implementing that API while meeting the > >>> > goals. Rapid iteration is clearly the best way to address this, but > >>> > we've already talked about why that hasn't really worked. If adding > a > >>> > non-binding API section to the template is important to you, I'm not > >>> > against it, but I don't think it's sufficient. > >>> > > >>> > On your PRD vs design doc spectrum, I'm saying this is closer to a > >>> > PRD. Clear agreement on goals is the most important thing and that's > >>> > why it's the thing I want binding agreement on. But I cannot agree > to > >>> > goals unless I have enough minimal technical info to judge whether > the > >>> > goals are likely to actually be accomplished. > >>> > > >>> > > >>> > > >>> > On Sun, Oct 9, 2016 at 5:35 PM, Matei Zaharia <[hidden email]> wrote: > >>> > > >>> > > >>> >> Well, I think there are a few things here that don't make sense. > >>> >> First, > >>> >> why > >>> >> should only committers submit SIPs? Development in the project > should > >>> >> be > >>> >> open to all contributors, whether they're committers or not. > Second, I > >>> >> think > >>> >> unrealistic goals can be found just by inspecting the goals, and I'm > >>> >> not > >>> >> super worried that we'll accept a lot of SIPs that are then > infeasible > >>> >> -- > >>> >> we > >>> >> can then submit new ones. But this depends on whether you want this > >>> >> process > >>> >> to be a "design doc lite", where people also agree on implementation > >>> >> strategy, or just a way to agree on goals. This is what I asked > >>> >> earlier > >>> >> about PRDs vs design docs (and I'm open to either one but I'd just > >>> >> like > >>> >> clarity). Finally, both as a user and designer of software, I always > >>> >> want > >>> >> to > >>> >> give feedback on APIs, so I'd really like a culture of having those > >>> >> early. > >>> >> People don't argue about prettiness when they discuss APIs, they > argue > >>> >> about > >>> >> the core concepts to expose in order to meet various goals, and then > >>> >> they're > >>> >> stuck maintaining those for a long time. > >>> >> > >>> >> Matei > >>> >> > >>> >> On Oct 9, 2016, at 3:10 PM, Cody Koeninger <[hidden email]> wrote: > >>> >> > >>> >> Users instead of people, sure. Commiters and contributors are (or > at > >>> >> least > >>> >> should be) a subset of users. > >>> >> > >>> >> Non goals, sure. I don't care what the name is, but we need to > clearly > >>> >> say > >>> >> e.g. 'no we are not maintaining compatibility with XYZ right now'. > >>> >> > >>> >> API, what I care most about is whether it allows me to accomplish > the > >>> >> goals. > >>> >> Arguing about how ugly or pretty it is can be saved for design/ > >>> >> implementation imho. > >>> >> > >>> >> Strategy, this is necessary because otherwise goals can be out of > line > >>> >> with > >>> >> reality. Don't propose goals you don't have at least some idea of > how > >>> >> to > >>> >> implement. > >>> >> > >>> >> Rejected strategies, given that commiters are the only ones I'm > saying > >>> >> should formally submit SPARKLIs or SIPs, if they put junk in a > >>> >> required > >>> >> section then slap them down for it and tell them to fix it. > >>> >> > >>> >> > >>> >> On Oct 9, 2016 4:36 PM, "Matei Zaharia" <[hidden email]> wrote: > >>> >>> > >>> >>> Yup, this is the stuff that I found unclear. Thanks for clarifying > >>> >>> here, > >>> >>> but we should also clarify it in the writeup. In particular: > >>> >>> > >>> >>> - Goals needs to be about user-facing behavior ("people" is broad) > >>> >>> > >>> >>> - I'd rename Rejected Goals to Non-Goals. Otherwise someone will > dig > >>> >>> up > >>> >>> one of these and say "Spark's developers have officially rejected > X, > >>> >>> which > >>> >>> our awesome system has". > >>> >>> > >>> >>> - For user-facing stuff, I think you need a section on API. > Virtually > >>> >>> all > >>> >>> other *IPs I've seen have that. > >>> >>> > >>> >>> - I'm still not sure why the strategy section is needed if the > >>> >>> purpose is > >>> >>> to define user-facing behavior -- unless this is the strategy for > >>> >>> setting > >>> >>> the goals or for defining the API. That sounds squarely like a > design > >>> >>> doc > >>> >>> issue. In some sense, who cares whether the proposal is technically > >>> >>> feasible > >>> >>> right now? If it's infeasible, that will be discovered later during > >>> >>> design > >>> >>> and implementation. Same thing with rejected strategies -- listing > >>> >>> some > >>> >>> of > >>> >>> those is definitely useful sometimes, but if you make this a > >>> >>> *required* > >>> >>> section, people are just going to fill it in with bogus stuff (I've > >>> >>> seen > >>> >>> this happen before). > >>> >>> > >>> >>> Matei > >>> >>> > >>> > > >>> >>> > On Oct 9, 2016, at 2:14 PM, Cody Koeninger <[hidden email]> > wrote: > >>> >>> > > >>> >>> > So to focus the discussion on the specific strategy I'm > suggesting, > >>> >>> > documented at > >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark- > improvement-proposals.md > >>> >>> > > >>> >>> > "Goals: What must this allow people to do, that they can't > >>> >>> > currently?" > >>> >>> > > >>> >>> > Is it unclear that this is focusing specifically on > people-visible > >>> >>> > behavior? > >>> >>> > > >>> >>> > Rejected goals - are important because otherwise people keep > >>> >>> > trying > >>> >>> > to argue about scope. Of course you can change things later > with a > >>> >>> > different SIP and different vote, the point is to focus. > >>> >>> > > >>> >>> > Use cases - are something that people are going to bring up in > >>> >>> > discussion. If they aren't clearly documented as a goal ("This > >>> >>> > must > >>> >>> > allow me to connect using SSL"), they should be added. > >>> >>> > > >>> >>> > Internal architecture - if the people who need specific behavior > >>> >>> > are > >>> >>> > implementers of other parts of the system, that's fine. > >>> >>> > > >>> >>> > Rejected strategies - If you have none of these, you have no > >>> >>> > evidence > >>> >>> > that the proponent didn't just go with the first thing they had > in > >>> >>> > mind (or have already implemented), which is a big problem > >>> >>> > currently. > >>> >>> > Approval isn't binding as to specifics of implementation, so > these > >>> >>> > aren't handcuffs. The goals are the contract, the strategy is > >>> >>> > evidence that contract can actually be met. > >>> >>> > > >>> >>> > Design docs - I'm not touching design docs. The markdown file I > >>> >>> > linked specifically says of the strategy section "This is not a > >>> >>> > full > >>> >>> > design document." Is this unclear? Design docs can be worked on > >>> >>> > obviously, but that's not what I'm concerned with here. > >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > On Sun, Oct 9, 2016 at 2:34 PM, Matei Zaharia <[hidden email]> > >>> >>> > wrote: > >>> >>> >> Hi Cody, > >>> >>> >> > >>> >>> >> I think this would be a lot more concrete if we had a more > >>> >>> >> detailed > >>> >>> >> template > >>> >>> >> for SIPs. Right now, it's not super clear what's in scope -- > e.g. > >>> >>> >> are > >>> >>> >> they > >>> >>> >> a way to solicit feedback on the user-facing behavior or on the > >>> >>> >> internals? > >>> >>> >> "Goals" can cover both things. I've been thinking of SIPs more > as > >>> >>> >> Product > >>> >>> >> Requirements Docs (PRDs), which focus on *what* a code change > >>> >>> >> should > >>> >>> >> do > >>> >>> >> as > >>> >>> >> opposed to how. > >>> >>> >> > >>> >>> >> In particular, here are some things that you may or may not > >>> >>> >> consider > >>> >>> >> in > >>> >>> >> scope for SIPs: > >>> >>> >> > >>> >>> >> - Goals and non-goals: This is definitely in scope, and IMO > should > >>> >>> >> focus on > >>> >>> >> user-visible behavior (e.g. "system supports SQL window > functions" > >>> >>> >> or > >>> >>> >> "system continues working if one node fails"). BTW I wouldn't > say > >>> >>> >> "rejected > >>> >>> >> goals" because some of them might become goals later, so we're > not > >>> >>> >> definitively rejecting them. > >>> >>> >> > >>> >>> >> - Public API: Probably should be included in most SIPs unless > it's > >>> >>> >> too > >>> >>> >> large > >>> >>> >> to fully specify then (e.g. "let's add an ML library"). > >>> >>> >> > >>> >>> >> - Use cases: I usually find this very useful in PRDs to better > >>> >>> >> communicate > >>> >>> >> the goals. > >>> >>> >> > >>> >>> >> - Internal architecture: This is usually *not* a thing users can > >>> >>> >> easily > >>> >>> >> comment on and it sounds more like a design doc item. Of course > >>> >>> >> it's > >>> >>> >> important to show that the SIP is feasible to implement. One > >>> >>> >> exception, > >>> >>> >> however, is that I think we'll have some SIPs primarily on > >>> >>> >> internals > >>> >>> >> (e.g. > >>> >>> >> if somebody wants to refactor Spark's query optimizer or > >>> >>> >> something). > >>> >>> >> > >>> >>> >> - Rejected strategies: I personally wouldn't put this, because > >>> >>> >> what's > >>> >>> >> the > >>> >>> >> point of voting to reject a strategy before you've really begun > >>> >>> >> designing > >>> >>> >> and implementing something? What if you discover that the > strategy > >>> >>> >> is > >>> >>> >> actually better when you start doing stuff? > >>> >>> >> > >>> >>> >> At a super high level, it depends on whether you want the SIPs > to > >>> >>> >> be > >>> >>> >> PRDs > >>> >>> >> for getting some quick feedback on the goals of a feature before > >>> >>> >> it is > >>> >>> >> designed, or something more like full-fledged design docs (just > a > >>> >>> >> more > >>> >>> >> visible design doc for bigger changes). I looked at Kafka's > KIPs, > >>> >>> >> and > >>> >>> >> they > >>> >>> >> actually seem to be more like design docs. This can work too but > >>> >>> >> it > >>> >>> >> does > >>> >>> >> require more work from the proposer and it can lead to the same > >>> >>> >> problems you > >>> >>> >> mentioned with people already having a design and implementation > >>> >>> >> in > >>> >>> >> mind. > >>> >>> >> > >>> >>> >> Basically, the question is, are you trying to iterate faster on > >>> >>> >> design > >>> >>> >> by > >>> >>> >> adding a step for user feedback earlier? Or are you just trying > to > >>> >>> >> make > >>> >>> >> design docs for key features more visible (and their approval > more > >>> >>> >> formal)? > >>> >>> >> > >>> >>> >> BTW note that in either case, I'd like to have a template for > >>> >>> >> design > >>> >>> >> docs > >>> >>> >> too, which should also include goals. I think that would've > >>> >>> >> avoided > >>> >>> >> some of > >>> >>> >> the issues you brought up. > >>> >>> >> > >>> >>> >> Matei > >>> >>> >> > >>> >>> >> On Oct 9, 2016, at 10:40 AM, Cody Koeninger <[hidden email]> > >>> >>> >> wrote: > >>> >>> >> > >>> >>> >> Here's my specific proposal (meta-proposal?) > >>> >>> >> > >>> >>> >> Spark Improvement Proposals (SIP) > >>> >>> >> > >>> >>> >> > >>> >>> >> Background: > >>> >>> >> > >>> >>> >> The current problem is that design and implementation of large > >>> >>> >> features > >>> >>> >> are > >>> >>> >> often done in private, before soliciting user feedback. > >>> >>> >> > >>> >>> >> When feedback is solicited, it is often as to detailed design > >>> >>> >> specifics, not > >>> >>> >> focused on goals. > >>> >>> >> > >>> >>> >> When implementation does take place after design, there is often > >>> >>> >> disagreement as to what goals are or are not in scope. > >>> >>> >> > >>> >>> >> This results in commits that don't fully meet user needs. > >>> >>> >> > >>> >>> >> > >>> >>> >> Goals: > >>> >>> >> > >>> >>> >> - Ensure user, contributor, and committer goals are clearly > >>> >>> >> identified > >>> >>> >> and > >>> >>> >> agreed upon, before implementation takes place. > >>> >>> >> > >>> >>> >> - Ensure that a technically feasible strategy is chosen that is > >>> >>> >> likely > >>> >>> >> to > >>> >>> >> meet the goals. > >>> >>> >> > >>> >>> >> > >>> >>> >> Rejected Goals: > >>> >>> >> > >>> >>> >> - SIPs are not for detailed design. Design by committee doesn't > >>> >>> >> work. > >>> >>> >> > >>> >>> >> - SIPs are not for every change. We dont need that much > process. > >>> >>> >> > >>> >>> >> > >>> >>> >> Strategy: > >>> >>> >> > >>> >>> >> My suggestion is outlined as a Spark Improvement Proposal > process > >>> >>> >> documented > >>> >>> >> at > >>> >>> >> > >>> >>> >> > >>> >>> >> > >>> >>> >> > >>> >>> >> https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark- > improvement-proposals.md > >>> >>> >> > >>> >>> >> Specifics of Jira manipulation are an implementation detail we > can > >>> >>> >> figure > >>> >>> >> out. > >>> >>> >> > >>> >>> >> I'm suggesting voting; the need here is for a _clear_ outcome. > >>> >>> >> > >>> >>> >> > >>> >>> >> Rejected Strategies: > >>> >>> >> > >>> >>> >> Having someone who understands the problem implement it first > >>> >>> >> works, > >>> >>> >> but > >>> >>> >> only if significant iteration after user feedback is allowed. > >>> >>> >> > >>> >>> >> Historically this has been problematic due to pressure to limit > >>> >>> >> public > >>> >>> >> api > >>> >>> >> changes. > >>> >>> >> > >>> >>> >> > >>> >>> >> On Fri, Oct 7, 2016 at 5:16 PM, Reynold Xin <[hidden email]> > >>> >>> >> wrote: > >>> >>> >>> > >>> >>> >>> Alright looks like there are quite a bit of support. We should > >>> >>> >>> wait > >>> >>> >>> to > >>> >>> >>> hear from more people too. > >>> >>> >>> > >>> >>> >>> To push this forward, Cody and I will be working together in > the > >>> >>> >>> next > >>> >>> >>> couple of weeks to come up with a concrete, detailed proposal > on > >>> >>> >>> what > >>> >>> >>> this > >>> >>> >>> entails, and then we can discuss this the specific proposal as > >>> >>> >>> well. > >>> >>> >>> > >>> >>> >>> > >>> >>> >>> On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <[hidden email]> > >>> >>> >>> wrote: > >>> >>> >>>> > >>> >>> >>>> Yeah, in case it wasn't clear, I was talking about SIPs for > >>> >>> >>>> major > >>> >>> >>>> user-facing or cross-cutting changes, not minor feature adds. > >>> >>> >>>> > >>> >>> >>>> On Fri, Oct 7, 2016 at 3:58 PM, Stavros Kontopoulos > >>> >>> >>>> <[hidden email]> wrote: > >>> >>> >>>>> > >>> >>> >>>>> +1 to the SIP label as long as it does not slow down things > and > >>> >>> >>>>> it > >>> >>> >>>>> targets optimizing efforts, coordination etc. For example > >>> >>> >>>>> really > >>> >>> >>>>> small > >>> >>> >>>>> features should not need to go through this process (assuming > >>> >>> >>>>> they > >>> >>> >>>>> dont > >>> >>> >>>>> touch public interfaces) or re-factorings and hope it will > be > >>> >>> >>>>> kept > >>> >>> >>>>> this > >>> >>> >>>>> way. So as a guideline doc should be provided, like in the > KIP > >>> >>> >>>>> case. > >>> >>> >>>>> > >>> >>> >>>>> IMHO so far aside from tagging things and linking them > >>> >>> >>>>> elsewhere > >>> >>> >>>>> simply > >>> >>> >>>>> having design docs and prototypes implementations in PRs is > not > >>> >>> >>>>> something > >>> >>> >>>>> that has not worked so far. What is really a pain in many > >>> >>> >>>>> projects > >>> >>> >>>>> out there > >>> >>> >>>>> is discontinuity in progress of PRs, missing features, slow > >>> >>> >>>>> reviews > >>> >>> >>>>> which is > >>> >>> >>>>> understandable to some extent... it is not only about Spark > but > >>> >>> >>>>> things can > >>> >>> >>>>> be improved for sure for this project in particular as > already > >>> >>> >>>>> stated. > >>> >>> >>>>> > >>> >>> >>>>> On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <[hidden > >>> >>> >>>>> email]> > >>> >>> >>>>> wrote: > >>> >>> >>>>>> > >>> >>> >>>>>> +1 to adding an SIP label and linking it from the website. > I > >>> >>> >>>>>> think > >>> >>> >>>>>> it > >>> >>> >>>>>> needs > >>> >>> >>>>>> > >>> >>> >>>>>> - template that focuses it towards soliciting user goals / > non > >>> >>> >>>>>> goals > >>> >>> >>>>>> - clear resolution as to which strategy was chosen to > pursue. > >>> >>> >>>>>> I'd > >>> >>> >>>>>> recommend a vote. > >>> >>> >>>>>> > >>> >>> >>>>>> Matei asked me to clarify what I meant by changing > interfaces, > >>> >>> >>>>>> I > >>> >>> >>>>>> think > >>> >>> >>>>>> it's directly relevant to the SIP idea so I'll clarify here, > >>> >>> >>>>>> and > >>> >>> >>>>>> split > >>> >>> >>>>>> a thread for the other discussion per Nicholas' request. > >>> >>> >>>>>> > >>> >>> >>>>>> I meant changing public user interfaces. I think the first > >>> >>> >>>>>> design > >>> >>> >>>>>> is > >>> >>> >>>>>> unlikely to be right, because it's done at a time when you > >>> >>> >>>>>> have > >>> >>> >>>>>> the > >>> >>> >>>>>> least information. As a user, I find it considerably more > >>> >>> >>>>>> frustrating > >>> >>> >>>>>> to be unable to use a tool to get my job done, than I do > >>> >>> >>>>>> having to > >>> >>> >>>>>> make minor changes to my code in order to take advantage of > >>> >>> >>>>>> features. > >>> >>> >>>>>> I've seen committers be seriously reluctant to allow changes > >>> >>> >>>>>> to > >>> >>> >>>>>> @experimental code that are needed in order for it to really > >>> >>> >>>>>> work > >>> >>> >>>>>> right. You need to be able to iterate, and if people on > both > >>> >>> >>>>>> sides > >>> >>> >>>>>> of > >>> >>> >>>>>> the fence aren't going to respect that some newer apis are > >>> >>> >>>>>> subject > >>> >>> >>>>>> to > >>> >>> >>>>>> change, then why even mark them as such? > >>> >>> >>>>>> > >>> >>> >>>>>> Ideally a finished SIP should give me a checklist of things > >>> >>> >>>>>> that > >>> >>> >>>>>> an > >>> >>> >>>>>> implementation must do, and things that it doesn't need to > do. > >>> >>> >>>>>> Contributors/committers should be seriously discouraged from > >>> >>> >>>>>> putting > >>> >>> >>>>>> out a version 0.1 that doesn't have at least a prototype > >>> >>> >>>>>> implementation of all those things, especially if they're > then > >>> >>> >>>>>> going > >>> >>> >>>>>> to argue against interface changes necessary to get the the > >>> >>> >>>>>> rest > >>> >>> >>>>>> of > >>> >>> >>>>>> the things done in the 0.2 version. > >>> >>> >>>>>> > >>> >>> >>>>>> > >>> >>> >>>>>> On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <[hidden email]> > >>> >>> >>>>>> wrote: > >>> >>> >>>>>>> I like the lightweight proposal to add a SIP label. > >>> >>> >>>>>>> > >>> >>> >>>>>>> During Spark 2.0 development, Tom (Graves) and I suggested > >>> >>> >>>>>>> using > >>> >>> >>>>>>> wiki > >>> >>> >>>>>>> to > >>> >>> >>>>>>> track the list of major changes, but that never really > >>> >>> >>>>>>> materialized > >>> >>> >>>>>>> due to > >>> >>> >>>>>>> the overhead. Adding a SIP label on major JIRAs and then > link > >>> >>> >>>>>>> to > >>> >>> >>>>>>> them > >>> >>> >>>>>>> prominently on the Spark website makes a lot of sense. > >>> >>> >>>>>>> > >>> >>> >>>>>>> > >>> >>> >>>>>>> On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia > >>> >>> >>>>>>> <[hidden email]> > >>> >>> >>>>>>> wrote: > >>> >>> >>>>>>>> > >>> >>> >>>>>>>> For the improvement proposals, I think one major point was > >>> >>> >>>>>>>> to > >>> >>> >>>>>>>> make > >>> >>> >>>>>>>> them > >>> >>> >>>>>>>> really visible to users who are not contributors, so we > >>> >>> >>>>>>>> should > >>> >>> >>>>>>>> do > >>> >>> >>>>>>>> more than > >>> >>> >>>>>>>> sending stuff to dev@. One very lightweight idea is to > have > >>> >>> >>>>>>>> a > >>> >>> >>>>>>>> new > >>> >>> >>>>>>>> type of > >>> >>> >>>>>>>> JIRA called a SIP and have a link to a filter that shows > all > >>> >>> >>>>>>>> such > >>> >>> >>>>>>>> JIRAs from > >>> >>> >>>>>>>> http://spark.apache.org. I also like the idea of SIP and > >>> >>> >>>>>>>> design > >>> >>> >>>>>>>> doc > >>> >>> >>>>>>>> templates (in fact many projects have them). > >>> >>> >>>>>>>> > >>> >>> >>>>>>>> Matei > >>> >>> >>>>>>>> > >>> >>> >>>>>>>> On Oct 7, 2016, at 10:38 AM, Reynold Xin <[hidden email]> > >>> >>> >>>>>>>> wrote: > >>> >>> >>>>>>>> > >>> >>> >>>>>>>> I called Cody last night and talked about some of the > topics > >>> >>> >>>>>>>> in > >>> >>> >>>>>>>> his > >>> >>> >>>>>>>> email. > >>> >>> >>>>>>>> It became clear to me Cody genuinely cares about the > >>> >>> >>>>>>>> project. > >>> >>> >>>>>>>> > >>> >>> >>>>>>>> Some of the frustrations come from the success of the > >>> >>> >>>>>>>> project > >>> >>> >>>>>>>> itself > >>> >>> >>>>>>>> becoming very "hot", and it is difficult to get clarity > from > >>> >>> >>>>>>>> people > >>> >>> >>>>>>>> who > >>> >>> >>>>>>>> don't dedicate all their time to Spark. In fact, it is in > >>> >>> >>>>>>>> some > >>> >>> >>>>>>>> ways > >>> >>> >>>>>>>> similar > >>> >>> >>>>>>>> to scaling an engineering team in a successful startup: > old > >>> >>> >>>>>>>> processes that > >>> >>> >>>>>>>> worked well might not work so well when it gets to a > certain > >>> >>> >>>>>>>> size, > >>> >>> >>>>>>>> cultures > >>> >>> >>>>>>>> can get diluted, building culture vs building process, > etc. > >>> >>> >>>>>>>> > >>> >>> >>>>>>>> I also really like to have a more visible process for > larger > >>> >>> >>>>>>>> changes, > >>> >>> >>>>>>>> especially major user facing API changes. Historically we > >>> >>> >>>>>>>> upload > >>> >>> >>>>>>>> design docs > >>> >>> >>>>>>>> for major changes, but it is not always consistent and > >>> >>> >>>>>>>> difficult > >>> >>> >>>>>>>> to > >>> >>> >>>>>>>> quality > >>> >>> >>>>>>>> of the docs, due to the volunteering nature of the > >>> >>> >>>>>>>> organization. > >>> >>> >>>>>>>> > >>> >>> >>>>>>>> Some of the more concrete ideas we discussed focus on > >>> >>> >>>>>>>> building a > >>> >>> >>>>>>>> culture > >>> >>> >>>>>>>> to improve clarity: > >>> >>> >>>>>>>> > >>> >>> >>>>>>>> - Process: Large changes should have design docs posted on > >>> >>> >>>>>>>> JIRA. > >>> >>> >>>>>>>> One > >>> >>> >>>>>>>> thing > >>> >>> >>>>>>>> Cody and I didn't discuss but an idea that just came to me > >>> >>> >>>>>>>> is we > >>> >>> >>>>>>>> should > >>> >>> >>>>>>>> create a design doc template for the project and ask > >>> >>> >>>>>>>> everybody > >>> >>> >>>>>>>> to > >>> >>> >>>>>>>> follow. > >>> >>> >>>>>>>> The design doc template should also explicitly list goals > >>> >>> >>>>>>>> and > >>> >>> >>>>>>>> non-goals, to > >>> >>> >>>>>>>> make design doc more consistent. > >>> >>> >>>>>>>> > >>> >>> >>>>>>>> - Process: Email dev@ to solicit feedback. We have some > this > >>> >>> >>>>>>>> with > >>> >>> >>>>>>>> some > >>> >>> >>>>>>>> changes, but again very inconsistent. Just posting > something > >>> >>> >>>>>>>> on > >>> >>> >>>>>>>> JIRA > >>> >>> >>>>>>>> isn't > >>> >>> >>>>>>>> sufficient, because there are simply too many JIRAs and > the > >>> >>> >>>>>>>> signal > >>> >>> >>>>>>>> get lost > >>> >>> >>>>>>>> in the noise. While this is generally impossible to > enforce > >>> >>> >>>>>>>> because > >>> >>> >>>>>>>> we can't > >>> >>> >>>>>>>> force all volunteers to conform to a process (or they > might > >>> >>> >>>>>>>> not > >>> >>> >>>>>>>> even > >>> >>> >>>>>>>> be > >>> >>> >>>>>>>> aware of this), those who are more familiar with the > >>> >>> >>>>>>>> project > >>> >>> >>>>>>>> can > >>> >>> >>>>>>>> help by > >>> >>> >>>>>>>> emailing the dev@ when they see something that hasn't > been. > >>> >>> >>>>>>>> > >>> >>> >>>>>>>> - Culture: The design doc author(s) should be open to > >>> >>> >>>>>>>> feedback. > >>> >>> >>>>>>>> A > >>> >>> >>>>>>>> design > >>> >>> >>>>>>>> doc should serve as the base for discussion and is by no > >>> >>> >>>>>>>> means > >>> >>> >>>>>>>> the > >>> >>> >>>>>>>> final > >>> >>> >>>>>>>> design. Of course, this does not mean the author has to > >>> >>> >>>>>>>> accept > >>> >>> >>>>>>>> every > >>> >>> >>>>>>>> feedback. They should also be comfortable accepting / > >>> >>> >>>>>>>> rejecting > >>> >>> >>>>>>>> ideas on > >>> >>> >>>>>>>> technical grounds. > >>> >>> >>>>>>>> > >>> >>> >>>>>>>> - Process / Culture: For major ongoing projects, it can be > >>> >>> >>>>>>>> useful > >>> >>> >>>>>>>> to > >>> >>> >>>>>>>> have > >>> >>> >>>>>>>> some monthly Google hangouts that are open to the world. I > >>> >>> >>>>>>>> am > >>> >>> >>>>>>>> actually not > >>> >>> >>>>>>>> sure how well this will work, because of the volunteering > >>> >>> >>>>>>>> nature > >>> >>> >>>>>>>> and > >>> >>> >>>>>>>> we need > >>> >>> >>>>>>>> to adjust for timezones for people across the globe, but > it > >>> >>> >>>>>>>> seems > >>> >>> >>>>>>>> worth > >>> >>> >>>>>>>> trying. > >>> >>> >>>>>>>> > >>> >>> >>>>>>>> - Culture: Contributors (including committers) should be > >>> >>> >>>>>>>> more > >>> >>> >>>>>>>> direct > >>> >>> >>>>>>>> in > >>> >>> >>>>>>>> setting expectations, including whether they are working > on > >>> >>> >>>>>>>> a > >>> >>> >>>>>>>> specific > >>> >>> >>>>>>>> issue, whether they will be working on a specific issue, > and > >>> >>> >>>>>>>> whether > >>> >>> >>>>>>>> an > >>> >>> >>>>>>>> issue or pr or jira should be rejected. Most people I know > >>> >>> >>>>>>>> in > >>> >>> >>>>>>>> this > >>> >>> >>>>>>>> community > >>> >>> >>>>>>>> are nice and don't enjoy telling other people no, but it > is > >>> >>> >>>>>>>> often > >>> >>> >>>>>>>> more > >>> >>> >>>>>>>> annoying to a contributor to not know anything than > getting > >>> >>> >>>>>>>> a > >>> >>> >>>>>>>> no. > >>> >>> >>>>>>>> > >>> >>> >>>>>>>> > >>> >>> >>>>>>>> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia > >>> >>> >>>>>>>> <[hidden email]> > >>> >>> >>>>>>>> wrote: > >>> >>> >>>>>>>>> > >>> >>> >>>>>>>>> > >>> >>> >>>>>>>>> Love the idea of a more visible "Spark Improvement > >>> >>> >>>>>>>>> Proposal" > >>> >>> >>>>>>>>> process that > >>> >>> >>>>>>>>> solicits user input on new APIs. For what it's worth, I > >>> >>> >>>>>>>>> don't > >>> >>> >>>>>>>>> think > >>> >>> >>>>>>>>> committers are trying to minimize their own work -- every > >>> >>> >>>>>>>>> committer > >>> >>> >>>>>>>>> cares > >>> >>> >>>>>>>>> about making the software useful for users. However, it > is > >>> >>> >>>>>>>>> always > >>> >>> >>>>>>>>> hard to > >>> >>> >>>>>>>>> get user input and so it helps to have this kind of > >>> >>> >>>>>>>>> process. > >>> >>> >>>>>>>>> I've > >>> >>> >>>>>>>>> certainly > >>> >>> >>>>>>>>> looked at the *IPs a lot in other software I use just to > >>> >>> >>>>>>>>> see > >>> >>> >>>>>>>>> the > >>> >>> >>>>>>>>> biggest > >>> >>> >>>>>>>>> things on the roadmap. > >>> >>> >>>>>>>>> > >>> >>> >>>>>>>>> When you're talking about "changing interfaces", are you > >>> >>> >>>>>>>>> talking > >>> >>> >>>>>>>>> about > >>> >>> >>>>>>>>> public or internal APIs? I do think many people hate > >>> >>> >>>>>>>>> changing > >>> >>> >>>>>>>>> public APIs > >>> >>> >>>>>>>>> and I actually think that's for the best of the project. > >>> >>> >>>>>>>>> That's > >>> >>> >>>>>>>>> a > >>> >>> >>>>>>>>> technical > >>> >>> >>>>>>>>> debate, but basically, the worst thing when you're using > a > >>> >>> >>>>>>>>> piece > >>> >>> >>>>>>>>> of > >>> >>> >>>>>>>>> software > >>> >>> >>>>>>>>> is that the developers constantly ask you to rewrite your > >>> >>> >>>>>>>>> app > >>> >>> >>>>>>>>> to > >>> >>> >>>>>>>>> update to a > >>> >>> >>>>>>>>> new version (and thus benefit from bug fixes, etc). Cue > >>> >>> >>>>>>>>> anyone > >>> >>> >>>>>>>>> who's used > >>> >>> >>>>>>>>> Protobuf, or Guava. The "let's get everyone to change > their > >>> >>> >>>>>>>>> code > >>> >>> >>>>>>>>> this > >>> >>> >>>>>>>>> release" model works well within a single large company, > >>> >>> >>>>>>>>> but > >>> >>> >>>>>>>>> doesn't work > >>> >>> >>>>>>>>> well for a community, which is why nearly all *very* > widely > >>> >>> >>>>>>>>> used > >>> >>> >>>>>>>>> programming > >>> >>> >>>>>>>>> interfaces (I'm talking things like Java standard > library, > >>> >>> >>>>>>>>> Windows > >>> >>> >>>>>>>>> API, etc) > >>> >>> >>>>>>>>> almost *never* break backwards compatibility. All this is > >>> >>> >>>>>>>>> done > >>> >>> >>>>>>>>> within reason > >>> >>> >>>>>>>>> though, e.g. we do change things in major releases (2.x, > >>> >>> >>>>>>>>> 3.x, > >>> >>> >>>>>>>>> etc). > >>> >>> >>>>>>>> > >>> >>> >>>>>>>> > >>> >>> >>>>>>>> > >>> >>> >>>>>>>> > >>> >>> >>>>>>> > >>> >>> >>>>>> > >>> >>> >>>>>> > >>> >>> >>>>>> > >>> >>> >>>>>> > >>> >>> >>>>>> ------------------------------ > --------------------------------------- > >>> >>> >>>>>> To unsubscribe e-mail: [hidden email] > >>> >>> >>>>>> > >>> >>> >>>>> > >>> >>> >>>>> > >>> >>> >>>>> > >>> >>> >>>>> -- > >>> >>> >>>>> Stavros Kontopoulos > >>> >>> >>>>> Senior Software Engineer > >>> >>> >>>>> Lightbend, Inc. > >>> >>> >>>>> p: +30 6977967274 > >>> >>> >>>>> e: [hidden email] > >>> >>> >>>>> > >>> >>> >>>>> > >>> >>> >>>> > >>> >>> >>> > >>> >>> >> > >>> >>> >> > >>> >>> > >>> >> > >>> > > >>> > > >>> > ------------------------------------------------------------ > --------- > >>> > To unsubscribe e-mail: [hidden email] > >>> > > >>> > > >>> > ________________________________ > >>> > > >>> > If you reply to this email, your message will be added to the > >>> > discussion > >>> > below: > >>> > > >>> > > >>> > http://apache-spark-developers-list.1001551.n3. > nabble.com/Spark-Improvement-Proposals-tp19268p19359.html > >>> > > >>> > To start a new topic under Apache Spark Developers List, email > [hidden > >>> > email] > >>> > To unsubscribe from Apache Spark Developers List, click here. > >>> > NAML > >>> > > >>> > > >>> > ________________________________ > >>> > View this message in context: RE: Spark Improvement Proposals > >>> > Sent from the Apache Spark Developers List mailing list archive at > >>> > Nabble.com. > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >>> > >> > >> > >> > >> -- > >> Ryan Blue > >> Software Engineer > >> Netflix > > > > > > > > > > -- > > Ryan Blue > > Software Engineer > > Netflix > -- Ryan Blue Software Engineer Netflix