That seems reasonable to me. I do not want to see lazy consensus used on one of these proposals though, I want a clear outcome, i.e. call for a vote, wait at least 72 hours, get three +1s and no vetos.
On Mon, Oct 10, 2016 at 2:15 PM, Ryan Blue <rb...@netflix.com> wrote: > Proposal submission: I think we should keep this as open as possible. If > there is a problem with too many open proposals, then we should tackle that > as a fix rather than excluding participation. Perhaps it will end up that > way, but I think it's worth trying a more open model first. > > Majority vs consensus: My rationale is that I don't think we want to > consider a proposal approved if it had objections serious enough that > committers down-voted (or PMC depending on who gets a vote). If these > proposals are like PEPs, then they represent a significant amount of > community effort and I wouldn't want to move forward if up to half of the > community thinks it's an untenable idea. > > rb > > On Mon, Oct 10, 2016 at 12:07 PM, Cody Koeninger <c...@koeninger.org> wrote: >> >> I think this is closer to a procedural issue than a code modification >> issue, hence why majority. If everyone thinks consensus is better, I >> don't care. Again, I don't feel strongly about the way we achieve >> clarity, just that we achieve clarity. >> >> On Mon, Oct 10, 2016 at 2:02 PM, Ryan Blue <rb...@netflix.com> wrote: >> > Sorry, I missed that the proposal includes majority approval. Why >> > majority >> > instead of consensus? I think we want to build consensus around these >> > proposals and it makes sense to discuss until no one would veto. >> > >> > rb >> > >> > On Mon, Oct 10, 2016 at 11:54 AM, Ryan Blue <rb...@netflix.com> wrote: >> >> >> >> +1 to votes to approve proposals. I agree that proposals should have an >> >> official mechanism to be accepted, and a vote is an established means >> >> of >> >> doing that well. I like that it includes a period to review the >> >> proposal and >> >> I think proposals should have been discussed enough ahead of a vote to >> >> survive the possibility of a veto. >> >> >> >> I also like the names that are short and (mostly) unique, like SEP. >> >> >> >> Where I disagree is with the requirement that a committer must formally >> >> propose an enhancement. I don't see the value of restricting this: if >> >> someone has the will to write up a proposal then they should be >> >> encouraged >> >> to do so and start a discussion about it. Even if there is a political >> >> reality as Cody says, what is the value of codifying that in our >> >> process? I >> >> think restricting who can submit proposals would only undermine them by >> >> pushing contributors out. Maybe I'm missing something here? >> >> >> >> rb >> >> >> >> >> >> >> >> On Mon, Oct 10, 2016 at 7:41 AM, Cody Koeninger <c...@koeninger.org> >> >> wrote: >> >>> >> >>> Yes, users suggesting SIPs is a good thing and is explicitly called >> >>> out in the linked document under the Who? section. Formally proposing >> >>> them, not so much, because of the political realities. >> >>> >> >>> Yes, implementation strategy definitely affects goals. There are all >> >>> kinds of examples of this, I'll pick one that's my fault so as to >> >>> avoid sounding like I'm blaming: >> >>> >> >>> When I implemented the Kafka DStream, one of my (not explicitly agreed >> >>> upon by the community) goals was to make sure people could use the >> >>> Dstream with however they were already using Kafka at work. The lack >> >>> of explicit agreement on that goal led to all kinds of fighting with >> >>> committers, that could have been avoided. The lack of explicit >> >>> up-front strategy discussion led to the DStream not really working >> >>> with compacted topics. I knew about compacted topics, but don't have >> >>> a use for them, so had a blind spot there. If there was explicit >> >>> up-front discussion that my strategy was "assume that batches can be >> >>> defined on the driver solely by beginning and ending offsets", there's >> >>> a greater chance that a user would have seen that and said, "hey, what >> >>> about non-contiguous offsets in a compacted topic". >> >>> >> >>> This kind of thing is only going to happen smoothly if we have a >> >>> lightweight user-visible process with clear outcomes. >> >>> >> >>> On Mon, Oct 10, 2016 at 1:34 AM, assaf.mendelson >> >>> <assaf.mendel...@rsa.com> wrote: >> >>> > I agree with most of what Cody said. >> >>> > >> >>> > Two things: >> >>> > >> >>> > First we can always have other people suggest SIPs but mark them as >> >>> > “unreviewed” and have committers basically move them forward. The >> >>> > problem is >> >>> > that writing a good document takes time. This way we can leverage >> >>> > non >> >>> > committers to do some of this work (it is just another way to >> >>> > contribute). >> >>> > >> >>> > >> >>> > >> >>> > As for strategy, in many cases implementation strategy can affect >> >>> > the >> >>> > goals. >> >>> > I will give a small example: In the current structured streaming >> >>> > strategy, >> >>> > we group by the time to achieve a sliding window. This is definitely >> >>> > an >> >>> > implementation decision and not a goal. However, I can think of >> >>> > several >> >>> > aggregation functions which have the time inside their calculation >> >>> > buffer. >> >>> > For example, let’s say we want to return a set of all distinct >> >>> > values. >> >>> > One >> >>> > way to implement this would be to make the set into a map and have >> >>> > the >> >>> > value >> >>> > contain the last time seen. Multiplying it across the groupby would >> >>> > cost a >> >>> > lot in performance. So adding such a strategy would have a great >> >>> > effect >> >>> > on >> >>> > the type of aggregations and their performance which does affect the >> >>> > goal. >> >>> > Without adding the strategy, it is easy for whoever goes to the >> >>> > design >> >>> > document to not think about these cases. Furthermore, it might be >> >>> > decided >> >>> > that these cases are rare enough so that the strategy is still good >> >>> > enough >> >>> > but how would we know it without user feedback? >> >>> > >> >>> > I believe this example is exactly what Cody was talking about. Since >> >>> > many >> >>> > times implementation strategies have a large effect on the goal, we >> >>> > should >> >>> > have it discussed when discussing the goals. In addition, while it >> >>> > is >> >>> > often >> >>> > easy to throw out completely infeasible goals, it is often much >> >>> > harder >> >>> > to >> >>> > figure out that the goals are unfeasible without fine tuning. >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > Assaf. >> >>> > >> >>> > >> >>> > >> >>> > From: Cody Koeninger-2 [via Apache Spark Developers List] >> >>> > [mailto:ml-node+[hidden email]] >> >>> > Sent: Monday, October 10, 2016 2:25 AM >> >>> > To: Mendelson, Assaf >> >>> > Subject: Re: Spark Improvement Proposals >> >>> > >> >>> > >> >>> > >> >>> > Only committers should formally submit SIPs because in an apache >> >>> > project only commiters have explicit political power. If a user >> >>> > can't >> >>> > find a commiter willing to sponsor an SIP idea, they have no way to >> >>> > get the idea passed in any case. If I can't find a committer to >> >>> > sponsor this meta-SIP idea, I'm out of luck. >> >>> > >> >>> > I do not believe unrealistic goals can be found solely by >> >>> > inspection. >> >>> > We've managed to ignore unrealistic goals even after implementation! >> >>> > Focusing on APIs can allow people to think they've solved something, >> >>> > when there's really no way of implementing that API while meeting >> >>> > the >> >>> > goals. Rapid iteration is clearly the best way to address this, but >> >>> > we've already talked about why that hasn't really worked. If adding >> >>> > a >> >>> > non-binding API section to the template is important to you, I'm not >> >>> > against it, but I don't think it's sufficient. >> >>> > >> >>> > On your PRD vs design doc spectrum, I'm saying this is closer to a >> >>> > PRD. Clear agreement on goals is the most important thing and >> >>> > that's >> >>> > why it's the thing I want binding agreement on. But I cannot agree >> >>> > to >> >>> > goals unless I have enough minimal technical info to judge whether >> >>> > the >> >>> > goals are likely to actually be accomplished. >> >>> > >> >>> > >> >>> > >> >>> > On Sun, Oct 9, 2016 at 5:35 PM, Matei Zaharia <[hidden email]> >> >>> > wrote: >> >>> > >> >>> > >> >>> >> Well, I think there are a few things here that don't make sense. >> >>> >> First, >> >>> >> why >> >>> >> should only committers submit SIPs? Development in the project >> >>> >> should >> >>> >> be >> >>> >> open to all contributors, whether they're committers or not. >> >>> >> Second, I >> >>> >> think >> >>> >> unrealistic goals can be found just by inspecting the goals, and >> >>> >> I'm >> >>> >> not >> >>> >> super worried that we'll accept a lot of SIPs that are then >> >>> >> infeasible >> >>> >> -- >> >>> >> we >> >>> >> can then submit new ones. But this depends on whether you want this >> >>> >> process >> >>> >> to be a "design doc lite", where people also agree on >> >>> >> implementation >> >>> >> strategy, or just a way to agree on goals. This is what I asked >> >>> >> earlier >> >>> >> about PRDs vs design docs (and I'm open to either one but I'd just >> >>> >> like >> >>> >> clarity). Finally, both as a user and designer of software, I >> >>> >> always >> >>> >> want >> >>> >> to >> >>> >> give feedback on APIs, so I'd really like a culture of having those >> >>> >> early. >> >>> >> People don't argue about prettiness when they discuss APIs, they >> >>> >> argue >> >>> >> about >> >>> >> the core concepts to expose in order to meet various goals, and >> >>> >> then >> >>> >> they're >> >>> >> stuck maintaining those for a long time. >> >>> >> >> >>> >> Matei >> >>> >> >> >>> >> On Oct 9, 2016, at 3:10 PM, Cody Koeninger <[hidden email]> wrote: >> >>> >> >> >>> >> Users instead of people, sure. Commiters and contributors are (or >> >>> >> at >> >>> >> least >> >>> >> should be) a subset of users. >> >>> >> >> >>> >> Non goals, sure. I don't care what the name is, but we need to >> >>> >> clearly >> >>> >> say >> >>> >> e.g. 'no we are not maintaining compatibility with XYZ right now'. >> >>> >> >> >>> >> API, what I care most about is whether it allows me to accomplish >> >>> >> the >> >>> >> goals. >> >>> >> Arguing about how ugly or pretty it is can be saved for design/ >> >>> >> implementation imho. >> >>> >> >> >>> >> Strategy, this is necessary because otherwise goals can be out of >> >>> >> line >> >>> >> with >> >>> >> reality. Don't propose goals you don't have at least some idea of >> >>> >> how >> >>> >> to >> >>> >> implement. >> >>> >> >> >>> >> Rejected strategies, given that commiters are the only ones I'm >> >>> >> saying >> >>> >> should formally submit SPARKLIs or SIPs, if they put junk in a >> >>> >> required >> >>> >> section then slap them down for it and tell them to fix it. >> >>> >> >> >>> >> >> >>> >> On Oct 9, 2016 4:36 PM, "Matei Zaharia" <[hidden email]> wrote: >> >>> >>> >> >>> >>> Yup, this is the stuff that I found unclear. Thanks for clarifying >> >>> >>> here, >> >>> >>> but we should also clarify it in the writeup. In particular: >> >>> >>> >> >>> >>> - Goals needs to be about user-facing behavior ("people" is broad) >> >>> >>> >> >>> >>> - I'd rename Rejected Goals to Non-Goals. Otherwise someone will >> >>> >>> dig >> >>> >>> up >> >>> >>> one of these and say "Spark's developers have officially rejected >> >>> >>> X, >> >>> >>> which >> >>> >>> our awesome system has". >> >>> >>> >> >>> >>> - For user-facing stuff, I think you need a section on API. >> >>> >>> Virtually >> >>> >>> all >> >>> >>> other *IPs I've seen have that. >> >>> >>> >> >>> >>> - I'm still not sure why the strategy section is needed if the >> >>> >>> purpose is >> >>> >>> to define user-facing behavior -- unless this is the strategy for >> >>> >>> setting >> >>> >>> the goals or for defining the API. That sounds squarely like a >> >>> >>> design >> >>> >>> doc >> >>> >>> issue. In some sense, who cares whether the proposal is >> >>> >>> technically >> >>> >>> feasible >> >>> >>> right now? If it's infeasible, that will be discovered later >> >>> >>> during >> >>> >>> design >> >>> >>> and implementation. Same thing with rejected strategies -- listing >> >>> >>> some >> >>> >>> of >> >>> >>> those is definitely useful sometimes, but if you make this a >> >>> >>> *required* >> >>> >>> section, people are just going to fill it in with bogus stuff >> >>> >>> (I've >> >>> >>> seen >> >>> >>> this happen before). >> >>> >>> >> >>> >>> Matei >> >>> >>> >> >>> > >> >>> >>> > On Oct 9, 2016, at 2:14 PM, Cody Koeninger <[hidden email]> >> >>> >>> > wrote: >> >>> >>> > >> >>> >>> > So to focus the discussion on the specific strategy I'm >> >>> >>> > suggesting, >> >>> >>> > documented at >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> > https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md >> >>> >>> > >> >>> >>> > "Goals: What must this allow people to do, that they can't >> >>> >>> > currently?" >> >>> >>> > >> >>> >>> > Is it unclear that this is focusing specifically on >> >>> >>> > people-visible >> >>> >>> > behavior? >> >>> >>> > >> >>> >>> > Rejected goals - are important because otherwise people keep >> >>> >>> > trying >> >>> >>> > to argue about scope. Of course you can change things later >> >>> >>> > with a >> >>> >>> > different SIP and different vote, the point is to focus. >> >>> >>> > >> >>> >>> > Use cases - are something that people are going to bring up in >> >>> >>> > discussion. If they aren't clearly documented as a goal ("This >> >>> >>> > must >> >>> >>> > allow me to connect using SSL"), they should be added. >> >>> >>> > >> >>> >>> > Internal architecture - if the people who need specific behavior >> >>> >>> > are >> >>> >>> > implementers of other parts of the system, that's fine. >> >>> >>> > >> >>> >>> > Rejected strategies - If you have none of these, you have no >> >>> >>> > evidence >> >>> >>> > that the proponent didn't just go with the first thing they had >> >>> >>> > in >> >>> >>> > mind (or have already implemented), which is a big problem >> >>> >>> > currently. >> >>> >>> > Approval isn't binding as to specifics of implementation, so >> >>> >>> > these >> >>> >>> > aren't handcuffs. The goals are the contract, the strategy is >> >>> >>> > evidence that contract can actually be met. >> >>> >>> > >> >>> >>> > Design docs - I'm not touching design docs. The markdown file I >> >>> >>> > linked specifically says of the strategy section "This is not a >> >>> >>> > full >> >>> >>> > design document." Is this unclear? Design docs can be worked >> >>> >>> > on >> >>> >>> > obviously, but that's not what I'm concerned with here. >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> > On Sun, Oct 9, 2016 at 2:34 PM, Matei Zaharia <[hidden email]> >> >>> >>> > wrote: >> >>> >>> >> Hi Cody, >> >>> >>> >> >> >>> >>> >> I think this would be a lot more concrete if we had a more >> >>> >>> >> detailed >> >>> >>> >> template >> >>> >>> >> for SIPs. Right now, it's not super clear what's in scope -- >> >>> >>> >> e.g. >> >>> >>> >> are >> >>> >>> >> they >> >>> >>> >> a way to solicit feedback on the user-facing behavior or on the >> >>> >>> >> internals? >> >>> >>> >> "Goals" can cover both things. I've been thinking of SIPs more >> >>> >>> >> as >> >>> >>> >> Product >> >>> >>> >> Requirements Docs (PRDs), which focus on *what* a code change >> >>> >>> >> should >> >>> >>> >> do >> >>> >>> >> as >> >>> >>> >> opposed to how. >> >>> >>> >> >> >>> >>> >> In particular, here are some things that you may or may not >> >>> >>> >> consider >> >>> >>> >> in >> >>> >>> >> scope for SIPs: >> >>> >>> >> >> >>> >>> >> - Goals and non-goals: This is definitely in scope, and IMO >> >>> >>> >> should >> >>> >>> >> focus on >> >>> >>> >> user-visible behavior (e.g. "system supports SQL window >> >>> >>> >> functions" >> >>> >>> >> or >> >>> >>> >> "system continues working if one node fails"). BTW I wouldn't >> >>> >>> >> say >> >>> >>> >> "rejected >> >>> >>> >> goals" because some of them might become goals later, so we're >> >>> >>> >> not >> >>> >>> >> definitively rejecting them. >> >>> >>> >> >> >>> >>> >> - Public API: Probably should be included in most SIPs unless >> >>> >>> >> it's >> >>> >>> >> too >> >>> >>> >> large >> >>> >>> >> to fully specify then (e.g. "let's add an ML library"). >> >>> >>> >> >> >>> >>> >> - Use cases: I usually find this very useful in PRDs to better >> >>> >>> >> communicate >> >>> >>> >> the goals. >> >>> >>> >> >> >>> >>> >> - Internal architecture: This is usually *not* a thing users >> >>> >>> >> can >> >>> >>> >> easily >> >>> >>> >> comment on and it sounds more like a design doc item. Of course >> >>> >>> >> it's >> >>> >>> >> important to show that the SIP is feasible to implement. One >> >>> >>> >> exception, >> >>> >>> >> however, is that I think we'll have some SIPs primarily on >> >>> >>> >> internals >> >>> >>> >> (e.g. >> >>> >>> >> if somebody wants to refactor Spark's query optimizer or >> >>> >>> >> something). >> >>> >>> >> >> >>> >>> >> - Rejected strategies: I personally wouldn't put this, because >> >>> >>> >> what's >> >>> >>> >> the >> >>> >>> >> point of voting to reject a strategy before you've really begun >> >>> >>> >> designing >> >>> >>> >> and implementing something? What if you discover that the >> >>> >>> >> strategy >> >>> >>> >> is >> >>> >>> >> actually better when you start doing stuff? >> >>> >>> >> >> >>> >>> >> At a super high level, it depends on whether you want the SIPs >> >>> >>> >> to >> >>> >>> >> be >> >>> >>> >> PRDs >> >>> >>> >> for getting some quick feedback on the goals of a feature >> >>> >>> >> before >> >>> >>> >> it is >> >>> >>> >> designed, or something more like full-fledged design docs (just >> >>> >>> >> a >> >>> >>> >> more >> >>> >>> >> visible design doc for bigger changes). I looked at Kafka's >> >>> >>> >> KIPs, >> >>> >>> >> and >> >>> >>> >> they >> >>> >>> >> actually seem to be more like design docs. This can work too >> >>> >>> >> but >> >>> >>> >> it >> >>> >>> >> does >> >>> >>> >> require more work from the proposer and it can lead to the same >> >>> >>> >> problems you >> >>> >>> >> mentioned with people already having a design and >> >>> >>> >> implementation >> >>> >>> >> in >> >>> >>> >> mind. >> >>> >>> >> >> >>> >>> >> Basically, the question is, are you trying to iterate faster on >> >>> >>> >> design >> >>> >>> >> by >> >>> >>> >> adding a step for user feedback earlier? Or are you just trying >> >>> >>> >> to >> >>> >>> >> make >> >>> >>> >> design docs for key features more visible (and their approval >> >>> >>> >> more >> >>> >>> >> formal)? >> >>> >>> >> >> >>> >>> >> BTW note that in either case, I'd like to have a template for >> >>> >>> >> design >> >>> >>> >> docs >> >>> >>> >> too, which should also include goals. I think that would've >> >>> >>> >> avoided >> >>> >>> >> some of >> >>> >>> >> the issues you brought up. >> >>> >>> >> >> >>> >>> >> Matei >> >>> >>> >> >> >>> >>> >> On Oct 9, 2016, at 10:40 AM, Cody Koeninger <[hidden email]> >> >>> >>> >> wrote: >> >>> >>> >> >> >>> >>> >> Here's my specific proposal (meta-proposal?) >> >>> >>> >> >> >>> >>> >> Spark Improvement Proposals (SIP) >> >>> >>> >> >> >>> >>> >> >> >>> >>> >> Background: >> >>> >>> >> >> >>> >>> >> The current problem is that design and implementation of large >> >>> >>> >> features >> >>> >>> >> are >> >>> >>> >> often done in private, before soliciting user feedback. >> >>> >>> >> >> >>> >>> >> When feedback is solicited, it is often as to detailed design >> >>> >>> >> specifics, not >> >>> >>> >> focused on goals. >> >>> >>> >> >> >>> >>> >> When implementation does take place after design, there is >> >>> >>> >> often >> >>> >>> >> disagreement as to what goals are or are not in scope. >> >>> >>> >> >> >>> >>> >> This results in commits that don't fully meet user needs. >> >>> >>> >> >> >>> >>> >> >> >>> >>> >> Goals: >> >>> >>> >> >> >>> >>> >> - Ensure user, contributor, and committer goals are clearly >> >>> >>> >> identified >> >>> >>> >> and >> >>> >>> >> agreed upon, before implementation takes place. >> >>> >>> >> >> >>> >>> >> - Ensure that a technically feasible strategy is chosen that is >> >>> >>> >> likely >> >>> >>> >> to >> >>> >>> >> meet the goals. >> >>> >>> >> >> >>> >>> >> >> >>> >>> >> Rejected Goals: >> >>> >>> >> >> >>> >>> >> - SIPs are not for detailed design. Design by committee >> >>> >>> >> doesn't >> >>> >>> >> work. >> >>> >>> >> >> >>> >>> >> - SIPs are not for every change. We dont need that much >> >>> >>> >> process. >> >>> >>> >> >> >>> >>> >> >> >>> >>> >> Strategy: >> >>> >>> >> >> >>> >>> >> My suggestion is outlined as a Spark Improvement Proposal >> >>> >>> >> process >> >>> >>> >> documented >> >>> >>> >> at >> >>> >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> >>> >> https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md >> >>> >>> >> >> >>> >>> >> Specifics of Jira manipulation are an implementation detail we >> >>> >>> >> can >> >>> >>> >> figure >> >>> >>> >> out. >> >>> >>> >> >> >>> >>> >> I'm suggesting voting; the need here is for a _clear_ outcome. >> >>> >>> >> >> >>> >>> >> >> >>> >>> >> Rejected Strategies: >> >>> >>> >> >> >>> >>> >> Having someone who understands the problem implement it first >> >>> >>> >> works, >> >>> >>> >> but >> >>> >>> >> only if significant iteration after user feedback is allowed. >> >>> >>> >> >> >>> >>> >> Historically this has been problematic due to pressure to limit >> >>> >>> >> public >> >>> >>> >> api >> >>> >>> >> changes. >> >>> >>> >> >> >>> >>> >> >> >>> >>> >> On Fri, Oct 7, 2016 at 5:16 PM, Reynold Xin <[hidden email]> >> >>> >>> >> wrote: >> >>> >>> >>> >> >>> >>> >>> Alright looks like there are quite a bit of support. We should >> >>> >>> >>> wait >> >>> >>> >>> to >> >>> >>> >>> hear from more people too. >> >>> >>> >>> >> >>> >>> >>> To push this forward, Cody and I will be working together in >> >>> >>> >>> the >> >>> >>> >>> next >> >>> >>> >>> couple of weeks to come up with a concrete, detailed proposal >> >>> >>> >>> on >> >>> >>> >>> what >> >>> >>> >>> this >> >>> >>> >>> entails, and then we can discuss this the specific proposal as >> >>> >>> >>> well. >> >>> >>> >>> >> >>> >>> >>> >> >>> >>> >>> On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <[hidden >> >>> >>> >>> email]> >> >>> >>> >>> wrote: >> >>> >>> >>>> >> >>> >>> >>>> Yeah, in case it wasn't clear, I was talking about SIPs for >> >>> >>> >>>> major >> >>> >>> >>>> user-facing or cross-cutting changes, not minor feature adds. >> >>> >>> >>>> >> >>> >>> >>>> On Fri, Oct 7, 2016 at 3:58 PM, Stavros Kontopoulos >> >>> >>> >>>> <[hidden email]> wrote: >> >>> >>> >>>>> >> >>> >>> >>>>> +1 to the SIP label as long as it does not slow down things >> >>> >>> >>>>> and >> >>> >>> >>>>> it >> >>> >>> >>>>> targets optimizing efforts, coordination etc. For example >> >>> >>> >>>>> really >> >>> >>> >>>>> small >> >>> >>> >>>>> features should not need to go through this process >> >>> >>> >>>>> (assuming >> >>> >>> >>>>> they >> >>> >>> >>>>> dont >> >>> >>> >>>>> touch public interfaces) or re-factorings and hope it will >> >>> >>> >>>>> be >> >>> >>> >>>>> kept >> >>> >>> >>>>> this >> >>> >>> >>>>> way. So as a guideline doc should be provided, like in the >> >>> >>> >>>>> KIP >> >>> >>> >>>>> case. >> >>> >>> >>>>> >> >>> >>> >>>>> IMHO so far aside from tagging things and linking them >> >>> >>> >>>>> elsewhere >> >>> >>> >>>>> simply >> >>> >>> >>>>> having design docs and prototypes implementations in PRs is >> >>> >>> >>>>> not >> >>> >>> >>>>> something >> >>> >>> >>>>> that has not worked so far. What is really a pain in many >> >>> >>> >>>>> projects >> >>> >>> >>>>> out there >> >>> >>> >>>>> is discontinuity in progress of PRs, missing features, slow >> >>> >>> >>>>> reviews >> >>> >>> >>>>> which is >> >>> >>> >>>>> understandable to some extent... it is not only about Spark >> >>> >>> >>>>> but >> >>> >>> >>>>> things can >> >>> >>> >>>>> be improved for sure for this project in particular as >> >>> >>> >>>>> already >> >>> >>> >>>>> stated. >> >>> >>> >>>>> >> >>> >>> >>>>> On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <[hidden >> >>> >>> >>>>> email]> >> >>> >>> >>>>> wrote: >> >>> >>> >>>>>> >> >>> >>> >>>>>> +1 to adding an SIP label and linking it from the website. >> >>> >>> >>>>>> I >> >>> >>> >>>>>> think >> >>> >>> >>>>>> it >> >>> >>> >>>>>> needs >> >>> >>> >>>>>> >> >>> >>> >>>>>> - template that focuses it towards soliciting user goals / >> >>> >>> >>>>>> non >> >>> >>> >>>>>> goals >> >>> >>> >>>>>> - clear resolution as to which strategy was chosen to >> >>> >>> >>>>>> pursue. >> >>> >>> >>>>>> I'd >> >>> >>> >>>>>> recommend a vote. >> >>> >>> >>>>>> >> >>> >>> >>>>>> Matei asked me to clarify what I meant by changing >> >>> >>> >>>>>> interfaces, >> >>> >>> >>>>>> I >> >>> >>> >>>>>> think >> >>> >>> >>>>>> it's directly relevant to the SIP idea so I'll clarify >> >>> >>> >>>>>> here, >> >>> >>> >>>>>> and >> >>> >>> >>>>>> split >> >>> >>> >>>>>> a thread for the other discussion per Nicholas' request. >> >>> >>> >>>>>> >> >>> >>> >>>>>> I meant changing public user interfaces. I think the first >> >>> >>> >>>>>> design >> >>> >>> >>>>>> is >> >>> >>> >>>>>> unlikely to be right, because it's done at a time when you >> >>> >>> >>>>>> have >> >>> >>> >>>>>> the >> >>> >>> >>>>>> least information. As a user, I find it considerably more >> >>> >>> >>>>>> frustrating >> >>> >>> >>>>>> to be unable to use a tool to get my job done, than I do >> >>> >>> >>>>>> having to >> >>> >>> >>>>>> make minor changes to my code in order to take advantage of >> >>> >>> >>>>>> features. >> >>> >>> >>>>>> I've seen committers be seriously reluctant to allow >> >>> >>> >>>>>> changes >> >>> >>> >>>>>> to >> >>> >>> >>>>>> @experimental code that are needed in order for it to >> >>> >>> >>>>>> really >> >>> >>> >>>>>> work >> >>> >>> >>>>>> right. You need to be able to iterate, and if people on >> >>> >>> >>>>>> both >> >>> >>> >>>>>> sides >> >>> >>> >>>>>> of >> >>> >>> >>>>>> the fence aren't going to respect that some newer apis are >> >>> >>> >>>>>> subject >> >>> >>> >>>>>> to >> >>> >>> >>>>>> change, then why even mark them as such? >> >>> >>> >>>>>> >> >>> >>> >>>>>> Ideally a finished SIP should give me a checklist of things >> >>> >>> >>>>>> that >> >>> >>> >>>>>> an >> >>> >>> >>>>>> implementation must do, and things that it doesn't need to >> >>> >>> >>>>>> do. >> >>> >>> >>>>>> Contributors/committers should be seriously discouraged >> >>> >>> >>>>>> from >> >>> >>> >>>>>> putting >> >>> >>> >>>>>> out a version 0.1 that doesn't have at least a prototype >> >>> >>> >>>>>> implementation of all those things, especially if they're >> >>> >>> >>>>>> then >> >>> >>> >>>>>> going >> >>> >>> >>>>>> to argue against interface changes necessary to get the the >> >>> >>> >>>>>> rest >> >>> >>> >>>>>> of >> >>> >>> >>>>>> the things done in the 0.2 version. >> >>> >>> >>>>>> >> >>> >>> >>>>>> >> >>> >>> >>>>>> On Fri, Oct 7, 2016 at 2:18 PM, Reynold Xin <[hidden >> >>> >>> >>>>>> email]> >> >>> >>> >>>>>> wrote: >> >>> >>> >>>>>>> I like the lightweight proposal to add a SIP label. >> >>> >>> >>>>>>> >> >>> >>> >>>>>>> During Spark 2.0 development, Tom (Graves) and I suggested >> >>> >>> >>>>>>> using >> >>> >>> >>>>>>> wiki >> >>> >>> >>>>>>> to >> >>> >>> >>>>>>> track the list of major changes, but that never really >> >>> >>> >>>>>>> materialized >> >>> >>> >>>>>>> due to >> >>> >>> >>>>>>> the overhead. Adding a SIP label on major JIRAs and then >> >>> >>> >>>>>>> link >> >>> >>> >>>>>>> to >> >>> >>> >>>>>>> them >> >>> >>> >>>>>>> prominently on the Spark website makes a lot of sense. >> >>> >>> >>>>>>> >> >>> >>> >>>>>>> >> >>> >>> >>>>>>> On Fri, Oct 7, 2016 at 10:50 AM, Matei Zaharia >> >>> >>> >>>>>>> <[hidden email]> >> >>> >>> >>>>>>> wrote: >> >>> >>> >>>>>>>> >> >>> >>> >>>>>>>> For the improvement proposals, I think one major point >> >>> >>> >>>>>>>> was >> >>> >>> >>>>>>>> to >> >>> >>> >>>>>>>> make >> >>> >>> >>>>>>>> them >> >>> >>> >>>>>>>> really visible to users who are not contributors, so we >> >>> >>> >>>>>>>> should >> >>> >>> >>>>>>>> do >> >>> >>> >>>>>>>> more than >> >>> >>> >>>>>>>> sending stuff to dev@. One very lightweight idea is to >> >>> >>> >>>>>>>> have >> >>> >>> >>>>>>>> a >> >>> >>> >>>>>>>> new >> >>> >>> >>>>>>>> type of >> >>> >>> >>>>>>>> JIRA called a SIP and have a link to a filter that shows >> >>> >>> >>>>>>>> all >> >>> >>> >>>>>>>> such >> >>> >>> >>>>>>>> JIRAs from >> >>> >>> >>>>>>>> http://spark.apache.org. I also like the idea of SIP and >> >>> >>> >>>>>>>> design >> >>> >>> >>>>>>>> doc >> >>> >>> >>>>>>>> templates (in fact many projects have them). >> >>> >>> >>>>>>>> >> >>> >>> >>>>>>>> Matei >> >>> >>> >>>>>>>> >> >>> >>> >>>>>>>> On Oct 7, 2016, at 10:38 AM, Reynold Xin <[hidden email]> >> >>> >>> >>>>>>>> wrote: >> >>> >>> >>>>>>>> >> >>> >>> >>>>>>>> I called Cody last night and talked about some of the >> >>> >>> >>>>>>>> topics >> >>> >>> >>>>>>>> in >> >>> >>> >>>>>>>> his >> >>> >>> >>>>>>>> email. >> >>> >>> >>>>>>>> It became clear to me Cody genuinely cares about the >> >>> >>> >>>>>>>> project. >> >>> >>> >>>>>>>> >> >>> >>> >>>>>>>> Some of the frustrations come from the success of the >> >>> >>> >>>>>>>> project >> >>> >>> >>>>>>>> itself >> >>> >>> >>>>>>>> becoming very "hot", and it is difficult to get clarity >> >>> >>> >>>>>>>> from >> >>> >>> >>>>>>>> people >> >>> >>> >>>>>>>> who >> >>> >>> >>>>>>>> don't dedicate all their time to Spark. In fact, it is in >> >>> >>> >>>>>>>> some >> >>> >>> >>>>>>>> ways >> >>> >>> >>>>>>>> similar >> >>> >>> >>>>>>>> to scaling an engineering team in a successful startup: >> >>> >>> >>>>>>>> old >> >>> >>> >>>>>>>> processes that >> >>> >>> >>>>>>>> worked well might not work so well when it gets to a >> >>> >>> >>>>>>>> certain >> >>> >>> >>>>>>>> size, >> >>> >>> >>>>>>>> cultures >> >>> >>> >>>>>>>> can get diluted, building culture vs building process, >> >>> >>> >>>>>>>> etc. >> >>> >>> >>>>>>>> >> >>> >>> >>>>>>>> I also really like to have a more visible process for >> >>> >>> >>>>>>>> larger >> >>> >>> >>>>>>>> changes, >> >>> >>> >>>>>>>> especially major user facing API changes. Historically we >> >>> >>> >>>>>>>> upload >> >>> >>> >>>>>>>> design docs >> >>> >>> >>>>>>>> for major changes, but it is not always consistent and >> >>> >>> >>>>>>>> difficult >> >>> >>> >>>>>>>> to >> >>> >>> >>>>>>>> quality >> >>> >>> >>>>>>>> of the docs, due to the volunteering nature of the >> >>> >>> >>>>>>>> organization. >> >>> >>> >>>>>>>> >> >>> >>> >>>>>>>> Some of the more concrete ideas we discussed focus on >> >>> >>> >>>>>>>> building a >> >>> >>> >>>>>>>> culture >> >>> >>> >>>>>>>> to improve clarity: >> >>> >>> >>>>>>>> >> >>> >>> >>>>>>>> - Process: Large changes should have design docs posted >> >>> >>> >>>>>>>> on >> >>> >>> >>>>>>>> JIRA. >> >>> >>> >>>>>>>> One >> >>> >>> >>>>>>>> thing >> >>> >>> >>>>>>>> Cody and I didn't discuss but an idea that just came to >> >>> >>> >>>>>>>> me >> >>> >>> >>>>>>>> is we >> >>> >>> >>>>>>>> should >> >>> >>> >>>>>>>> create a design doc template for the project and ask >> >>> >>> >>>>>>>> everybody >> >>> >>> >>>>>>>> to >> >>> >>> >>>>>>>> follow. >> >>> >>> >>>>>>>> The design doc template should also explicitly list goals >> >>> >>> >>>>>>>> and >> >>> >>> >>>>>>>> non-goals, to >> >>> >>> >>>>>>>> make design doc more consistent. >> >>> >>> >>>>>>>> >> >>> >>> >>>>>>>> - Process: Email dev@ to solicit feedback. We have some >> >>> >>> >>>>>>>> this >> >>> >>> >>>>>>>> with >> >>> >>> >>>>>>>> some >> >>> >>> >>>>>>>> changes, but again very inconsistent. Just posting >> >>> >>> >>>>>>>> something >> >>> >>> >>>>>>>> on >> >>> >>> >>>>>>>> JIRA >> >>> >>> >>>>>>>> isn't >> >>> >>> >>>>>>>> sufficient, because there are simply too many JIRAs and >> >>> >>> >>>>>>>> the >> >>> >>> >>>>>>>> signal >> >>> >>> >>>>>>>> get lost >> >>> >>> >>>>>>>> in the noise. While this is generally impossible to >> >>> >>> >>>>>>>> enforce >> >>> >>> >>>>>>>> because >> >>> >>> >>>>>>>> we can't >> >>> >>> >>>>>>>> force all volunteers to conform to a process (or they >> >>> >>> >>>>>>>> might >> >>> >>> >>>>>>>> not >> >>> >>> >>>>>>>> even >> >>> >>> >>>>>>>> be >> >>> >>> >>>>>>>> aware of this), those who are more familiar with the >> >>> >>> >>>>>>>> project >> >>> >>> >>>>>>>> can >> >>> >>> >>>>>>>> help by >> >>> >>> >>>>>>>> emailing the dev@ when they see something that hasn't >> >>> >>> >>>>>>>> been. >> >>> >>> >>>>>>>> >> >>> >>> >>>>>>>> - Culture: The design doc author(s) should be open to >> >>> >>> >>>>>>>> feedback. >> >>> >>> >>>>>>>> A >> >>> >>> >>>>>>>> design >> >>> >>> >>>>>>>> doc should serve as the base for discussion and is by no >> >>> >>> >>>>>>>> means >> >>> >>> >>>>>>>> the >> >>> >>> >>>>>>>> final >> >>> >>> >>>>>>>> design. Of course, this does not mean the author has to >> >>> >>> >>>>>>>> accept >> >>> >>> >>>>>>>> every >> >>> >>> >>>>>>>> feedback. They should also be comfortable accepting / >> >>> >>> >>>>>>>> rejecting >> >>> >>> >>>>>>>> ideas on >> >>> >>> >>>>>>>> technical grounds. >> >>> >>> >>>>>>>> >> >>> >>> >>>>>>>> - Process / Culture: For major ongoing projects, it can >> >>> >>> >>>>>>>> be >> >>> >>> >>>>>>>> useful >> >>> >>> >>>>>>>> to >> >>> >>> >>>>>>>> have >> >>> >>> >>>>>>>> some monthly Google hangouts that are open to the world. >> >>> >>> >>>>>>>> I >> >>> >>> >>>>>>>> am >> >>> >>> >>>>>>>> actually not >> >>> >>> >>>>>>>> sure how well this will work, because of the volunteering >> >>> >>> >>>>>>>> nature >> >>> >>> >>>>>>>> and >> >>> >>> >>>>>>>> we need >> >>> >>> >>>>>>>> to adjust for timezones for people across the globe, but >> >>> >>> >>>>>>>> it >> >>> >>> >>>>>>>> seems >> >>> >>> >>>>>>>> worth >> >>> >>> >>>>>>>> trying. >> >>> >>> >>>>>>>> >> >>> >>> >>>>>>>> - Culture: Contributors (including committers) should be >> >>> >>> >>>>>>>> more >> >>> >>> >>>>>>>> direct >> >>> >>> >>>>>>>> in >> >>> >>> >>>>>>>> setting expectations, including whether they are working >> >>> >>> >>>>>>>> on >> >>> >>> >>>>>>>> a >> >>> >>> >>>>>>>> specific >> >>> >>> >>>>>>>> issue, whether they will be working on a specific issue, >> >>> >>> >>>>>>>> and >> >>> >>> >>>>>>>> whether >> >>> >>> >>>>>>>> an >> >>> >>> >>>>>>>> issue or pr or jira should be rejected. Most people I >> >>> >>> >>>>>>>> know >> >>> >>> >>>>>>>> in >> >>> >>> >>>>>>>> this >> >>> >>> >>>>>>>> community >> >>> >>> >>>>>>>> are nice and don't enjoy telling other people no, but it >> >>> >>> >>>>>>>> is >> >>> >>> >>>>>>>> often >> >>> >>> >>>>>>>> more >> >>> >>> >>>>>>>> annoying to a contributor to not know anything than >> >>> >>> >>>>>>>> getting >> >>> >>> >>>>>>>> a >> >>> >>> >>>>>>>> no. >> >>> >>> >>>>>>>> >> >>> >>> >>>>>>>> >> >>> >>> >>>>>>>> On Fri, Oct 7, 2016 at 10:03 AM, Matei Zaharia >> >>> >>> >>>>>>>> <[hidden email]> >> >>> >>> >>>>>>>> wrote: >> >>> >>> >>>>>>>>> >> >>> >>> >>>>>>>>> >> >>> >>> >>>>>>>>> Love the idea of a more visible "Spark Improvement >> >>> >>> >>>>>>>>> Proposal" >> >>> >>> >>>>>>>>> process that >> >>> >>> >>>>>>>>> solicits user input on new APIs. For what it's worth, I >> >>> >>> >>>>>>>>> don't >> >>> >>> >>>>>>>>> think >> >>> >>> >>>>>>>>> committers are trying to minimize their own work -- >> >>> >>> >>>>>>>>> every >> >>> >>> >>>>>>>>> committer >> >>> >>> >>>>>>>>> cares >> >>> >>> >>>>>>>>> about making the software useful for users. However, it >> >>> >>> >>>>>>>>> is >> >>> >>> >>>>>>>>> always >> >>> >>> >>>>>>>>> hard to >> >>> >>> >>>>>>>>> get user input and so it helps to have this kind of >> >>> >>> >>>>>>>>> process. >> >>> >>> >>>>>>>>> I've >> >>> >>> >>>>>>>>> certainly >> >>> >>> >>>>>>>>> looked at the *IPs a lot in other software I use just to >> >>> >>> >>>>>>>>> see >> >>> >>> >>>>>>>>> the >> >>> >>> >>>>>>>>> biggest >> >>> >>> >>>>>>>>> things on the roadmap. >> >>> >>> >>>>>>>>> >> >>> >>> >>>>>>>>> When you're talking about "changing interfaces", are you >> >>> >>> >>>>>>>>> talking >> >>> >>> >>>>>>>>> about >> >>> >>> >>>>>>>>> public or internal APIs? I do think many people hate >> >>> >>> >>>>>>>>> changing >> >>> >>> >>>>>>>>> public APIs >> >>> >>> >>>>>>>>> and I actually think that's for the best of the project. >> >>> >>> >>>>>>>>> That's >> >>> >>> >>>>>>>>> a >> >>> >>> >>>>>>>>> technical >> >>> >>> >>>>>>>>> debate, but basically, the worst thing when you're using >> >>> >>> >>>>>>>>> a >> >>> >>> >>>>>>>>> piece >> >>> >>> >>>>>>>>> of >> >>> >>> >>>>>>>>> software >> >>> >>> >>>>>>>>> is that the developers constantly ask you to rewrite >> >>> >>> >>>>>>>>> your >> >>> >>> >>>>>>>>> app >> >>> >>> >>>>>>>>> to >> >>> >>> >>>>>>>>> update to a >> >>> >>> >>>>>>>>> new version (and thus benefit from bug fixes, etc). Cue >> >>> >>> >>>>>>>>> anyone >> >>> >>> >>>>>>>>> who's used >> >>> >>> >>>>>>>>> Protobuf, or Guava. The "let's get everyone to change >> >>> >>> >>>>>>>>> their >> >>> >>> >>>>>>>>> code >> >>> >>> >>>>>>>>> this >> >>> >>> >>>>>>>>> release" model works well within a single large company, >> >>> >>> >>>>>>>>> but >> >>> >>> >>>>>>>>> doesn't work >> >>> >>> >>>>>>>>> well for a community, which is why nearly all *very* >> >>> >>> >>>>>>>>> widely >> >>> >>> >>>>>>>>> used >> >>> >>> >>>>>>>>> programming >> >>> >>> >>>>>>>>> interfaces (I'm talking things like Java standard >> >>> >>> >>>>>>>>> library, >> >>> >>> >>>>>>>>> Windows >> >>> >>> >>>>>>>>> API, etc) >> >>> >>> >>>>>>>>> almost *never* break backwards compatibility. All this >> >>> >>> >>>>>>>>> is >> >>> >>> >>>>>>>>> done >> >>> >>> >>>>>>>>> within reason >> >>> >>> >>>>>>>>> though, e.g. we do change things in major releases (2.x, >> >>> >>> >>>>>>>>> 3.x, >> >>> >>> >>>>>>>>> etc). >> >>> >>> >>>>>>>> >> >>> >>> >>>>>>>> >> >>> >>> >>>>>>>> >> >>> >>> >>>>>>>> >> >>> >>> >>>>>>> >> >>> >>> >>>>>> >> >>> >>> >>>>>> >> >>> >>> >>>>>> >> >>> >>> >>>>>> >> >>> >>> >>>>>> >> >>> >>> >>>>>> --------------------------------------------------------------------- >> >>> >>> >>>>>> To unsubscribe e-mail: [hidden email] >> >>> >>> >>>>>> >> >>> >>> >>>>> >> >>> >>> >>>>> >> >>> >>> >>>>> >> >>> >>> >>>>> -- >> >>> >>> >>>>> Stavros Kontopoulos >> >>> >>> >>>>> Senior Software Engineer >> >>> >>> >>>>> Lightbend, Inc. >> >>> >>> >>>>> p: +30 6977967274 >> >>> >>> >>>>> e: [hidden email] >> >>> >>> >>>>> >> >>> >>> >>>>> >> >>> >>> >>>> >> >>> >>> >>> >> >>> >>> >> >> >>> >>> >> >> >>> >>> >> >>> >> >> >>> > >> >>> > >> >>> > >> >>> > --------------------------------------------------------------------- >> >>> > To unsubscribe e-mail: [hidden email] >> >>> > >> >>> > >> >>> > ________________________________ >> >>> > >> >>> > If you reply to this email, your message will be added to the >> >>> > discussion >> >>> > below: >> >>> > >> >>> > >> >>> > >> >>> > http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-Improvement-Proposals-tp19268p19359.html >> >>> > >> >>> > To start a new topic under Apache Spark Developers List, email >> >>> > [hidden >> >>> > email] >> >>> > To unsubscribe from Apache Spark Developers List, click here. >> >>> > NAML >> >>> > >> >>> > >> >>> > ________________________________ >> >>> > View this message in context: RE: Spark Improvement Proposals >> >>> > Sent from the Apache Spark Developers List mailing list archive at >> >>> > Nabble.com. >> >>> >> >>> --------------------------------------------------------------------- >> >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >>> >> >> >> >> >> >> >> >> -- >> >> Ryan Blue >> >> Software Engineer >> >> Netflix >> > >> > >> > >> > >> > -- >> > Ryan Blue >> > Software Engineer >> > Netflix > > > > > -- > Ryan Blue > Software Engineer > Netflix --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org