The current proposal seems process-heavy to me. That's not necessarily bad, but there are a couple areas I haven't seen discussed.
Why is there a shepherd? If the person proposing a change has a good idea, I don't see why one is either a good idea or necessary. The result of this requirement is that each SPIP must attract the attention of a PMC member, and that PMC member has then taken on extra responsibility. Why can't the SPIP author simply call a vote when an idea has been sufficiently discussed? I think *this* proposal would have moved faster if Cody had felt empowered to bring it to a vote. More steps out of the author's control will cause fewer ideas to move forward, regardless of quality, so we should make sure this is balanced by a real benefit. Why are only PMC members allowed a binding vote? I don't have a strong inclination one way or another, but until recently this was an open question. I'd like to hear the argument for restricting voting to PMC members, or I think we should change it to allow all commiters. If this decision is left to default, let's be more inclusive. I would be fine with the proposal overall if there are good reasons behind these choices. rb On Thu, Feb 16, 2017 at 8:22 AM, Reynold Xin <r...@databricks.com> wrote: > Updated. Any feedback from other community members? > > > On Wed, Feb 15, 2017 at 2:53 AM, Cody Koeninger <c...@koeninger.org> > wrote: > >> Thanks for doing that. >> >> Given that there are at least 4 different Apache voting processes, >> "typical Apache vote process" isn't meaningful to me. >> >> I think the intention is that in order to pass, it needs at least 3 +1 >> votes from PMC members *and no -1 votes from PMC members*. But the >> document doesn't explicitly say that second part. >> >> There's also no mention of the duration a vote should remain open. >> There's a mention of a month for finding a shepherd, but that's different. >> >> Other than that, LGTM. >> >> On Mon, Feb 13, 2017 at 9:02 AM, Reynold Xin <r...@databricks.com> wrote: >> >>> Here's a new draft that incorporated most of the feedback: >>> https://docs.google.com/document/d/1-Zdi_W-wtuxS9h >>> TK0P9qb2x-nRanvXmnZ7SUi4qMljg/edit# >>> >>> I added a specific role for SPIP Author and another one for SPIP >>> Shepherd. >>> >>> On Sat, Feb 11, 2017 at 6:13 PM, Xiao Li <gatorsm...@gmail.com> wrote: >>> >>>> During the summit, I also had a lot of discussions over similar topics >>>> with multiple Committers and active users. I heard many fantastic ideas. I >>>> believe Spark improvement proposals are good channels to collect the >>>> requirements/designs. >>>> >>>> >>>> IMO, we also need to consider the priority when working on these items. >>>> Even if the proposal is accepted, it does not mean it will be implemented >>>> and merged immediately. It is not a FIFO queue. >>>> >>>> >>>> Even if some PRs are merged, sometimes, we still have to revert them >>>> back, if the design and implementation are not reviewed carefully. We have >>>> to ensure our quality. Spark is not an application software. It is an >>>> infrastructure software that is being used by many many companies. We have >>>> to be very careful in the design and implementation, especially >>>> adding/changing the external APIs. >>>> >>>> >>>> When I developed the Mainframe infrastructure/middleware software in >>>> the past 6 years, I were involved in the discussions with external/internal >>>> customers. The to-do feature list was always above 100. Sometimes, the >>>> customers are feeling frustrated when we are unable to deliver them on time >>>> due to the resource limits and others. Even if they paid us billions, we >>>> still need to do it phase by phase or sometimes they have to accept the >>>> workarounds. That is the reality everyone has to face, I think. >>>> >>>> >>>> Thanks, >>>> >>>> >>>> Xiao Li >>>> >>>> 2017-02-11 7:57 GMT-08:00 Cody Koeninger <c...@koeninger.org>: >>>> >>>>> At the spark summit this week, everyone from PMC members to users I >>>>> had never met before were asking me about the Spark improvement proposals >>>>> idea. It's clear that it's a real community need. >>>>> >>>>> But it's been almost half a year, and nothing visible has been done. >>>>> >>>>> Reynold, are you going to do this? >>>>> >>>>> If so, when? >>>>> >>>>> If not, why? >>>>> >>>>> You already did the right thing by including long-deserved >>>>> committers. Please keep doing the right thing for the community. >>>>> >>>>> On Wed, Jan 11, 2017 at 4:13 AM, Reynold Xin <r...@databricks.com> >>>>> wrote: >>>>> >>>>>> +1 on all counts (consensus, time bound, define roles) >>>>>> >>>>>> I can update the doc in the next few days and share back. Then maybe >>>>>> we can just officially vote on this. As Tim suggested, we might not get >>>>>> it >>>>>> 100% right the first time and would need to re-iterate. But that's fine. >>>>>> >>>>>> >>>>>> On Thu, Jan 5, 2017 at 3:29 PM, Tim Hunter <timhun...@databricks.com> >>>>>> wrote: >>>>>> >>>>>>> Hi Cody, >>>>>>> thank you for bringing up this topic, I agree it is very important >>>>>>> to keep a cohesive community around some common, fluid goals. Here are a >>>>>>> few comments about the current document: >>>>>>> >>>>>>> 1. name: it should not overlap with an existing one such as SIP. Can >>>>>>> you imagine someone trying to discuss a scala spore proposal for spark? >>>>>>> "[Spark] SIP-3 is intended to evolve in tandem with [Scala] SIP-21". >>>>>>> SPIP >>>>>>> sounds great. >>>>>>> >>>>>>> 2. roles: at a high level, SPIPs are meant to reach consensus for >>>>>>> technical decisions with a lasting impact. As such, the template should >>>>>>> emphasize the role of the various parties during this process: >>>>>>> >>>>>>> - the SPIP author is responsible for building consensus. She is the >>>>>>> champion driving the process forward and is responsible for ensuring >>>>>>> that >>>>>>> the SPIP follows the general guidelines. The author should be >>>>>>> identified in >>>>>>> the SPIP. The authorship of a SPIP can be transferred if the current >>>>>>> author >>>>>>> is not interested and someone else wants to move the SPIP forward. There >>>>>>> should probably be 2-3 authors at most for each SPIP. >>>>>>> >>>>>>> - someone with voting power should probably shepherd the SPIP (and >>>>>>> be recorded as such): ensuring that the final decision over the SPIP is >>>>>>> recorded (rejected, accepted, etc.), and advising about the technical >>>>>>> quality of the SPIP: this person need not be a champion for the SPIP or >>>>>>> contribute to it, but rather makes sure it stands a chance of being >>>>>>> approved when the vote happens. Also, if the author cannot find anyone >>>>>>> who >>>>>>> would want to take this role, this proposal is likely to be rejected >>>>>>> anyway. >>>>>>> >>>>>>> - users, committers, contributors have the roles already outlined >>>>>>> in the document >>>>>>> >>>>>>> 3. timeline: ideally, once a SPIP has been offered for voting, it >>>>>>> should move swiftly into either being accepted or rejected, so that we >>>>>>> do >>>>>>> not end up with a distracting long tail of half-hearted proposals. >>>>>>> >>>>>>> These rules are meant to be flexible, but the current document >>>>>>> should be clear about who is in charge of a SPIP, and the state it is >>>>>>> currently in. >>>>>>> >>>>>>> We have had long discussions over some very important questions such >>>>>>> as approval. I do not have an opinion on these, but why not make a pick >>>>>>> and >>>>>>> reevaluate this decision later? This is not a binding process at this >>>>>>> point. >>>>>>> >>>>>>> Tim >>>>>>> >>>>>>> >>>>>>> On Tue, Jan 3, 2017 at 3:16 PM, Cody Koeninger <c...@koeninger.org> >>>>>>> wrote: >>>>>>> >>>>>>>> I don't have a concern about voting vs consensus. >>>>>>>> >>>>>>>> I have a concern that whatever the decision making process is, it >>>>>>>> is explicitly announced on the ticket for the given proposal, with an >>>>>>>> explicit deadline, and an explicit outcome. >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Jan 3, 2017 at 4:08 PM, Imran Rashid <iras...@cloudera.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> I'm also in favor of this. Thanks for your persistence Cody. >>>>>>>>> >>>>>>>>> My take on the specific issues Joseph mentioned: >>>>>>>>> >>>>>>>>> 1) voting vs. consensus -- I agree with the argument Ryan Blue >>>>>>>>> made earlier for consensus: >>>>>>>>> >>>>>>>>> > Majority vs consensus: My rationale is that I don't think we >>>>>>>>> want to consider a proposal approved if it had objections serious >>>>>>>>> enough >>>>>>>>> that committers down-voted (or PMC depending on who gets a vote). If >>>>>>>>> these >>>>>>>>> proposals are like PEPs, then they represent a significant amount of >>>>>>>>> community effort and I wouldn't want to move forward if up to half of >>>>>>>>> the >>>>>>>>> community thinks it's an untenable idea. >>>>>>>>> >>>>>>>>> 2) Design doc template -- agree this would be useful, but also >>>>>>>>> seems totally orthogonal to moving forward on the SIP proposal. >>>>>>>>> >>>>>>>>> 3) agree w/ Joseph's proposal for updating the template. >>>>>>>>> >>>>>>>>> One small addition: >>>>>>>>> >>>>>>>>> 4) Deciding on a name -- minor, but I think its wroth >>>>>>>>> disambiguating from Scala's SIPs, and the best proposal I've heard is >>>>>>>>> "SPIP". At least, no one has objected. (don't care enough that I'd >>>>>>>>> object to anything else, though.) >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Jan 3, 2017 at 3:30 PM, Joseph Bradley < >>>>>>>>> jos...@databricks.com> wrote: >>>>>>>>> >>>>>>>>>> Hi Cody, >>>>>>>>>> >>>>>>>>>> Thanks for being persistent about this. I too would like to see >>>>>>>>>> this happen. Reviewing the thread, it sounds like the main things >>>>>>>>>> remaining are: >>>>>>>>>> * Decide about a few issues >>>>>>>>>> * Finalize the doc(s) >>>>>>>>>> * Vote on this proposal >>>>>>>>>> >>>>>>>>>> Issues & TODOs: >>>>>>>>>> >>>>>>>>>> (1) The main issue I see above is voting vs. consensus. I have >>>>>>>>>> little preference here. It sounds like something which could be >>>>>>>>>> tailored >>>>>>>>>> based on whether we see too many or too few SIPs being approved. >>>>>>>>>> >>>>>>>>>> (2) Design doc template (This would be great to have for Spark >>>>>>>>>> regardless of this SIP discussion.) >>>>>>>>>> * Reynold, are you still putting this together? >>>>>>>>>> >>>>>>>>>> (3) Template cleanups. Listing some items mentioned above + a >>>>>>>>>> new one w.r.t. Reynold's draft >>>>>>>>>> <https://docs.google.com/document/d/1-Zdi_W-wtuxS9hTK0P9qb2x-nRanvXmnZ7SUi4qMljg/edit#> >>>>>>>>>> : >>>>>>>>>> * Reinstate the "Where" section with links to current and past >>>>>>>>>> SIPs >>>>>>>>>> * Add field for stating explicit deadlines for approval >>>>>>>>>> * Add field for stating Author & Committer shepherd >>>>>>>>>> >>>>>>>>>> Thanks all! >>>>>>>>>> Joseph >>>>>>>>>> >>>>>>>>>> On Mon, Jan 2, 2017 at 7:45 AM, Cody Koeninger < >>>>>>>>>> c...@koeninger.org> wrote: >>>>>>>>>> >>>>>>>>>>> I'm bumping this one more time for the new year, and then I'm >>>>>>>>>>> giving up. >>>>>>>>>>> >>>>>>>>>>> Please, fix your process, even if it isn't exactly the way I >>>>>>>>>>> suggested. >>>>>>>>>>> >>>>>>>>>>> On Tue, Nov 8, 2016 at 11:14 AM, Ryan Blue <rb...@netflix.com> >>>>>>>>>>> wrote: >>>>>>>>>>> > On lazy consensus as opposed to voting: >>>>>>>>>>> > >>>>>>>>>>> > First, why lazy consensus? The proposal was for consensus, >>>>>>>>>>> which is at least >>>>>>>>>>> > three +1 votes and no vetos. Consensus has no losing side, it >>>>>>>>>>> requires >>>>>>>>>>> > getting to a point where there is agreement. Isn't that >>>>>>>>>>> agreement what we >>>>>>>>>>> > want to achieve with these proposals? >>>>>>>>>>> > >>>>>>>>>>> > Second, lazy consensus only removes the requirement for three >>>>>>>>>>> +1 votes. Why >>>>>>>>>>> > would we not want at least three committers to think something >>>>>>>>>>> is a good >>>>>>>>>>> > idea before adopting the proposal? >>>>>>>>>>> > >>>>>>>>>>> > rb >>>>>>>>>>> > >>>>>>>>>>> > On Tue, Nov 8, 2016 at 8:13 AM, Cody Koeninger < >>>>>>>>>>> c...@koeninger.org> wrote: >>>>>>>>>>> >> >>>>>>>>>>> >> So there are some minor things (the Where section heading >>>>>>>>>>> appears to >>>>>>>>>>> >> be dropped; wherever this document is posted it needs to >>>>>>>>>>> actually link >>>>>>>>>>> >> to a jira filter showing current / past SIPs) but it doesn't >>>>>>>>>>> look like >>>>>>>>>>> >> I can comment on the google doc. >>>>>>>>>>> >> >>>>>>>>>>> >> The major substantive issue that I have is that this version >>>>>>>>>>> is >>>>>>>>>>> >> significantly less clear as to the outcome of an SIP. >>>>>>>>>>> >> >>>>>>>>>>> >> The apache example of lazy consensus at >>>>>>>>>>> >> http://apache.org/foundation/voting.html#LazyConsensus >>>>>>>>>>> involves an >>>>>>>>>>> >> explicit announcement of an explicit deadline, which I think >>>>>>>>>>> are >>>>>>>>>>> >> necessary for clarity. >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> On Mon, Nov 7, 2016 at 1:55 PM, Reynold Xin < >>>>>>>>>>> r...@databricks.com> wrote: >>>>>>>>>>> >> > It turned out suggested edits (trackable) don't show up for >>>>>>>>>>> non-owners, >>>>>>>>>>> >> > so >>>>>>>>>>> >> > I've just merged all the edits in place. It should be >>>>>>>>>>> visible now. >>>>>>>>>>> >> > >>>>>>>>>>> >> > On Mon, Nov 7, 2016 at 10:10 AM, Reynold Xin < >>>>>>>>>>> r...@databricks.com> >>>>>>>>>>> >> > wrote: >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> Oops. Let me try figure that out. >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> On Monday, November 7, 2016, Cody Koeninger < >>>>>>>>>>> c...@koeninger.org> wrote: >>>>>>>>>>> >> >>> >>>>>>>>>>> >> >>> Thanks for picking up on this. >>>>>>>>>>> >> >>> >>>>>>>>>>> >> >>> Maybe I fail at google docs, but I can't see any edits on >>>>>>>>>>> the document >>>>>>>>>>> >> >>> you linked. >>>>>>>>>>> >> >>> >>>>>>>>>>> >> >>> Regarding lazy consensus, if the board in general has >>>>>>>>>>> less of an issue >>>>>>>>>>> >> >>> with that, sure. As long as it is clearly announced, >>>>>>>>>>> lasts at least >>>>>>>>>>> >> >>> 72 hours, and has a clear outcome. >>>>>>>>>>> >> >>> >>>>>>>>>>> >> >>> The other points are hard to comment on without being >>>>>>>>>>> able to see the >>>>>>>>>>> >> >>> text in question. >>>>>>>>>>> >> >>> >>>>>>>>>>> >> >>> >>>>>>>>>>> >> >>> On Mon, Nov 7, 2016 at 3:11 AM, Reynold Xin < >>>>>>>>>>> r...@databricks.com> >>>>>>>>>>> >> >>> wrote: >>>>>>>>>>> >> >>> > I just looked through the entire thread again tonight - >>>>>>>>>>> there are a >>>>>>>>>>> >> >>> > lot >>>>>>>>>>> >> >>> > of >>>>>>>>>>> >> >>> > great ideas being discussed. Thanks Cody for taking the >>>>>>>>>>> first crack >>>>>>>>>>> >> >>> > at >>>>>>>>>>> >> >>> > the >>>>>>>>>>> >> >>> > proposal. >>>>>>>>>>> >> >>> > >>>>>>>>>>> >> >>> > I want to first comment on the context. Spark is one of >>>>>>>>>>> the most >>>>>>>>>>> >> >>> > innovative >>>>>>>>>>> >> >>> > and important projects in (big) data -- overall >>>>>>>>>>> technical decisions >>>>>>>>>>> >> >>> > made in >>>>>>>>>>> >> >>> > Apache Spark are sound. But of course, a project as >>>>>>>>>>> large and active >>>>>>>>>>> >> >>> > as >>>>>>>>>>> >> >>> > Spark always have room for improvement, and we as a >>>>>>>>>>> community should >>>>>>>>>>> >> >>> > strive >>>>>>>>>>> >> >>> > to take it to the next level. >>>>>>>>>>> >> >>> > >>>>>>>>>>> >> >>> > To that end, the two biggest areas for improvements in >>>>>>>>>>> my opinion >>>>>>>>>>> >> >>> > are: >>>>>>>>>>> >> >>> > >>>>>>>>>>> >> >>> > 1. Visibility: There are so much happening that it is >>>>>>>>>>> difficult to >>>>>>>>>>> >> >>> > know >>>>>>>>>>> >> >>> > what >>>>>>>>>>> >> >>> > really is going on. For people that don't follow >>>>>>>>>>> closely, it is >>>>>>>>>>> >> >>> > difficult to >>>>>>>>>>> >> >>> > know what the important initiatives are. Even for >>>>>>>>>>> people that do >>>>>>>>>>> >> >>> > follow, it >>>>>>>>>>> >> >>> > is difficult to know what specific things require their >>>>>>>>>>> attention, >>>>>>>>>>> >> >>> > since the >>>>>>>>>>> >> >>> > number of pull requests and JIRA tickets are high and >>>>>>>>>>> it's difficult >>>>>>>>>>> >> >>> > to >>>>>>>>>>> >> >>> > extract signal from noise. >>>>>>>>>>> >> >>> > >>>>>>>>>>> >> >>> > 2. Solicit user (broadly defined, including developers >>>>>>>>>>> themselves) >>>>>>>>>>> >> >>> > input >>>>>>>>>>> >> >>> > more proactively: At the end of the day the project >>>>>>>>>>> provides value >>>>>>>>>>> >> >>> > because >>>>>>>>>>> >> >>> > users use it. Users can't tell us exactly what to >>>>>>>>>>> build, but it is >>>>>>>>>>> >> >>> > important >>>>>>>>>>> >> >>> > to get their inputs. >>>>>>>>>>> >> >>> > >>>>>>>>>>> >> >>> > >>>>>>>>>>> >> >>> > I've taken Cody's doc and edited it: >>>>>>>>>>> >> >>> > >>>>>>>>>>> >> >>> > >>>>>>>>>>> >> >>> > https://docs.google.com/docume >>>>>>>>>>> nt/d/1-Zdi_W-wtuxS9hTK0P9qb2x-nRanvXmnZ7SUi4qMljg/edit#headi >>>>>>>>>>> ng=h.36ut37zh7w2b >>>>>>>>>>> >> >>> > (I've made all my modifications trackable) >>>>>>>>>>> >> >>> > >>>>>>>>>>> >> >>> > There are couple high level changes I made: >>>>>>>>>>> >> >>> > >>>>>>>>>>> >> >>> > 1. I've consulted a board member and he recommended >>>>>>>>>>> lazy consensus >>>>>>>>>>> >> >>> > as >>>>>>>>>>> >> >>> > opposed to voting. The reason being in voting there can >>>>>>>>>>> easily be a >>>>>>>>>>> >> >>> > "loser' >>>>>>>>>>> >> >>> > that gets outvoted. >>>>>>>>>>> >> >>> > >>>>>>>>>>> >> >>> > 2. I made it lighter weight, and renamed "strategy" to >>>>>>>>>>> "optional >>>>>>>>>>> >> >>> > design >>>>>>>>>>> >> >>> > sketch". Echoing one of the earlier email: "IMHO so far >>>>>>>>>>> aside from >>>>>>>>>>> >> >>> > tagging >>>>>>>>>>> >> >>> > things and linking them elsewhere simply having design >>>>>>>>>>> docs and >>>>>>>>>>> >> >>> > prototypes >>>>>>>>>>> >> >>> > implementations in PRs is not something that has not >>>>>>>>>>> worked so far". >>>>>>>>>>> >> >>> > >>>>>>>>>>> >> >>> > 3. I made some the language tweaks to focus more on >>>>>>>>>>> visibility. For >>>>>>>>>>> >> >>> > example, >>>>>>>>>>> >> >>> > "The purpose of an SIP is to inform and involve", >>>>>>>>>>> rather than just >>>>>>>>>>> >> >>> > "involve". SIPs should also have at least two emails >>>>>>>>>>> that go to >>>>>>>>>>> >> >>> > dev@. >>>>>>>>>>> >> >>> > >>>>>>>>>>> >> >>> > >>>>>>>>>>> >> >>> > While I was editing this, I thought we really needed a >>>>>>>>>>> suggested >>>>>>>>>>> >> >>> > template >>>>>>>>>>> >> >>> > for design doc too. I will get to that too ... >>>>>>>>>>> >> >>> > >>>>>>>>>>> >> >>> > >>>>>>>>>>> >> >>> > On Tue, Nov 1, 2016 at 12:09 AM, Reynold Xin < >>>>>>>>>>> r...@databricks.com> >>>>>>>>>>> >> >>> > wrote: >>>>>>>>>>> >> >>> >> >>>>>>>>>>> >> >>> >> Most things looked OK to me too, although I do plan to >>>>>>>>>>> take a >>>>>>>>>>> >> >>> >> closer >>>>>>>>>>> >> >>> >> look >>>>>>>>>>> >> >>> >> after Nov 1st when we cut the release branch for 2.1. >>>>>>>>>>> >> >>> >> >>>>>>>>>>> >> >>> >> >>>>>>>>>>> >> >>> >> On Mon, Oct 31, 2016 at 3:12 PM, Marcelo Vanzin >>>>>>>>>>> >> >>> >> <van...@cloudera.com> >>>>>>>>>>> >> >>> >> wrote: >>>>>>>>>>> >> >>> >>> >>>>>>>>>>> >> >>> >>> The proposal looks OK to me. I assume, even though >>>>>>>>>>> it's not >>>>>>>>>>> >> >>> >>> explicitly >>>>>>>>>>> >> >>> >>> called, that voting would happen by e-mail? A >>>>>>>>>>> template for the >>>>>>>>>>> >> >>> >>> proposal document (instead of just a bullet nice) >>>>>>>>>>> would also be >>>>>>>>>>> >> >>> >>> nice, >>>>>>>>>>> >> >>> >>> but that can be done at any time. >>>>>>>>>>> >> >>> >>> >>>>>>>>>>> >> >>> >>> BTW, shameless plug: I filed SPARK-18085 which I >>>>>>>>>>> consider a >>>>>>>>>>> >> >>> >>> candidate >>>>>>>>>>> >> >>> >>> for a SIP, given the scope of the work. The document >>>>>>>>>>> attached even >>>>>>>>>>> >> >>> >>> somewhat matches the proposed format. So if anyone >>>>>>>>>>> wants to try >>>>>>>>>>> >> >>> >>> out >>>>>>>>>>> >> >>> >>> the process... >>>>>>>>>>> >> >>> >>> >>>>>>>>>>> >> >>> >>> On Mon, Oct 31, 2016 at 10:34 AM, Cody Koeninger >>>>>>>>>>> >> >>> >>> <c...@koeninger.org> >>>>>>>>>>> >> >>> >>> wrote: >>>>>>>>>>> >> >>> >>> > Now that spark summit europe is over, are any >>>>>>>>>>> committers >>>>>>>>>>> >> >>> >>> > interested >>>>>>>>>>> >> >>> >>> > in >>>>>>>>>>> >> >>> >>> > moving forward with this? >>>>>>>>>>> >> >>> >>> > >>>>>>>>>>> >> >>> >>> > >>>>>>>>>>> >> >>> >>> > >>>>>>>>>>> >> >>> >>> > >>>>>>>>>>> >> >>> >>> > https://github.com/koeninger/s >>>>>>>>>>> park-1/blob/SIP-0/docs/spark-improvement-proposals.md >>>>>>>>>>> >> >>> >>> > >>>>>>>>>>> >> >>> >>> > Or are we going to let this discussion die on the >>>>>>>>>>> vine? >>>>>>>>>>> >> >>> >>> > >>>>>>>>>>> >> >>> >>> > On Mon, Oct 17, 2016 at 10:05 AM, Tomasz Gawęda >>>>>>>>>>> >> >>> >>> > <tomasz.gaw...@outlook.com> wrote: >>>>>>>>>>> >> >>> >>> >> Maybe my mail was not clear enough. >>>>>>>>>>> >> >>> >>> >> >>>>>>>>>>> >> >>> >>> >> >>>>>>>>>>> >> >>> >>> >> I didn't want to write "lets focus on Flink" or >>>>>>>>>>> any other >>>>>>>>>>> >> >>> >>> >> framework. >>>>>>>>>>> >> >>> >>> >> The >>>>>>>>>>> >> >>> >>> >> idea with benchmarks was to show two things: >>>>>>>>>>> >> >>> >>> >> >>>>>>>>>>> >> >>> >>> >> - why some people are doing bad PR for Spark >>>>>>>>>>> >> >>> >>> >> >>>>>>>>>>> >> >>> >>> >> - how - in easy way - we can change it and show >>>>>>>>>>> that Spark is >>>>>>>>>>> >> >>> >>> >> still on >>>>>>>>>>> >> >>> >>> >> the >>>>>>>>>>> >> >>> >>> >> top >>>>>>>>>>> >> >>> >>> >> >>>>>>>>>>> >> >>> >>> >> >>>>>>>>>>> >> >>> >>> >> No more, no less. Benchmarks will be helpful, but >>>>>>>>>>> I don't think >>>>>>>>>>> >> >>> >>> >> they're the >>>>>>>>>>> >> >>> >>> >> most important thing in Spark :) On the Spark main >>>>>>>>>>> page there >>>>>>>>>>> >> >>> >>> >> is >>>>>>>>>>> >> >>> >>> >> still >>>>>>>>>>> >> >>> >>> >> chart >>>>>>>>>>> >> >>> >>> >> "Spark vs Hadoop". It is important to show that >>>>>>>>>>> framework is >>>>>>>>>>> >> >>> >>> >> not >>>>>>>>>>> >> >>> >>> >> the >>>>>>>>>>> >> >>> >>> >> same >>>>>>>>>>> >> >>> >>> >> Spark with other API, but much faster and >>>>>>>>>>> optimized, comparable >>>>>>>>>>> >> >>> >>> >> or >>>>>>>>>>> >> >>> >>> >> even >>>>>>>>>>> >> >>> >>> >> faster than other frameworks. >>>>>>>>>>> >> >>> >>> >> >>>>>>>>>>> >> >>> >>> >> >>>>>>>>>>> >> >>> >>> >> About real-time streaming, I think it would be >>>>>>>>>>> just good to see >>>>>>>>>>> >> >>> >>> >> it >>>>>>>>>>> >> >>> >>> >> in >>>>>>>>>>> >> >>> >>> >> Spark. >>>>>>>>>>> >> >>> >>> >> I very like current Spark model, but many voices >>>>>>>>>>> that says "we >>>>>>>>>>> >> >>> >>> >> need >>>>>>>>>>> >> >>> >>> >> more" - >>>>>>>>>>> >> >>> >>> >> community should listen also them and try to help >>>>>>>>>>> them. With >>>>>>>>>>> >> >>> >>> >> SIPs >>>>>>>>>>> >> >>> >>> >> it >>>>>>>>>>> >> >>> >>> >> would >>>>>>>>>>> >> >>> >>> >> be easier, I've just posted this example as "thing >>>>>>>>>>> that may be >>>>>>>>>>> >> >>> >>> >> changed >>>>>>>>>>> >> >>> >>> >> with >>>>>>>>>>> >> >>> >>> >> SIP". >>>>>>>>>>> >> >>> >>> >> >>>>>>>>>>> >> >>> >>> >> >>>>>>>>>>> >> >>> >>> >> I very like unification via Datasets, but there is >>>>>>>>>>> a lot of >>>>>>>>>>> >> >>> >>> >> algorithms >>>>>>>>>>> >> >>> >>> >> inside - let's make easy API, but with strong >>>>>>>>>>> background >>>>>>>>>>> >> >>> >>> >> (articles, >>>>>>>>>>> >> >>> >>> >> benchmarks, descriptions, etc) that shows that >>>>>>>>>>> Spark is still >>>>>>>>>>> >> >>> >>> >> modern >>>>>>>>>>> >> >>> >>> >> framework. >>>>>>>>>>> >> >>> >>> >> >>>>>>>>>>> >> >>> >>> >> >>>>>>>>>>> >> >>> >>> >> Maybe now my intention will be clearer :) As I said >>>>>>>>>>> >> >>> >>> >> organizational >>>>>>>>>>> >> >>> >>> >> ideas >>>>>>>>>>> >> >>> >>> >> were already mentioned and I agree with them, my >>>>>>>>>>> mail was just >>>>>>>>>>> >> >>> >>> >> to >>>>>>>>>>> >> >>> >>> >> show >>>>>>>>>>> >> >>> >>> >> some >>>>>>>>>>> >> >>> >>> >> aspects from my side, so from theside of developer >>>>>>>>>>> and person >>>>>>>>>>> >> >>> >>> >> who >>>>>>>>>>> >> >>> >>> >> is >>>>>>>>>>> >> >>> >>> >> trying >>>>>>>>>>> >> >>> >>> >> to help others with Spark (via StackOverflow or >>>>>>>>>>> other ways) >>>>>>>>>>> >> >>> >>> >> >>>>>>>>>>> >> >>> >>> >> >>>>>>>>>>> >> >>> >>> >> Pozdrawiam / Best regards, >>>>>>>>>>> >> >>> >>> >> >>>>>>>>>>> >> >>> >>> >> Tomasz >>>>>>>>>>> >> >>> >>> >> >>>>>>>>>>> >> >>> >>> >> >>>>>>>>>>> >> >>> >>> >> ________________________________ >>>>>>>>>>> >> >>> >>> >> Od: Cody Koeninger <c...@koeninger.org> >>>>>>>>>>> >> >>> >>> >> Wysłane: 17 października 2016 16:46 >>>>>>>>>>> >> >>> >>> >> Do: Debasish Das >>>>>>>>>>> >> >>> >>> >> DW: Tomasz Gawęda; dev@spark.apache.org >>>>>>>>>>> >> >>> >>> >> Temat: Re: Spark Improvement Proposals >>>>>>>>>>> >> >>> >>> >> >>>>>>>>>>> >> >>> >>> >> I think narrowly focusing on Flink or benchmarks >>>>>>>>>>> is missing my >>>>>>>>>>> >> >>> >>> >> point. >>>>>>>>>>> >> >>> >>> >> >>>>>>>>>>> >> >>> >>> >> My point is evolve or die. Spark's governance and >>>>>>>>>>> organization >>>>>>>>>>> >> >>> >>> >> is >>>>>>>>>>> >> >>> >>> >> hampering its ability to evolve technologically, >>>>>>>>>>> and it needs >>>>>>>>>>> >> >>> >>> >> to >>>>>>>>>>> >> >>> >>> >> change. >>>>>>>>>>> >> >>> >>> >> >>>>>>>>>>> >> >>> >>> >> On Sun, Oct 16, 2016 at 9:21 PM, Debasish Das >>>>>>>>>>> >> >>> >>> >> <debasish.da...@gmail.com> >>>>>>>>>>> >> >>> >>> >> wrote: >>>>>>>>>>> >> >>> >>> >>> Thanks Cody for bringing up a valid point...I >>>>>>>>>>> picked up Spark >>>>>>>>>>> >> >>> >>> >>> in >>>>>>>>>>> >> >>> >>> >>> 2014 >>>>>>>>>>> >> >>> >>> >>> as >>>>>>>>>>> >> >>> >>> >>> soon as I looked into it since compared to >>>>>>>>>>> writing Java >>>>>>>>>>> >> >>> >>> >>> map-reduce >>>>>>>>>>> >> >>> >>> >>> and >>>>>>>>>>> >> >>> >>> >>> Cascading code, Spark made writing distributed >>>>>>>>>>> code fun...But >>>>>>>>>>> >> >>> >>> >>> now >>>>>>>>>>> >> >>> >>> >>> as >>>>>>>>>>> >> >>> >>> >>> we >>>>>>>>>>> >> >>> >>> >>> went >>>>>>>>>>> >> >>> >>> >>> deeper with Spark and real-time streaming >>>>>>>>>>> use-case gets more >>>>>>>>>>> >> >>> >>> >>> prominent, I >>>>>>>>>>> >> >>> >>> >>> think it is time to bring a messaging model in >>>>>>>>>>> conjunction >>>>>>>>>>> >> >>> >>> >>> with >>>>>>>>>>> >> >>> >>> >>> the >>>>>>>>>>> >> >>> >>> >>> batch/micro-batch API that Spark is good >>>>>>>>>>> at....akka-streams >>>>>>>>>>> >> >>> >>> >>> close >>>>>>>>>>> >> >>> >>> >>> integration with spark micro-batching APIs looks >>>>>>>>>>> like a great >>>>>>>>>>> >> >>> >>> >>> direction to >>>>>>>>>>> >> >>> >>> >>> stay in the game with Apache Flink...Spark 2.0 >>>>>>>>>>> integrated >>>>>>>>>>> >> >>> >>> >>> streaming >>>>>>>>>>> >> >>> >>> >>> with >>>>>>>>>>> >> >>> >>> >>> batch with the assumption is that micro-batching >>>>>>>>>>> is sufficient >>>>>>>>>>> >> >>> >>> >>> to >>>>>>>>>>> >> >>> >>> >>> run >>>>>>>>>>> >> >>> >>> >>> SQL >>>>>>>>>>> >> >>> >>> >>> commands on stream but do we really have time to >>>>>>>>>>> do SQL >>>>>>>>>>> >> >>> >>> >>> processing at >>>>>>>>>>> >> >>> >>> >>> streaming data within 1-2 seconds ? >>>>>>>>>>> >> >>> >>> >>> >>>>>>>>>>> >> >>> >>> >>> After reading the email chain, I started to look >>>>>>>>>>> into Flink >>>>>>>>>>> >> >>> >>> >>> documentation >>>>>>>>>>> >> >>> >>> >>> and if you compare it with Spark documentation, I >>>>>>>>>>> think we >>>>>>>>>>> >> >>> >>> >>> have >>>>>>>>>>> >> >>> >>> >>> major >>>>>>>>>>> >> >>> >>> >>> work >>>>>>>>>>> >> >>> >>> >>> to do detailing out Spark internals so that more >>>>>>>>>>> people from >>>>>>>>>>> >> >>> >>> >>> community >>>>>>>>>>> >> >>> >>> >>> start >>>>>>>>>>> >> >>> >>> >>> to take active role in improving the issues so >>>>>>>>>>> that Spark >>>>>>>>>>> >> >>> >>> >>> stays >>>>>>>>>>> >> >>> >>> >>> strong >>>>>>>>>>> >> >>> >>> >>> compared to Flink. >>>>>>>>>>> >> >>> >>> >>> >>>>>>>>>>> >> >>> >>> >>> >>>>>>>>>>> >> >>> >>> >>> https://cwiki.apache.org/confl >>>>>>>>>>> uence/display/SPARK/Spark+Internals >>>>>>>>>>> >> >>> >>> >>> >>>>>>>>>>> >> >>> >>> >>> >>>>>>>>>>> >> >>> >>> >>> https://cwiki.apache.org/confl >>>>>>>>>>> uence/display/FLINK/Flink+Internals >>>>>>>>>>> >> >>> >>> >>> >>>>>>>>>>> >> >>> >>> >>> Spark is no longer an engine that works for >>>>>>>>>>> micro-batch and >>>>>>>>>>> >> >>> >>> >>> batch...We >>>>>>>>>>> >> >>> >>> >>> (and >>>>>>>>>>> >> >>> >>> >>> I am sure many others) are pushing spark as an >>>>>>>>>>> engine for >>>>>>>>>>> >> >>> >>> >>> stream >>>>>>>>>>> >> >>> >>> >>> and >>>>>>>>>>> >> >>> >>> >>> query >>>>>>>>>>> >> >>> >>> >>> processing.....we need to make it a >>>>>>>>>>> state-of-the-art engine >>>>>>>>>>> >> >>> >>> >>> for >>>>>>>>>>> >> >>> >>> >>> high >>>>>>>>>>> >> >>> >>> >>> speed >>>>>>>>>>> >> >>> >>> >>> streaming data and user queries as well ! >>>>>>>>>>> >> >>> >>> >>> >>>>>>>>>>> >> >>> >>> >>> On Sun, Oct 16, 2016 at 1:30 PM, Tomasz Gawęda >>>>>>>>>>> >> >>> >>> >>> <tomasz.gaw...@outlook.com> >>>>>>>>>>> >> >>> >>> >>> wrote: >>>>>>>>>>> >> >>> >>> >>>> >>>>>>>>>>> >> >>> >>> >>>> Hi everyone, >>>>>>>>>>> >> >>> >>> >>>> >>>>>>>>>>> >> >>> >>> >>>> I'm quite late with my answer, but I think my >>>>>>>>>>> suggestions may >>>>>>>>>>> >> >>> >>> >>>> help a >>>>>>>>>>> >> >>> >>> >>>> little bit. :) Many technical and organizational >>>>>>>>>>> topics were >>>>>>>>>>> >> >>> >>> >>>> mentioned, >>>>>>>>>>> >> >>> >>> >>>> but I want to focus on these negative posts >>>>>>>>>>> about Spark and >>>>>>>>>>> >> >>> >>> >>>> about >>>>>>>>>>> >> >>> >>> >>>> "haters" >>>>>>>>>>> >> >>> >>> >>>> >>>>>>>>>>> >> >>> >>> >>>> I really like Spark. Easy of use, speed, very >>>>>>>>>>> good community >>>>>>>>>>> >> >>> >>> >>>> - >>>>>>>>>>> >> >>> >>> >>>> it's >>>>>>>>>>> >> >>> >>> >>>> everything here. But Every project has to >>>>>>>>>>> "flight" on >>>>>>>>>>> >> >>> >>> >>>> "framework >>>>>>>>>>> >> >>> >>> >>>> market" >>>>>>>>>>> >> >>> >>> >>>> to be still no 1. I'm following many Spark and >>>>>>>>>>> Big Data >>>>>>>>>>> >> >>> >>> >>>> communities, >>>>>>>>>>> >> >>> >>> >>>> maybe my mail will inspire someone :) >>>>>>>>>>> >> >>> >>> >>>> >>>>>>>>>>> >> >>> >>> >>>> You (every Spark developer; so far I didn't have >>>>>>>>>>> enough time >>>>>>>>>>> >> >>> >>> >>>> to >>>>>>>>>>> >> >>> >>> >>>> join >>>>>>>>>>> >> >>> >>> >>>> contributing to Spark) has done excellent job. >>>>>>>>>>> So why are >>>>>>>>>>> >> >>> >>> >>>> some >>>>>>>>>>> >> >>> >>> >>>> people >>>>>>>>>>> >> >>> >>> >>>> saying that Flink (or other framework) is >>>>>>>>>>> better, like it was >>>>>>>>>>> >> >>> >>> >>>> posted >>>>>>>>>>> >> >>> >>> >>>> in >>>>>>>>>>> >> >>> >>> >>>> this mailing list? No, not because that >>>>>>>>>>> framework is better >>>>>>>>>>> >> >>> >>> >>>> in >>>>>>>>>>> >> >>> >>> >>>> all >>>>>>>>>>> >> >>> >>> >>>> cases.. In my opinion, many of these discussions >>>>>>>>>>> where >>>>>>>>>>> >> >>> >>> >>>> started >>>>>>>>>>> >> >>> >>> >>>> after >>>>>>>>>>> >> >>> >>> >>>> Flink marketing-like posts. Please look at >>>>>>>>>>> StackOverflow >>>>>>>>>>> >> >>> >>> >>>> "Flink >>>>>>>>>>> >> >>> >>> >>>> vs >>>>>>>>>>> >> >>> >>> >>>> ...." >>>>>>>>>>> >> >>> >>> >>>> posts, almost every post in "winned" by Flink. >>>>>>>>>>> Answers are >>>>>>>>>>> >> >>> >>> >>>> sometimes >>>>>>>>>>> >> >>> >>> >>>> saying nothing about other frameworks, Flink's >>>>>>>>>>> users (often >>>>>>>>>>> >> >>> >>> >>>> PMC's) >>>>>>>>>>> >> >>> >>> >>>> are >>>>>>>>>>> >> >>> >>> >>>> just posting same information about real-time >>>>>>>>>>> streaming, >>>>>>>>>>> >> >>> >>> >>>> about >>>>>>>>>>> >> >>> >>> >>>> delta >>>>>>>>>>> >> >>> >>> >>>> iterations, etc. It look smart and very often it >>>>>>>>>>> is marked as >>>>>>>>>>> >> >>> >>> >>>> an >>>>>>>>>>> >> >>> >>> >>>> aswer, >>>>>>>>>>> >> >>> >>> >>>> even if - in my opinion - there wasn't told all >>>>>>>>>>> the truth. >>>>>>>>>>> >> >>> >>> >>>> >>>>>>>>>>> >> >>> >>> >>>> >>>>>>>>>>> >> >>> >>> >>>> My suggestion: I don't have enough money and >>>>>>>>>>> knowledgle to >>>>>>>>>>> >> >>> >>> >>>> perform >>>>>>>>>>> >> >>> >>> >>>> huge >>>>>>>>>>> >> >>> >>> >>>> performance test. Maybe some company, that >>>>>>>>>>> supports Spark >>>>>>>>>>> >> >>> >>> >>>> (Databricks, >>>>>>>>>>> >> >>> >>> >>>> Cloudera? - just saying you're most visible in >>>>>>>>>>> community :) ) >>>>>>>>>>> >> >>> >>> >>>> could >>>>>>>>>>> >> >>> >>> >>>> perform performance test of: >>>>>>>>>>> >> >>> >>> >>>> >>>>>>>>>>> >> >>> >>> >>>> - streaming engine - probably Spark will loose >>>>>>>>>>> because of >>>>>>>>>>> >> >>> >>> >>>> mini-batch >>>>>>>>>>> >> >>> >>> >>>> model, however currently the difference should >>>>>>>>>>> be much lower >>>>>>>>>>> >> >>> >>> >>>> that in >>>>>>>>>>> >> >>> >>> >>>> previous versions >>>>>>>>>>> >> >>> >>> >>>> >>>>>>>>>>> >> >>> >>> >>>> - Machine Learning models >>>>>>>>>>> >> >>> >>> >>>> >>>>>>>>>>> >> >>> >>> >>>> - batch jobs >>>>>>>>>>> >> >>> >>> >>>> >>>>>>>>>>> >> >>> >>> >>>> - Graph jobs >>>>>>>>>>> >> >>> >>> >>>> >>>>>>>>>>> >> >>> >>> >>>> - SQL queries >>>>>>>>>>> >> >>> >>> >>>> >>>>>>>>>>> >> >>> >>> >>>> People will see that Spark is envolving and is >>>>>>>>>>> also a modern >>>>>>>>>>> >> >>> >>> >>>> framework, >>>>>>>>>>> >> >>> >>> >>>> because after reading posts mentioned above >>>>>>>>>>> people may think >>>>>>>>>>> >> >>> >>> >>>> "it >>>>>>>>>>> >> >>> >>> >>>> is >>>>>>>>>>> >> >>> >>> >>>> outdated, future is in framework X". >>>>>>>>>>> >> >>> >>> >>>> >>>>>>>>>>> >> >>> >>> >>>> Matei Zaharia posted excellent blog post about >>>>>>>>>>> how Spark >>>>>>>>>>> >> >>> >>> >>>> Structured >>>>>>>>>>> >> >>> >>> >>>> Streaming beats every other framework in terms >>>>>>>>>>> of easy-of-use >>>>>>>>>>> >> >>> >>> >>>> and >>>>>>>>>>> >> >>> >>> >>>> reliability. Performance tests, done in various >>>>>>>>>>> environments >>>>>>>>>>> >> >>> >>> >>>> (in >>>>>>>>>>> >> >>> >>> >>>> example: laptop, small 2 node cluster, 10-node >>>>>>>>>>> cluster, >>>>>>>>>>> >> >>> >>> >>>> 20-node >>>>>>>>>>> >> >>> >>> >>>> cluster), could be also very good marketing >>>>>>>>>>> stuff to say >>>>>>>>>>> >> >>> >>> >>>> "hey, >>>>>>>>>>> >> >>> >>> >>>> you're >>>>>>>>>>> >> >>> >>> >>>> telling that you're better, but Spark is still >>>>>>>>>>> faster and is >>>>>>>>>>> >> >>> >>> >>>> still >>>>>>>>>>> >> >>> >>> >>>> getting even more fast!". This would be based on >>>>>>>>>>> facts (just >>>>>>>>>>> >> >>> >>> >>>> numbers), >>>>>>>>>>> >> >>> >>> >>>> not opinions. It would be good for companies, >>>>>>>>>>> for marketing >>>>>>>>>>> >> >>> >>> >>>> puproses >>>>>>>>>>> >> >>> >>> >>>> and >>>>>>>>>>> >> >>> >>> >>>> for every Spark developer >>>>>>>>>>> >> >>> >>> >>>> >>>>>>>>>>> >> >>> >>> >>>> >>>>>>>>>>> >> >>> >>> >>>> Second: real-time streaming. I've written some >>>>>>>>>>> time ago about >>>>>>>>>>> >> >>> >>> >>>> real-time >>>>>>>>>>> >> >>> >>> >>>> streaming support in Spark Structured Streaming. >>>>>>>>>>> Some work >>>>>>>>>>> >> >>> >>> >>>> should be >>>>>>>>>>> >> >>> >>> >>>> done to make SSS more low-latency, but I think >>>>>>>>>>> it's possible. >>>>>>>>>>> >> >>> >>> >>>> Maybe >>>>>>>>>>> >> >>> >>> >>>> Spark may look at Gearpump, which is also built >>>>>>>>>>> on top of >>>>>>>>>>> >> >>> >>> >>>> Akka? >>>>>>>>>>> >> >>> >>> >>>> I >>>>>>>>>>> >> >>> >>> >>>> don't >>>>>>>>>>> >> >>> >>> >>>> know yet, it is good topic for SIP. However I >>>>>>>>>>> think that >>>>>>>>>>> >> >>> >>> >>>> Spark >>>>>>>>>>> >> >>> >>> >>>> should >>>>>>>>>>> >> >>> >>> >>>> have real-time streaming support. Currently I >>>>>>>>>>> see many >>>>>>>>>>> >> >>> >>> >>>> posts/comments >>>>>>>>>>> >> >>> >>> >>>> that "Spark has too big latency". Spark >>>>>>>>>>> Streaming is doing >>>>>>>>>>> >> >>> >>> >>>> very >>>>>>>>>>> >> >>> >>> >>>> good >>>>>>>>>>> >> >>> >>> >>>> jobs with micro-batches, however I think it is >>>>>>>>>>> possible to >>>>>>>>>>> >> >>> >>> >>>> add >>>>>>>>>>> >> >>> >>> >>>> also >>>>>>>>>>> >> >>> >>> >>>> more >>>>>>>>>>> >> >>> >>> >>>> real-time processing. >>>>>>>>>>> >> >>> >>> >>>> >>>>>>>>>>> >> >>> >>> >>>> Other people said much more and I agree with >>>>>>>>>>> proposal of SIP. >>>>>>>>>>> >> >>> >>> >>>> I'm >>>>>>>>>>> >> >>> >>> >>>> also >>>>>>>>>>> >> >>> >>> >>>> happy that PMC's are not saying that they will >>>>>>>>>>> not listen to >>>>>>>>>>> >> >>> >>> >>>> users, >>>>>>>>>>> >> >>> >>> >>>> but >>>>>>>>>>> >> >>> >>> >>>> they really want to make Spark better for every >>>>>>>>>>> user. >>>>>>>>>>> >> >>> >>> >>>> >>>>>>>>>>> >> >>> >>> >>>> >>>>>>>>>>> >> >>> >>> >>>> What do you think about these two topics? >>>>>>>>>>> Especially I'm >>>>>>>>>>> >> >>> >>> >>>> looking >>>>>>>>>>> >> >>> >>> >>>> at >>>>>>>>>>> >> >>> >>> >>>> Cody >>>>>>>>>>> >> >>> >>> >>>> (who has started this topic) and PMCs :) >>>>>>>>>>> >> >>> >>> >>>> >>>>>>>>>>> >> >>> >>> >>>> Pozdrawiam / Best regards, >>>>>>>>>>> >> >>> >>> >>>> >>>>>>>>>>> >> >>> >>> >>>> Tomasz >>>>>>>>>>> >> >>> >>> >>>> >>>>>>>>>>> >> >>> >>> >>>> >>>>>>>>>>> >> >>> >>> >>>>>>>>>>> >> >>> >> >>>>>>>>>>> >> >>> > >>>>>>>>>>> >> >>> > >>>>>>>>>>> >> > >>>>>>>>>>> >> > >>>>>>>>>>> >> >>>>>>>>>>> >> ------------------------------------------------------------ >>>>>>>>>>> --------- >>>>>>>>>>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>>>>>> >> >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > -- >>>>>>>>>>> > Ryan Blue >>>>>>>>>>> > Software Engineer >>>>>>>>>>> > Netflix >>>>>>>>>>> >>>>>>>>>>> ------------------------------------------------------------ >>>>>>>>>>> --------- >>>>>>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> >>>>>>>>>> Joseph Bradley >>>>>>>>>> >>>>>>>>>> Software Engineer - Machine Learning >>>>>>>>>> >>>>>>>>>> Databricks, Inc. >>>>>>>>>> >>>>>>>>>> [image: http://databricks.com] <http://databricks.com/> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -- Ryan Blue Software Engineer Netflix