On 2 December 2015 at 23:04, Greg Stein <gst...@gmail.com> wrote: > On Wed, Dec 2, 2015 at 8:50 PM, Julian Hyde <jh...@apache.org> wrote: > > > Thanks, Roman. For the record, I don’t plan to contribute to Impala or > > Kudu, and I don’t like strict commit policies such as RTC. But I wanted > to > > stand up for “states' rights”, the right of podlings and projects to > > determine their own processes and cultures. > > > > LOL ... being a Texan, I can certainly get on board with the notion of > states' rights :-P > > But I caution: as I said else-thread, we use the Incubation process because > we believe the podling needs to *learn* how we like communities to operate. > Peer respect, inclusivity, open dialog, consensus, etc. By definition, the > podling is unable to make these decisions within the guides and desires of > the Foundation. If we trusted them to do so, then we'd just make them a TLP > and skip incubation. > > Josh puts it well: > > On Thu, Dec 3, 2015 at 12:26 AM, Josh Elser <els...@apache.org> wrote: > >... > > > +1 I'm not entirely sold on saying they have no explicitly policy up > front > > (I'd be worried about that causing confusion -- the project will operate > > how they're comfortable operating), but I'd definitely want to see _real_ > > discussion is had after the podling gets on its feet and grows beyond the > > initial membership. > > > > I'd like to see podlings have enough diversity and independence from the > initial PPMC, to have such a discussion. My fear is that RTC holds back > growing the diversity of opinion, and that status quo will not allow for > moving away from Gerrit. > > ... > > I will also note that one of the primary reasons explained for RTC is "the > code is too complex to allow for unreviewed changes to be applied". Has > that basis been justified for Impala? Are we talking data loss? Nope. It's > a layer over the data substrate. Synchronization across a cluster? Nah. > Where is the complexity? >
I'm happy to field technical questions about Impala. You seem to be conflating 'complexity' with 'severity of potential bugs' - I see the two as separate. Under the 'severity' heading, Impala both writes and reads data from a variety of data stores. So if there's a bug in Impala's write path, data can be lost. But because Impala also returns results to client applications, there's a significant risk of business impact if the *wrong* results are returned. I know, because I have dealt with situations where this has happened, and no-one is very happy about it. Our customers typically run business-critical analytic workloads through Impala; if it stops working correctly that's usually a big problem. As far as 'complexity' goes, I make no comparative claims about Impala's complexity vs any other project. But to give some indication of the moving parts inside Impala: there's a component which compiles highly optimised versions of each query operator at run time, there's a query planner which parses and plans a large portion of the SQL standard, there is the added complexity of being a 'massively' (with many deployments in the high 100s of nodes) distributed system with the added coordination and consistency guarantees that brings to it, and there is also the added complexity of running highly concurrent workloads in a single process, with all the concurrency headaches etc. that can bring. That's not to mention implementations of 'standard' SQL operators like joins, sorts and so on that are still the subject of active research in academia and industry. All this is in the context of Impala's main differentiator, which is that it is amongst the very fastest SQL engine for data stored in HDFS and friends. That means that small changes can have large unexpected consequences, since efficiency is a subtle and capricious thing. It has always, therefore, helped us to have more than one set of eyes on every change in the past, to ensure that the probability of the introduction of subtle performance and functional regressions is reduced. Automated testing plays a huge role here as well, but for us it's been most effective in concert with code review. (There are other reasons I vastly prefer RTC as well, but I'm addressing your specific points here so as not to kick off another RTCvsCTR thread :)). > > In this case, the RTC seems to stem from the choice of Gerrit, rather than > some innate complexity. > > Gerrit does not mandate RTC, since you can just push to refs/heads/<branch> and bypass the review creation step. Historically, the Impala team at Cloudera has used at least three different review tools (including Review Board, which is used elsewhere at the ASF). The choice of review tool stems completely from pragmatism - we really did not like Review Board, and briefly used Rietveld before moving to Gerrit which we have preferred. At every step, we used RTC. Henry > I *do* note that possibly committers could choose to commit directly, or > choose to use Gerrit when they are unsure. Will the (P)PMC allow those > direct commits? Or mandate Gerrit for every commit? > > Cheers, > -g >