I want to clarify that my plea here is just that we acknowledge that once we
adopt drivers (especially if all of them), the "project" becomes quite big.

All sane big projects have a minimum of organization, so let's make sure we
have enough organization to make sure we don't make our future lives harder
than it needs to. And there is a clear and natural separation between the
server and (each) drivers, so that's an obvious point of
organization/separation.

Again, at a "high" level, I'm in favor of the Cassandra project being both
server and drivers (not saying it's not debatable). So a single _user_ ML
make
sense, as well as a single web site, document and CEP process (I do see CEP
as
being somewhat high-ish level).

My concern is more for the day-to-day maintenance work. Here, I think there
is
gonna be 3 types of people:
1. some will _primarily_ focus on (a) driver development.
2. some will _primarily_ focus on server development.
3. some may have interested in both, but won't be able to focus too much on
   either (because again, the sum is too big, and in a way too unrelated).

And I actually expect 1 and 2 to preponderantly drive the day-to-day
maintenance. So I'd like to keep things easy for those population (but
obviously, with the goal of not hinder collaboration and consistency
overall).

Concretely, my initial thinking (but haven't think some of those through a
lot)
are:
- as said above, user list, web site, documentation and CEP would be global.
- new specific JIRA projects for drivers, and JIRA notifications going to
  separate 'commits' mailing lists. To me, that one point is a no-brainer,
  I don't see why we wouldn't do that, and I'll fight for that one.
- dev mailing lists: I'm conflicted. I see a few "dev" discussion gaining
from
  being common, but I think most won't be (common). My gut reaction was
  to suggest separate lists but I'm warming up to the idea of experimenting
  with one and splitting later if it's unmanageable.
- source repository: I think I don't have a super strong opinion so far. I'm
  not a fan of abusing mono-repo, and I think it would be overall cleaner to
  have separate repo with separate history. But I reckon there is pros to
  mono-repo as well so this might boil down to a personal preference.
- committers and PMC members pool: I believe that if we keep the
organization
  of a single project in the Apache sense (which again, is debatable but I'm
  in favor at this point), then that imply a single pool of committers/PMC
  members. Which is fine by me, outside of the fact that it imo makes it
  even more urgent to have the PMC conclude some ongoing and never
  concluded discussions (around more objective criteria for committers/PMC
  members nominations).
- other: there is actually a bunch of other things we'll need to discuss in
  that scenario. For instance, DataStax drivers currently have their
  independent release cycles and versioning.Especially if we go the
  mono-repo route, then it would make sense to move towards releasing
  everything together as Stephen mentions Tinkerpop is doing, but that
  in turn may require a non trivial amount of build-tools setup.

Lastly, and to Stephen's previous email, it might be more manageable to
accept one drivers first and figure all the details/issues/questions that
are bound to arise before accepting more. It's worth discussing at least.

> In the Venn diagram of overlap vs. non between the two projects, I see
there
> being more overlap than not.

I'll address, because it's an important point. If we're talking day to day
maintenance, so the bulk of the work really, then I feel rather confident
saying that you are wrong, that the vast majority of the work is mostly
unrelated.

Which is important, because that's really why I said that no-one can
effectively focus on both sides. You can only focus on one and only dabble
in
the other(s), because the overlap is not that big.

--
Sylvain


On Mon, Apr 27, 2020 at 11:34 PM Nate McCall <zznat...@gmail.com> wrote:

> Thanks, Stephen, this is really helpful!
>
> On Tue, Apr 28, 2020 at 6:24 AM Stephen Mallette <spmalle...@gmail.com>
> wrote:
>
> > >
> > > To step out of the weeds a bit - other than the Zookeeper / Curator
> > > example, does anyone know of any other apache projects that have either
> > > subprojects or complementary sideprojects they're interdependent upon
> in
> > > their ecosystems?
> >
> >
> > Every Apache project is different, so it's quite possible that the
> > experience I have in this area doesn't apply much here, but I'll offer
> some
> > words on the matter in the event that some of it is helpful.
> >
> > For many years even prior to joining Apache, TinkerPop was quite against
> > bringing in driver-style sub-projects. Our main concern was one that I
> > think was voiced here in this thread in some fashion, where core
> developers
> > would have to be knowledgeable of the incoming body of work and maintain
> > that going forward. For core contributors who were primarily Java
> > developers it was difficult to think that we'd suddenly be responsible
> for
> > reviews/VOTEs on Python code, for example.  It was with a bit of
> > trepidation that we eventually decided it a good idea and opened the
> > project to them. For our purposes we brought all such projects directly
> > into our core repository as the thinking was that we wanted to keep all
> > aspects of the project unified (testing, release, etc) to ensure that
> for a
> > particular release tag you could be sure that everything worked together.
> > We initially started with just Python and developed that as our model for
> > how new drivers would arrive (there were already other disparate projects
> > out there in other languages).
> >
> > We wanted a model that ensured a reasonably high bar for acceptance and
> > created a rough set of minimum criteria we wanted to have for adding a
> new
> > driver to our release lines. The core of that criteria was a common
> > language agnostic test suite that needed to pass for us to consider it
> > "ready" in any sense and the project needed to build, test and release
> > using Maven (which is our build tool for the project). The former ensured
> > that we had a reasonable level of common tested functionality among
> drivers
> > and the latter ensured an easy and consistent way to manage build/release
> > practices (which fed nicely into our Docker infrastructure for both full
> > builds and for giving non-JVM developers a nice way to develop drivers
> > against the latest code without having to be Java experts). Once we
> > established this approach with Python, we successfully brought in .NET
> and
> > Javascript.
> >
> > I think there were a number of nice upsides to deciding to bring in
> drivers
> > in the first place and then in the model for acceptance that we chose:
> >
> > + We saw a greater diversity of folks contributing in general as the
> > ecosystem opened up beyond just the JVM.
> > + We saw that the general community coalesced around the "official"
> > drivers, contributing as one to them, rather than going off and creating
> > one-off projects. I'm not really aware of any third-party drivers right
> now
> > for the languages we support, but if you look at something like Go, there
> > are three or more choices. I suppose Go would be our next target for
> > official inclusion.
> > + Release day was pretty simple despite the complexity of the environment
> > with that mixed ecosystem because of our unified build model using Maven
> > and there wasn't a lot of disparate tooling exposed to the release
> manager
> > directly.
> > + I can't say that we really saw problems with core project developers
> (who
> > mostly new Java) having to review python/c#/javascript. For the most
> part,
> > the contribution quality was high and we managed and became more
> > knowledgeable as we went.
> > + As we released drivers and core together, we no longer had situations
> > where some third-party driver lagged behind some feature in core - if you
> > wanted to use the latest core functionality you just used the latest
> > release of core and driver and you could be assured they worked together
> > and we felt confident saying so.
> >
> > Doing it over again, I think I would still consider going single repo for
> > this situation but I think I might not place the requirement that the
> > projects build with Maven. I think Maven has turned-off some contributors
> > from those language ecosystems who don't know the JVM. They would have
> been
> > much more comfortable just working more directly with the tool systems
> that
> > they were familiar with. Of course, to get rid of local maven builds
> > completely we would have to build a "latest" Docker images so that folks
> > didn't need to do that themselves like they do now (also with Maven).
> >
> > Aside from TinkerPop experiences I will offer that, while I'm not
> > completely sure, I think that for a contribution like this one where the
> > bulk of the code has been developed outside of the ASF, the DS drivers
> > would need to go through an IP Clearance process:
> >
> > https://incubator.apache.org/ip-clearance/
> >
> >
> >
> > On Mon, Apr 27, 2020 at 12:57 PM Joshua McKenzie <jmcken...@apache.org>
> > wrote:
> >
> > > To step out of the weeds a bit - other than the Zookeeper / Curator
> > > example, does anyone know of any other apache projects that have either
> > > subprojects or complementary sideprojects they're interdependent upon
> in
> > > their ecosystems? I'd like to reach out to some other pmc's for advice
> > and
> > > feedback on this topic since there's no sense in reinventing the wheel
> if
> > > other projects have wisdom to share on this.
> > >
> > > On Mon, Apr 27, 2020 at 12:42 PM Joshua McKenzie <jmcken...@apache.org
> >
> > > wrote:
> > >
> > > > re: ML noise, how hard would it be to filter out JIRA updates
> > w/component
> > > > "Drivers"? Or from JIRA queries?
> > > >
> > > > For governance, I see it cutting both ways. If we have two separate
> > > > projects and ML's for drivers and C*, how do we keep a coherent view
> of
> > > new
> > > > features and roadmap stuff? Do we have CEP's for both projects and
> tie
> > > them
> > > > together? Do we drive changes in the driver feature ecosystem via
> CEP's
> > > in
> > > > C*?
> > > >
> > > > In the Venn diagram of overlap vs. non between the two projects, I
> see
> > > > there being more overlap than not.
> > > >
> > > > On Mon, Apr 27, 2020 at 12:34 PM Dinesh Joshi <djo...@apache.org>
> > wrote:
> > > >
> > > >>
> > > >>
> > > >> > On Apr 27, 2020, at 2:50 AM, Sylvain Lebresne <lebre...@gmail.com
> >
> > > >> wrote:
> > > >> >
> > > >> > Fwiw, I agree with the concerns raised by Benedict, and think we
> > > should
> > > >> > carefully think about how this is handled. Which isn't not a
> > rejection
> > > >> of
> > > >> > the donation in any way.
> > > >> >
> > > >> > Drivers are not small projects, and the majority of their day to
> day
> > > >> > maintenance is unrelated to the server (and the reverse is true).
> > > >> >
> > > >> > From the user point of view, I think it would be fabulous that
> > > Cassandra
> > > >> > appears like one project with a server and some official drivers,
> > with
> > > >> one
> > > >> > coherent website and documentation for all. I'm all for striving
> for
> > > >> that.
> > > >>
> > > >> +1
> > > >>
> > > >> > Behind the scenes however, I feel tings should be setup so that
> some
> > > >> amount
> > > >> > of
> > > >> > separation remains between server and whichever drivers are
> donated
> > > and
> > > >> > accepted, or I'm fairly sure things would get messy very
> > quickly[1]).
> > > >> In my
> > > >>
> > > >> Can you say more about what "getting messy very quickly" means here?
> > > >>
> > > >> > mind that means *at a minimum*:
> > > >> > - separate JIRA projects.
> > > >> > - dedicated _dev_ (and commits) mailing lists.
> > > >>
> > > >> If we're thinking through how this would be setup, initially we had
> > the
> > > >> same Jira project for sidecar but now there is a separate one to
> track
> > > >> sidecar specific jiras. At the moment we do not have a separate
> > mailing
> > > >> list. I think Cassandra dev mailing list's volume is low enough to
> > keep
> > > >> using the same ML. There is an added value that it gives visibility
> > and
> > > >> developers don't need to go between multiple mailing lists.
> > > >>
> > > >> > But it's also worth thinking whether a single pool of
> committers/PMC
> > > >> > members is
> > > >> > desirable.
> > > >> >
> > > >> > Tbc, I'm not sure what is the best way to achieve this within the
> > > >> > constraint of
> > > >> > the Apache fundation, and maybe I'm just stating the obvious here.
> > > >> >
> > > >> >
> > > >> > [1] fwiw, I say this as someone that at some points in time was
> > > >> > simultaneously
> > > >> > somewhat actively involved in both Cassandra and the DataStax Java
> > > >> driver.
> > > >> >
> > > >> > --
> > > >> > Sylvain
> > > >> >
> > > >> >
> > > >> > On Fri, Apr 24, 2020 at 12:54 AM Benedict Elliott Smith <
> > > >> bened...@apache.org>
> > > >> > wrote:
> > > >> >
> > > >> >> Do you have some examples of issues?
> > > >> >>
> > > >> >> So, to explain my thinking: I believe there is value in most
> > > >> contributors
> > > >> >> being able to know and understand a majority of what the project
> > > >> >> undertakes.  Many people track a wide variety of activity on the
> > > >> project,
> > > >> >> and whether they express an opinion they probably form one and
> will
> > > >> involve
> > > >> >> themselves if they consider it important to do so.  I worry that
> > > >> importing
> > > >> >> several distinct and only loosely related projects to the same
> > > >> governance
> > > >> >> and communication structures has a strong potential to undermine
> > that
> > > >> >> capability, as people begin to assume that activity and
> > > >> decision-making is
> > > >> >> unrelated to them - and if that happens I think something
> important
> > > is
> > > >> lost.
> > > >> >>
> > > >> >> The sidecar challenges this already but seems hopefully
> manageable:
> > > it
> > > >> is
> > > >> >> a logical extension of Cassandra, existing primarily to plug gaps
> > in
> > > >> >> Cassandra's own functionality, and features may migrate to
> > Cassandra
> > > >> over
> > > >> >> time.  It is likely to have releases closely tied to Cassandra
> > > itself.
> > > >> >> Other subprojects are so far exclusively for consumption by the
> > > >> Cassandra
> > > >> >> project itself, and are all naturally coupled.
> > > >> >>
> > > >> >> Drivers however are inherently arms-length endeavours: we
> publish a
> > > >> >> protocol specification, and driver maintainers implement it.
> They
> > > are
> > > >> >> otherwise fairly independent, and while a dialogue is helpful it
> > does
> > > >> not
> > > >> >> need to be controlled by a single entity.  Many drivers will
> > continue
> > > >> to be
> > > >> >> controlled by others, as they have been until now.  We're of
> course
> > > >> able to
> > > >> >> ensure there's a strong overlap of governance, which I think
> would
> > be
> > > >> very
> > > >> >> helpful, and something Curator and Zookeeper seem not to have
> > > managed.
> > > >> >>
> > > >> >> Looking at the Curator website, it also seems to pitch itself as
> a
> > > >> >> relatively opinionated product, and much more than a driver.  I
> > hope
> > > >> the
> > > >> >> recipe for conflict in our case is much more limited given the
> > > >> functional
> > > >> >> scope of a driver - and anyway better avoided with more
> integrated,
> > > but
> > > >> >> still distinct governance.
> > > >> >>
> > > >> >> That's not to say I don't see some value in the project
> controlling
> > > the
> > > >> >> driver directly, I just worry about the above.
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >> On 22/04/2020, 21:25, "Nate McCall" <zznat...@gmail.com> wrote:
> > > >> >>
> > > >> >>    On Thu, Apr 23, 2020 at 5:37 AM Benedict Elliott Smith <
> > > >> >> bened...@apache.org>
> > > >> >>    wrote:
> > > >> >>
> > > >> >>> I welcome the donation, and hope we are able to accept all of
> the
> > > >> >>> drivers.  This is really great news IMO.
> > > >> >>>
> > > >> >>> I do however wonder if the project may be accumulating too many
> > > >> >>> sub-projects?  I wonder if it's time to think about splitting,
> and
> > > >> >> perhaps
> > > >> >>> incubating a project for the drivers?
> > > >> >>>
> > > >> >>
> > > >> >>    This is a legit concern and good question, but I think this is
> > > more
> > > >> a
> > > >> >>    natural evolution of growing a project. There is precedent for
> > > this
> > > >> in
> > > >> >>    Spark, Beam, Hadoop and others who have a number of different
> > > >> >> repositories
> > > >> >>    under the general project umbrella.
> > > >> >>
> > > >> >>    What I would like to avoid is a situation like with Apache
> > Curator
> > > >> and
> > > >> >>    Apache Zookeeper. The former being a zookeeper client donation
> > > from
> > > >> >> Netflix
> > > >> >>    that came in as a top level project. From the peanut gallery,
> it
> > > >> seems
> > > >> >> like
> > > >> >>    that has been less than ideal a couple of times in the past
> > > >> >> coordinating
> > > >> >>    releases, trademarks and such with separate project
> management.
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >>
> > ---------------------------------------------------------------------
> > > >> >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > >> >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> > > >> >>
> > > >> >>
> > > >>
> > > >>
> > > >>
> ---------------------------------------------------------------------
> > > >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> > > >>
> > > >>
> > >
> >
>

Reply via email to