Re: Additions to Cassandra ecosystem page?

2021-06-29 Thread Benjamin Lerer
I feel that we are going into a too restrictive direction. I believe that
we have more to win by being open and welcoming.
-1 for the strict approach and for the licences.

Le mar. 29 juin 2021 à 00:40, Ben Bromhead  a écrit :

> On Thu, Jun 24, 2021 at 2:38 AM Joshua McKenzie 
> wrote:
>
> >
> > The obvious core responsibility of the website should be to ASLv2
> > permissively licensed Apache Cassandra and secondarily to CQL as a
> protocol
> > IMO. I don't think we as a project should be tracking derivative works,
> > forks, or other things built on top of the code-base and certainly not
> > things with wildly varied licensing (AGPL, proprietary closed, etc).
> >
> > To go that route we either become fully inclusive of everything or become
> > Kingmakers, and either way there's the consequence of inconsistent levels
> > of vetting, maintenance, and dilution of what it means to "be Cassandra".
> > There's plenty of other websites for other projects and everyone has
> access
> > to search engines.
> >
>
> This makes sense to me as a line in the sand to draw if we are going down a
> strict path.
>
> It would be up to whoever wants to be added to the list to demonstrate this
> is the case.
>
> There would still be some degree of honesty required as well on the service
> providers part.
>


Re: Additions to Cassandra ecosystem page?

2021-06-29 Thread bened...@apache.org
I don’t think it is intractable to come up with a definition that we use for 
inclusion.

1. List no alternative offerings at all.
2. List only those offerings that deploy precisely a released version of 
Cassandra.
3. List only those offerings that deploy precisely a released version of 
Cassandra with modifications that extend a list of public APIs.
4. List only those offerings that deploy precisely a released version of 
Cassandra with modifications that extend a list of public APIs, or are 
themselves published under ASL v2.

Listing a product on our website implicitly endorses that offering, and we 
should absolutely be restrictive about what we endorse. I’m -1 unconditionally 
endorsing competing products, and products that are not themselves clearly some 
derivative work that is accessible to the community under similar terms.

If we cannot agree on a set of conditions, options (1) and (2) are simple, easy 
to administer, adequately restrictive and not inconsistently permissive.

I don’t think this website is going to drive a lot of traffic to any of these 
businesses, so I doubt any of them should be upset at any loss of revenue. The 
question is simply one of principle for us as a project.



From: Benjamin Lerer 
Date: Tuesday, 29 June 2021 at 08:10
To: dev@cassandra.apache.org 
Subject: Re: Additions to Cassandra ecosystem page?
I feel that we are going into a too restrictive direction. I believe that
we have more to win by being open and welcoming.
-1 for the strict approach and for the licences.

Le mar. 29 juin 2021 à 00:40, Ben Bromhead  a écrit :

> On Thu, Jun 24, 2021 at 2:38 AM Joshua McKenzie 
> wrote:
>
> >
> > The obvious core responsibility of the website should be to ASLv2
> > permissively licensed Apache Cassandra and secondarily to CQL as a
> protocol
> > IMO. I don't think we as a project should be tracking derivative works,
> > forks, or other things built on top of the code-base and certainly not
> > things with wildly varied licensing (AGPL, proprietary closed, etc).
> >
> > To go that route we either become fully inclusive of everything or become
> > Kingmakers, and either way there's the consequence of inconsistent levels
> > of vetting, maintenance, and dilution of what it means to "be Cassandra".
> > There's plenty of other websites for other projects and everyone has
> access
> > to search engines.
> >
>
> This makes sense to me as a line in the sand to draw if we are going down a
> strict path.
>
> It would be up to whoever wants to be added to the list to demonstrate this
> is the case.
>
> There would still be some degree of honesty required as well on the service
> providers part.
>


Re: Additions to Cassandra ecosystem page?

2021-06-29 Thread Benjamin Lerer
If I have to choose between the four choices that you proposed I would then
choose (1) List no alternative offerings at all.

Le mar. 29 juin 2021 à 09:34, bened...@apache.org  a
écrit :

> I don’t think it is intractable to come up with a definition that we use
> for inclusion.
>
> 1. List no alternative offerings at all.
> 2. List only those offerings that deploy precisely a released version of
> Cassandra.
> 3. List only those offerings that deploy precisely a released version of
> Cassandra with modifications that extend a list of public APIs.
> 4. List only those offerings that deploy precisely a released version of
> Cassandra with modifications that extend a list of public APIs, or are
> themselves published under ASL v2.
>
> Listing a product on our website implicitly endorses that offering, and we
> should absolutely be restrictive about what we endorse. I’m -1
> unconditionally endorsing competing products, and products that are not
> themselves clearly some derivative work that is accessible to the community
> under similar terms.
>
> If we cannot agree on a set of conditions, options (1) and (2) are simple,
> easy to administer, adequately restrictive and not inconsistently
> permissive.
>
> I don’t think this website is going to drive a lot of traffic to any of
> these businesses, so I doubt any of them should be upset at any loss of
> revenue. The question is simply one of principle for us as a project.
>
>
>
> From: Benjamin Lerer 
> Date: Tuesday, 29 June 2021 at 08:10
> To: dev@cassandra.apache.org 
> Subject: Re: Additions to Cassandra ecosystem page?
> I feel that we are going into a too restrictive direction. I believe that
> we have more to win by being open and welcoming.
> -1 for the strict approach and for the licences.
>
> Le mar. 29 juin 2021 à 00:40, Ben Bromhead  a écrit :
>
> > On Thu, Jun 24, 2021 at 2:38 AM Joshua McKenzie 
> > wrote:
> >
> > >
> > > The obvious core responsibility of the website should be to ASLv2
> > > permissively licensed Apache Cassandra and secondarily to CQL as a
> > protocol
> > > IMO. I don't think we as a project should be tracking derivative works,
> > > forks, or other things built on top of the code-base and certainly not
> > > things with wildly varied licensing (AGPL, proprietary closed, etc).
> > >
> > > To go that route we either become fully inclusive of everything or
> become
> > > Kingmakers, and either way there's the consequence of inconsistent
> levels
> > > of vetting, maintenance, and dilution of what it means to "be
> Cassandra".
> > > There's plenty of other websites for other projects and everyone has
> > access
> > > to search engines.
> > >
> >
> > This makes sense to me as a line in the sand to draw if we are going
> down a
> > strict path.
> >
> > It would be up to whoever wants to be added to the list to demonstrate
> this
> > is the case.
> >
> > There would still be some degree of honesty required as well on the
> service
> > providers part.
> >
>


Re: Additions to Cassandra ecosystem page?

2021-06-29 Thread Paulo Motta
> Listing a product on our website implicitly endorses that offering, and
we should absolutely be restrictive about what we endorse. I’m -1
unconditionally endorsing

I don't think listing a product on the website means implicitly endorsing
it if it's explicitly mentioned with a visible disclaimer that we're not
endorsing the listed products.

>From my experience, an ecosystem page is an open wiki editable by anyone
with the sole objective of allowing external users to easily find anything
related to the project, and not a list of "unconditionally endorsed"
offerings.

Why not take a community-driven laissez-faire approach and just let people
list whatever they want if they feel part of the community, with the
explicit disclaimer that being on the list is not a project endorsement of
the offering? For instance, Apache Kafka uses very simple wording to convey
this [1]: "Here is a list of tools *we have been* told about that integrate
with Kafka outside the main distribution. *We haven't tried them all, so
they may not work*!" [1]

I think it's fine to bikeshed how to categorize offerings, present the
list, word the disclaimer and even remove clear violations of good faith,
but I don't think we should be over presumptuous and prescribe what is
allowed and forbidden on a public wiki of an open source project.

Two objective suggestions I'd like to make are:
- Give more visibility/prominence to
auxiliary/complementary/supplementary/non-competing/open-source
projects/products by listing them at the top of the page, and list
closed-source / SaaS / API-compatible offerings under its own category at
the bottom of the page with maybe an additional disclaimer that not all
features may be available on these offerings.
- There are 3 sub-offerings from a single vendor in the "Professional
Services" category, but I think it's sufficient to list each service
provider once per category, since the sub-offerings can be easily found by
visiting the service provider website.

Paulo
-

[1] https://spark.apache.org/third-party-projects.html

Em ter., 29 de jun. de 2021 às 04:48, Benjamin Lerer 
escreveu:

> If I have to choose between the four choices that you proposed I would then
> choose (1) List no alternative offerings at all.
>
> Le mar. 29 juin 2021 à 09:34, bened...@apache.org  a
> écrit :
>
> > I don’t think it is intractable to come up with a definition that we use
> > for inclusion.
> >
> > 1. List no alternative offerings at all.
> > 2. List only those offerings that deploy precisely a released version of
> > Cassandra.
> > 3. List only those offerings that deploy precisely a released version of
> > Cassandra with modifications that extend a list of public APIs.
> > 4. List only those offerings that deploy precisely a released version of
> > Cassandra with modifications that extend a list of public APIs, or are
> > themselves published under ASL v2.
> >
> > Listing a product on our website implicitly endorses that offering, and
> we
> > should absolutely be restrictive about what we endorse. I’m -1
> > unconditionally endorsing competing products, and products that are not
> > themselves clearly some derivative work that is accessible to the
> community
> > under similar terms.
> >
> > If we cannot agree on a set of conditions, options (1) and (2) are
> simple,
> > easy to administer, adequately restrictive and not inconsistently
> > permissive.
> >
> > I don’t think this website is going to drive a lot of traffic to any of
> > these businesses, so I doubt any of them should be upset at any loss of
> > revenue. The question is simply one of principle for us as a project.
> >
> >
> >
> > From: Benjamin Lerer 
> > Date: Tuesday, 29 June 2021 at 08:10
> > To: dev@cassandra.apache.org 
> > Subject: Re: Additions to Cassandra ecosystem page?
> > I feel that we are going into a too restrictive direction. I believe that
> > we have more to win by being open and welcoming.
> > -1 for the strict approach and for the licences.
> >
> > Le mar. 29 juin 2021 à 00:40, Ben Bromhead  a
> écrit :
> >
> > > On Thu, Jun 24, 2021 at 2:38 AM Joshua McKenzie 
> > > wrote:
> > >
> > > >
> > > > The obvious core responsibility of the website should be to ASLv2
> > > > permissively licensed Apache Cassandra and secondarily to CQL as a
> > > protocol
> > > > IMO. I don't think we as a project should be tracking derivative
> works,
> > > > forks, or other things built on top of the code-base and certainly
> not
> > > > things with wildly varied licensing (AGPL, proprietary closed, etc).
> > > >
> > > > To go that route we either become fully inclusive of everything or
> > become
> > > > Kingmakers, and either way there's the consequence of inconsistent
> > levels
> > > > of vetting, maintenance, and dilution of what it means to "be
> > Cassandra".
> > > > There's plenty of other websites for other projects and everyone has
> > > access
> > > > to search engines.
> > > >
> > >
> > > This makes sense to me as a line in the sand to draw if we ar

Re: [DISCUSS] CEP-9: Make SSLContext creation pluggable

2021-06-29 Thread Maulin Vasavada
^^^ bumping up ^^^ this thread since people might have more time reviewing
post 4.0 work. Specifically for this

section in the CEP, I have coded for one option (here
)
and now will do for another option very soon.

On Wed, Jun 2, 2021 at 5:11 PM Maulin Vasavada 
wrote:

> Thank you Dinesh and everybody. Will keep calm and wait for the feedback.
> Meanwhile I am experimenting with various implementation options for what I
> put as "will seek community's input
> "
> on the CEP document and learning little bit more about the CircleCI.
>
> On Wed, Jun 2, 2021 at 4:08 PM Dinesh Joshi 
> wrote:
>
>> Hi Maulin,
>>
>> Thank you for the CEP & Patch. I’ve been following along albeit silently.
>> Will take a look. It’s just that we’re currently busy so bear with us.
>>
>> Thanks,
>>
>> Dinesh
>>
>> > On Jun 2, 2021, at 3:28 PM, Maulin Vasavada 
>> wrote:
>> >
>> > Hi all
>> >
>> > ^^^ bump ^^^ I've raised the PR and am waiting for the review. Once I
>> see
>> > that the suggested changes are directionally right I'll start a VOTE
>> thread
>> > on the CEP (unless I am recommended to follow another process).
>> >
>> > Thanks
>> > Maulin
>> >
>> >> On Thu, May 27, 2021 at 1:29 PM Maulin Vasavada <
>> maulin.vasav...@gmail.com>
>> >> wrote:
>> >>
>> >> HI all
>> >>
>> >> I've raised the PR with the changes. Specifically I would appreciate
>> the
>> >> community's input on this section of the CEP
>> >> <
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-9%3A+Make+SSLContext+creation+pluggable#CEP9:MakeSSLContextcreationpluggable-ImportantnoteaboutcommonSSLconfigurations
>> >
>> >> .
>> >>
>> >> Once we get some consensus on the PR (except minor code improvement
>> >> suggestions) I'll start a VOTE thread for the CEP.
>> >>
>> >> I thank all the reviewers of the CEP and the PR in advance and am
>> >> completely excited to contribute to Apache Cassandra.
>> >>
>> >> Thanks
>> >> Maulin
>> >>
>> >> On Thu, May 27, 2021 at 11:04 AM Maulin Vasavada <
>> >> maulin.vasav...@gmail.com> wrote:
>> >>
>> >>> Sounds good Brandon. I'll raise the PR in a couple of hours from now.
>> >>> Thanks.
>> >>>
>> >>> On Thu, May 27, 2021 at 10:14 AM Brandon Williams 
>> >>> wrote:
>> >>>
>>  You can raise a PR in any state, but review will be a different
>>  matter.  I would go ahead and raise it and the testing can be sorted
>>  out from there.
>> 
>>  On Thu, May 27, 2021 at 12:12 PM Maulin Vasavada
>>   wrote:
>> >
>> > Hi all
>> >
>> > I think I am close to raising a PR now but my CircleCI job
>> > <
>> https://app.circleci.com/pipelines/github/maulin-vasavada/cassandra>
>> > doesn't make progress beyond key tasks success like unit tests,
>> dtests,
>> > cqlshlibtests. Any recommendation on if we need to see the whole
>>  CircleCI
>> > job green before raising the PR?
>> >
>> > Thanks
>> > Maulin
>> >
>> > On Fri, May 21, 2021 at 8:54 PM Maulin Vasavada <
>>  maulin.vasav...@gmail.com>
>> > wrote:
>> >
>> >> I am almost done with all changes except the code snippet in the
>> >> EncryptioOptions.java which determines 'enabled' and 'optional'
>>  encryption
>> >> flags. Will raise the PR soon once I see my CircleCI getting green.
>> >>
>> >> On Fri, May 21, 2021 at 9:24 AM Maulin Vasavada <
>>  maulin.vasav...@gmail.com>
>> >> wrote:
>> >>
>> >>> FYI - I am working on PR. I made some changes and trying to run
>>  tests.
>> >>>
>> >>> On Tue, May 18, 2021 at 10:14 PM Maulin Vasavada <
>> >>> maulin.vasav...@gmail.com> wrote:
>> >>>
>>  Thanks Nate for reviewing the CEP. Yes for change #3 in the CEP,
>> I
>>  mean
>>  to have only single Default Impl and that would be a final class,
>>  not
>>  overridable. It will be basically an internal implementation.
>> I've
>>  updated
>>  the CEP to reflect this.
>> 
>>  On Tue, May 18, 2021 at 7:21 PM Nate McCall 
>>  wrote:
>> 
>> > Hi Maulin,
>> > Thanks for putting this together!
>> >
>> > Took a quick glance, and I can't think of a compelling reason on
>>  why
>> > SSLContext should be final and your point about
>>  organization/compliance
>> > issues requiring different implementations is a good one.
>> >
>> > Per #3 on your proposed changes, I'm keen to only support a
>> single
>> > default
>> > impl in-tree. I don't think we should be i

[DISCUSS] CASSANDRA-16767, CASSANDRA-16768, and CASSANDRA-16769 for 3.11.x

2021-06-29 Thread Scott Carey
I'd like to discuss the inclusion of the above tickets for a 3.11.x
release.  These are not a pure 'bug fix' so I'll need a waiver to get them
into 3.11.x  (and implicitly, 4.0.x).

The first two are straightforward oversights:  neither *nodetool
garbagecollect *nor *nodetool scrub* currently accept a *--user-defined*
parameter list of SSTables in the same way that *nodetool compact* does.

This is an operational problem for large tables.

I often need to scrub just one file that is corrupted for some reason, and
not scrub an entire 1TB+ of data for a table on a node.  This renders
'nodetool scrub' operationally useless for large tables.

For *garbagecollect* it is often operationally easy to identify which
tables are likely to be full of bloa- and operationally useful to do this
task in small increments.  The existing order that garbagecollect processes
SSTables prevents it from being useful in any incremental fashion -- if you
stop it and later restart, it will first process the SSTables you just
garbage collected.

The third ticket adds an option for* nodetool garbagecollect*,
*--oldest-fraction* that can select a fraction of the oldest table data in
bytes, and garbagecollect only the SSTables that 'cover' that percentage of
data.  Operationally, this lends itself to easy automation -- for example
running this once a week on 10% of a table's data would imply that there is
no data on disk that has been overwritten within the last 10 weeks.  This
caps data bloat in ways neither LCS nor STCS can currently achieve without
regular major compactions or full-pass garbagecollect.

I have a large LCS table that has existed in steady state for about two
years. Its oldest SSTable files were about 20 months old.  These old tables
were 95% bloated by that time -- 'garbagecollect' was able to shrink those
to 5% of their original size.
Being able to automate garbagecollect on a small fraction of the older data
would be a big disk space and performance win, without the downsides of a
major compaction.

The overall risk of these additions is low:

   - They do not modify any existing behavior, only add new options.
   - They re-use existing machinery for most of the work, and only adds
   logic in areas that are already well tested.  The areas that need the most
   scrutiny in review have good test coverage.
   - scripts that worked with nodetool before should continue to work
   except for the case where a keyspace is named --user-defined or
   --oldest-fraction, but this flaw already exists with other nodetool
   commands.
   - Three is no modification to sensitive areas like the read, write, or
   autocompaction path.  This merely does the same thing that is already done,
   just on a subset of SSTables rather than all of them.



Thanks for considering this proposal,

-Scott Carey


P.S.
You might wonder why the --oldest-fraction is necessary when one can use
--user-defined and some OS level scripting.

   1.  --oldest-fraction calculates the SSTable fraction based on the total
   data size, not file count.
   2. nodetool can avoid race conditions with autocompaction on sstable
   selection
   3. nodetool has access to the current state of active SSTables, a script
   just sees files on disk, files that might be scheduled for delete or files
   that are actively being written to.
   4. Even if used at a 100% fraction, it processes from oldest to newest
   by the SSTable generation number, meaning that if it is interrupted half
   way through, then re-started, it won't immediately work on the files that
   were just processed, as those will have the largest generation number.