Re: [VOTE] Release Apache Cassandra 4.1.0 GA
> > Given that we're talking about two one-liners, just changing time units, > what are folks thoughts about skipping rc2 and just re-cutting 4.1.0 (GA) ? It sounds reasonable to me. Le mar. 6 déc. 2022 à 22:36, Aleksey Yeshchenko a écrit : > Sure. > > On 6 Dec 2022, at 18:14, Mick Semb Wever wrote: > > Let’s get these fixes in and roll a pure RC2. >> > > > Given that we're talking about two one-liners, just changing time units, > what are folks thoughts about skipping rc2 and just re-cutting 4.1.0 (GA) ? > > (I agree with the rules, and am in favour of making the exception.) > > >
Re: Aggregate functions on collections, collection functions and MAXWRITETIME
The comparator for collections is the lexicographical compare on the collection items. That might nor be the more useful thing but it's not impossible to imagine cases where that ordering can be useful. To make a random example, you can use a list column to store the name and surnames of a person, considering that some persons can have multiple surnames. You can then sort rows based on that list column, to get the names in alphabetical order, or to get the first or last person according to that order. I'm sure we can think on more cases were the lexicographical order of a list can be useful, although I agree that's is not the greatest feature ever. It's worth mentioning that collections are not the only data types where the MIN/MAX functions are of dubious utility. For example, blob columns can also be used with MIN/MAX. Same as with collections, the min/max blobs are selected according to the comparator for the data type. That comparator is the lexicographic compare on the unsigned values of the byte contents. The utility of MIN/MAX on inet and boolean columns isn't very clear either, although one can always imagine use cases. Fox example, MAX of a boolean column can be used as a logical disjunction. If we were to special-case MIN/MAX functions to reject collections, we should also reject other data types such as, at least, blobs. That would require a deprecation plan. Also, it's not that the comparator used by MIN/MAX is an internal obscure thing. The action of that comparator is very visible when any of those data types is used in a clustering column, and it's used as the basis for "ORDER BY" clauses. Should we also reject blobs, collections, tuples and UDTs on "ORDER BY"? I don't think so. I rather think that basing MIN/MAX on the regular order of the column data type is consistent, easy to do and easy to understand. I don't see the need to add rules explicitly forbidding some data types on MIN/MAX functions just because we can't easily figure out a use case for their ordering. Especially when we are exposing that same ordering on clusterings and "ORDER BY". On Tue, 6 Dec 2022 at 18:56, J. D. Jordan wrote: > If the functionality truly has never actually worked, then throwing an > error that MAX is not supported for collections seems reasonable. > > But we should throw an error, I do not think we should have functions that > aggregate across rows and functions that operate within a row use the same > name. > > My expectation as a user would be that MAX either always aggregates across > rows, so results in a single row of output or always operates within a row, > so returns the full set of rows matching the query. > > So if we want a max that aggregates across rows that works for collections > we could change it to return the aggregated max across all rows. Or we just > leave it as an error and if someone wants the max across all rows they > would ask for MAX(COLLECTION_MAX(column)). Yes I still agree COLLECTION_MAX > may be a bad name. > > > On Dec 6, 2022, at 11:55 AM, Benedict wrote: > > > > As far as I am aware it has never worked in a release, and so > deprecating it is probably not as challenging as you think. Only folk that > have been able to parse the raw bytes of the collection in storage format > would be affected - which we can probably treat as zero. > > > > > >> On 6 Dec 2022, at 17:31, Jeremiah D Jordan > wrote: > >> > >> > >>> > >>> 1. I think it is a mistake to offer a function MAX that operates over > rows containing collections, returning the collection with the most > elements. This is just a nonsensical operation to support IMO. We should > decide as a community whether we “fix” this aggregation, or remove it. > >> > >> The current MAX function does not work this way afaik? It returns the > row with the column that has the highest value in clustering order sense, > like if the collection was used as a clustering key. While that also may > have limited use, I don’t think it worth while to deprecate such use and > all the headache that comes with doing so. > >> > >>> 2. I think “collection_" prefixed methods are non-intuitive for > discovery, and all-else equal it would be better to use MAX,MIN, etc, same > as for aggregations. > >> > >> If we actually wanted to move towards using the existing names with new > meanings, then I think that would take us multiple major releases. First > deprecate existing use in current releases. Then make it an error in the > next major release X. Then change the behavior in major release X+1. Just > switching the behavior without having a major where such queries error out > would make a bunch of user queries start returning “wrong” data. > >> Also I don’t think those functions being cross row aggregations for > some column types, but within row collection operations for other types, is > any more intuitive, and actually would be more confusing. So I am -1 on > using the same names. > >> > >>> 3. I think it is peculiar to permit met
Re: [DISCUSS] API modifications and when to raise a thread on the dev ML
> > I think it makes sense to look into improving visibility of API changes, > so people can more easily review a summary of API changes versus reading > through the whole changelog (perhaps we need a summarized API change log?). > Agree Paulo. Observers should be able to see all API changes early. We can do better than telling downstream users/devs "you have to listen to all jira tickets" or "you have to watch the code and pick up changes". Watching CHANGES.txt or NEWS.txt or CEPs doesn't solve the need either. Observing such changes as early as possible can save a significant amount of effort and headache later on, and should be encouraged. If done correctly I can imagine it will help welcome more contributors. I can also see that we can improve at, and have a better shared understanding of, categorising the types of API changes: addition/change/deprecation/removal, signature/output/behavioural, API/SPI. So I can see value here for both observers and for ourselves.
Re: [DISCUSSION] Cassandra's code style and source code analysis
Dear community, I have created the epic with code-style activities to track the progress: https://issues.apache.org/jira/browse/CASSANDRA-18090 In my understanding, there is no need to format whole the code base at once according to the code style described on the page [1], and the best strategy here is to go forward with small evolutionary changes. Thus eventually we will come up with a set of rules convenient for all members of the community. In my mind, having one commit per an added code style rule should be easy to look at for a reviewer, the git commits history as well as rebasing/merging other pull requests that may be affected by the new rules. I want to raise one more question related to class imports and the classses import order for a wider discussion. The import order is well described on the code style page [1], but using wildcard imports is not mentioned at all. The wildcard imports with their drawbacks has has already been raised in the JIRA issue [2] and didn't get enough attention. The checkstyle has the rules we are interested in for import control and they must be considered together. We can implement them in a single pull request or one by one, or use only the last one: - AvoidStarImport - CustomImportOrder But still, I think that wildcard imports have more disadvantages (class names conflicts e.g. java.util.*, java.sql.* or a new version of a library has name clashes) than advantages and such problems will be found in later CI cycles. Currently, I've implemented the AvoidStarImport checkstyle rule in a dedicated pull request [3][4], so you will be able to see all amount of the changes with removing wildcard imports. The changes are made for the checkstyle configuration as well as for code style configurations for different IDEs we supported. So, the open questions here are: - Should the source code obey the AvoidStarImport rule [3]? (I think yes); - Should we implement AvoidStarImport and CustomImportOrder in a single pull request or do it one by one? Anyway, I will fix the result of the agreement over the AvoidStarImport rule on the documentation page [1]. [1] https://cassandra.apache.org/_/development/code_style.html [2] https://issues.apache.org/jira/browse/CASSANDRA-17925 [3] https://issues.apache.org/jira/browse/CASSANDRA-18089 [4] https://github.com/apache/cassandra/pull/2041 On Thu, 1 Dec 2022 at 11:55, Claude Warren, Jr via dev wrote: > > The last time I worked on a project that tried to implement a coding style > across the project it was "an education". The short story is that trying to > "mitigate" the code base, with respect to style, is either a massive change > or a long slow process. > > Arguments here have stated that earlier attempts to have the tooling reformat > the code did not go well. What we ended up doing was turned on the style > checker and looked at the number of issues across the project. When new code > was accepted the number of issues could not rise. Eventually most of the > code was clean, with a few well coded legacy bits still not up to standard. > We could do something similar here. Much like code coverage, you can't > perform a merge unless the number of style errors remains the same or > decreases. > > As with all software rules, this is a strong recommendation as I am certain > that there are edge/corner case exceptions to be found. > > > > > On Wed, Nov 30, 2022 at 3:30 PM Patrick McFadin wrote: >> >> Why are we still debating build tooling? I think you’re wrong, but I’ve >> conceded - on the assumption that we can get enough volunteers willing to >> adopt responsibility for the new world order. >> >> Not debating. I am just throwing in my support since I have been in the Camp >> of Ant. >> >> On Wed, Nov 30, 2022 at 1:29 AM Benedict wrote: >>> >>> Why are we still debating build tooling? I think you’re wrong, but I’ve >>> conceded - on the assumption that we can get enough volunteers willing to >>> adopt responsibility for the new world order. >>> >>> I suggest five long term contributors nominate themselves as the build file >>> maintainers, and collectively manage a safe and painless migration for the >>> rest of us - and agree to maintain and develop the new build file going >>> forwards, and support the community as they adopt it. >>> >>> On the topic of over-exuberant linting I will continue to push back. I >>> think linting our brace rules could make sense since they are atypical, but >>> more formatting rules than this likely just leads to atrophying style. >>> Authorship involves thinking about how to present your code; I don’t want >>> to either encourage lazy authorship or prevent experimentation with >>> presentation. Both would be bad, and I expect we would struggle to evolve >>> our style guide again in future as the language evolves. Our brace rules >>> are a good example everyone unilaterally ignored when lambdas arrived, as >>> we all recognised they materially harmed the brevity
[RESULT][VOTE] Release Apache Cassandra 4.1.0 GA
> The vote will be open for 72 hours (longer if needed). Everyone who has > tested the build is invited to vote. Votes by PMC members are considered > binding. A vote passes if there are at least three binding +1s and no -1's. > This vote failed due to CASSANDRA-18086, which has now been committed. As agreed in this thread, I will cut 4.1.0 again and open a new vote.
Re: [RESULT][VOTE] Release Apache Cassandra 4.1.0 GA
Can we give Marianne and Matt a chance to confirm their performance numbers? I got an indicative message suggesting it looked good, but nothing firm yet. > On 7 Dec 2022, at 20:37, Mick Semb Wever wrote: > > > >> The vote will be open for 72 hours (longer if needed). Everyone who has >> tested the build is invited to vote. Votes by PMC members are considered >> binding. A vote passes if there are at least three binding +1s and no -1's. > > > This vote failed due to CASSANDRA-18086, which has now been committed. > As agreed in this thread, I will cut 4.1.0 again and open a new vote.
Re: [RESULT][VOTE] Release Apache Cassandra 4.1.0 GA
> Can we give Marianne and Matt a chance to confirm their performance > numbers? I got an indicative message suggesting it looked good, but nothing > firm yet. > I am presuming that will (and must) happen before the new vote closes (which will be next Monday anyway). I don't see much point in delaying opening the vote, it's just me that would have to do the cut again. Ok?
Re: [RESULT][VOTE] Release Apache Cassandra 4.1.0 GA
Sure > On 7 Dec 2022, at 20:47, Mick Semb Wever wrote: > > > >> Can we give Marianne and Matt a chance to confirm their performance numbers? >> I got an indicative message suggesting it looked good, but nothing firm yet. > > > > I am presuming that will (and must) happen before the new vote closes (which > will be next Monday anyway). I don't see much point in delaying opening the > vote, it's just me that would have to do the cut again. Ok? >
[VOTE] Release Apache Cassandra 4.1.0 (take2)
Proposing the (second) test build of Cassandra 4.1.0 for release. sha1: f9e033f519c14596da4dc954875756a69aea4e78 Git: https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/4.1.0-tentative Maven Artifacts: https://repository.apache.org/content/repositories/orgapachecassandra-1282/org/apache/cassandra/cassandra-all/4.1.0/ The Source and Build Artifacts, and the Debian and RPM packages and repositories, are available here: https://dist.apache.org/repos/dist/dev/cassandra/4.1.0/ The vote will be open for 96 hours (longer if needed). Everyone who has tested the build is invited to vote. Votes by PMC members are considered binding. A vote passes if there are at least three binding +1s and no -1's. [1]: CHANGES.txt: https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/4.1.0-tentative [2]: NEWS.txt: https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/4.1.0-tentative