Hello everyone,
First of all, thank you all for this awesome project which I have often been inspired by. My name is Maxim Muzafarov I'm a Committer and PMC of Apache Ignite hence you most likely don't know me as I come from another part of the ASF. Perhaps, I did almost the same things with B-Trees, off-heap memory management, rebalancing, checkpointing, snapshotting, and IEPs (you are calling it CEPs) but on a slightly different distributed database architecture. Nevertheless, I was chasing down for my first issue to get experience with Cassandra and found a bunch of opened JIRAs related to the source code analysis (static analysis as well as the code style). These issues still appear in JIRA from time to time [1][2][3][4]. It seems to me there not enough attention has been paid to this topic and all possible options for this analysis and code style haven't been widely discussed before. I'd like to summarize everything that I have found and offer my skills and my experience for solving some of such issues we'll agree on. = Motivation = The goal is to make small contributions easier and safer to apply with GitHub PRs for both a contributor and a committer by adding automation code checks for each new Cassandra contribution. This also will help to reduce the time required for reviewing and applying such PRs by an experienced developer. As you may know, the infrastructure team has disabled public sign-up to ASF JIRA (the GitHub issues are recommended instead). Thus the following things become more important if we are still interested in attracting new contributions as it was discussed earlier [6]. I do not want to add extra steps to the existing workflow with code review or make GitHub pull requests as default for patches as it also was discussed already [7], just to improve the automation checks in it and make checks more convenient. = Proposed Solution = == 1. Make the checkstyle config a single point of truth for the source code style. == The checkstyle is already used and included in the Cassandra project build lifecycle (ant command line, Jenkins, CircleCI). There is no need to maintain code style configurations for different types of IDEs (e.g. IntelliJ inspections configuration) since the checkstyle.xml file can be directly imported to IDE used by a developer. This is fair for Intellij Idea, NetBeans, and Eclipse. So, I propose to focus on the checks themselves and checking pull requests with automation scripts, rather than maintaining these integrations. The benefits here are avoiding all issues with maintaining configurations for different IDEs. Another good advantage of this approach would be the ability to add new checkstyle rules without touching IDE configuration - and such tickets will be LFH and easy to commit. The actions points here are: - create an umbrella JIRA ticket for all checkstyle issues e.g. [8] (or label checkstyle); - add checkstyle to GitHub pull requests using GitHub actions (execute ant command); - include checkstyle to the build (already done); - remove redundant code style configurations related to IDEs from the source code e.g. [9]; == 2. Add additional tooling for code analysis to the build and GitHub pull requests. == The source code static analysis and automated checks have been discussed recently in the "SpotBugs to the build" topic [10]. I'd like to add my 50 cents here. Before discussing the pros and cons of each solution, let's focus on the criteria that such solutions must meet. You can find the most complete list of such tooling here [11]. >From my point of view, the crucial criteria are: - free for open-source (at least licenses should allow such usages); - popularity in the ASF ecosystems; - convenient integration and/or plugins for IDEs and GitHub; - we must be able to integrate with CirleCI, and Jenkins as well as launch from a command line; === Sonar === pros + this tool is free for open-source and recommended by the ASF infrastructure team [12]; + was already used for the Cassandra project some time ago at sonarcloud.io [13]; + it has GitHub pull requests analysis [14]; cons - run locally requires additional configuration and TOKEN_ID due to the analysis results stored in the ext database (infra will not provide it for local usage); === SpotBugs (FindBugs) === pros + license is allowed to use it and run it as a library (should be legal for us); + it analyses the bytecode that differs from how the checkstyle works; + can be executed from the command line as well as integrated into the build; cons - requires compiled source code; === PMD === pros + BSD licenses more permissive than LGPL (SpotBugs); + analyses the source code like the checkstyle does; + have an extended rule sets for source code analyses; cons - the checkstyle is already used in the project, and should be enough for now; === IntelliJ IDEA === pros + free for open-source and can be used from the command line [15]; + local checks are performed with the same code style configuration as the IDE does during development; cons - inconvenient for different IDEs (Eclipse, NetBeans); As a summary for this section: There is no need to limit ourselves to the choice of such tools and pick up only one from this list. For instance, it seems it will be better to configure sonarcloud.io reports on each push commit to the trunk. Sonar already has built-in rules (no need to maintain them) and this is easy to add with the ASF infra team install guides using GitHub actions or Jenkins. We will be able to see immediately the source code trends as well as new good candidates for LFH issues to fix. In addition to Sonar reports, we can add SpotBugs (PMD is more like an 'improved' checkstyle, but for now it is more than enough and there is no need to maintain another set of rules) to the build and GitHub pull requests with limited configuration for the first time. Thus we will be able to add new rules one by one. The actions points here are: - configure Sonar reports for the trunk (using GA or Jenkins); - add SpotBugs to the build; - add integration SpotBugs with GitHub pull requests; == 3. Enable pushing backwards build results for both Jenkins and CircleCI to GitHub pull requests. == The goal here is to have a "green checkbox" for those GitHub pull requests that have corresponding Jenkins/CircleCI runs. According to the following links, it is completely possible to have those. https://github.com/jenkinsci/github-branch-source-plugin https://circleci.com/docs/enable-checks/ Moreover, the GitHub Branch Source Plugin is already enabled for the Cassandra project [16]. The same seems should work the same way for CirleCI, but I have faced the infrastructure team comment [17] that describes admin permissions are required for that, so the question is still open here. I could dig a bit more once we've agreed on it. The actions points here are: - enable Jenkins integration for GitHub pull requests; - enable CircleCI integration for GitHub pull requests; == 4. Review IDE configs that are not used by the community. == Sorry for bumping up such a holy topic with IDEs that are being used by developers, so please do not judge me strictly against it. My main point here is "IDE should serve the source code" rather than "the source code adapts to IDE", thus it will probably become easy to use the project sources in an IDE once the migration from Ant to Maven/Gradle occurs [5], but it is still possible to make things a bit simpler today. Some of the configuration IDE scripts still consume some time for their maintenance and require to be tested on each new release occurs. These files must be always up to date if a configuration file structure changed in a new IDE version and this is inconvenient. I don't propose removing such scripts, just a review of what IDEs are exactly using. For instance, Cassandra + Eclipse doesn't seem to appear so often on StackOverflow. It could mean only two things: everything is working fine or no one is using it (hope the first one). https://stackoverflow.com/questions/tagged/cassandra%20eclipse?sort=Newest Cassandra + NetBeans appear even less frequently and the same questions occur. https://stackoverflow.com/questions/tagged/cassandra%20netbeans?sort=Newest The actions points here are: - initiate a wide survey for user and dev lists, to get to know about the usages; - remove those configurations that are not used anymore; - force migration from Ant to Gradle/Maven; Summarizing everything proposed above I think it is possible to simplify adding small contributions easier to the codebase as well as save a bunch of committer's time. So, WDYT about the things described above? Should I create a CEP for this? [1] https://issues.apache.org/jira/browse/CASSANDRA-9209 (Add static analysis to report any AutoCloseable objects) [2] https://issues.apache.org/jira/browse/CASSANDRA-17400 (Fail build or warn when closeable reference is not closed in tests) [3] https://issues.apache.org/jira/browse/CASSANDRA-9919 (Cleanup closeable iterator usage) [4] https://issues.apache.org/jira/browse/CASSANDRA-17925 (Java source code should have sorted imports as defined in the codestyle) [5] https://issues.apache.org/jira/browse/CASSANDRA-17015 (Migrate build from Ant to Maven/Gradle) [6] https://lists.apache.org/thread/1t3hr1pylhpgc5c9d2382xtnw3wgkg06 (Attracting new contributors) [7] https://lists.apache.org/thread/b3nowtrlpjwr46vftsb3ob38xqnrnxl1 (Switch to using GitHub pull requests?) [8] https://issues.apache.org/jira/browse/CASSANDRA-17925 (Java source code should have sorted imports as defined in the codestyle) [9] https://github.com/apache/cassandra/tree/trunk/ide/idea/inspectionProfiles [10] https://lists.apache.org/thread/1ro1mvkpvt4vr24nw7dbpdlxo82mq3hz (Add SpotBugs to the build) [11] https://en.wikipedia.org/wiki/List_of_tools_for_static_code_analysis [12] https://cwiki.apache.org/confluence/display/INFRA/SonarCloud+for+ASF+projects [13] https://sonarcloud.io/summary/overall?id=cassandra_180410 [14] https://docs.sonarqube.org/latest/analysis/pull-request/ [15] https://www.jetbrains.com/help/idea/command-line-code-inspector.html [16] https://cwiki.apache.org/confluence/display/INFRA/ci-cassandra.apache.org [17] https://issues.apache.org/jira/browse/INFRA-22367?focusedCommentId=17421230