Hello everyone,

First of all, thank you all for this awesome project which I have
often been inspired by. My name is Maxim Muzafarov I'm a Committer and
PMC of Apache Ignite hence you most likely don't know me as I come
from another part of the ASF. Perhaps, I did almost the same things
with B-Trees, off-heap memory management, rebalancing, checkpointing,
snapshotting, and IEPs (you are calling it CEPs) but on a slightly
different distributed database architecture.

Nevertheless,

I was chasing down for my first issue to get experience with Cassandra
and found a bunch of opened JIRAs related to the source code analysis
(static analysis as well as the code style). These issues still appear
in JIRA from time to time [1][2][3][4]. It seems to me there not
enough attention has been paid to this topic and all possible options
for this analysis and code style haven't been widely discussed before.
I'd like to summarize everything that I have found and offer my skills
and my experience for solving some of such issues we'll agree on.


= Motivation =

The goal is to make small contributions easier and safer to apply with
GitHub PRs for both a contributor and a committer by adding automation
code checks for each new Cassandra contribution. This also will help
to reduce the time required for reviewing and applying such PRs by an
experienced developer.

As you may know, the infrastructure team has disabled public sign-up
to ASF JIRA (the GitHub issues are recommended instead). Thus the
following things become more important if we are still interested in
attracting new contributions as it was discussed earlier [6].

I do not want to add extra steps to the existing workflow with code
review or make GitHub pull requests as default for patches as it also
was discussed already [7], just to improve the automation checks in it
and make checks more convenient.


= Proposed Solution =

== 1. Make the checkstyle config a single point of truth for the
source code style. ==

The checkstyle is already used and included in the Cassandra project
build lifecycle (ant command line, Jenkins, CircleCI). There is no
need to maintain code style configurations for different types of IDEs
(e.g. IntelliJ inspections configuration) since the checkstyle.xml
file can be directly imported to IDE used by a developer. This is fair
for Intellij Idea, NetBeans, and Eclipse.

So, I propose to focus on the checks themselves and checking pull
requests with automation scripts, rather than maintaining these
integrations. The benefits here are avoiding all issues with
maintaining configurations for different IDEs. Another good advantage
of this approach would be the ability to add new checkstyle rules
without touching IDE configuration - and such tickets will be LFH and
easy to commit.

The actions points here are:

- create an umbrella JIRA ticket for all checkstyle issues e.g. [8]
(or label checkstyle);
- add checkstyle to GitHub pull requests using GitHub actions (execute
ant command);
- include checkstyle to the build (already done);
- remove redundant code style configurations related to IDEs from the
source code e.g. [9];


== 2. Add additional tooling for code analysis to the build and GitHub
pull requests. ==

The source code static analysis and automated checks have been
discussed recently in the "SpotBugs to the build" topic [10]. I'd like
to add my 50 cents here.

Before discussing the pros and cons of each solution, let's focus on
the criteria that such solutions must meet. You can find the most
complete list of such tooling here [11].

>From my point of view, the crucial criteria are:
- free for open-source (at least licenses should allow such usages);
- popularity in the ASF ecosystems;
- convenient integration and/or plugins for IDEs and GitHub;
- we must be able to integrate with CirleCI, and Jenkins as well as
launch from a command line;


=== Sonar ===

pros
+ this tool is free for open-source and recommended by the ASF
infrastructure team [12];
+ was already used for the Cassandra project some time ago at
sonarcloud.io [13];
+ it has GitHub pull requests analysis [14];

cons
- run locally requires additional configuration and TOKEN_ID due to
the analysis results stored in the ext database (infra will not
provide it for local usage);

=== SpotBugs (FindBugs) ===

pros
+ license is allowed to use it and run it as a library (should be legal for us);
+ it analyses the bytecode that differs from how the checkstyle works;
+ can be executed from the command line as well as integrated into the build;

cons
- requires compiled source code;

=== PMD ===

pros
+ BSD licenses more permissive than LGPL (SpotBugs);
+ analyses the source code like the checkstyle does;
+ have an extended rule sets for source code analyses;

cons
- the checkstyle is already used in the project, and should be enough for now;

=== IntelliJ IDEA ===

pros
+ free for open-source and can be used from the command line [15];
+ local checks are performed with the same code style configuration as
the IDE does during development;

cons
- inconvenient for different IDEs (Eclipse, NetBeans);


As a summary for this section:

There is no need to limit ourselves to the choice of such tools and
pick up only one from this list.

For instance, it seems it will be better to configure sonarcloud.io
reports on each push commit to the trunk. Sonar already has built-in
rules (no need to maintain them) and this is easy to add with the ASF
infra team install guides using GitHub actions or Jenkins. We will be
able to see immediately the source code trends as well as new good
candidates for LFH issues to fix.

In addition to Sonar reports, we can add SpotBugs (PMD is more like an
'improved' checkstyle, but for now it is more than enough and there is
no need to maintain another set of rules) to the build and GitHub pull
requests with limited configuration for the first time. Thus we will
be able to add new rules one by one.


The actions points here are:

- configure Sonar reports for the trunk (using GA or Jenkins);
- add SpotBugs to the build;
- add integration SpotBugs with GitHub pull requests;


== 3. Enable pushing backwards build results for both Jenkins and
CircleCI to GitHub pull requests. ==

The goal here is to have a "green checkbox" for those GitHub pull
requests that have corresponding Jenkins/CircleCI runs. According to
the following links, it is completely possible to have those.

https://github.com/jenkinsci/github-branch-source-plugin
https://circleci.com/docs/enable-checks/

Moreover, the GitHub Branch Source Plugin is already enabled for the
Cassandra project [16]. The same seems should work the same way for
CirleCI, but I have faced the infrastructure team comment [17] that
describes admin permissions are required for that, so the question is
still open here. I could dig a bit more once we've agreed on it.

The actions points here are:
- enable Jenkins integration for GitHub pull requests;
- enable CircleCI integration for GitHub pull requests;


== 4. Review IDE configs that are not used by the community. ==

Sorry for bumping up such a holy topic with IDEs that are being used
by developers, so please do not judge me strictly against it. My main
point here is "IDE should serve the source code" rather than "the
source code adapts to IDE", thus it will probably become easy to use
the project sources in an IDE once the migration from Ant to
Maven/Gradle occurs [5], but it is still possible to make things a bit
simpler today.

Some of the configuration IDE scripts still consume some time for
their maintenance and require to be tested on each new release occurs.
These files must be always up to date if a configuration file
structure changed in a new IDE version and this is inconvenient. I
don't propose removing such scripts, just a review of what IDEs are
exactly using.

For instance,

Cassandra + Eclipse doesn't seem to appear so often on StackOverflow.
It could mean only two things: everything is working fine or no one is
using it (hope the first one).
https://stackoverflow.com/questions/tagged/cassandra%20eclipse?sort=Newest

Cassandra + NetBeans appear even less frequently and the same questions occur.
https://stackoverflow.com/questions/tagged/cassandra%20netbeans?sort=Newest

The actions points here are:

- initiate a wide survey for user and dev lists, to get to know about
the usages;
- remove those configurations that are not used anymore;
- force migration from Ant to Gradle/Maven;



Summarizing everything proposed above I think it is possible to
simplify adding small contributions easier to the codebase as well as
save a bunch of committer's time.

So,
WDYT about the things described above?
Should I create a CEP for this?



[1] https://issues.apache.org/jira/browse/CASSANDRA-9209 (Add static
analysis to report any AutoCloseable objects)
[2] https://issues.apache.org/jira/browse/CASSANDRA-17400 (Fail build
or warn when closeable reference is not closed in tests)
[3] https://issues.apache.org/jira/browse/CASSANDRA-9919 (Cleanup
closeable iterator usage)
[4] https://issues.apache.org/jira/browse/CASSANDRA-17925 (Java source
code should have sorted imports as defined in the codestyle)
[5] https://issues.apache.org/jira/browse/CASSANDRA-17015 (Migrate
build from Ant to Maven/Gradle)
[6] https://lists.apache.org/thread/1t3hr1pylhpgc5c9d2382xtnw3wgkg06
(Attracting new contributors)
[7] https://lists.apache.org/thread/b3nowtrlpjwr46vftsb3ob38xqnrnxl1
(Switch to using GitHub pull requests?)
[8] https://issues.apache.org/jira/browse/CASSANDRA-17925 (Java source
code should have sorted imports as defined in the codestyle)
[9] https://github.com/apache/cassandra/tree/trunk/ide/idea/inspectionProfiles
[10] https://lists.apache.org/thread/1ro1mvkpvt4vr24nw7dbpdlxo82mq3hz
(Add SpotBugs to the build)
[11] https://en.wikipedia.org/wiki/List_of_tools_for_static_code_analysis
[12] 
https://cwiki.apache.org/confluence/display/INFRA/SonarCloud+for+ASF+projects
[13] https://sonarcloud.io/summary/overall?id=cassandra_180410
[14] https://docs.sonarqube.org/latest/analysis/pull-request/
[15] https://www.jetbrains.com/help/idea/command-line-code-inspector.html
[16] https://cwiki.apache.org/confluence/display/INFRA/ci-cassandra.apache.org
[17] https://issues.apache.org/jira/browse/INFRA-22367?focusedCommentId=17421230

Reply via email to