A Roadmap to Cassandra Analytics 1.0

Doug Rohrer Tue, 22 Apr 2025 10:54:09 -0700

Hello folks,

As many of you on the ASF Slack may have noticed, I’ve been creating a bunch of 
new tickets for the Cassandra Analytics project related to a 1.0 release. Since 
it was initially contributed, there have been many enhancements and fixes to 
the library, but there are still some gaps that need to be addressed. We’re 
putting together a plan to close those gaps, and would love to enlist more 
folks from the community in making the analytics library more useful. The gaps 
we see today include:
vnode support (and optimizations to the exiting code if necessary to make it 
work more efficiently with clusters using vnodes) (CASSANALYTICS-10 
<https://issues.apache.org/jira/browse/CASSANALYTICS-10>)
Cassandra 5.0 support (this is an epic with lots of subtasks, some of which are 
already being worked on by a variety of folks) (CASSANALYTICS-23 
<https://issues.apache.org/jira/browse/CASSANALYTICS-23>)
Documentation, including both docs on cassandra.apache.org 
<http://cassandra.apache.org/> and updated/improved developer docs in the 
repository itself (CASSANALYTICS-6 
<https://issues.apache.org/jira/browse/CASSANALYTICS-6>)
Build scripts for release (CASSANALYTICS-22 
<https://issues.apache.org/jira/browse/CASSANALYTICS-22>)
Miscellaneous bug fixes of known issues/improvements
Analytics writer should support all valid partition/clustering key types 
(CASSANALYTICS-35 <https://issues.apache.org/jira/browse/CASSANALYTICS-35>)
CassandraDataLayer uses configuration list of IPs instead of the full 
ring/datacenter (CASSANALYTICS-20 
<https://issues.apache.org/jira/browse/CASSANALYTICS-20>)
Bulk Reader should dynamically calculate number of cores to use to better 
utilize resources for smaller tables (CASSANALYTICS-36 
<https://issues.apache.org/jira/browse/CASSANALYTICS-36>)


Beyond 1.0, there’s a lot of improvements and enhancements on the roadmap to 
date:
Cassandra 6.0 Support (CASSANALYTICS-37 
<https://issues.apache.org/jira/browse/CASSANALYTICS-37>)
Spark 4.0 support (CASSANALYTICS-34 
<https://issues.apache.org/jira/browse/CASSANALYTICS-34>)
JDK Support Matrix (CASSANALYTICS-38 
<https://issues.apache.org/jira/browse/CASSANALYTICS-38>)
Improved Compaction/Repair load for bulk writes (CASSANALYTICS-39 
<https://issues.apache.org/jira/browse/CASSANALYTICS-39>)
Bandwidth reduction (especially cross-dc writes) (CASSANALYTICS-40 
<https://issues.apache.org/jira/browse/CASSANALYTICS-40>)
Consolidation of SBW-on-S3 and DIRECT mode code (CASSANALYTICS-41 
<https://issues.apache.org/jira/browse/CASSANALYTICS-41>)
Bulk reads via S3 (CASSANALYTICS-42 
<https://issues.apache.org/jira/browse/CASSANALYTICS-42>)

We’re also looking for input on what others think should be in the 1.0 release, 
or the long-term roadmap. If you’ve got ideas, don’t hesitate to respond to 
this thread. I’ll also be checking the existing JIRAs and making sure they are 
incorporated into the plan, which I believe most are already.

I want to thank the folks who have, so far, contributed most of the code for 
the Analytics library, and those in the community who have already started to 
use and improve it. We’re looking forward to getting more community members 
involved. If any of these items sounds interesting, please feel free to reach 
out to folks on Slack or reply on the dev list.

Thanks,

Doug Rohrer

A Roadmap to Cassandra Analytics 1.0

Reply via email to