Hi Doug, I would love to help you with some of that. Spark 4.0 support seems appealing to me. Let me check with my "backend" if there is any capacity doing so and connecting privately to hash out the details.
Regards On Tue, Apr 22, 2025 at 7:53 PM Doug Rohrer <droh...@apple.com> wrote: > Hello folks, > > As many of you on the ASF Slack may have noticed, I’ve been creating a > bunch of new tickets for the Cassandra Analytics project related to a 1.0 > release. Since it was initially contributed, there have been many > enhancements and fixes to the library, but there are still some gaps that > need to be addressed. We’re putting together a plan to close those gaps, > and would love to enlist more folks from the community in making the > analytics library more useful. The gaps we see today include: > > - vnode support (and optimizations to the exiting code if necessary to > make it work more efficiently with clusters using vnodes) ( > CASSANALYTICS-10 > <https://issues.apache.org/jira/browse/CASSANALYTICS-10>) > - Cassandra 5.0 support (this is an epic with lots of subtasks, some > of which are already being worked on by a variety of folks) ( > CASSANALYTICS-23 > <https://issues.apache.org/jira/browse/CASSANALYTICS-23>) > - Documentation, including both docs on cassandra.apache.org and > updated/improved developer docs in the repository itself ( > CASSANALYTICS-6 <https://issues.apache.org/jira/browse/CASSANALYTICS-6> > ) > - Build scripts for release (CASSANALYTICS-22 > <https://issues.apache.org/jira/browse/CASSANALYTICS-22>) > - Miscellaneous bug fixes of known issues/improvements > - Analytics writer should support all valid partition/clustering > key types (CASSANALYTICS-35 > <https://issues.apache.org/jira/browse/CASSANALYTICS-35>) > - CassandraDataLayer uses configuration list of IPs instead of the > full ring/datacenter (CASSANALYTICS-20 > <https://issues.apache.org/jira/browse/CASSANALYTICS-20>) > - Bulk Reader should dynamically calculate number of cores to use > to better utilize resources for smaller tables (CASSANALYTICS-36 > <https://issues.apache.org/jira/browse/CASSANALYTICS-36>) > > > Beyond 1.0, there’s a lot of improvements and enhancements on the roadmap > to date: > > - Cassandra 6.0 Support (CASSANALYTICS-37 > <https://issues.apache.org/jira/browse/CASSANALYTICS-37>) > - Spark 4.0 support (CASSANALYTICS-34 > <https://issues.apache.org/jira/browse/CASSANALYTICS-34>) > - JDK Support Matrix (CASSANALYTICS-38 > <https://issues.apache.org/jira/browse/CASSANALYTICS-38>) > - Improved Compaction/Repair load for bulk writes (CASSANALYTICS-39 > <https://issues.apache.org/jira/browse/CASSANALYTICS-39>) > - Bandwidth reduction (especially cross-dc writes) (CASSANALYTICS-40 > <https://issues.apache.org/jira/browse/CASSANALYTICS-40>) > - Consolidation of SBW-on-S3 and DIRECT mode code (CASSANALYTICS-41 > <https://issues.apache.org/jira/browse/CASSANALYTICS-41>) > - Bulk reads via S3 (CASSANALYTICS-42 > <https://issues.apache.org/jira/browse/CASSANALYTICS-42>) > > > We’re also looking for input on what others think should be in the 1.0 > release, or the long-term roadmap. If you’ve got ideas, don’t hesitate to > respond to this thread. I’ll also be checking the existing JIRAs and making > sure they are incorporated into the plan, which I believe most are already. > > I want to thank the folks who have, so far, contributed most of the code > for the Analytics library, and those in the community who have already > started to use and improve it. We’re looking forward to getting more > community members involved. If any of these items sounds interesting, > please feel free to reach out to folks on Slack or reply on the dev list. > > Thanks, > > Doug Rohrer >