Re: A Roadmap to Cassandra Analytics 1.0

Patrick McFadin Wed, 30 Apr 2025 08:18:54 -0700

I'm not thinking that the Confluence page would be a status page or try to
get too close to being a tracker.


My motivation here is for the millions of users not watching the project
intently and completely missing that this is happening. Case in point. I
was recently in a Reddit thread with a guy trying to build his own CDC
mechanism for Kafka topics. I pointed out that not only did sidecar exist,
but maybe he would like to contribute? It's this kind of non-coding
activity that has an awesome downstream effect on our project codebase by
finding more contributors/users. My thoughts about this page in
Confluence is a semi-dynamic page that explains what the project does,
what's being worked on and potential areas of contribution. The latter
being the most dynamic. If you have time, I can get on a zoom with you,
take some notes and put it up. Doesn't have to be a big effort.

Patrick

On Wed, Apr 23, 2025 at 6:52 AM Doug Rohrer <droh...@apple.com> wrote:

> I put everything into Jira directly - there are two epics, one for the 
> “Analytics
> 1.0 <https://issues.apache.org/jira/browse/CASSANALYTICS-21>” release and
> one for “Cassandra 5.0 support.
> <https://issues.apache.org/jira/browse/CASSANALYTICS-23>”, figuring that
> once we started work on these things (which some folks actually have) a
> Confluence page would quickly become out of date.
>
> If folks feel like there’s some value in putting something up there we
> could do that, but I think epics in Jira capture the plan fairly well.
>
> Thanks,
>
> Doug
>
> On Apr 22, 2025, at 6:15 PM, Patrick McFadin <pmcfa...@gmail.com> wrote:
>
> Is the current roadmap published somewhere? I went to Confluence and
> couldn't find anything.
>
> Patrick
>
> On Tue, Apr 22, 2025 at 10:53 AM Doug Rohrer <droh...@apple.com> wrote:
>
>> Hello folks,
>>
>> As many of you on the ASF Slack may have noticed, I’ve been creating a
>> bunch of new tickets for the Cassandra Analytics project related to a 1.0
>> release. Since it was initially contributed, there have been many
>> enhancements and fixes to the library, but there are still some gaps that
>> need to be addressed. We’re putting together a plan to close those gaps,
>> and would love to enlist more folks from the community in making the
>> analytics library more useful. The gaps we see today include:
>>
>>    - vnode support (and optimizations to the exiting code if necessary
>>    to make it work more efficiently with clusters using vnodes) (
>>    CASSANALYTICS-10
>>    <https://issues.apache.org/jira/browse/CASSANALYTICS-10>)
>>    - Cassandra 5.0 support (this is an epic with lots of subtasks, some
>>    of which are already being worked on by a variety of folks) (
>>    CASSANALYTICS-23
>>    <https://issues.apache.org/jira/browse/CASSANALYTICS-23>)
>>    - Documentation, including both docs on cassandra.apache.org and
>>    updated/improved developer docs in the repository itself (
>>    CASSANALYTICS-6
>>    <https://issues.apache.org/jira/browse/CASSANALYTICS-6>)
>>    - Build scripts for release (CASSANALYTICS-22
>>    <https://issues.apache.org/jira/browse/CASSANALYTICS-22>)
>>    - Miscellaneous bug fixes of known issues/improvements
>>       - Analytics writer should support all valid partition/clustering
>>       key types (CASSANALYTICS-35
>>       <https://issues.apache.org/jira/browse/CASSANALYTICS-35>)
>>       - CassandraDataLayer uses configuration list of IPs instead of the
>>       full ring/datacenter (CASSANALYTICS-20
>>       <https://issues.apache.org/jira/browse/CASSANALYTICS-20>)
>>       - Bulk Reader should dynamically calculate number of cores to use
>>       to better utilize resources for smaller tables (CASSANALYTICS-36
>>       <https://issues.apache.org/jira/browse/CASSANALYTICS-36>)
>>
>>
>> Beyond 1.0, there’s a lot of improvements and enhancements on the roadmap
>> to date:
>>
>>    - Cassandra 6.0 Support (CASSANALYTICS-37
>>    <https://issues.apache.org/jira/browse/CASSANALYTICS-37>)
>>    - Spark 4.0 support (CASSANALYTICS-34
>>    <https://issues.apache.org/jira/browse/CASSANALYTICS-34>)
>>    - JDK Support Matrix (CASSANALYTICS-38
>>    <https://issues.apache.org/jira/browse/CASSANALYTICS-38>)
>>    - Improved Compaction/Repair load for bulk writes (CASSANALYTICS-39
>>    <https://issues.apache.org/jira/browse/CASSANALYTICS-39>)
>>    - Bandwidth reduction (especially cross-dc writes) (CASSANALYTICS-40
>>    <https://issues.apache.org/jira/browse/CASSANALYTICS-40>)
>>    - Consolidation of SBW-on-S3 and DIRECT mode code (CASSANALYTICS-41
>>    <https://issues.apache.org/jira/browse/CASSANALYTICS-41>)
>>    - Bulk reads via S3 (CASSANALYTICS-42
>>    <https://issues.apache.org/jira/browse/CASSANALYTICS-42>)
>>
>>
>> We’re also looking for input on what others think should be in the 1.0
>> release, or the long-term roadmap. If you’ve got ideas, don’t hesitate to
>> respond to this thread. I’ll also be checking the existing JIRAs and making
>> sure they are incorporated into the plan, which I believe most are already.
>>
>> I want to thank the folks who have, so far, contributed most of the code
>> for the Analytics library, and those in the community who have already
>> started to use and improve it. We’re looking forward to getting more
>> community members involved. If any of these items sounds interesting,
>> please feel free to reach out to folks on Slack or reply on the dev list.
>>
>> Thanks,
>>
>> Doug Rohrer
>>
>
>

Re: A Roadmap to Cassandra Analytics 1.0

Reply via email to