I want to break out at least one or two shared library projects. Both accord 
and in-jvm-dtest-api should share code with the Cassandra main project, 
particularly executors/futures/collections/concurrency utilities. This is 
something that has caused me some recurring friction over the past few years, 
so if there’s appetite I may try to pursue it in the near future.

I also like the idea of defining our public APIs in a separate 
jar/folder/source tree. This helpfully also solves the never-ending discussion 
topic of how we define what our public APIs are. I don’t have any cycles for 
this, but I doubt it would be controversial.

I am less sure about how we might go about breaking up the internals of 
Cassandra itself, but the accord project is perhaps a step in this direction.

That all said, plugin dependencies are a much easier problem than this. We 
don’t need to run the plugins on their own threads; they just need their own 
class loader - which is anyway probably a good idea. We can perhaps even reuse 
the logic we already have for loading UDFs, but relax some of the restrictions.


> On 6 Mar 2025, at 21:27, Josh McKenzie <jmcken...@apache.org> wrote:
> 
>> I've gotten the impression that there's not a lot of enthusiasm for breaking 
>> apart the main Cassandra module, but I have wondered if it'd be worth making 
>> an exception for the interfaces plugins are supposed to code against
> Oh, there's plenty of enthusiasm. There's been a shortage of consensus 
> however. For now. :D
> 
> I think breaking out the interfaces first makes a lot of sense as that'd 
> allow us to focus almost purely on build dependency and environmental factors 
> w/out having to reason through implementation code movements and 
> encapsulation breakage. I believe there's folks working on exploring the 
> current build system through the lens of requirements to break out shared 
> deps; I'll see if I can't rustle them up.
> 
> On Thu, Mar 6, 2025, at 4:06 PM, Joel Shepherd wrote:
>> Splitting this out from the CEP-36 thread.
>> 
>> I agree: dependency collisions at run-time are a problem. It's made even 
>> worse by the possibility of users using multiple plugins (authn, authz, 
>> compression, storage, etc.).
>> 
>> It also cuts two ways. E.g. the interfaces that plugin authenticators need 
>> to implement are defined in org.apache.cassandra.auth, so as far as I know 
>> the plugin has to take a build-time dependency on the main Cassandra module 
>> itself, and pull in all of its dependencies. (I'd love to be told that I'm 
>> mistaken.) In addition to the risk of version conflicts, it increases the 
>> risk of a change to Cassandra's own dependencies inadvertently breaking a 
>> plugin that's taken a transitive dependency. Might be bad form on the 
>> plugin's part, but certainly possible.
>> 
>> I've gotten the impression that there's not a lot of enthusiasm for breaking 
>> apart the main Cassandra module, but I have wondered if it'd be worth making 
>> an exception for the interfaces plugins are supposed to code against. It'd 
>> be nice to depend on those without pulling in the rest of the project, and 
>> it'd be another step towards reducing the risk of plugins breaking because 
>> of dependency changes in the main project.
>> 
>> -- Joel.
>> 
>> On 3/6/2025 10:52 AM, Jon Haddad wrote:
>>> Hey Joel, thanks for chiming in!
>>> 
>>> Regarding dependencies - while it's possible to provide pluggable 
>>> interfaces, the issue I'm concerned about is conflicting versions of 
>>> transitive dependencies at runtime.  For example, I used a java agent that 
>>> had a different version of snakeyaml, and it ended up breaking C*'s startup 
>>> sequence [1].  I suggest putting external modules on separate threads with 
>>> their own classpath to avoid this issue. 
>>> 
>>> I think there's quite a bit of overlap between the two desires expressed in 
>>> this thread, even though they achieve very different results.  I personally 
>>> can't see myself using something that treats an object store as cold 
>>> storage where SSTables are moved (implying they weren't there before), and 
>>> I've expressed my concerns with this, but other folks seem to want it and 
>>> that's OK.  I feel very strongly that treating local storage as a cache 
>>> with the full dataset on object store is a better approach, but ultimately 
>>> different people have different priorities.  Either way, stuff is moved to 
>>> object store at some point, and pulled to the local disk on demand. 
>>> 
>>> I am *firmly* of the position that this CEP should not exclude the local 
>>> storage as cache option, and should be accounted for in the design.
>>> 
>>> Jon
>>> 
>>> [1] https://issues.apache.org/jira/browse/CASSANDRA-19663
>>> 
>>> 
>>> On Thu, Mar 6, 2025 at 10:31 AM Joel Shepherd <sheph...@amazon.com 
>>> <mailto:sheph...@amazon.com>> wrote:
>>> On 3/6/2025 7:16 AM, Jon Haddad wrote:
>>>> Assuming everything else is identical, might not matter for S3. However, 
>>>> not every object store has a filesystem mount. 
>>>> 
>>>> Regarding sprawling dependencies, we can always make the provider specific 
>>>> libraries available as a separate download and put them on their own 
>>>> thread with a separate class path. I think in JVM dtest does this already. 
>>>>  Someone just started asking about IAM for login, it sounds like a similar 
>>>> problem.
>>> That was me. :-) Cassandra's auth already has fairly well defined 
>>> interfaces and a plug-in mechanism, so it's easy to vend alternative auth 
>>> solutions without polluting the main project's dependency graph, at 
>>> build-time anyway. A similar approach could be beneficial for CEP-36, 
>>> particularly (IMO) for cold-storage purposes. I suspect decoupling 
>>> pluggable alternate channel proxies for cold storage from configurable 
>>> alternate channel proxies for redirecting data locally to free up space, 
>>> migrate to a different storage device, etc., would make both easier. The 
>>> CEP seems to be trying to do both, but they smell like pretty different 
>>> goals to me.
>>> 
>>> Thanks -- Joel.
>>> 
>>>> 
>>>> On Thu, Mar 6, 2025 at 12:53 AM Benedict <bened...@apache.org 
>>>> <mailto:bened...@apache.org>> wrote:
>>>> I think another way of saying what Stefan may be getting at is what does a 
>>>> library give us that an appropriately configured mount dir doesn’t?
>>>> 
>>>> We don’t want to treat S3 the same as local disk, but this can be achieved 
>>>> easily with config. Is there some other benefit of direct integration? 
>>>> Well defined exceptions if we need to distinguish cases is one that maybe 
>>>> springs to mind but perhaps there are others?
>>>> 
>>>> 
>>>>> On 6 Mar 2025, at 08:39, Štefan Miklošovič <smikloso...@apache.org 
>>>>> <mailto:smikloso...@apache.org>> wrote:
>>>>> 
>>>> 
>>>>> That is cool but this still does not show / explain how it would look 
>>>>> like when it comes to dependencies needed for actually talking to 
>>>>> storages like s3. 
>>>>> 
>>>>> Maybe I am missing something here and please explain when I am mistaken 
>>>>> but If I understand that correctly, for talking to s3 we would need to 
>>>>> use a library like this, right? (1). So that would be added among 
>>>>> Cassandra dependencies? Hence Cassandra starts to be biased against s3? 
>>>>> Why s3? Every time somebody comes up with a new remote storage support, 
>>>>> that would be added to classpath as well? How are these dependencies 
>>>>> going to play with each other and with Cassandra in general? Will all 
>>>>> these storage provider libraries for arbitrary clouds be even compatible 
>>>>> with Cassandra licence-wise?
>>>>> 
>>>>> I am sorry I keep repeating these questions but this part of that I just 
>>>>> don't get at all. 
>>>>> 
>>>>> We can indeed add an API for this, sure sure, why not. But for people who 
>>>>> do not want to deal with this at all and just be OK with a FS mounted, 
>>>>> why would we block them doing that?
>>>>> 
>>>>> (1) 
>>>>> https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/pom.xml
>>>>> 
>>>>> On Wed, Mar 5, 2025 at 3:28 PM Mick Semb Wever <m...@apache.org 
>>>>> <mailto:m...@apache.org>> wrote:
>>>>>    .
>>>>>   
>>>>> 
>>>>> It’s not an area where I can currently dedicate engineering effort. But 
>>>>> if others are interested in contributing a feature like this, I’d see it 
>>>>> as valuable for the project and would be happy to collaborate on 
>>>>> design/architecture/goals.
>>>>> 
>>>>> 
>>>>> Jake mentioned 17 months ago a custom FileSystemProvider we could offer.
>>>>> 
>>>>> None of us at DataStax has gotten around to providing that, but to 
>>>>> quickly throw something over the wall this is it:
>>>>>  
>>>>> https://github.com/datastax/cassandra/blob/main/src/java/org/apache/cassandra/io/storage/StorageProvider.java
>>>>>  
>>>>>   (with a few friend classes under o.a.c.io.util)
>>>>> 
>>>>> We then have a RemoteStorageProvider, private in another repo, that 
>>>>> implements that and also provides the RemoteFileSystemProvider that Jake 
>>>>> refers to.
>>>>> Hopefully that's a start to get people thinking about CEP level details, 
>>>>> while we get a cleaned abstract of RemoteStorageProvider and friends to 
>>>>> offer.

Reply via email to