This has been on my mind for a while, and I think it's a great idea. Someone shouldn't need all the C* internals just to implement an interface.
On Thu, Mar 6, 2025 at 1:08 PM Joel Shepherd <sheph...@amazon.com> wrote: > Splitting this out from the CEP-36 thread. > > I agree: dependency collisions at run-time are a problem. It's made even > worse by the possibility of users using multiple plugins (authn, authz, > compression, storage, etc.). > > It also cuts two ways. E.g. the interfaces that plugin authenticators need > to implement are defined in org.apache.cassandra.auth, so as far as I know > the plugin has to take a build-time dependency on the main Cassandra module > itself, and pull in all of its dependencies. (I'd love to be told that I'm > mistaken.) In addition to the risk of version conflicts, it increases the > risk of a change to Cassandra's own dependencies inadvertently breaking a > plugin that's taken a transitive dependency. Might be bad form on the > plugin's part, but certainly possible. > > I've gotten the impression that there's not a lot of enthusiasm for > breaking apart the main Cassandra module, but I have wondered if it'd be > worth making an exception for the interfaces plugins are supposed to code > against. It'd be nice to depend on those without pulling in the rest of the > project, and it'd be another step towards reducing the risk of plugins > breaking because of dependency changes in the main project. > > -- Joel. > On 3/6/2025 10:52 AM, Jon Haddad wrote: > > Hey Joel, thanks for chiming in! > > Regarding dependencies - while it's possible to provide pluggable > interfaces, the issue I'm concerned about is conflicting versions of > transitive dependencies at runtime. For example, I used a java agent that > had a different version of snakeyaml, and it ended up breaking C*'s startup > sequence [1]. I suggest putting external modules on separate threads with > their own classpath to avoid this issue. > > I think there's quite a bit of overlap between the two desires expressed > in this thread, even though they achieve very different results. I > personally can't see myself using something that treats an object store as > cold storage where SSTables are moved (implying they weren't there before), > and I've expressed my concerns with this, but other folks seem to want it > and that's OK. I feel very strongly that treating local storage as a cache > with the full dataset on object store is a better approach, but ultimately > different people have different priorities. Either way, stuff is moved to > object store at some point, and pulled to the local disk on demand. > > I am *firmly* of the position that this CEP should not exclude the local > storage as cache option, and should be accounted for in the design. > > Jon > > [1] https://issues.apache.org/jira/browse/CASSANDRA-19663 > > > On Thu, Mar 6, 2025 at 10:31 AM Joel Shepherd <sheph...@amazon.com> wrote: > >> On 3/6/2025 7:16 AM, Jon Haddad wrote: >> >> Assuming everything else is identical, might not matter for S3. However, >> not every object store has a filesystem mount. >> >> Regarding sprawling dependencies, we can always make the provider >> specific libraries available as a separate download and put them on their >> own thread with a separate class path. I think in JVM dtest does this >> already. Someone just started asking about IAM for login, it sounds like a >> similar problem. >> >> That was me. :-) Cassandra's auth already has fairly well defined >> interfaces and a plug-in mechanism, so it's easy to vend alternative auth >> solutions without polluting the main project's dependency graph, at >> build-time anyway. A similar approach could be beneficial for CEP-36, >> particularly (IMO) for cold-storage purposes. I suspect decoupling >> pluggable alternate channel proxies for cold storage from configurable >> alternate channel proxies for redirecting data locally to free up space, >> migrate to a different storage device, etc., would make both easier. The >> CEP seems to be trying to do both, but they smell like pretty different >> goals to me. >> >> Thanks -- Joel. >> >> >> On Thu, Mar 6, 2025 at 12:53 AM Benedict <bened...@apache.org> wrote: >> >>> I think another way of saying what Stefan may be getting at is what does >>> a library give us that an appropriately configured mount dir doesn’t? >>> >>> We don’t want to treat S3 the same as local disk, but this can be >>> achieved easily with config. Is there some other benefit of direct >>> integration? Well defined exceptions if we need to distinguish cases is one >>> that maybe springs to mind but perhaps there are others? >>> >>> >>> On 6 Mar 2025, at 08:39, Štefan Miklošovič <smikloso...@apache.org> >>> wrote: >>> >>> >>> >>> That is cool but this still does not show / explain how it would look >>> like when it comes to dependencies needed for actually talking to storages >>> like s3. >>> >>> Maybe I am missing something here and please explain when I am mistaken >>> but If I understand that correctly, for talking to s3 we would need to use >>> a library like this, right? (1). So that would be added among Cassandra >>> dependencies? Hence Cassandra starts to be biased against s3? Why s3? Every >>> time somebody comes up with a new remote storage support, that would be >>> added to classpath as well? How are these dependencies going to play with >>> each other and with Cassandra in general? Will all these storage >>> provider libraries for arbitrary clouds be even compatible with Cassandra >>> licence-wise? >>> >>> I am sorry I keep repeating these questions but this part of that I just >>> don't get at all. >>> >>> We can indeed add an API for this, sure sure, why not. But for people >>> who do not want to deal with this at all and just be OK with a FS mounted, >>> why would we block them doing that? >>> >>> (1) >>> https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/pom.xml >>> >>> On Wed, Mar 5, 2025 at 3:28 PM Mick Semb Wever <m...@apache.org> wrote: >>> >>>> . >>>> >>>> >>>> It’s not an area where I can currently dedicate engineering effort. But >>>>> if others are interested in contributing a feature like this, I’d see it >>>>> as >>>>> valuable for the project and would be happy to collaborate on >>>>> design/architecture/goals. >>>>> >>>> >>>> >>>> Jake mentioned 17 months ago a custom FileSystemProvider we could offer. >>>> >>>> None of us at DataStax has gotten around to providing that, but to >>>> quickly throw something over the wall this is it: >>>> >>>> https://github.com/datastax/cassandra/blob/main/src/java/org/apache/cassandra/io/storage/StorageProvider.java >>>> >>>> (with a few friend classes under o.a.c.io.util) >>>> >>>> We then have a RemoteStorageProvider, private in another repo, that >>>> implements that and also provides the RemoteFileSystemProvider that Jake >>>> refers to. >>>> >>>> Hopefully that's a start to get people thinking about CEP level >>>> details, while we get a cleaned abstract of RemoteStorageProvider and >>>> friends to offer. >>>> >>>>