Definitely supportive of modularizing code but from a developer
productivity standpoint we should discuss the overhead of managing changes
across multiple repos.

On Sun, Mar 16, 2025 at 4:26 AM Benedict Elliott Smith <bened...@apache.org>
wrote:

> I want to break out at least one or two shared library projects. Both
> accord and in-jvm-dtest-api should share code with the Cassandra main
> project, particularly executors/futures/collections/concurrency utilities.
> This is something that has caused me some recurring friction over the past
> few years, so if there’s appetite I may try to pursue it in the near future.
>
> I also like the idea of defining our public APIs in a separate
> jar/folder/source tree. This helpfully also solves the never-ending
> discussion topic of how we define what our public APIs are. I don’t have
> any cycles for this, but I doubt it would be controversial.
>
> I am less sure about how we might go about breaking up the internals of
> Cassandra itself, but the accord project is perhaps a step in this
> direction.
>
> That all said, plugin dependencies are a much easier problem than this. We
> don’t need to run the plugins on their own threads; they just need their
> own class loader - which is anyway probably a good idea. We can perhaps
> even reuse the logic we already have for loading UDFs, but relax some of
> the restrictions.
>
>
> On 6 Mar 2025, at 21:27, Josh McKenzie <jmcken...@apache.org> wrote:
>
> I've gotten the impression that there's not a lot of enthusiasm for
> breaking apart the main Cassandra module, but I have wondered if it'd be
> worth making an exception for the interfaces plugins are supposed to code
> against
>
> Oh, there's *plenty* of enthusiasm. There's been a shortage of consensus
> however. *For now. *:D
>
> I think breaking out the interfaces first makes a lot of sense as that'd
> allow us to focus almost purely on build dependency and environmental
> factors w/out having to reason through implementation code movements and
> encapsulation breakage. I believe there's folks working on exploring the
> current build system through the lens of requirements to break out shared
> deps; I'll see if I can't rustle them up.
>
> On Thu, Mar 6, 2025, at 4:06 PM, Joel Shepherd wrote:
>
> Splitting this out from the CEP-36 thread.
>
> I agree: dependency collisions at run-time are a problem. It's made even
> worse by the possibility of users using multiple plugins (authn, authz,
> compression, storage, etc.).
>
> It also cuts two ways. E.g. the interfaces that plugin authenticators need
> to implement are defined in org.apache.cassandra.auth, so as far as I know
> the plugin has to take a build-time dependency on the main Cassandra module
> itself, and pull in all of its dependencies. (I'd love to be told that I'm
> mistaken.) In addition to the risk of version conflicts, it increases the
> risk of a change to Cassandra's own dependencies inadvertently breaking a
> plugin that's taken a transitive dependency. Might be bad form on the
> plugin's part, but certainly possible.
>
> I've gotten the impression that there's not a lot of enthusiasm for
> breaking apart the main Cassandra module, but I have wondered if it'd be
> worth making an exception for the interfaces plugins are supposed to code
> against. It'd be nice to depend on those without pulling in the rest of the
> project, and it'd be another step towards reducing the risk of plugins
> breaking because of dependency changes in the main project.
>
> -- Joel.
> On 3/6/2025 10:52 AM, Jon Haddad wrote:
>
> Hey Joel, thanks for chiming in!
>
> Regarding dependencies - while it's possible to provide pluggable
> interfaces, the issue I'm concerned about is conflicting versions of
> transitive dependencies at runtime.  For example, I used a java agent that
> had a different version of snakeyaml, and it ended up breaking C*'s startup
> sequence [1].  I suggest putting external modules on separate threads with
> their own classpath to avoid this issue.
>
> I think there's quite a bit of overlap between the two desires expressed
> in this thread, even though they achieve very different results.  I
> personally can't see myself using something that treats an object store as
> cold storage where SSTables are moved (implying they weren't there before),
> and I've expressed my concerns with this, but other folks seem to want it
> and that's OK.  I feel very strongly that treating local storage as a cache
> with the full dataset on object store is a better approach, but ultimately
> different people have different priorities.  Either way, stuff is moved to
> object store at some point, and pulled to the local disk on demand.
>
> I am *firmly* of the position that this CEP should not exclude the local
> storage as cache option, and should be accounted for in the design.
>
> Jon
>
> [1] https://issues.apache.org/jira/browse/CASSANDRA-19663
>
>
> On Thu, Mar 6, 2025 at 10:31 AM Joel Shepherd <sheph...@amazon.com> wrote:
>
> On 3/6/2025 7:16 AM, Jon Haddad wrote:
>
> Assuming everything else is identical, might not matter for S3. However,
> not every object store has a filesystem mount.
>
> Regarding sprawling dependencies, we can always make the provider specific
> libraries available as a separate download and put them on their own thread
> with a separate class path. I think in JVM dtest does this already.
> Someone just started asking about IAM for login, it sounds like a similar
> problem.
>
> That was me. :-) Cassandra's auth already has fairly well defined
> interfaces and a plug-in mechanism, so it's easy to vend alternative auth
> solutions without polluting the main project's dependency graph, at
> build-time anyway. A similar approach could be beneficial for CEP-36,
> particularly (IMO) for cold-storage purposes. I suspect decoupling
> pluggable alternate channel proxies for cold storage from configurable
> alternate channel proxies for redirecting data locally to free up space,
> migrate to a different storage device, etc., would make both easier. The
> CEP seems to be trying to do both, but they smell like pretty different
> goals to me.
>
> Thanks -- Joel.
>
>
> On Thu, Mar 6, 2025 at 12:53 AM Benedict <bened...@apache.org> wrote:
>
> I think another way of saying what Stefan may be getting at is what does a
> library give us that an appropriately configured mount dir doesn’t?
>
> We don’t want to treat S3 the same as local disk, but this can be achieved
> easily with config. Is there some other benefit of direct integration? Well
> defined exceptions if we need to distinguish cases is one that maybe
> springs to mind but perhaps there are others?
>
>
> On 6 Mar 2025, at 08:39, Štefan Miklošovič <smikloso...@apache.org> wrote:
>
> 
>
> That is cool but this still does not show / explain how it would look like
> when it comes to dependencies needed for actually talking to storages like
> s3.
>
> Maybe I am missing something here and please explain when I am mistaken
> but If I understand that correctly, for talking to s3 we would need to use
> a library like this, right? (1). So that would be added among Cassandra
> dependencies? Hence Cassandra starts to be biased against s3? Why s3? Every
> time somebody comes up with a new remote storage support, that would be
> added to classpath as well? How are these dependencies going to play with
> each other and with Cassandra in general? Will all these storage
> provider libraries for arbitrary clouds be even compatible with Cassandra
> licence-wise?
>
> I am sorry I keep repeating these questions but this part of that I just
> don't get at all.
>
> We can indeed add an API for this, sure sure, why not. But for people who
> do not want to deal with this at all and just be OK with a FS mounted, why
> would we block them doing that?
>
> (1)
> https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/pom.xml
>
> On Wed, Mar 5, 2025 at 3:28 PM Mick Semb Wever <m...@apache.org> wrote:
>
>    .
>
>
> It’s not an area where I can currently dedicate engineering effort. But if
> others are interested in contributing a feature like this, I’d see it as
> valuable for the project and would be happy to collaborate on
> design/architecture/goals.
>
>
>
> Jake mentioned 17 months ago a custom FileSystemProvider we could offer.
>
> None of us at DataStax has gotten around to providing that, but to quickly
> throw something over the wall this is it:
>
> https://github.com/datastax/cassandra/blob/main/src/java/org/apache/cassandra/io/storage/StorageProvider.java
>
>   (with a few friend classes under o.a.c.io.util)
>
> We then have a RemoteStorageProvider, private in another repo, that
> implements that and also provides the RemoteFileSystemProvider that Jake
> refers to.
> Hopefully that's a start to get people thinking about CEP level details,
> while we get a cleaned abstract of RemoteStorageProvider and friends to
> offer.
>
>
>

Reply via email to