Ryan Daum <r...@thimbleware.com> writes: > This is very discouraging; I've looked several times at this code and could > not believe my eyes in regard to the wanton use of global statics. In > addition to smelling bad, it makes it difficult to embed Cassandra. Is there > no will at all to fix this?
I experienced all manner of problems when trying to embed Cassandra myself. the primary reason I wanted to embed Cassandra was for unit testing. I was using the @Rule annotation in JUnit to let junit create a unique temporary directory for the Cassandra instance. Once I had a temp dir I then created the needed directories and used the Apache Velocity templating engine to produce a storage-conf.xml with absolute paths to the various directories for commit logs, data etc. once the tests are done the framework takes care of cleaning up the files. this also ensures that if I run several tests in parallell I get separate unique temp directories for each instance. (I saw Ran Tavory had contributed a DataCleaner class (or what it was named) to do something similar, but I didn't want to use that since JUnit already has the needed mechanisms for doing this. besides, I didn't like relying on a single testing directory. of course, reality came crashing in when I had more than one test and thus more than one embedded Cassandra instance. I tried to look for quick solutions to this, but eventually flushed an entire week's work down the toilet and left for vacation. now we plan to take an inferior approach to the testing simply because we've run out of time to get this done properly. (In an ideal world I would be able to sit down with the Cassandra code, rewrite the parts that are "misbehaving" and work with someone to get the code reviewed). okay, so what I would have wanted to do if I had the time: - go through the Cassandra code and remove singletons. - make Cassandra easier to embed by making starting and stopping work properly (for some reason that I have forgotten I had shutdown and/or timing issues. for servers to be embeddable the start() and stop()/shutdown() methods need to block until some known state is reached. (if shutdown() has to be slow because of work that needs to be done before safe shutdown it may be an idea to implement kill() for unsafe shutdown -- for instance when you know you will nuke the data anyway) - Remove dependence on config files. It should be possible to just instantiate an embedded Cassandra server, pass it a config object and then start it without having to touch the filesystem or access any resource files for config. Depending on files or resources for config is bad. (However, there is nothing wrong with having a trivial API for reading files to produce a config object you can then pass into Cassandra). The detour I made into rendering an Apache Velocity template to produce a storage-conf.xml only to have my embedded Cassandra instance read it again was just silly. there are other valid reasons for wanting to embed Cassandra besides unit testing. for instance, if you are writing an application that depends on Cassandra and you want the option of packaging it as a single binary for single node experimentation, development and demo purposes. as an example, I am currently working on a project where I have a server that will be talking to a Cassandra cluster of half a dozen nodes. but other development projects depend on this server, so they need some quick way of getting it up and running on their own workstations and laptops-- so they can start the server with a command line option that says "use an embedded Cassandra server". of course, in unit tests they also want to be able to embed my server and, of course, Cassandra. I've done this a few times with Apache Derby -- to give users the option of running with an embedded SQL server if they don't want the hassle of setting up a MySQL instance, or fire up the application and have it talk to a MySQL instance. so in short: yes, I am very, very interested in Cassandra being embeddable, I am very interested in being able to have more than one Cassandra instance in the same JVM and I am very interested in being able to programmatically configuring Cassandra rather than messing with config files. :-) sorry for not having more time to actually go and do these things rather than whine about them. -Bjørn