I'm afraid I didn't think too hard about your overall problem, since since you haven't gotten other responses I can at least say something:
> Theory 1: use EHCache or something like it. > Theory 2: having it in memory in the Cassandra server is nearly as > good as having it in memory in my jvm, since thrift is thrifty. I would not expect to get comparable performance to just grabbing data from a hash map on your local heap. To what extent it compares to EHCache I don't know since I don't know how EHCache is implemented and I haven't used it; but a plain hash table get vs. thrift + validation of the RPC call + going through some stages in Cassandra (typically forced context switching) and back is definitely going to be significantly slower. If you want to get a feel for the basic performance on a simple workload on in-memory data, maybe have a look at the stress.py tool that is part of Cassandra. But you won't get performance comparable to a local in-memory HashMap, I can tell you that right off the bat. > Theory 3: I've seen some blogs from a while back about embedding > Cassandra. I'm not clear on the current viability of this, or of the > efficiency thereof. I've never done it, but I've seen people talk about doing it for some purposes (testing is one). But my sense is that overall, don't consider that a supported feature and don't rely on it for serious production use unless you are willing to invest time in the code base and figure out issues that result. (Someone correct me if I'm painting a too gloomy picture here.) -- / Peter Schuller