I'm afraid I didn't think too hard about your overall problem, since
since you haven't gotten other responses I can at least say something:

> Theory 1: use EHCache or something like it.
> Theory 2: having it in memory in the Cassandra server is nearly as
> good as having it in memory in my jvm, since thrift is thrifty.

I would not expect to get comparable performance to just grabbing data
from a hash map on your local heap. To what extent it compares to
EHCache I don't know since I don't know how EHCache is implemented and
I haven't used it; but a plain hash table get vs. thrift + validation
of the RPC call + going through some stages in Cassandra (typically
forced context switching) and back is definitely going to be
significantly slower.

If you want to get a feel for the basic performance on a simple
workload on in-memory data, maybe have a look at the stress.py tool
that is part of Cassandra. But you won't get performance comparable to
a local in-memory HashMap, I can tell you that right off the bat.

> Theory 3: I've seen some blogs from a while back about embedding
> Cassandra. I'm not clear on the current viability of this, or of the
> efficiency thereof.

I've never done it, but I've seen people talk about doing it for some
purposes (testing is one). But my sense is that overall, don't
consider that a supported feature and don't rely on it for serious
production use unless you are willing to invest time in the code base
and figure out issues that result. (Someone correct me if I'm painting
a too gloomy picture here.)

-- 
/ Peter Schuller

Reply via email to