On Wed, Jul 28, 2010 at 9:13 PM, Aaron Morton <aa...@thelastpickle.com> wrote: > Have you considered Redis http://code.google.com/p/redis/? > > It may be more suited to the master-slave configuration you are after. > > - You can have a master to write to, then slave to a slave master, then your > web heads run a local redis and slave from the slave master. > - Backup at the master or the slave master > - Writes to the write master would make their way to the web head slave. > - Web heads only read from their local slave. > - Reads will be all in memory and faster than disk > - Redis can store a lot of data in memory and also use disk > (http://blogzawodny.com/2010/07/24/200000000-keys-in-redis-2-0-0-rc3/) > - Web heads would have to write to the master, not locally > > It sounds like your thinking of running a cassandra node on each web head > with full replication and only reading locally, I'm not sure if this is the > best use case. Would like to know what others think. I would imagine you > would get better over all up time and performance from running cassandra as > a cluster separate from the web heads, with less than full replication. >
Thanks for this, Aaron. It does actually look like Redis may be better suited to our needs. I had originally discounted Redis because I had the impression that it had volatile storage only, but now I see that not to be the case. Thanks again! > Aaron > > > > > On 29 Jul, 2010,at 11:11 AM, Russ Brown <pickscr...@gmail.com> wrote: > > Hi, > > I'm currently looking at NoSQL solutions to replace a bespoke system > that we currently have in place. Currently I think the best fit is > Cassandra, but I would like to get some feedback from those who know > it better before spending more time on it. > > Our current system is geared to allowing our web servers to operate > very quickly and completely independently (for most pages) of other > servers. This is accomplished by keeping chunks of data about "things" > on each machine's disk with a file per entity. The key in this is > effectively the filename, with the value being the file's content. A > central server handles the initial generation (and subsequent updates) > of these files, and distribution to the web servers is carried out by > a combination of network share mounting and shell scripts. > > The system *does* work: the servers are very fast and they do work > fine when the servers behind them disappear. However, the storage and > transport mechanisms are cumbersome, and we would like to see if there > are suitable alternatives available. > > The idea is to replace the disk-based storage on each server with a > NoSQL solution using replication to handle the transport automatically > for us. What we need is: > > * One "master", though being able to have a backup for it that we > could quickly bring into play would be advantageous > * Each "slave" must have a full copy of the data > * It does not matter if the slaves do not get updates immediately or > at exactly the same time, as long as they get there quickly > * Reads must be fast (though understandably it will probably be > slower than reading a system-cached file direct from disk) > * It would be a bonus if the slaves could be written to too, with the > writes making their way to the other nodes. This is probably a given, > but I thought I'd mention it anyway. > > Now, I have read a few things about Cassandra's read performance which > is what has got me a bit worried. However, I have also read quite a > bit about its flexibility in terms of topology, and that the read > performance is very much dependent on how things are set up. For > example, a lot of what I've read describes how when querying a node it > will ask other nodes for information, which it then collates and > returns. Is it possible to configure Cassandra in such a way that a > node only every asks itself for the data, and if so what sort of > effect will that have on read performance? Our current solution is > designed to avoid having to hit the network, so doing the same here > would be advantageous. > > I have also read that Cassandra will distribute data between different > nodes, while we want all to have a full copy of all data. Is it > possible to configure Cassandra in this way? > > If this will work, it will be a heck of a lot cleaner and easier to > maintain than the current solution, so we're quite hopeful. :) > > Feedback appreciated, > > -- > > Russ > -- Russ