Hello, Mike, Thank you for your advice. I'll close this thread with this mail (I've been afraid I was interrupting the community developers with cloudy questions.) I'm happy to know that any clearly known limitation does not exist to limit the cluster to a couple hundreds of nodes. If our project starts with Cassandra and encounter any issues or interesting things, I'll report here.
Regards, Takayuki Tsunakawa From: Mike Malone Hey Takayuki, I don't think you're going to find anyone willing to promise that Cassandra will fit your petabyte scale data analysis problem. That's a lot of data, and there's not a ton of operational experience at that scale within the community. And the people who do work on that sort of problem tend to be busy ;). If your problem is that big, you're probably going to need to do some experimentation and see if the system will scale for you. I'm sure someone here can answer any specific questions that may come up if you do that sort of work. As you mentioned, the first concern I'd have with a cluster that big is whether gossip will scale. I'd suggest taking a look at the gossip code. Cassandra nodes are "omniscient" in the sense that they all try to maintain full ring state for the entire cluster. At a certain cluster size that no longer works. My best guess is that a cluster of 1000 machines would be fine. Maybe even an order of maginitude bigger than that. I could be completely wrong, but given the low overhead that I've observed that estimate seems reasonable. If you do find that gossip won't work in your situation it would be interesting to hear why. You may even consider modifying / updating gossip to work for you. The code isn't as scary as it may seem. At that scale it's likely you'll encounter bugs and corner cases that other people haven't, so it's probably worth familiarizing yourself with the code anyways if you decide to use Cassandra. Mike