li wei <liwei_6v <at> yahoo.com> writes: > > Thanks you very much, Per! > > ----- Original Message ---- > From: Per Olesen <pol <at> trifork.com> > To: "user <at> cassandra.apache.org" <user <at> cassandra.apache.org> > Sent: Wed, June 9, 2010 4:02:52 PM > Subject: Re: Quick help on Cassandra please: cluster access and performance > > On Jun 9, 2010, at 9:47 PM, li wei wrote: > > > Thanks a lot. > > We are set READ one, WRITE ANY. Is this better than QUORUM in performance. > > Yes, but less consistency safe. > > > Do you think the cassandra Cluster (with 2 or nodes) should be always faster than Single one node in the > reality and theory? > > Or it depends? > > It depends > > I think the idea with cassandra is that it scales linearly. So, if you have obtained some performance > numbers X for read performance. And you get lots of new users and data amounts, you can keep having X simply > by adding new nodes. > > But I think there are others on this list with much more insight into this than mine! > > /Per > >
We have done a lot of work trying to get performance to scale as we enlarge our cluster and found that there is a single server bottleneck if all of your clients talk to one server, no matter how many server nodes you add to your cluster. The best scaling that we experienced (quite linear, actually) was to have our clients use a round-robin scheme to distribute their communications evenly with all the server nodes in the cluster. This avoids a single server bottleneck. This is interesting since for most writes or reads, the server being contacted will most likely have to ship off the row to be written/read to another server. In our testing, we actually have x clients and x servers (where we've gone from x=1, 2, 4, 8, and 16) where each client is talking to a particular server. We saw excellent performance scaling this way. (For example, client1 contacts server1, client2 contacts server2, etc.) A round robin approach is probably the real way to do this for an actual system. We tried MANY things but did not see good scaling until we started evenly distributing our communications amongst all the servers in the cluster.