Hi Maki and Adrian, Thank you very much for the promptness. It's weekend after all :).
I realized I forgot a part of my question until Adrian mentioned the replication factor. Is it also possible to set where the replicas are stored as well? Thanks. This is a research experiment we're exploring with socially-related data. If we want to pull data of A and B out of Cassandra, (i.e LastNameColumn['A'], and LastNameColumn['B'), it should be faster if these values are stored in the same box than if one is stored at a box in NY and another, Tokyo, no? Regards, -k On Sun, Jun 5, 2011 at 2:07 AM, Adrian Cockcroft <adrian.cockcr...@gmail.com> wrote: > Sounds like Khanh thinks he can do joins... :-) > > User oriented data is easy, key by facebook id, let cassandra handle > location. Set replication factor=3 so you don't lose data and can do > consistent but slower read after write when you need to using quorum. > If you are running on AWS you should distribute your replicas over > availability zones. > > Then you can do read A, read B join them in your app code. Single > digit milliseconds for each read or write. > > If you want to do bulk operations over many users, use Brisk with a Hadoop > job. > > HTH > Adrian > > On Sat, Jun 4, 2011 at 9:32 PM, Maki Watanabe <watanabe.m...@gmail.com> wrote: >> You may be able to do it with the Order Preserving Partitioner with >> making key to node mapping before storing data, or you may need your >> custom Partitioner. Please note that you are responsible to distribute >> load between nodes in this case. >> From application design perspective, it is not clear for me why you >> need to store user A and his friends into same box.... >> >> maki >> >> >> 2011/6/5 Khanh Nguyen <nguyen.h.kh...@gmail.com>: >>> Hi everyone, >>> >>> Is it possible to have direct control over where objects are stored in >>> Cassandra? For example, I have a Cassandra cluster of 4 machines and 4 >>> objects A, B, C, D; I want to store A at machine 1, B at machine 2, C >>> at machine 3 and D at machine 4. My guess is that I need to intervene >>> they way Cassandra hashes an object into the keyspace? If so, how >>> complicated the task will be? >>> >>> I'm new to the list and Cassandra. The reason I am asking is that my >>> current project is related to social locality of data: if A and B are >>> Facebook friends, I want to store their data as close as possible, >>> preferably in the same machine in a cluster. >>> >>> Thank you. >>> >>> Regards, >>> >>> -k >>> >> >> >> >> -- >> w3m >> >