> > I am currently benchmarking Cassandra with three machines, and on each machine I am seeing an unbalanced distribution of data among the data directories (1 per disk). > I am concerned that this affects my write performance, is there anything that I can make the distribution be more even? Would raid0 be my best option? >
Using LeveledCompactionStrategy should provide a much better balance. However, depending on your use case, this may not be the right choice for your workload, in which case RAID0 with a single data_dir will be the best option. > Total size of data is about 2TB, 14B records, all unique. Replication factor of 1. RF=1 means *no* redundancy which is a bad idea to run in production (and sort of defeats the purpose of a system like Cassandra). This is not going to be an accurate a picture for a load test as it eliminates a lot of cross-node traffic which you would see with a higher Replication Factor. -- ----------------- Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com