I am right now evaluating Cassandra 1.0.0-rc2 with a super column family with one super column and variable (from 7 to 18) number of sub columns inside that one super column. The column family has about 20 million rows (keys) and about 200,000 million sub columns (records) in total. I use a 3 node linux cluster with replication factor as three and read/write consistancy level as hector client default (QUORUM). The column family uses SnappyCompressor and has read repair disabled
The issue I face is that when I query the super column for a particular key using hector-core-0.8.0-2, the data retrieved is inconsistent. I mean, for the key that I use to fetch data, there are 7 sub columns actually. But the query returns 1 or 3 sub columns depending on which nodes respond to it. (I tested by bringing down each one of the three nodes in turn). When I tried to fetch the data for the same key using cassandra-cli tool, I get all the 7 sub columns for both the consistancy levels ONE and QUORUM. So is this a inconsistancy issue or compatibility problem between Cassandra 1.0.0- rc2 and hector-core-0.8.0-2? I tested with same column family and same code in Cassandra 1.0.0-beta1 version and it works fine there. Because I am new to Cassandra and been working with it for only two months I paste the code I used to fetch the data below: superColumnQuery = HFactory.createSuperColumnQuery (keyspaceOperator, stringSerializer, stringSerializer, stringSerializer, stringSerializer); superColumnQuery.setColumnFamily(cfName).setKey (key).setSuperName(scName); result=superColumnQuery.execute(); superColumn=result.get(); columnList=superColumn.getColumns(); Please suggest me on the problem and the solution. Also I need a suggestion on the data model that I use and the performance (especially the read) that I can expect. The behaviour of our system is that we would be receiving 200 million records per day in total with a maximum allowed limit of 50000 records per second. The read request for a particular record could be received just after two minutes the record has been written in the system. The 200 million records could be generated only for maximum of 20 million keys. So in one extreme case it would be 20 million keys each holding 10 records and in other other extreme case it could be just 2 million keys holding 100 records each We use 3 node linux cluster with each node having 32GB RAM I initially tested with Casandra 0.8.5 and Hector 0.8.0-1 and the Cassandra process took default of 16GB. The replication factor is 3 and read/write consistancy is hector default which is QUORUM. I used a day wise column family, that is a column family would be created for each day thereby limiting the maximum number of records that could be stored in a column family as 200 million and maximum number of keys as 20 million. That is because I wanted to limit the size of the sstable (during and after major compaction). For every key in the column family, I dynamically create a column for every record(data) inserted against the key. Here I could get write performance of about 64000 records using 5 load threads in each node (Ofcourse by changing some of the parameters in Cassandra.yaml but I don't want to get into that detail because write performance is not my concern right now and the parameters I changed should not impact read performance except compaction_throughput_in_mb) But the read performance was very slow. I should mention that we could get a maximum of 100 requests per second with an expected response time for most of the requests which is less than a second (preferably 0.75 second because we would like to process the records after being fetched from the column family). A read request for a key would be at least and generally for 7 days range with the seventh day being the current day. This means internally for every read request we would query 7 column families and the seventh/latest day column family will be being written only then. So compaction should be and would be intensely running and runs into I/O contention with read requests. If I minimise compactions, the number of sstables for the current day are so huge so still the read requests were very slow I tried limiting the write load to only 24000 records per second so that compaction could catch up and reduce number of sstables for current day but still the read performance was very bad. It could not handle even 25 requests per second. Then I tried Cassandra 1.0.0-beta1 version because it came with compression option. Compression during write should reduce the io contention between compaction and read threads because the data to be accessed is lesser in space and I used SnappyCompressor. Also I used 7 different data storages (data directories) instead of one. I reduced the heap size for Cassandra process to 8GB instead of default 16GB(half of a node's RAM size). I disabled read repair because I thought it might improve performance and would any way is not related to inconsistancy (correct me if I am wrong!) I used Hector 0.8.0-2 to write/read data I changed the data model so that instead of creating a separate column when inserting a data against a key, there will be only one super column in the column family and every data would be inserted as a sub column in the super column. This is because I thought a single super column fetch should be faster than a slice query which scans all columns for a key (correct me if I am wrong!) I did not see any improvement in write performance, in fact it became slow but very slight impact. The read performance improved a lot, but still went to about 2-3 seconds response time when the number of requests were maximum,100. Then I saw Cassandra 1.0.0-rc2 released and tried that. Some how I saw that the memtable flush was happening after a larger thresold size than in 1.0.0- beta1 version and the size of the sstable created was larger than that was in 1.0.0-beta1 version. That is the number of sstables will be lesser in 1.0.0- rc2 version and this means the read performance should further improve (not sure if this improvement in the release candidate version is related to 'Fix counting CFMetadata towards Memtable liveRatio' change ) Infact the number of memtable switch count reduced by 4 times compared to 1.0.0-beta1 version and that is why I want to evaluate read performance in 1.0.0-rc2 version. But during that I faced inconsistant values in read response and that triggered my very first query in this post. Sorry if the description has been too lengthy, but we are right now at a crucial stage of evaluating open source databases against our current database and if Cassandra seems to be feasible that might lead to a major change in our production system at a large scale. Suggestions on the inconsistancy problem, the data model we are using and ways to improve read performance are greatly welcomed and appreciated We have evaluated HBase 3 node cluster for the same requirements already and it is found to give almost equal performance in write (say 80% compared to Cassandra) and a much better performance for reads. In hbase-0.90.3, for 100 requests per second, at least 75% of the requests were within a second and the average response time was 400 milli seconds.