I am right now evaluating Cassandra 1.0.0-rc2  with a super column family with 
one super column and variable (from 7 to 18) number of sub columns inside that 
one super column. The column family has about 20 million rows (keys) and about 
200,000 million sub columns (records) in total. I use a 3 node linux cluster 
with replication factor as three and read/write consistancy level as hector 
client default (QUORUM). The column family uses SnappyCompressor and has read 
repair disabled

The issue I face is that when I query the super column for a particular key 
using hector-core-0.8.0-2,
the data retrieved is inconsistent. I mean, for the key that I use to fetch 
data, there are 7 sub columns actually. But the query returns 1 or 3 sub 
columns depending on which nodes respond to it. (I tested by bringing down 
each one of the three nodes in turn).  

When I tried to fetch the data for the same key using cassandra-cli tool, I 
get all the 7 sub columns for both the consistancy levels ONE and QUORUM. So 
is this a inconsistancy issue or compatibility problem between Cassandra 1.0.0-
rc2  and hector-core-0.8.0-2? I tested with same column family and same code 
in Cassandra 1.0.0-beta1 version and it works fine there.

Because I am new to Cassandra and been working with it for only two months I 
paste the code I used to fetch the data below:

                        superColumnQuery = HFactory.createSuperColumnQuery
(keyspaceOperator, 
                                                    stringSerializer, 
stringSerializer, stringSerializer, stringSerializer);
                        superColumnQuery.setColumnFamily(cfName).setKey
(key).setSuperName(scName);
                        result=superColumnQuery.execute();
                        superColumn=result.get();
                        columnList=superColumn.getColumns();

Please suggest me on the problem and the solution.


Also I need a suggestion on the data model that I use and the performance 
(especially the read) that I can expect.

The behaviour of our system is that we would be receiving 200 million records 
per day in total with a maximum allowed limit of 50000 records per second.  
The read request for a particular record could be received just after two 
minutes the record has been written in the system. The 200 million records 
could be generated only for maximum of 20 million keys. So in one extreme case 
it would be 20 million keys each holding 10 records and in other other extreme 
case it could be just 2 million keys holding 100 records each

We use 3 node linux cluster with each node having 32GB RAM

I initially tested with Casandra 0.8.5  and Hector 0.8.0-1 and the Cassandra 
process took default of 16GB.

The replication factor is 3 and read/write consistancy is hector default which 
is QUORUM. 

I used a day wise column family, that is a column family would be created for 
each day thereby limiting the maximum number of records that could be stored 
in a column family as 200 million and maximum number of keys as 20 million. 
That is because I wanted to limit the size of the sstable (during and after 
major compaction). For every key in the column family, I dynamically create a 
column for every record(data) inserted against the key.

Here I could get write performance of about 64000 records using 5 load threads 
in each node (Ofcourse by changing some of the parameters in Cassandra.yaml 
but I don't want to get into that detail because write performance is not my 
concern right now and the parameters I changed should not impact read  
performance except compaction_throughput_in_mb)

But the read performance was very slow. I should mention that we could get a 
maximum of 100 requests per second with an expected response time for most of 
the requests which is less than a second (preferably 0.75 second because we 
would like to process the records after being fetched from the column family). 
A read request for a key would be at least and generally for 7 days range with 
the seventh day being the current day. This means internally for every read 
request we would query 7 column families and the seventh/latest day column 
family will be being written only then. So compaction should be and would be 
intensely running and runs into I/O contention with read requests. If I 
minimise compactions, the number of sstables for the current day are so huge 
so still the read requests were very slow

I tried limiting the write load to only 24000 records per second so that 
compaction could catch up and reduce number of sstables for current day but 
still the read performance was very bad. It could not handle  even 25 requests 
per second.

Then I tried Cassandra 1.0.0-beta1 version because it came with compression 
option. Compression during write should reduce the io contention between 
compaction and read threads because the data to be accessed is lesser in space 
and I used SnappyCompressor. Also  I used 7 different data storages (data 
directories) instead of one. I reduced the heap size for Cassandra process to 
8GB instead of default 16GB(half of a node's RAM size). I disabled read repair 
because I thought it might improve performance and would any way is not 
related to inconsistancy (correct me if I am wrong!) I used Hector 0.8.0-2 to 
write/read data

I changed the data model so that instead of creating a separate column when 
inserting a data against a key, there will be only one super column in the 
column family and every data would be inserted as a sub column in the super 
column. This is because I thought a single super column fetch should be faster 
than a slice query which scans all columns for a key (correct me if I am 
wrong!)

I did not see any improvement in write performance, in fact it became slow but 
very slight impact. The read performance improved a lot, but still went to 
about 2-3 seconds response time when the number of requests were maximum,100.

Then I saw Cassandra 1.0.0-rc2 released and tried that. Some how I saw that 
the memtable flush was happening after a larger thresold size than in 1.0.0-
beta1 version and the size of the sstable created was larger than that was in 
1.0.0-beta1 version. That is the number of sstables will be lesser in 1.0.0-
rc2 version and this means the read performance should further improve (not 
sure if this improvement in the release candidate version is related to 'Fix 
counting CFMetadata towards Memtable liveRatio' change )

Infact the number of memtable switch count reduced by 4 times compared to 
1.0.0-beta1  version and that is why I want to evaluate read performance in 
1.0.0-rc2 version. But during that I faced inconsistant values in read 
response and that triggered my very first query in this post.


Sorry if the description has been too lengthy, but we are right now at a 
crucial stage of evaluating open source databases against our current database 
and if Cassandra seems to be feasible that might lead to a major change in our 
production system at a large scale. Suggestions on the inconsistancy problem, 
the data model we are using and ways to improve read performance are greatly 
welcomed and appreciated

We have evaluated HBase 3 node cluster for the same requirements already and 
it is found to give almost equal performance in write (say 80% compared to 
Cassandra) and a much better performance for reads.

In hbase-0.90.3, for 100 requests per second, at least 75% of the requests 
were within a second and the average response time was 400 milli seconds.





Reply via email to