Hi, I am looking at cassandra for a logging application. We currently log to a Postgresql database.
I set up 2 cassandra servers for testing. I did a benchmark where I had 100 hashes representing logs entries, read from a json file. I then looped over these to do 10,000 log inserts. I repeated the same writing to a postgresql instance on one of the cassandra servers. The script is attached. The cassandra writes appear to perform a lot worse. Is this expected? jeff@transcoder01:~$ ruby cassandra-bm.rb cassandra 3.170000 0.480000 3.650000 ( 12.032212) jeff@transcoder01:~$ ruby cassandra-bm.rb postgres 2.140000 0.330000 2.470000 ( 7.002601) Regards, Jeff
require 'rubygems' require 'cassandra-cql' require 'simple_uuid' require 'benchmark' require 'json' require 'active_record' type = 'postgres' #type = 'cassandra' puts type ActiveRecord::Base.establish_connection( #:adapter => "jdbcpostgresql", :adapter => "postgresql", :host => "meta01", :username => "postgres", :database => "test") db = nil if type == 'postgres' db = ActiveRecord::Base.connection else db = CassandraCQL::Database.new('meta01:9160', {:keyspace => 'PlayLog'}) end def cql_insert(table, key, key_value) cql = "INSERT INTO #{table} (KEY, " cql << key_value.keys.join(', ') cql << ") VALUES ('#{key}', " cql << (key_value.values.map {|x| "'#{x}'" }).join(', ') cql << ")" cql end def quote_value(x, type=nil) if x.nil? return 'NULL' else return "'#{x}'" end end def sql_insert(table, key_value) key_value.delete('time') cql = "INSERT INTO #{table} (" cql << key_value.keys.join(', ') cql << ") VALUES (" cql << (key_value.values.map {|x| quote_value(x) }).join(', ') cql << ")" cql end # load 100 hashes of log details rows = [] File.open('data.json') do |f| rows = JSON.load(f) end bm = Benchmark.measure do (1..10000).each do |i| row = rows[i%100] if type == 'postgres' fred = sql_insert('playlog', row) else fred = cql_insert('playlog', SimpleUUID::UUID.new.to_guid, row) end db.execute(fred) end end puts bm