This is the cfstats. Right now I use three thread to read 200k records. I only use Keyspace1 and Column family Standard2. For other unused column families, do I need to comment them out in storage configure file? The latency is 0.2576ms per records, is this a regular number (we are reading from ssd, which should much faster than normal hard drive)?
Keyspace: Keyspace1 Read Count: 600000 Read Latency: 0.25760798333333335 ms. Write Count: 200000 Write Latency: 0.015756365 ms. Pending Tasks: 0 Column Family: StandardByUUID1 SSTable count: 0 Space used (live): 0 Space used (total): 0 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 0 Read Count: 0 Read Latency: NaN ms. Write Count: 0 Write Latency: NaN ms. Pending Tasks: 0 Key cache capacity: 200000 Key cache size: 0 Key cache hit rate: NaN Row cache: disabled Compacted row minimum size: 0 Compacted row maximum size: 0 Compacted row mean size: 0 Column Family: Super1 SSTable count: 0 Space used (live): 0 Space used (total): 0 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 0 Read Count: 0 Read Latency: NaN ms. Write Count: 0 Write Latency: NaN ms. Pending Tasks: 0 Key cache capacity: 200000 Key cache size: 0 Key cache hit rate: NaN Row cache: disabled Compacted row minimum size: 0 Compacted row maximum size: 0 Compacted row mean size: 0 Column Family: Standard2 SSTable count: 4 Space used (live): 279466127 Space used (total): 279466127 Memtable Columns Count: 2615 Memtable Data Size: 2667300 Memtable Switch Count: 3 Read Count: 600000 Read Latency: NaN ms. Write Count: 200000 Write Latency: NaN ms. Pending Tasks: 0 Key cache capacity: 1 Key cache size: 1 Key cache hit rate: NaN Row cache: disabled Compacted row minimum size: 0 Compacted row maximum size: 0 Compacted row mean size: 0 Column Family: Standard1 SSTable count: 0 Space used (live): 0 Space used (total): 0 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 0 Read Count: 0 Read Latency: NaN ms. Write Count: 0 Write Latency: NaN ms. Pending Tasks: 0 Key cache capacity: 200000 Key cache size: 0 Key cache hit rate: NaN Row cache: disabled Compacted row minimum size: 0 Compacted row maximum size: 0 Compacted row mean size: 0 Column Family: Super2 SSTable count: 0 Space used (live): 0 Space used (total): 0 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 0 Read Count: 0 Read Latency: NaN ms. Write Count: 0 Write Latency: NaN ms. Pending Tasks: 0 Key cache capacity: 1 Key cache size: 0 Key cache hit rate: NaN Row cache capacity: 200000 Row cache size: 0 Row cache hit rate: NaN Compacted row minimum size: 0 Compacted row maximum size: 0 Compacted row mean size: 0 ---------------- Keyspace: system Read Count: 1 Read Latency: 13.205 ms. Write Count: 2 Write Latency: 0.062 ms. Pending Tasks: 0 Column Family: HintsColumnFamily SSTable count: 0 Space used (live): 0 Space used (total): 0 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 0 Read Count: 0 Read Latency: NaN ms. Write Count: 0 Write Latency: NaN ms. Pending Tasks: 0 Key cache capacity: 1 Key cache size: 0 Key cache hit rate: NaN Row cache: disabled Compacted row minimum size: 0 Compacted row maximum size: 0 Compacted row mean size: 0 Column Family: LocationInfo SSTable count: 3 Space used (live): 3853 Space used (total): 3853 Memtable Columns Count: 2 Memtable Data Size: 46 Memtable Switch Count: 0 Read Count: 1 Read Latency: NaN ms. Write Count: 2 Write Latency: NaN ms. Pending Tasks: 0 Key cache capacity: 3 Key cache size: 3 Key cache hit rate: NaN Row cache: disabled Compacted row minimum size: 0 Compacted row maximum size: 0 Compacted row mean size: 0 ---------------- On Fri, Jun 11, 2010 at 10:50 AM, Jonathan Ellis <jbel...@gmail.com> wrote: > you need to look at cfstats to see what the latency is internal to > cassandra, vs what your client is introducing > > then you should probably read the comments in the configuration file > about caching > > On Fri, Jun 11, 2010 at 9:38 AM, Caribbean410 <caribbean...@gmail.com> > wrote: > > > > Thanks Riyad. > > > > Right now I am just testing Cassandra on single node. The server and > client > > are running on the same machine. I tried the read test again on two > > machines, on one machine the cpu usage is around 30% most of the time and > > another is 90%. > > > > Pelops is one way to access Cassandra, there are also other java client > like > > hector and jassandra, will these java clients have significant different > > performance? > > > > Also I once tried to change the storage configure file, like change > > CommitLogDirectory and DataFileDirectory to different disks, change > > DiskAccessMode to mmap for a 64bit machine, and change ConcurrentReads > from > > 8 to 2. All of these do not change performance much. > > > > For other users who use different access client, like using php, c++, > > python, etc, if you have any experience in boosting the read performance, > > you are more than welcome to share with me. Thanks, > > > > On Fri, Jun 11, 2010 at 8:19 AM, Riyad Kalla <rka...@gmail.com> wrote: > >> > >> Caribbean410, > >> > >> This comes up on the Redis list alot as well -- what you are actually > >> measuring is the client sending a network connection to the Cas server > and > >> it replying -- so the performance numbers you are getting can easily be > 70% > >> network wait time and not necessarily hardcore read/write server > >> performance. > >> One way to see if this is the case, run your read test, then watch the > CPU > >> on the server for the Cassandra process and see if it's pegging the CPU > -- > >> if it's just sitting there banging between 0-10%, the you are spending > most > >> of your time waiting on network i/o (open/close sockets, etc.) > >> If you can parallelize your test to spawn say 5 threads that all do the > >> same thing, see if the performance for each thread increases linearly -- > >> which would indicate Cassandra is plenty fast in your setup, you just > need > >> to utilize more client threads over the network. > >> That new Java library, Pelops by Dominic > >> ( > http://ria101.wordpress.com/2010/06/11/pelops-the-beautiful-cassandra-database-client-for-java/ > ) > >> has a nice intrinsic node-balancing design that could be handy IF you > are > >> using multiple nodes. If you are just testing against 1 node, then spawn > >> multiple threads of your code above and see how each thread's > performance > >> scales. > >> -R > >> On Thu, Jun 10, 2010 at 2:39 PM, Caribbean410 <caribbean...@gmail.com> > >> wrote: > >>> > >>> Hello, > >>> > >>> I am testing the performance of cassandra. We write 200k records to > >>> database and each record is 1k size. Then we read these 200k records. > >>> It takes more than 400s to finish the read which is much slower than > >>> mysql (20s around). I read some discussion online and someone suggest > >>> to make multiple connections to make it faster. But I am not sure how > >>> to do it, do I need to change my storage setting file or just change > >>> the java client code? > >>> > >>> Here is my read code, > >>> > >>> Properties info = new Properties(); > >>> info.put(DriverManager.CONSISTENCY_LEVEL, > >>> ConsistencyLevel.ONE.toString()); > >>> > >>> IConnection connection = > DriverManager.getConnection( > >>> "thrift://localhost:9160", info); > >>> > >>> // 2. Get a KeySpace by name > >>> IKeySpace keySpace = > >>> connection.getKeySpace("Keyspace1"); > >>> > >>> // 3. Get a ColumnFamily by name > >>> IColumnFamily cf = > >>> keySpace.getColumnFamily("Standard2"); > >>> > >>> ByteArray nameFirst = ByteArray.ofASCII("first"); > >>> ICriteria criteria = cf.createCriteria(); > >>> long readBytes = 0; > >>> long start = System.currentTimeMillis(); > >>> for (int i = 0; i < numOfRecords; i++) { > >>> int n = random.nextInt(numOfRecords); > >>> userName = keySet[n]; > >>> > >>> criteria.keyList(Lists.newArrayList(userName)).columnRange(nameFirst, > >>> nameFirst, 10); > >>> Map<String, List<IColumn>> map = > >>> criteria.select(); > >>> List<IColumn> list = > >>> map.get(userName); > >>> ByteArray bloc = > >>> list.get(0).getValue(); > >>> byte[] byteArrayloc = > >>> bloc.toByteArray(); > >>> loc = new String(byteArrayloc); > >>> // System.out.println(userName+" > >>> "+loc); > >>> readBytes = readBytes + > >>> loc.length(); > >>> } > >>> > >>> long finish=System.currentTimeMillis(); > >>> > >>> I once commented these lines > >>> > >>> ByteArray bloc = > >>> list.get(0).getValue(); > >>> byte[] byteArrayloc = > >>> bloc.toByteArray(); > >>> loc = new String(byteArrayloc); > >>> // System.out.println(userName+" > >>> "+loc); > >>> readBytes = readBytes + > >>> loc.length(); > >>> > >>> And the performance doesn't improve much. > >>> > >>> Any suggestion is welcome. Thanks, > > > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com >