Jassandra is used here:
Map<String, List<IColumn>> map = criteria.select(); The select here basically is a call to Thrift API: get_range_slices From: Caribbean410 [mailto:caribbean...@gmail.com] Sent: Saturday, June 12, 2010 8:00 AM To: user@cassandra.apache.org Subject: Re: read operation is slow I remove some unnecessary column family and change the size of rowcache and keycache, now the latency changes from 0.25ms to 0.09ms. In essence 0.09ms*200k=18s. I don't know why it takes more than 400s total. Here is the client code and cfstats. There are not many operations here, why is the extra time so large? long start = System.currentTimeMillis(); for (int j = 0; j < 1; j++) { for (int i = 0; i < numOfRecords; i++) { int n = random.nextInt(numOfRecords); ICriteria criteria = cf.createCriteria(); userName = keySet[n]; criteria.keyList(Lists.newArrayList(userName)).columnRange(nameFirst, nameFirst, 10); Map<String, List<IColumn>> map = criteria.select(); List<IColumn> list = map.get(userName); // ByteArray bloc = list.get(0).getValue(); // byte[] byteArrayloc = bloc.toByteArray(); // loc = new String(byteArrayloc); // readBytes = readBytes + loc.length(); readBytes = readBytes + blobSize; } } long finish=System.currentTimeMillis(); float totalTime=(finish-start)/1000; Keyspace: Keyspace1 Read Count: 600000 Read Latency: 0.09053006666666667 ms. Write Count: 200000 Write Latency: 0.01504989 ms. Pending Tasks: 0 Column Family: Standard2 SSTable count: 3 Space used (live): 265990358 Space used (total): 265990358 Memtable Columns Count: 2615 Memtable Data Size: 2667300 Memtable Switch Count: 3 Read Count: 600000 Read Latency: 0.091 ms. Write Count: 200000 Write Latency: 0.015 ms. Pending Tasks: 0 Key cache capacity: 10000000 Key cache size: 187465 Key cache hit rate: 0.0 Row cache capacity: 10000000 Row cache size: 189990 Row cache hit rate: 0.68335 Compacted row minimum size: 0 Compacted row maximum size: 0 Compacted row mean size: 0 ---------------- Keyspace: system Read Count: 1 Read Latency: 10.954 ms. Write Count: 4 Write Latency: 0.28075 ms. Pending Tasks: 0 Column Family: HintsColumnFamily SSTable count: 0 Space used (live): 0 Space used (total): 0 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 0 Read Count: 0 Read Latency: NaN ms. Write Count: 0 Write Latency: NaN ms. Pending Tasks: 0 Key cache capacity: 1 Key cache size: 0 Key cache hit rate: NaN Row cache: disabled Compacted row minimum size: 0 Compacted row maximum size: 0 Compacted row mean size: 0 Column Family: LocationInfo SSTable count: 2 Space used (live): 3232 Space used (total): 3232 Memtable Columns Count: 2 Memtable Data Size: 46 Memtable Switch Count: 1 Read Count: 1 Read Latency: 10.954 ms. Write Count: 4 Write Latency: 0.281 ms. Pending Tasks: 0 Key cache capacity: 1 Key cache size: 1 Key cache hit rate: 0.0 Row cache: disabled Compacted row minimum size: 0 Compacted row maximum size: 0 Compacted row mean size: 0 ---------------- On Fri, Jun 11, 2010 at 1:50 PM, Jonathan Ellis <jbel...@gmail.com> wrote: you need to look at cfstats to see what the latency is internal to cassandra, vs what your client is introducing then you should probably read the comments in the configuration file about caching On Fri, Jun 11, 2010 at 9:38 AM, Caribbean410 <caribbean...@gmail.com> wrote: > > Thanks Riyad. > > Right now I am just testing Cassandra on single node. The server and client > are running on the same machine. I tried the read test again on two > machines, on one machine the cpu usage is around 30% most of the time and > another is 90%. > > Pelops is one way to access Cassandra, there are also other java client like > hector and jassandra, will these java clients have significant different > performance? > > Also I once tried to change the storage configure file, like change > CommitLogDirectory and DataFileDirectory to different disks, change > DiskAccessMode to mmap for a 64bit machine, and change ConcurrentReads from > 8 to 2. All of these do not change performance much. > > For other users who use different access client, like using php, c++, > python, etc, if you have any experience in boosting the read performance, > you are more than welcome to share with me. Thanks, > > On Fri, Jun 11, 2010 at 8:19 AM, Riyad Kalla <rka...@gmail.com> wrote: >> >> Caribbean410, >> >> This comes up on the Redis list alot as well -- what you are actually >> measuring is the client sending a network connection to the Cas server and >> it replying -- so the performance numbers you are getting can easily be 70% >> network wait time and not necessarily hardcore read/write server >> performance. >> One way to see if this is the case, run your read test, then watch the CPU >> on the server for the Cassandra process and see if it's pegging the CPU -- >> if it's just sitting there banging between 0-10%, the you are spending most >> of your time waiting on network i/o (open/close sockets, etc.) >> If you can parallelize your test to spawn say 5 threads that all do the >> same thing, see if the performance for each thread increases linearly -- >> which would indicate Cassandra is plenty fast in your setup, you just need >> to utilize more client threads over the network. >> That new Java library, Pelops by Dominic >> (http://ria101.wordpress.com/2010/06/11/pelops-the-beautiful-cassandra-datab ase-client-for-java/) >> has a nice intrinsic node-balancing design that could be handy IF you are >> using multiple nodes. If you are just testing against 1 node, then spawn >> multiple threads of your code above and see how each thread's performance >> scales. >> -R >> On Thu, Jun 10, 2010 at 2:39 PM, Caribbean410 <caribbean...@gmail.com> >> wrote: >>> >>> Hello, >>> >>> I am testing the performance of cassandra. We write 200k records to >>> database and each record is 1k size. Then we read these 200k records. >>> It takes more than 400s to finish the read which is much slower than >>> mysql (20s around). I read some discussion online and someone suggest >>> to make multiple connections to make it faster. But I am not sure how >>> to do it, do I need to change my storage setting file or just change >>> the java client code? >>> >>> Here is my read code, >>> >>> Properties info = new Properties(); >>> info.put(DriverManager.CONSISTENCY_LEVEL, >>> ConsistencyLevel.ONE.toString()); >>> >>> IConnection connection = DriverManager.getConnection( >>> "thrift://localhost:9160", info); >>> >>> // 2. Get a KeySpace by name >>> IKeySpace keySpace = >>> connection.getKeySpace("Keyspace1"); >>> >>> // 3. Get a ColumnFamily by name >>> IColumnFamily cf = >>> keySpace.getColumnFamily("Standard2"); >>> >>> ByteArray nameFirst = ByteArray.ofASCII("first"); >>> ICriteria criteria = cf.createCriteria(); >>> long readBytes = 0; >>> long start = System.currentTimeMillis(); >>> for (int i = 0; i < numOfRecords; i++) { >>> int n = random.nextInt(numOfRecords); >>> userName = keySet[n]; >>> >>> criteria.keyList(Lists.newArrayList(userName)).columnRange(nameFirst, >>> nameFirst, 10); >>> Map<String, List<IColumn>> map = >>> criteria.select(); >>> List<IColumn> list = >>> map.get(userName); >>> ByteArray bloc = >>> list.get(0).getValue(); >>> byte[] byteArrayloc = >>> bloc.toByteArray(); >>> loc = new String(byteArrayloc); >>> // System.out.println(userName+" >>> "+loc); >>> readBytes = readBytes + >>> loc.length(); >>> } >>> >>> long finish=System.currentTimeMillis(); >>> >>> I once commented these lines >>> >>> ByteArray bloc = >>> list.get(0).getValue(); >>> byte[] byteArrayloc = >>> bloc.toByteArray(); >>> loc = new String(byteArrayloc); >>> // System.out.println(userName+" >>> "+loc); >>> readBytes = readBytes + >>> loc.length(); >>> >>> And the performance doesn't improve much. >>> >>> Any suggestion is welcome. Thanks, > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com