Now I read 100 records each time, and the total time to read 200k records (1M each) reduce to 10s. Looks good. But I am still curious how to handle the case that users read one record each time,
On Fri, Jun 11, 2010 at 6:05 PM, Dop Sun <su...@dopsun.com> wrote: > And also, you are only select *1* key and *10* columns? > > > > criteria.keyList(Lists.newArrayList(userName)).columnRange(nameFirst, > nameFirst, 10); > > > > Then, if you have 200k keys, you have 200k Thrift calls. If this is the > case, you may need to optimize the way you do the query (to combine multiple > keys into a single query), and to reduce the number of calls. > > > > *From:* Dop Sun [mailto:su...@dopsun.com] > *Sent:* Saturday, June 12, 2010 8:57 AM > > *To:* user@cassandra.apache.org > *Subject:* RE: read operation is slow > > > > You mean after you “I remove some unnecessary column family and change the > size of rowcache and keycache, now the latency changes from 0.25ms to > 0.09ms. In essence 0.09ms*200k=18s.”, it still takes 400 seconds to > returning? > > > > *From:* Caribbean410 [mailto:caribbean...@gmail.com] > *Sent:* Saturday, June 12, 2010 8:48 AM > *To:* user@cassandra.apache.org > *Subject:* Re: read operation is slow > > > > Hi, do you mean this one should not introduce much extra delay? To read a > record, I need select here, not sure where the extra delay comes from. > > On Fri, Jun 11, 2010 at 5:29 PM, Dop Sun <su...@dopsun.com> wrote: > > Jassandra is used here: > > > > Map<String, List<IColumn>> map = criteria.select(); > > > > The select here basically is a call to Thrift API: get_range_slices > > > > > > *From:* Caribbean410 [mailto:caribbean...@gmail.com] > *Sent:* Saturday, June 12, 2010 8:00 AM > > > *To:* user@cassandra.apache.org > *Subject:* Re: read operation is slow > > > > I remove some unnecessary column family and change the size of rowcache and > keycache, now the latency changes from 0.25ms to 0.09ms. In essence > 0.09ms*200k=18s. I don't know why it takes more than 400s total. Here is the > client code and cfstats. There are not many operations here, why is the > extra time so large? > > > > long start = System.currentTimeMillis(); > for (int j = 0; j < 1; j++) { > for (int i = 0; i < numOfRecords; i++) { > int n = random.nextInt(numOfRecords); > ICriteria criteria = cf.createCriteria(); > userName = keySet[n]; > > criteria.keyList(Lists.newArrayList(userName)).columnRange(nameFirst, > nameFirst, 10); > Map<String, List<IColumn>> map = criteria.select(); > List<IColumn> list = map.get(userName); > // ByteArray bloc = list.get(0).getValue(); > // byte[] byteArrayloc = bloc.toByteArray(); > // loc = new String(byteArrayloc); > > // readBytes = readBytes + loc.length(); > readBytes = readBytes + blobSize; > } > } > > long finish=System.currentTimeMillis(); > > float totalTime=(finish-start)/1000; > > > Keyspace: Keyspace1 > Read Count: 600000 > Read Latency: 0.09053006666666667 ms. > Write Count: 200000 > Write Latency: 0.01504989 ms. > Pending Tasks: 0 > Column Family: Standard2 > SSTable count: 3 > Space used (live): 265990358 > Space used (total): 265990358 > Memtable Columns Count: 2615 > Memtable Data Size: 2667300 > Memtable Switch Count: 3 > Read Count: 600000 > Read Latency: 0.091 ms. > Write Count: 200000 > Write Latency: 0.015 ms. > Pending Tasks: 0 > Key cache capacity: 10000000 > Key cache size: 187465 > Key cache hit rate: 0.0 > Row cache capacity: 10000000 > Row cache size: 189990 > Row cache hit rate: 0.68335 > Compacted row minimum size: 0 > Compacted row maximum size: 0 > Compacted row mean size: 0 > > ---------------- > Keyspace: system > Read Count: 1 > Read Latency: 10.954 ms. > Write Count: 4 > Write Latency: 0.28075 ms. > Pending Tasks: 0 > Column Family: HintsColumnFamily > SSTable count: 0 > Space used (live): 0 > Space used (total): 0 > Memtable Columns Count: 0 > Memtable Data Size: 0 > Memtable Switch Count: 0 > Read Count: 0 > Read Latency: NaN ms. > Write Count: 0 > Write Latency: NaN ms. > Pending Tasks: 0 > Key cache capacity: 1 > Key cache size: 0 > Key cache hit rate: NaN > Row cache: disabled > Compacted row minimum size: 0 > Compacted row maximum size: 0 > Compacted row mean size: 0 > > Column Family: LocationInfo > SSTable count: 2 > Space used (live): 3232 > Space used (total): 3232 > Memtable Columns Count: 2 > Memtable Data Size: 46 > Memtable Switch Count: 1 > Read Count: 1 > Read Latency: 10.954 ms. > Write Count: 4 > Write Latency: 0.281 ms. > Pending Tasks: 0 > Key cache capacity: 1 > Key cache size: 1 > Key cache hit rate: 0.0 > Row cache: disabled > Compacted row minimum size: 0 > Compacted row maximum size: 0 > Compacted row mean size: 0 > > ---------------- > > On Fri, Jun 11, 2010 at 1:50 PM, Jonathan Ellis <jbel...@gmail.com> wrote: > > you need to look at cfstats to see what the latency is internal to > cassandra, vs what your client is introducing > > then you should probably read the comments in the configuration file > about caching > > > On Fri, Jun 11, 2010 at 9:38 AM, Caribbean410 <caribbean...@gmail.com> > wrote: > > > > Thanks Riyad. > > > > Right now I am just testing Cassandra on single node. The server and > client > > are running on the same machine. I tried the read test again on two > > machines, on one machine the cpu usage is around 30% most of the time and > > another is 90%. > > > > Pelops is one way to access Cassandra, there are also other java client > like > > hector and jassandra, will these java clients have significant different > > performance? > > > > Also I once tried to change the storage configure file, like change > > CommitLogDirectory and DataFileDirectory to different disks, change > > DiskAccessMode to mmap for a 64bit machine, and change ConcurrentReads > from > > 8 to 2. All of these do not change performance much. > > > > For other users who use different access client, like using php, c++, > > python, etc, if you have any experience in boosting the read performance, > > you are more than welcome to share with me. Thanks, > > > > On Fri, Jun 11, 2010 at 8:19 AM, Riyad Kalla <rka...@gmail.com> wrote: > >> > >> Caribbean410, > >> > >> This comes up on the Redis list alot as well -- what you are actually > >> measuring is the client sending a network connection to the Cas server > and > >> it replying -- so the performance numbers you are getting can easily be > 70% > >> network wait time and not necessarily hardcore read/write server > >> performance. > >> One way to see if this is the case, run your read test, then watch the > CPU > >> on the server for the Cassandra process and see if it's pegging the CPU > -- > >> if it's just sitting there banging between 0-10%, the you are spending > most > >> of your time waiting on network i/o (open/close sockets, etc.) > >> If you can parallelize your test to spawn say 5 threads that all do the > >> same thing, see if the performance for each thread increases linearly -- > >> which would indicate Cassandra is plenty fast in your setup, you just > need > >> to utilize more client threads over the network. > >> That new Java library, Pelops by Dominic > >> ( > http://ria101.wordpress.com/2010/06/11/pelops-the-beautiful-cassandra-database-client-for-java/ > ) > >> has a nice intrinsic node-balancing design that could be handy IF you > are > >> using multiple nodes. If you are just testing against 1 node, then spawn > >> multiple threads of your code above and see how each thread's > performance > >> scales. > >> -R > >> On Thu, Jun 10, 2010 at 2:39 PM, Caribbean410 <caribbean...@gmail.com> > >> wrote: > >>> > >>> Hello, > >>> > >>> I am testing the performance of cassandra. We write 200k records to > >>> database and each record is 1k size. Then we read these 200k records. > >>> It takes more than 400s to finish the read which is much slower than > >>> mysql (20s around). I read some discussion online and someone suggest > >>> to make multiple connections to make it faster. But I am not sure how > >>> to do it, do I need to change my storage setting file or just change > >>> the java client code? > >>> > >>> Here is my read code, > >>> > >>> Properties info = new Properties(); > >>> info.put(DriverManager.CONSISTENCY_LEVEL, > >>> ConsistencyLevel.ONE.toString()); > >>> > >>> IConnection connection = > DriverManager.getConnection( > >>> "thrift://localhost:9160", info); > >>> > >>> // 2. Get a KeySpace by name > >>> IKeySpace keySpace = > >>> connection.getKeySpace("Keyspace1"); > >>> > >>> // 3. Get a ColumnFamily by name > >>> IColumnFamily cf = > >>> keySpace.getColumnFamily("Standard2"); > >>> > >>> ByteArray nameFirst = ByteArray.ofASCII("first"); > >>> ICriteria criteria = cf.createCriteria(); > >>> long readBytes = 0; > >>> long start = System.currentTimeMillis(); > >>> for (int i = 0; i < numOfRecords; i++) { > >>> int n = random.nextInt(numOfRecords); > >>> userName = keySet[n]; > >>> > >>> criteria.keyList(Lists.newArrayList(userName)).columnRange(nameFirst, > >>> nameFirst, 10); > >>> Map<String, List<IColumn>> map = > >>> criteria.select(); > >>> List<IColumn> list = > >>> map.get(userName); > >>> ByteArray bloc = > >>> list.get(0).getValue(); > >>> byte[] byteArrayloc = > >>> bloc.toByteArray(); > >>> loc = new String(byteArrayloc); > >>> // System.out.println(userName+" > >>> "+loc); > >>> readBytes = readBytes + > >>> loc.length(); > >>> } > >>> > >>> long finish=System.currentTimeMillis(); > >>> > >>> I once commented these lines > >>> > >>> ByteArray bloc = > >>> list.get(0).getValue(); > >>> byte[] byteArrayloc = > >>> bloc.toByteArray(); > >>> loc = new String(byteArrayloc); > >>> // System.out.println(userName+" > >>> "+loc); > >>> readBytes = readBytes + > >>> loc.length(); > >>> > >>> And the performance doesn't improve much. > >>> > >>> Any suggestion is welcome. Thanks, > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com > > > > >