Re: Read Performance

2010-03-30 Thread James Golick
Okay, so now my row cache hit rate jumps between 1.0, 99.5, 95.6, and NaN. Seems like that stat is a little broken. Still seeing around 35ms to multiget 20 rows. - James On Tue, Mar 30, 2010 at 9:22 PM, Ryan King wrote: > On Tue, Mar 30, 2010 at 9:11 PM, James Golick > wrote: > > No change ob

Re: Read Performance

2010-03-30 Thread Ryan King
On Tue, Mar 30, 2010 at 9:11 PM, James Golick wrote: > No change observed. The hit rate fluctuates between 0.0, 0.3, and NaN every > time I run cfstats. > I just increased it by 10x. Hopefully that'll help. You should turn the caches up until you either run out of heap, or the hitrate stops going

Re: Read Performance

2010-03-30 Thread James Golick
No change observed. The hit rate fluctuates between 0.0, 0.3, and NaN every time I run cfstats. I just increased it by 10x. Hopefully that'll help. On Tue, Mar 30, 2010 at 8:59 PM, Jonathan Ellis wrote: > What is your row cache hit rate? > > By "still slow" do you mean "no change observed" or "

Re: Read Performance

2010-03-30 Thread Jonathan Ellis
What is your row cache hit rate? By "still slow" do you mean "no change observed" or "faster but not fast enough?" On Tue, Mar 30, 2010 at 10:47 PM, James Golick wrote: > We are starting to use cassandra to power our activity feed. The way we > organize our data is simple. "Event"s live in a CF

Read Performance

2010-03-30 Thread James Golick
We are starting to use cassandra to power our activity feed. The way we organize our data is simple. "Event"s live in a CF called Events and are keyed by a UUID. The timelines themselves live in a CF called Timelines, which is keyed by user id (i.e. "1229") and contains a event uuids as column name

Re: Insertion time question

2010-03-30 Thread Jonathan Ellis
Hard to say without busting out the profiler. "supercolumns are slower" is not a surprise to anyone at this point, I'm afraid. On Tue, Mar 30, 2010 at 6:16 PM, Carlos Sanchez wrote: > I was wondering if I could have a bit more insight as why we are seeing > different insertion times between reg

Re: Replicating data over the wan?

2010-03-30 Thread Erik Holstad
Thanks David and Jonathan for the info. Those two links were pretty much the only thing that I did find about this issue, but is wasn't sure that only because it works for different zones it would also work for different regions. -- Regards Erik

Re: Replicating data over the wan?

2010-03-30 Thread Avinash Lakshman
How far apart are the data centers? Technically there will be an increase in latency for the writes if you are waiting for acks from the replicas. How long does it for simple pings between machines in these data centers? If inconsistency is not an issue you can mitigate this by doing asynchronous r

Re: Replicating data over the wan?

2010-03-30 Thread Jonathan Ellis
http://permalink.gmane.org/gmane.comp.db.cassandra.user/3462 http://permalink.gmane.org/gmane.comp.db.cassandra.user/3483 On Tue, Mar 30, 2010 at 7:49 PM, Erik Holstad wrote: > Is anyone using datacenter aware replication where the replication takes > place over the wan > and not over super fast

Re: Replicating data over the wan?

2010-03-30 Thread David Strauss
On 2010-03-31 01:42, Erik Holstad wrote: > I'm not too worried about inconsistency in data too much more if things > like the gossip protocol would saturate the wan and things like that. I haven't tried inter-DC replication, but I would be surprised if gossip saturated a line with any decent bandw

Re: Replicating data over the wan?

2010-03-30 Thread David Timothy Strauss
Your ConsistencyLevel will change the effect. If CL is low, inconsistency will temporarily occur between the DCs. If CL is high, writes will have noticeably high latency. -Original Message- From: Erik Holstad Date: Tue, 30 Mar 2010 17:49:17 To: Subject: Replicating data over the wan?

Replicating data over the wan?

2010-03-30 Thread Erik Holstad
Is anyone using datacenter aware replication where the replication takes place over the wan and not over super fast optical cable between the centers? Tried to look at all posts related to the topic but haven't really found too much, only some things about not doing that if using ZooKeeper and som

Insertion time question

2010-03-30 Thread Carlos Sanchez
I was wondering if I could have a bit more insight as why we are seeing different insertion times between regular column families and super columns. We have a group object (with its name) that may have a series of attributes (name/value). There can be up a million group object and different grou

Re: Poor performance; PHP & Thrift to blame

2010-03-30 Thread Julian Simon
Well, the app is written in PHP, and in order to use Cassandra for the (small) aspect of the app which could make use of its' benefits, the client code will need to be in PHP and run fairly speedily. Hence my testing with PHP. I suppose another question for me is: Are there any alternative interf

Re: Large data files and no "edit in place"?

2010-03-30 Thread Julian Simon
Thanks for the detailed explanation David. I had a feeling it was to do with random vs sequential IO, and now I'm comfortable with that concept w.r.t Cassandra. On Tue, Mar 30, 2010 at 11:59 PM, David Strauss wrote: > On 2010-03-30 05:54, Julian Simon wrote: >> My understanding is that Cassand

CfP with Extended Deadline 5th Workshop on Virtualization in High-Performance Cloud Computing (VHPC'10)

2010-03-30 Thread Michael Alexander
Apologies if you received multiple copies of this message. = CALL FOR PAPERS 5th Workshop on Virtualization in High-Performance Cloud Computing VHPC'10 as part of Euro-Par 2010, Island of Ischia-Naples, Italy ==

Re: Large data files and no "edit in place"?

2010-03-30 Thread Jonathan Ellis
Cassandra does "minor" compactions with a minimum of 4 sstables in the same "bucket," with buckets doubling in size as you compact. So you only ever rewrite all data in your weekly-ish major compaction for tombstone cleanup and anti entropy. -Jonathan On Tue, Mar 30, 2010 at 12:54 AM, Julian Sim

Re: How reliable is cassandra?

2010-03-30 Thread Tatu Saloranta
On Mon, Mar 29, 2010 at 10:31 AM, Matthew Stump wrote: > Am I crazy to want to switch our server's primary data store from postgres to > cassandra?  This is a system used by banks and governments to store crypto > keys which absolutely can not be lost. Back to original question: in my completel

Re: How reliable is cassandra?

2010-03-30 Thread Ted Zlatanov
On Mon, 29 Mar 2010 10:31:06 -0700 Matthew Stump wrote: MS> Am I crazy to want to switch our server's primary data store from MS> postgres to cassandra? This is a system used by banks and MS> governments to store crypto keys which absolutely can not be lost. Run a test pilot for N months (depe

Re: Poor performance; PHP & Thrift to blame

2010-03-30 Thread David Strauss
On 2010-03-30 12:51, yaw wrote: > I have seen your guide at > https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP. > > I use Cassandra with a PHP client .. > Until now, I am using Thrift PHP classes that I found into Pandra > project (high level PHP client) as I was unable to instal

Re: Large data files and no "edit in place"?

2010-03-30 Thread David Strauss
On 2010-03-30 05:54, Julian Simon wrote: > My understanding is that Cassandra never updates data "in place" on > disk - instead it completely re-creates the data files during a > "flush". Stop me if I'm wrong already ;-) You're correct that existing SSTables are immutable; they are retired follow

Re: Poor performance; PHP & Thrift to blame

2010-03-30 Thread yaw
Hi David, I have seen your guide at https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP. I use Cassandra with a PHP client .. Until now, I am using Thrift PHP classes that I found into Pandra project (high level PHP client) as I was unable to install or build thrift compiler on my o

Re: Poor performance; PHP & Thrift to blame

2010-03-30 Thread David Timothy Strauss
This sounds like the sort of analysis that shouldn't be done in PHP. Have you tried Hadoop + Cassandra 0.6? -Original Message- From: Julian Simon Date: Tue, 30 Mar 2010 22:21:22 To: Subject: Re: Poor performance; PHP & Thrift to blame Yes I tested it with and without APC - it had a ne

Re: Poor performance; PHP & Thrift to blame

2010-03-30 Thread Julian Simon
Yes I tested it with and without APC - it had a negligible impact on performance. This didn't surprise me - most of the optimization that APC offers is in the parsing of PHP code; seeing as the benchmark is a single PHP process the code parsing overhead occurs outside the benchmark loop. Does any

Re: Poor performance; PHP & Thrift to blame

2010-03-30 Thread David Timothy Strauss
Without APC, there should be even more of an improvement with the Thrift PHP extension. - "Rauan Maemirov" wrote: > What about APC? Did you turn it on? > > 2010/3/30 Julian Simon : > > Hi, > > > > I've been trying to benchmark Cassandra for our use case and have > been > > seeing poor perf

Re: Poor performance; PHP & Thrift to blame

2010-03-30 Thread Rauan Maemirov
What about APC? Did you turn it on? 2010/3/30 Julian Simon : > Hi, > > I've been trying to benchmark Cassandra for our use case and have been > seeing poor performance on both writes and (extremely) poor > performance on reads. > > Using Cassandra 0.51 stable & thrift-0.2.0. > > It turns out all t