Network latency on Cassandra 0.7 (TFramedTransport)

2010-09-17 Thread Michal Augustýn
Hello, I'm experiencing big network latency when using TFramedTransport. The latency is about 200 ms on every request when I'm connected to another computer. On localhost, all goes well. I can now solve this issue by changing "thrift_framed_transport_size_in_mb" to 0 (so disable framed transport

Re: busy thread on IncomingStreamReader

2010-09-17 Thread Jonathan Ellis
Are you on the most recent version of the JVM? There have been bugs fixed in FileChannel over the 1.6 lifespan. On Thu, Sep 16, 2010 at 4:03 AM, Joseph Mermelstein wrote: > Hi - has anyone made any progress with this issue? We are having the same > problem with our Cassandra nodes in production.

Re: Secondary Index Null Pointer Error

2010-09-17 Thread Jonathan Ellis
Indexed columns don't have to exist. Try this after I post a fix for http://issues.apache.org/jira/browse/CASSANDRA-1415. On Thu, Sep 16, 2010 at 12:53 PM, Colin Britton wrote: > Hi, > > I am using Casandra 0.7 trunk (r997357) and am having issues with a > secondary index. > > I have a ColumnFam

Re: Getting client only example to work

2010-09-17 Thread Jonathan Ellis
You can run them both on the same machine, but it's always been the case that multiple instances of StorageProxy need to be on different IPs. So you'll have to override ListenAddress. On Thu, Sep 16, 2010 at 4:20 PM, Asif Jan wrote: > ok, did something about the message service changed in the in

Dazed and confused with Cassandra on EC2 ...

2010-09-17 Thread Jedd Rashbrooke
Howdi, I've just landed in an experiment to get Cassandra going, and fed by PHP via Thrift via Hadoop, all running on EC2. I've been lurking a bit on the list for a couple of weeks, mostly reading any threads with the word 'performance' in them. Few people have anything polite to say about

Re: Network latency on Cassandra 0.7 (TFramedTransport)

2010-09-17 Thread Peter Schuller
> The latency is about 200 ms on every request when I'm connected to another My first thought here was that maybe you're seeing the effects of nagle[1] + delayed acks on the other side. On Unix, normally something like a thrift client would set TCP_NODELAY on its socket to avoid the problem. I'm n

Re: Dazed and confused with Cassandra on EC2 ...

2010-09-17 Thread Dave Viner
Hi Jedd, I'm using Cassandra on EC2 as well - so I'm quite interested. Just to clarify your post - it sounds like you have 4 questions/issue: 1. Writes have slowed down significantly. What's the logical explanation? And what is the logical solution/options to solve it? 2. You grew from 2 nodes

Re: Dazed and confused with Cassandra on EC2 ...

2010-09-17 Thread Robert Coli
On 9/17/10 7:41 AM, Jedd Rashbrooke wrote: Happy times. This was when the cluster was modestly sized - 20-50GB. It's now about 200GB, and performance has dropped by an order of magnitude - perhaps 5-6 hours to do the same amount of work, using the same codebase and the same input dat

Re: Dazed and confused with Cassandra on EC2 ...

2010-09-17 Thread Jedd Rashbrooke
Hi Dave, Thank you for your response. I can clarify a couple of things here: > 2. You grew from 2 nodes to 4, but the original 2 nodes have 200GB and the 2 > new ones have 40 GB.  What's the recommended practice for rebalancing (i.e., > when should you do it), what's the actual procedure, and

Re: Dazed and confused with Cassandra on EC2 ...

2010-09-17 Thread Jedd Rashbrooke
Hi Rob, Thanks for your suggestions. I should have been a bit more verbose in my platform description -- I'm using 64-bit instances, which I think in a Ben Black video I saw led to a sensible default usage of mmap when left at auto. Should I look at forcing this setting? > You don't mentio

Re: Network latency on Cassandra 0.7 (TFramedTransport)

2010-09-17 Thread Michael Greene
This is the correct cause. Reproducing your test gives 38-45ms in each of 10 runs. If you run a profiler against it, you can see that the time is entirely spent blocking on receive in TStreamTransport.Read. Your test can be modified with the following line: coreTransport.TcpClient.NoDelay = true

Re: questions on cassandra (repair and multi-datacenter)

2010-09-17 Thread Gurpreet Singh
Hi Benjamin, I reverted back to the old RF of 2, by restarting all nodes with RF 2, and then running cleanup. It came down to 2. This time, i now changed the RF to 3 for all machines and restarted all the nodes. I started running repair one by one on all machines, tracking through jconsole that co

Re: Build an index to for join query

2010-09-17 Thread Alvin UW
Thanks Paul, If we make a CF Name_Address(name, address) rather than an index, we have to maintain it, once any change happens in ID_Address(*Id*, address) , Name_ID(*name*, id). Besides, it also occupies some space. In contrast, if Name_Address(name, address) is just an index, we can redirect th

Re: Cassandra performance

2010-09-17 Thread Zhong Li
This is my personal experiences. MySQL is faster than Cassandra on most normal use cases. You should understand why you choose Cassandra instead of MySQL. If one central MySQL can handle your workload, MySQL is better than Cassandra. BUT if you are overload one MySQL and want multiple boxes

Re: Cassandra performance

2010-09-17 Thread Jeremy Hanna
http://www.quora.com/Is-Cassandra-to-blame-for-Digg-v4s-technical-failures On Sep 17, 2010, at 4:35 PM, Zhong Li wrote: > This is my personal experiences. MySQL is faster than Cassandra on most > normal use cases. > > You should understand why you choose Cassandra instead of MySQL. If one >

Re: Cassandra performance

2010-09-17 Thread Peter Schuller
> durable and rich data model. It will not provide your high performance, > especially reading  performance is poor. Note that for several realistic work-loads, the above claim is most definitely wrong. For example, for large databases with a mix of insertions/deletions (so that the MySQL case doe

Re: Cassandra performance

2010-09-17 Thread Benjamin Black
It appears you are doing several things that assure terrible performance, so I am not surprised you are getting it. On Tue, Sep 14, 2010 at 3:40 PM, Kamil Gorlo wrote: > My main tool was stress.py for benchmarks (or equivalent written in > C++ to deal with python2.5 lack of multiprocessing). I wi

Cassandra Cache Mbean values; bytes or number of elements ?

2010-09-17 Thread kannan chandrasekaran
I am using 0.6.5 and my keycache for the CF is set as "100%" ... What do the values in the Mbean interfaces indicate ? bytes or number of elements ? Specifically, these are the numbers that I observe for one of the column family... capacity = 36826888 (if this is number of elements, how doe