Re: Dazed and confused with Cassandra on EC2 ...

2010-10-09 Thread Peter Schuller
> Heap *shrinkage* on the other hand is another matter and for both CMS > and G1, shrinkage never happens except on Full GC, unfortunately. I'm And to be clear, the implication here is that shrinkage normally doesn't happen. The implication is *not* that you see fallbacks to full GC for the purpos

Re: Dazed and confused with Cassandra on EC2 ...

2010-10-09 Thread Peter Schuller
> The main reason to set Xms=Xmx is so mlockall can tag the entire heap > as "don't swap this out" on startup.  Secondarily whenever the heap > resizes upwards the JVM does a stop-the-world gc, but no, not really a > big deal when your uptime is in days or weeks. I'm not sure where this is coming

Re: Dazed and confused with Cassandra on EC2 ...

2010-10-08 Thread Jonathan Ellis
On Fri, Oct 8, 2010 at 4:54 AM, Jedd Rashbrooke wrote: > On 8 October 2010 02:05, Matthew Dennis wrote: >> Also, in general, you probably want to set Xms = Xmx (regardless of the >> value you eventually decide on for that). > >  Matthew - we'd just about reached that conclusion!  Is it as big an

Re: Dazed and confused with Cassandra on EC2 ...

2010-10-08 Thread Jedd Rashbrooke
On 7 October 2010 20:49, Peter Schuller wrote: > ... if you "waste" 10-15 gigs of RAM on the JVM heap for a > Cassandra instances which could live with e.g. 1 GB, you're actively > taking away those 10-15 gigs of RAM from the operating system to use > for the buffer cache. Particularly if you're I

Re: Dazed and confused with Cassandra on EC2 ...

2010-10-07 Thread Matthew Dennis
Also, in general, you probably want to set Xms = Xmx (regardless of the value you eventually decide on for that). If you set them equal, the JVM will just go ahead and allocate that amount on startup. If they're different, then when you grow above Xms it has to allocate more and move a bunch of s

Re: Dazed and confused with Cassandra on EC2 ...

2010-10-07 Thread Peter Schuller
>  There's some words on the 'Net that - the recent pages on >  Riptano's site in fact - that strongly encourage scaling left >  and right, rather than beefing up the boxes - and certainly >  we're seeing far less bother from GC using a much smaller >  heap - previously we'd been going up to 16GB,

Re: Dazed and confused with Cassandra on EC2 ...

2010-10-04 Thread Jedd Rashbrooke
Hi Peter, Thanks again for your time and thoughts on this problem. We think we've got a bit ahead of the problem by just scaling back (quite savagely) on the rate that we try to hit the cluster. Previously, with a surplus of optimism, we were throwing very big Hadoop jobs at Cassandra, in

Re: Dazed and confused with Cassandra on EC2 ...

2010-10-02 Thread Peter Schuller
(sorry for the delay in following up on this thread) >  Actually, there's a question - is it 'acceptable' do you think >  for GC to take out a small number of your nodes at a time, >  so long as the bulk (or at least where RF is > nodes gone >  on STW GC) of the nodes are okay?  I suspect this is

Re: Dazed and confused with Cassandra on EC2 ...

2010-09-28 Thread Jedd Rashbrooke
Peter - my apologies for the slow response - we had to divert down a 'Plan B' approach last week involving MySQL, memcache, redis and various other uglies. On 20 September 2010 23:11, Peter Schuller wrote: > Are you running an old JVM by any chance? (Just grasping for straws.) JVM is Sun's 1

Re: Dazed and confused with Cassandra on EC2 ...

2010-09-20 Thread Peter Schuller
>  We think we might have cracked the underlying problem >  though, and it might be similar to the 'behind the scenes >  swap thing' (sadly I suspect that such things might actually >  be happening -- plus I thought that memory overcommit wasn't >  possible with Xen - only with VMware - but I guess

Re: Dazed and confused with Cassandra on EC2 ...

2010-09-20 Thread Jedd Rashbrooke
Hi Peter, We were logging the GC output as per this before, have since taken it out, but will put it back in I think. Apropos logging - I've found that with RMI to our boxes at EC2 I've had to do the ugly thing with this: -Djava.rmi.server.hostname= .. which then renders nodetool useless,

Re: Dazed and confused with Cassandra on EC2 ...

2010-09-20 Thread Peter Schuller
> Nope - no swap enabled. Something is seriously weird, unless the system clock is broken... Given: INFO [GC inspection] 2010-09-20 15:27:42,046 GCInspector.java (line 129) GC for ParNew: 325411 ms, 84284896 reclaimed leaving 640770336 used; max is 25907560448 INFO [GC inspection] 2010-09-20 15:

Re: Dazed and confused with Cassandra on EC2 ...

2010-09-20 Thread Dave Gardner
One other question for the list: I gather GMFD is "gossip stage" - but what does this actually mean? Is it an issue to have 203 pending operations? Thanks Dave INFO [GC inspection] 2010-09-20 16:56:12,792 GCInspector.java (line 129) GC for ParNew: 127970 ms, 570382800 reclaimed leaving 4606885

Re: Dazed and confused with Cassandra on EC2 ...

2010-09-20 Thread Dave Gardner
Nope - no swap enabled. top - 16:53:14 up 12 days, 6:11, 3 users, load average: 1.99, 2.63, 5.03 Tasks: 133 total, 1 running, 132 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 35840228k total, 33077580k used, 2762648k f

Re: Dazed and confused with Cassandra on EC2 ...

2010-09-20 Thread Peter Schuller
> Can anyone help shed any light on why this might be happening? We've tried a > variety of JVM settings to alleviate this; currently with no luck. Extremely long ParNew (young generations) pause times are almost always due to swapping. Are you swapping? -- / Peter Schuller

Re: Dazed and confused with Cassandra on EC2 ...

2010-09-20 Thread Dave Gardner
As a follow up to this conversation; we are still having issues with our Cassandra cluster on EC2. It *looks* to be related to Garbage Collection; however we aren't sure what the root cause of the problem is. Here is an extract from logs: INFO [GMFD:1] 2010-09-20 15:22:00,242 Gossiper.java (line

Re: Dazed and confused with Cassandra on EC2 ...

2010-09-17 Thread Jedd Rashbrooke
Hi Rob, Thanks for your suggestions. I should have been a bit more verbose in my platform description -- I'm using 64-bit instances, which I think in a Ben Black video I saw led to a sensible default usage of mmap when left at auto. Should I look at forcing this setting? > You don't mentio

Re: Dazed and confused with Cassandra on EC2 ...

2010-09-17 Thread Jedd Rashbrooke
Hi Dave, Thank you for your response. I can clarify a couple of things here: > 2. You grew from 2 nodes to 4, but the original 2 nodes have 200GB and the 2 > new ones have 40 GB.  What's the recommended practice for rebalancing (i.e., > when should you do it), what's the actual procedure, and

Re: Dazed and confused with Cassandra on EC2 ...

2010-09-17 Thread Robert Coli
On 9/17/10 7:41 AM, Jedd Rashbrooke wrote: Happy times. This was when the cluster was modestly sized - 20-50GB. It's now about 200GB, and performance has dropped by an order of magnitude - perhaps 5-6 hours to do the same amount of work, using the same codebase and the same input dat

Re: Dazed and confused with Cassandra on EC2 ...

2010-09-17 Thread Dave Viner
Hi Jedd, I'm using Cassandra on EC2 as well - so I'm quite interested. Just to clarify your post - it sounds like you have 4 questions/issue: 1. Writes have slowed down significantly. What's the logical explanation? And what is the logical solution/options to solve it? 2. You grew from 2 nodes

Dazed and confused with Cassandra on EC2 ...

2010-09-17 Thread Jedd Rashbrooke
Howdi, I've just landed in an experiment to get Cassandra going, and fed by PHP via Thrift via Hadoop, all running on EC2. I've been lurking a bit on the list for a couple of weeks, mostly reading any threads with the word 'performance' in them. Few people have anything polite to say about