Re: frequent node up/downs

aaron morton Fri, 06 Jul 2012 12:09:54 -0700

> It looks like this happens when there is a promotion failure. 

Java Heap is full. 
Memory is fragmented. 
Use C for web scale.


> Also is it normal to see the "Heap is xx full.  You may need to reduce 
> memtable and/or cache sizes" message quite often? I haven't turned on row 
> caches or changed any default memtable size settings so I am wondering why 
> the old gen fills up.

It's odd to get that out of the box with an 8GB heap on a 1.1.X install. 

What sort of work load ? Is it under heavy inserts ?
Do you have a lot of CF's ? A lot of secondary indexes ?
After the messages is it able to reduce heap usage ?
Does it seem to correlate to compactions ?
Is the node able to get back to a healthy state ?
If this is testing are you able to pull back to a workload where the issues doe 
not appear ? 


Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 7/07/2012, at 4:33 AM, feedly team wrote:

> I reduced the load and the problem hasn't been happening as much. After 
> enabling gc logging, I see messages mentioning promotion failed when the 
> pauses happen. It looks like this happens when there is a promotion failure. 
> From reading on the web it looks like I could try reducing the 
> CMSInitiatingOccupancyFraction value and/or decreasing the young gen size to 
> try to avoid this scenario.
> 
> Also is it normal to see the "Heap is xx full.  You may need to reduce 
> memtable and/or cache sizes" message quite often? I haven't turned on row 
> caches or changed any default memtable size settings so I am wondering why 
> the old gen fills up.
> 
> 
> On Wed, Jul 4, 2012 at 6:28 AM, aaron morton <aa...@thelastpickle.com> wrote:
>> What accounts for the much larger virtual number? some kind of off-heap 
>> memory? 
> http://wiki.apache.org/cassandra/FAQ#mmap
> 
>> I'm a little puzzled as to why I would get such long pauses without 
>> swapping. 
> The two are not related. On startup the JVM memory is locked so it will not 
> swap, from then on memory management is pretty much up the JVM. 
> 
> Getting a lot of ParNew activity does not mean the JVM is low on memory, it 
> means there is a lot of activity in the new heap. 
> 
> If you have a lot of insert activity (typically in a load test) you can 
> generate a lot of GC activity. Try reducing the load to a point where it does 
> not ht GC and then increase to find the cause. Also if you can connect 
> JConole to the JVM you may get a better view of the heap usage.
> 
> Hope that helps. 
> 
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 3/07/2012, at 3:41 PM, feedly team wrote:
> 
>> Couple more details. I confirmed that swap space is not being used (free -m 
>> shows 0 swap) and cassandra.log has a message like "JNA mlockall 
>> successful". top shows the process having 9g in resident memory but 21.6g in 
>> virtual...What accounts for the much larger virtual number? some kind of 
>> off-heap memory? 
>> 
>> I'm a little puzzled as to why I would get such long pauses without 
>> swapping. I uncommented all the gc logging options in cassandra-env.sh to 
>> try to see what is going on when the node freezes.
>> 
>> Thanks
>> Kireet
>> 
>> On Mon, Jul 2, 2012 at 9:51 PM, feedly team <feedly...@gmail.com> wrote:
>> Yeah I noticed the leap second problem and ran the suggested fix, but I have 
>> been facing these problems before Saturday and still see the occasional 
>> failures after running the fix. 
>> 
>> Thanks.
>> 
>> 
>> On Mon, Jul 2, 2012 at 11:17 AM, Marcus Both <mb...@terra.com.br> wrote:
>> Yeah! Look that.
>> http://arstechnica.com/business/2012/07/one-day-later-the-leap-second-v-the-internet-scorecard/
>> I had the same problem. The solution was rebooting.
>> 
>> On Mon, 2 Jul 2012 11:08:57 -0400
>> feedly team <feedly...@gmail.com> wrote:
>> 
>> > Hello,
>> >    I recently set up a 2 node cassandra cluster on dedicated hardware. In
>> > the logs there have been a lot of "InetAddress xxx is now dead' or UP
>> > messages. Comparing the log messages between the 2 nodes, they seem to
>> > coincide with extremely long ParNew collections. I have seem some of up to
>> > 50 seconds. The installation is pretty vanilla, I didn't change any
>> > settings and the machines don't seem particularly busy - cassandra is the
>> > only thing running on the machine with an 8GB heap. The machine has 64GB of
>> > RAM and CPU/IO usage looks pretty light. I do see a lot of 'Heap is xxx
>> > full. You may need to reduce memtable and/or cache sizes' messages. Would
>> > this help with the long ParNew collections? That message seems to be
>> > triggered on a full collection.
>> 
>> --
>> Marcus Both
>> 
>> 
>> 
> 
>

Re: frequent node up/downs

Reply via email to