responses below. thanks! On Fri, Jul 6, 2012 at 3:09 PM, aaron morton <aa...@thelastpickle.com>wrote:
> It looks like this happens when there is a promotion failure. > > > Java Heap is full. > Memory is fragmented. > Use C for web scale. > unfortunately i became too dumb to use C around 2004. camping accident. > > Also is it normal to see the "Heap is xx full. You may need to reduce > memtable and/or cache sizes" message quite often? I haven't turned on row > caches or changed any default memtable size settings so I am wondering why > the old gen fills up. > > > It's odd to get that out of the box with an 8GB heap on a 1.1.X install. > > What sort of work load ? Is it under heavy inserts ? > opscenter shows between 60-120 writes/sec and between 80-150 reads/sec total for both machines. i am not sure if that is considered heavy or not. the machines don't seem particularly busy. load seems pretty even across both. Do you have a lot of CF's ? A lot of secondary indexes ? > i have 15 column families with maybe 4 that are larger and active. there are a couple secondary indexes. opscenter uses 8 CFs and system 7. total data is ~100GB After the messages is it able to reduce heap usage ? > seems like it, they occur every few minutes for awhile and then stop. Does it seem to correlate to compactions ? > no. > Is the node able to get back to a healthy state ? > yes. after the gc finishes it rejoins the cluster. > If this is testing are you able to pull back to a workload where the > issues doe not appear ? > i am guessing so. i am running a data-heavy background processing job. when i reduced thread count from 20 to 15 the problem has happened only once in the past 2 days vs 2-3 times a day. we are just starting to use cassandra so i am more worried about when more critical web traffic hits. > > Cheers > > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 7/07/2012, at 4:33 AM, feedly team wrote: > > I reduced the load and the problem hasn't been happening as much. After > enabling gc logging, I see messages mentioning promotion failed when the > pauses happen. It looks like this happens when there is a promotion > failure. From reading on the web it looks like I could try reducing the > CMSInitiatingOccupancyFraction value and/or decreasing the young gen size > to try to avoid this scenario. > > Also is it normal to see the "Heap is xx full. You may need to reduce > memtable and/or cache sizes" message quite often? I haven't turned on row > caches or changed any default memtable size settings so I am wondering why > the old gen fills up. > > > On Wed, Jul 4, 2012 at 6:28 AM, aaron morton <aa...@thelastpickle.com>wrote: > >> What accounts for the much larger virtual number? some kind of off-heap >> memory? >> >> http://wiki.apache.org/cassandra/FAQ#mmap >> >> I'm a little puzzled as to why I would get such long pauses without >> swapping. >> >> The two are not related. On startup the JVM memory is locked so it will >> not swap, from then on memory management is pretty much up the JVM. >> >> Getting a lot of ParNew activity does not mean the JVM is low on memory, >> it means there is a lot of activity in the new heap. >> >> If you have a lot of insert activity (typically in a load test) you can >> generate a lot of GC activity. Try reducing the load to a point where it >> does not ht GC and then increase to find the cause. Also if you can connect >> JConole to the JVM you may get a better view of the heap usage. >> >> Hope that helps. >> >> ----------------- >> Aaron Morton >> Freelance Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 3/07/2012, at 3:41 PM, feedly team wrote: >> >> Couple more details. I confirmed that swap space is not being used (free >> -m shows 0 swap) and cassandra.log has a message like "JNA mlockall >> successful". top shows the process having 9g in resident memory but 21.6g >> in virtual...What accounts for the much larger virtual number? some kind of >> off-heap memory? >> >> I'm a little puzzled as to why I would get such long pauses without >> swapping. I uncommented all the gc logging options in cassandra-env.sh to >> try to see what is going on when the node freezes. >> >> Thanks >> Kireet >> >> On Mon, Jul 2, 2012 at 9:51 PM, feedly team <feedly...@gmail.com> wrote: >> >>> Yeah I noticed the leap second problem and ran the suggested fix, but I >>> have been facing these problems before Saturday and still see the >>> occasional failures after running the fix. >>> >>> Thanks. >>> >>> >>> On Mon, Jul 2, 2012 at 11:17 AM, Marcus Both <mb...@terra.com.br> wrote: >>> >>>> Yeah! Look that. >>>> >>>> http://arstechnica.com/business/2012/07/one-day-later-the-leap-second-v-the-internet-scorecard/ >>>> I had the same problem. The solution was rebooting. >>>> >>>> On Mon, 2 Jul 2012 11:08:57 -0400 >>>> feedly team <feedly...@gmail.com> wrote: >>>> >>>> > Hello, >>>> > I recently set up a 2 node cassandra cluster on dedicated >>>> hardware. In >>>> > the logs there have been a lot of "InetAddress xxx is now dead' or UP >>>> > messages. Comparing the log messages between the 2 nodes, they seem to >>>> > coincide with extremely long ParNew collections. I have seem some of >>>> up to >>>> > 50 seconds. The installation is pretty vanilla, I didn't change any >>>> > settings and the machines don't seem particularly busy - cassandra is >>>> the >>>> > only thing running on the machine with an 8GB heap. The machine has >>>> 64GB of >>>> > RAM and CPU/IO usage looks pretty light. I do see a lot of 'Heap is >>>> xxx >>>> > full. You may need to reduce memtable and/or cache sizes' messages. >>>> Would >>>> > this help with the long ParNew collections? That message seems to be >>>> > triggered on a full collection. >>>> >>>> -- >>>> Marcus Both >>>> >>>> >>> >> >> > >