Re: Cassandra crashed - possible JMX threads leak

Frank LoVecchio Wed, 20 Oct 2010 22:54:20 -0700

Ah, missed that.

Thanks Aaron.


On Wed, Oct 20, 2010 at 11:49 PM, Aaron Morton <aa...@thelastpickle.com>wrote:

> Sounds like the problem discussed here
> http://wiki.apache.org/cassandra/Operations?highlight=(allocate)|(memory)<http://wiki.apache.org/cassandra/Operations?highlight=(allocate)%7C(memory)>
>
>
> <http://wiki.apache.org/cassandra/Operations?highlight=(allocate)%7C(memory)>If
> you have the JNA jar it should work
> http://www.mail-archive.com/user@cassandra.apache.org/msg06284.html
> <http://www.mail-archive.com/user@cassandra.apache.org/msg06284.html>
> http://www.riptano.com/blog/whats-new-cassandra-065
>
> <http://www.riptano.com/blog/whats-new-cassandra-065>Aaron
>
>
> On 21 Oct, 2010,at 06:29 PM, Frank LoVecchio <fr...@isidorey.com> wrote:
>
> I have a cluster of 3 0.7 beta 2 nodes (built today from the latest trunk)
> running on Large, EBS-backed, x64 EC2 instances; RF=3.  I attempted to write
> somewhere near 500,000 records every 15 minutes from a total of 5 different
> computers (using Pelops and multi-threading).   Though my network blew up
> and I'm not quite sure how many records were inserted, I lost a node a
> couple hours later, and the other 2 were at severely high memory useage.  Is
> this a memory leak of some kind, or something I can configure / watch for in
> the future?
>
> A nodetool does this:
>
> [ec2-u...@xxx bin]$ ./nodetool -h localhost ring
> Address  Status State   Load         Token
>
>                                                     XXX
> ipXXX   Down   Normal  564.76 MB       XXX
> ipXXX   Up       Normal  564.83 MB       XXX
> ipXXX   Up       Normal  563.06 MB       XXX
>
> A top on the box that is down shows this: (dual core x64)
>
> Cpu(s): 19.9%us,  5.9%sy,  0.0%ni,  8.8%id, 57.3%wa,  0.0%hi,  0.0%si,
>  8.1%st
> Mem:   7651528k total,  7611112k used,    40416k free,    66056k buffers
> Swap:        0k total,        0k used,        0k free,  3294076k cached
>
> PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>
> 22514 root      20   0 5790m 4.0g 167m S    91.9        54.8 152:45.08 java
>
>
> I see this error in the log file:
>
> ERROR [CompactionExecutor:1] 2010-10-21 01:35:05,318
> AbstractCassandraDaemon.java (line 88) Fatal exception in thread
> Thread[CompactionExecutor:1,1,main]
> java.io.IOError: java.io.IOException: Cannot run program "ln":
> java.io.IOException: error=12, Cannot allocate memory
> at
> org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1368)
>  at org.apache.cassandra.db.Table.snapshot(Table.java:163)
> at
> org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:232)
>  at
> org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:106)
> at
> org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:84)
>  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>  at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>  at java.lang.Thread.run(Thread.java:636)
> Caused by: java.io.IOException: Cannot run program "ln":
> java.io.IOException: error=12, Cannot allocate memory
>  at java.lang.ProcessBuilder.start(ProcessBuilder.java:475)
> at
> org.apache.cassandra.io.util.FileUtils.createHardLinkWithExec(FileUtils.java:263)
>  at
> org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:229)
> at
> org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1360)
>  ... 9 more
> Caused by: java.io.IOException: java.io.IOException: error=12, Cannot
> allocate memory
> at java.lang.UNIXProcess.<init>(UNIXProcess.java:164)
>  at java.lang.ProcessImpl.start(ProcessImpl.java:81)
> at java.lang.ProcessBuilder.start(ProcessBuilder.java:468)
>  ... 12 more
>
>
> On Wed, Oct 20, 2010 at 3:16 PM, Jonathan Ellis <jbel...@gmail.com> wrote:
>
>> can you reproduce this by, say, running nodeprobe ring in a bash while
>> loop?
>>
>>
>> On Wed, Oct 20, 2010 at 3:09 PM, Bill Au <bill.w...@gmail.com> wrote:
>> > One of my Cassandra server crashed with the following:
>> >
>> > ERROR [ACCEPT-xxx.xxx.xxx/nnn.nnnnnn.nnn] 2010-10-19 00:25:10,419
>>
>> > CassandraDaemon.java (line 82) Uncaught exception in thread
>> > Thread[ACCEPT-xxx.xxx.xxx/nnn.nnn.nnn.nnn,5,main]
>> > java.lang.OutOfMemoryError: unable to create new native thread
>> >         at java.lang.Thread.start0(Native Method)
>> >         at java.lang.Thread.start(Thread.java:597)
>> >         at
>> >
>> org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:533)
>> >
>> >
>> > I took threads dump in the JVM on all the other Cassandra severs in my
>> > cluster.  They all have thousand of threads looking like this:
>> >
>> > "JMX server connection timeout 183373" daemon prio=10
>> tid=0x00002aad230db800
>> > nid=0x5cf6 in Object.wait() [0x00002aad7a316000]
>> >    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>> >         at java.lang.Object.wait(Native Method)
>> >         at
>> >
>> com.sunjmx.remote.internal.ServerCommunicatorAdmin$Timeout.run(ServerCommunicatorAdmin.java:150)
>>
>> >         - locked <0x00002aab056ccee0> (a [I)
>> >         at java.lang.Thread.run(Thread.java:619)
>> >
>> > It seems to me that there is a JMX threads leak in Cassandra.  NodeProbe
>> > creates a JMXConnector but never calls its close() method.  I tried
>> setting
>> > jmx.remote.x.server.connection.timeout to 0 hoping that would disable
>> the
>> > JMX server connection timeout threads.  But that did not make any
>> > difference.
>> >
>> > Has anyone else seen this?
>> >
>> > Bill
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>>
>
>

Re: Cassandra crashed - possible JMX threads leak

Reply via email to