Thanks for the help so far. Is there any way to find out why my HintsColumnFamily is so large now, since it wasn't this way before the upgrade and it seems to just climbing?
I've tried invoking o.a.c.db.HintedHnadoffManager.countPendingHints() thinking I have a bunch of stale hints from upgrade issues, but it just eventually times out. Plus the node it gets invoked against gets thrashed and stops responding, forcing me to restart cassandra. -----Original Message----- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Thursday, November 03, 2011 5:06 PM To: user@cassandra.apache.org Subject: Re: Problem after upgrade to 1.0.1 I found the problem and posted a patch on https://issues.apache.org/jira/browse/CASSANDRA-3451. If you build with that patch and rerun scrub the exception should go away. On Thu, Nov 3, 2011 at 2:08 PM, Bryce Godfrey <bryce.godf...@azaleos.com> wrote: > A restart fixed the load numbers, they are back to where I expect them to be > now, but disk utilization is double the load #. I'm also still get the > cfstats exception from any node. > > -----Original Message----- > From: Jonathan Ellis [mailto:jbel...@gmail.com] > Sent: Thursday, November 03, 2011 11:52 AM > To: user@cassandra.apache.org > Subject: Re: Problem after upgrade to 1.0.1 > > Does restarting the node fix this? > > On Thu, Nov 3, 2011 at 1:51 PM, Bryce Godfrey <bryce.godf...@azaleos.com> > wrote: >> Disk utilization is actually about 80% higher than what is reported >> for nodetool ring across all my nodes on the data drive >> >> >> >> Bryce Godfrey | Sr. Software Engineer | Azaleos Corporation | T: >> 206.926.1978 | M: 206.849.2477 >> >> >> >> From: Dan Hendry [mailto:dan.hendry.j...@gmail.com] >> Sent: Thursday, November 03, 2011 11:47 AM >> To: user@cassandra.apache.org >> Subject: RE: Problem after upgrade to 1.0.1 >> >> >> >> Regarding load growth, presumably you are referring to the load as >> reported by JMX/nodetool. Have you actually looked at the disk >> utilization on the nodes themselves? Potential issue I have seen: >> http://www.mail-archive.com/user@cassandra.apache.org/msg18142.html >> >> >> >> Dan >> >> >> >> From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com] >> Sent: November-03-11 14:40 >> To: user@cassandra.apache.org >> Subject: Problem after upgrade to 1.0.1 >> >> >> >> I recently upgraded from 0.8.6 to 1.0.1 and everything seemed to go >> just fine with the rolling upgrade. But now I'm having extreme load >> growth on one of my nodes (and others are growing faster than usual >> also). I attempted to run a cfstats against the extremely large node >> that was seeing 2x the load of others and I get this error below. >> I'm also went into the o.a.c.db.HintedHandoffManager mbean and >> attempted to list pending hints to see if it was growing out of >> control for some reason, but that just times out eventually for any node. >> I'm not sure what to do next with this issue. >> >> >> >> Column Family: HintsColumnFamily >> >> SSTable count: 3 >> >> Space used (live): 12681676437 >> >> Space used (total): 10233130272 >> >> Number of Keys (estimate): 384 >> >> Memtable Columns Count: 117704 >> >> Memtable Data Size: 115107307 >> >> Memtable Switch Count: 66 >> >> Read Count: 0 >> >> Read Latency: NaN ms. >> >> Write Count: 21203290 >> >> Write Latency: 0.014 ms. >> >> Pending Tasks: 0 >> >> Key cache capacity: 3 >> >> Key cache size: 0 >> >> Key cache hit rate: NaN >> >> Row cache: disabled >> >> Compacted row minimum size: 30130993 >> >> Compacted row maximum size: 9223372036854775807 >> >> Exception in thread "main" java.lang.IllegalStateException: Unable to >> compute ceiling for max when histogram overflowed >> >> at >> org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram. >> java:170) >> >> at >> org.apache.cassandra.db.DataTracker.getMeanRowSize(DataTracker.java:3 >> 9 >> 5) >> >> at >> org.apache.cassandra.db.ColumnFamilyStore.getMeanRowSize(ColumnFamily >> S >> tore.java:293) >> >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native >> Method) >> >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. >> j >> ava:39) >> >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces >> s >> orImpl.java:25) >> >> at java.lang.reflect.Method.invoke(Method.java:597) >> >> at >> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMB >> e >> anIntrospector.java:93) >> >> at >> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMB >> e >> anIntrospector.java:27) >> >> at >> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.j >> a >> va:208) >> >> at >> com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:6 >> 5 >> ) >> >> at >> com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:2 >> 1 >> 6) >> >> at >> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(De >> f >> aultMBeanServerInterceptor.java:666) >> >> at >> com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.ja >> v >> a:638) >> >> at >> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnecti >> o >> nImpl.java:1404) >> >> at >> javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectio >> n >> Impl.java:72) >> >> at >> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run >> ( >> RMIConnectionImpl.java:1265) >> >> at >> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(R >> M >> IConnectionImpl.java:1360) >> >> at >> javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnect >> i >> onImpl.java:600) >> >> at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown >> Source) >> >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces >> s >> orImpl.java:25) >> >> at java.lang.reflect.Method.invoke(Method.java:597) >> >> at >> sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305) >> >> at sun.rmi.transport.Transport$1.run(Transport.java:159) >> >> at java.security.AccessController.doPrivileged(Native Method) >> >> at >> sun.rmi.transport.Transport.serviceCall(Transport.java:155) >> >> at >> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:5 >> 3 >> 5) >> >> at >> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTranspor >> t >> .java:790) >> >> at >> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport. >> java:649) >> >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec >> u >> tor.java:886) >> >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor. >> java:908) >> >> at java.lang.Thread.run(Thread.java:662) >> >> >> >> Bryce Godfrey | Sr. Software Engineer | Azaleos Corporation | T: >> 206.926.1978 | M: 206.849.2477 >> >> >> >> No virus found in this incoming message. >> Checked by AVG - www.avg.com >> Version: 9.0.920 / Virus Database: 271.1.1/3993 - Release Date: >> 11/03/11 >> 03:39:00 > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com