Hi, I tried running cfstats on other nodes. It works on all except two nodes. Then I tried scrubbing the OSP CF on one of the nodes where it fails (actually the node where the first exception I reported happened), but got this exception in the log:
[...] INFO 14:58:00,604 Scrub of SSTableReader(path='/data/cassandra/data/KeyspaceCumulus/OSP-h-9528-Data.db') complete: 44378258 rows in new sstable and 0 empty (tombstoned) rows dropped INFO 14:58:00,604 Scrubbing SSTableReader(path='/data/cassandra/data/KeyspaceCumulus/OSP-h-9709-Data.db') INFO 14:58:46,958 GC for ParNew: 347 ms for 1 collections, 1172065472 used; max is 2057306112 INFO 14:59:51,360 Scrub of SSTableReader(path='/data/cassandra/data/KeyspaceCumulus/OSP-h-9709-Data.db') complete: 3833789 rows in new sstable and 0 empty (tombstoned) rows dropped INFO 14:59:51,361 Scrubbing SSTableReader(path='/data/cassandra/data/KeyspaceCumulus/OSP-h-9708-Data.db') INFO 14:59:56,355 Scrub of SSTableReader(path='/data/cassandra/data/KeyspaceCumulus/OSP-h-9708-Data.db') complete: 179681 rows in new sstable and 0 empty (tombstoned) rows dropped INFO 14:59:56,356 Scrubbing SSTableReader(path='/data/cassandra/data/KeyspaceCumulus/OSP-h-9530-Data.db') ERROR 14:59:56,393 Fatal exception in thread Thread[CompactionExecutor:7,5,RMI Runtime] java.io.IOException: disk full at org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:477) at org.apache.cassandra.db.compaction.CompactionManager.doScrub(CompactionManager.java:465) at org.apache.cassandra.db.compaction.CompactionManager.access$300(CompactionManager.java:63) at org.apache.cassandra.db.compaction.CompactionManager$3.call(CompactionManager.java:217) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) The disk is nowhere near full: root@ubuntu:~# df -h Filesystem Size Used Avail Use% Mounted on /dev/vda1 1.9G 654M 1.2G 36% / none 2.0G 164K 2.0G 1% /dev none 2.0G 0 2.0G 0% /dev/shm none 2.0G 40K 2.0G 1% /var/run none 2.0G 0 2.0G 0% /var/lock none 2.0G 0 2.0G 0% /lib/init/rw none 1.9G 654M 1.2G 36% /var/lib/ureadahead/debugfs /dev/vdb 79G 44G 32G 59% /data /dev/vdc 4.0G 137M 3.7G 4% /data2 /data contains the data directory, /data2 the commitlog. This is the sstable on disk: -rw-r--r-- 2 cassandra cassandra 3946478 2011-10-10 07:51 /data/cassandra/data/KeyspaceCumulus/OSP-h-9530-CompressionInfo.db -rw-r--r-- 2 cassandra cassandra 3207243773 2011-10-10 07:51 /data/cassandra/data/KeyspaceCumulus/OSP-h-9530-Data.db -rw-r--r-- 2 cassandra cassandra 1758976 2011-10-10 07:51 /data/cassandra/data/KeyspaceCumulus/OSP-h-9530-Filter.db -rw-r--r-- 2 cassandra cassandra 71879406 2011-10-10 07:51 /data/cassandra/data/KeyspaceCumulus/OSP-h-9530-Index.db -rw-r--r-- 2 cassandra cassandra 4284 2011-10-10 07:51 /data/cassandra/data/KeyspaceCumulus/OSP-h-9530-Statistics.db I'm currently scrubbing the CF on the other node where cfstats fails, but it's not done yet. Cheers, Günter On 12.10.2011, at 16:28, Jonathan Ellis wrote: > Try scrubbing the CF ("nodetool scrub") and see if that fixes it. > > If not, then at least we have a reproducible problem. :) > > On Tue, Oct 11, 2011 at 4:43 PM, Günter Ladwig <guenter.lad...@kit.edu> wrote: >> Hi all, >> >> I'm seeing the same problem on my 1.0.0-rc2 cluster. However, I do not have >> 5000, but just three (compressed) CFs. >> >> The exception does not happen for the Migrations CF, but for one of mine: >> >> Keyspace: KeyspaceCumulus >> Read Count: 816 >> Read Latency: 8.926029411764706 ms. >> Write Count: 16808336 >> Write Latency: 0.03914435902518846 ms. >> Pending Tasks: 0 >> Column Family: OSP >> SSTable count: 22 >> Space used (live): 22319610951 >> Space used (total): 22227585112 >> Number of Keys (estimate): 87322624 >> Memtable Columns Count: 56028 >> Memtable Data Size: 54362270 >> Memtable Switch Count: 154 >> Read Count: 277 >> Read Latency: NaN ms. >> Write Count: 10913659 >> Write Latency: NaN ms. >> Pending Tasks: 0 >> Key cache: disabled >> Row cache: disabled >> Compacted row minimum size: 125 >> Compacted row maximum size: 9223372036854775807 >> Exception in thread "main" java.lang.IllegalStateException: Unable to >> compute ceiling for max when histogram overflowed >> at >> org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram.java:170) >> at >> org.apache.cassandra.db.DataTracker.getMeanRowSize(DataTracker.java:395) >> at >> org.apache.cassandra.db.ColumnFamilyStore.getMeanRowSize(ColumnFamilyStore.java:275) >> [...snip…] >> >> I also had a look at the stats using JMX. The other CFs work fine, the only >> problem seems to be this one. In JMX it shows 'Unavailable' for the row mean >> size and also that ridiculous value for the max size. >> >> The cluster consists of 15 nodes. The keyspace has three CFs (SPO, OSP and >> POS) of which only two contain any data (POS is empty), and uses replication >> factor 2. In total, there are about 2 billion columns in each CF. The data >> distribution is different between the two CFs. The row sizes for SPO should >> be fairly evenly distributed whereas OSP will have a few very wide rows and >> a large number of small rows. >> >> Here is the output from describe: >> >> Keyspace: KeyspaceCumulus: >> >> Replication Strategy: >> org.apache.cassandra.locator.SimpleStrategy >> Durable Writes: true >> Options: [replication_factor:2] >> Column Families: >> ColumnFamily: OSP >> Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type >> Default column value validator: org.apache.cassandra.db.marshal.UTF8Type >> Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type >> Row cache size / save period in seconds / keys to save : 0.0/0/all >> Key cache size / save period in seconds: 0.0/0 >> GC grace seconds: 0 >> Compaction min/max thresholds: 4/32 >> Read repair chance: 0.0 >> Replicate on write: false >> Built indexes: [] >> Compaction Strategy: >> org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy >> Compression Options: >> sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor >> ColumnFamily: POS >> Key Validation Class: org.apache.cassandra.db.marshal.BytesType >> Default column value validator: org.apache.cassandra.db.marshal.UTF8Type >> Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type >> Row cache size / save period in seconds / keys to save : 0.0/0/all >> Key cache size / save period in seconds: 0.0/0 >> GC grace seconds: 0 >> Compaction min/max thresholds: 4/32 >> Read repair chance: 0.0 >> Replicate on write: false >> Built indexes: [POS.index_p] >> Column Metadata: >> Column Name: !o >> Validation Class: org.apache.cassandra.db.marshal.UTF8Type >> Column Name: !p >> Validation Class: org.apache.cassandra.db.marshal.UTF8Type >> Index Name: index_p >> Index Type: KEYS >> Compaction Strategy: >> org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy >> Compression Options: >> sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor >> ColumnFamily: SPO >> Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type >> Default column value validator: org.apache.cassandra.db.marshal.UTF8Type >> Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type >> Row cache size / save period in seconds / keys to save : 0.0/0/all >> Key cache size / save period in seconds: 0.0/0 >> GC grace seconds: 0 >> Compaction min/max thresholds: 4/32 >> Read repair chance: 0.0 >> Replicate on write: false >> Built indexes: [] >> Compaction Strategy: >> org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy >> Compression Options: >> sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor >> >> If you need additional information, let me know. >> >> Cheers, >> Günter >> >> On 04.10.2011, at 10:20, aaron morton wrote: >> >>> That row has a size of 819 peta bytes, so something is odd there. The error >>> is a result of that value been so huge. When you rant he same script on >>> 0.8.6 what was the max size of the Migrations CF ? >>> >>> As Jonathan says, it's unlikely anyone would have tested creating 5000 >>> CF's. Most people only create a few 10's of CF's at most. >>> >>> either use fewer CF's or… >>> >>> * dump the Migrations CF using sstable2json to take a look around >>> * work out steps to reproduce and report it on Jira >>> >>> Hope that helps. >>> >>> ----------------- >>> Aaron Morton >>> Freelance Cassandra Developer >>> @aaronmorton >>> http://www.thelastpickle.com >>> >>> On 4/10/2011, at 11:30 AM, Ramesh Natarajan wrote: >>> >>>> We recreated the schema using the same input file on both clusters and >>>> they are running identical load. >>>> >>>> Isn't the exception thrown in the system CF? >>>> >>>> this line looks strange: >>>> >>>> Compacted row maximum size: 9223372036854775807 >>>> >>>> thanks >>>> Ramesh >>>> >>>> On Mon, Oct 3, 2011 at 5:26 PM, Jonathan Ellis <jbel...@gmail.com> wrote: >>>> Looks like you have unexpectedly large rows in your 1.0 cluster but >>>> not 0.8. I guess you could use sstable2json to manually check your >>>> row sizes. >>>> >>>> On Mon, Oct 3, 2011 at 5:20 PM, Ramesh Natarajan <rames...@gmail.com> >>>> wrote: >>>>> It happens all the time on 1.0. It doesn't happen on 0.8.6. Is there any >>>>> thing I can do to check? >>>>> thanks >>>>> Ramesh >>>>> >>>>> On Mon, Oct 3, 2011 at 5:15 PM, Jonathan Ellis <jbel...@gmail.com> wrote: >>>>>> >>>>>> My suspicion would be that it has more to do with "rare case when >>>>>> running with 5000 CFs" than "1.0 regression." >>>>>> >>>>>> On Mon, Oct 3, 2011 at 5:00 PM, Ramesh Natarajan <rames...@gmail.com> >>>>>> wrote: >>>>>>> We have about 5000 column family and when we run the nodetool cfstats it >>>>>>> throws out this exception... this is running 1.0.0-rc1 >>>>>>> This seems to work on 0.8.6. Is this a bug in 1.0.0? >>>>>>> >>>>>>> thanks >>>>>>> Ramesh >>>>>>> Keyspace: system >>>>>>> Read Count: 28 >>>>>>> Read Latency: 5.8675 ms. >>>>>>> Write Count: 3 >>>>>>> Write Latency: 0.166 ms. >>>>>>> Pending Tasks: 0 >>>>>>> Column Family: Schema >>>>>>> SSTable count: 4 >>>>>>> Space used (live): 4293758276 >>>>>>> Space used (total): 4293758276 >>>>>>> Number of Keys (estimate): 5376 >>>>>>> Memtable Columns Count: 0 >>>>>>> Memtable Data Size: 0 >>>>>>> Memtable Switch Count: 0 >>>>>>> Read Count: 3 >>>>>>> Read Latency: NaN ms. >>>>>>> Write Count: 0 >>>>>>> Write Latency: NaN ms. >>>>>>> Pending Tasks: 0 >>>>>>> Key cache capacity: 53 >>>>>>> Key cache size: 2 >>>>>>> Key cache hit rate: NaN >>>>>>> Row cache: disabled >>>>>>> Compacted row minimum size: 104 >>>>>>> Compacted row maximum size: 1955666 >>>>>>> Compacted row mean size: 1508515 >>>>>>> Column Family: HintsColumnFamily >>>>>>> SSTable count: 0 >>>>>>> Space used (live): 0 >>>>>>> Space used (total): 0 >>>>>>> Number of Keys (estimate): 0 >>>>>>> Memtable Columns Count: 0 >>>>>>> Memtable Data Size: 0 >>>>>>> Memtable Switch Count: 0 >>>>>>> Read Count: 5 >>>>>>> Read Latency: NaN ms. >>>>>>> Write Count: 0 >>>>>>> Write Latency: NaN ms. >>>>>>> Pending Tasks: 0 >>>>>>> Key cache capacity: 1 >>>>>>> Key cache size: 0 >>>>>>> Key cache hit rate: NaN >>>>>>> Row cache: disabled >>>>>>> Compacted row minimum size: 0 >>>>>>> Compacted row maximum size: 0 >>>>>>> Compacted row mean size: 0 >>>>>>> Column Family: LocationInfo >>>>>>> SSTable count: 1 >>>>>>> Space used (live): 6947 >>>>>>> Space used (total): 6947 >>>>>>> Number of Keys (estimate): 128 >>>>>>> Memtable Columns Count: 0 >>>>>>> Memtable Data Size: 0 >>>>>>> Memtable Switch Count: 2 >>>>>>> Read Count: 20 >>>>>>> Read Latency: NaN ms. >>>>>>> Write Count: 3 >>>>>>> Write Latency: NaN ms. >>>>>>> Pending Tasks: 0 >>>>>>> Key cache capacity: 1 >>>>>>> Key cache size: 1 >>>>>>> Key cache hit rate: NaN >>>>>>> Row cache: disabled >>>>>>> Compacted row minimum size: 73 >>>>>>> Compacted row maximum size: 258 >>>>>>> Compacted row mean size: 185 >>>>>>> Column Family: Migrations >>>>>>> SSTable count: 4 >>>>>>> Space used (live): 4315909643 >>>>>>> Space used (total): 4315909643 >>>>>>> Number of Keys (estimate): 512 >>>>>>> Memtable Columns Count: 0 >>>>>>> Memtable Data Size: 0 >>>>>>> Memtable Switch Count: 0 >>>>>>> Read Count: 0 >>>>>>> Read Latency: NaN ms. >>>>>>> Write Count: 0 >>>>>>> Write Latency: NaN ms. >>>>>>> Pending Tasks: 0 >>>>>>> Key cache capacity: 5 >>>>>>> Key cache size: 0 >>>>>>> Key cache hit rate: NaN >>>>>>> Row cache: disabled >>>>>>> Compacted row minimum size: 5839589 >>>>>>> Compacted row maximum size: 9223372036854775807 >>>>>>> Exception in thread "main" java.lang.IllegalStateException: Unable to >>>>>>> compute ceiling for max when histogram overflowed >>>>>>> at >>>>>>> >>>>>>> org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram.java:170) >>>>>>> at >>>>>>> org.apache.cassandra.db.DataTracker.getMeanRowSize(DataTracker.java:395) >>>>>>> at >>>>>>> >>>>>>> org.apache.cassandra.db.ColumnFamilyStore.getMeanRowSize(ColumnFamilyStore.java:275) >>>>>>> at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source) >>>>>>> at >>>>>>> >>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>>>>>> at java.lang.reflect.Method.invoke(Method.java:597) >>>>>>> at >>>>>>> >>>>>>> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93) >>>>>>> at >>>>>>> >>>>>>> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27) >>>>>>> at >>>>>>> >>>>>>> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208) >>>>>>> at >>>>>>> com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:65) >>>>>>> at >>>>>>> com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:216) >>>>>>> at >>>>>>> >>>>>>> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:666) >>>>>>> at >>>>>>> >>>>>>> com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:638) >>>>>>> at >>>>>>> >>>>>>> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1404) >>>>>>> at >>>>>>> >>>>>>> javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72) >>>>>>> at >>>>>>> >>>>>>> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265) >>>>>>> at >>>>>>> >>>>>>> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360) >>>>>>> at >>>>>>> >>>>>>> javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl.java:600) >>>>>>> at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source) >>>>>>> at >>>>>>> >>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>>>>>> at java.lang.reflect.Method.invoke(Method.java:597) >>>>>>> at >>>>>>> sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305) >>>>>>> at sun.rmi.transport.Transport$1.run(Transport.java:159) >>>>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>>>> at sun.rmi.transport.Transport.serviceCall(Transport.java:155) >>>>>>> at >>>>>>> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535) >>>>>>> at >>>>>>> >>>>>>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790) >>>>>>> at >>>>>>> >>>>>>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649) >>>>>>> at >>>>>>> >>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >>>>>>> at >>>>>>> >>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >>>>>>> at java.lang.Thread.run(Thread.java:662) >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Jonathan Ellis >>>>>> Project Chair, Apache Cassandra >>>>>> co-founder of DataStax, the source for professional Cassandra support >>>>>> http://www.datastax.com >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Jonathan Ellis >>>> Project Chair, Apache Cassandra >>>> co-founder of DataStax, the source for professional Cassandra support >>>> http://www.datastax.com >>>> >>> >> >> -- >> Dipl.-Inform. Günter Ladwig >> >> Karlsruhe Institute of Technology (KIT) >> Institute AIFB >> >> Englerstraße 11 (Building 11.40, Room 250) >> 76131 Karlsruhe, Germany >> Phone: +49 721 608-47946 >> Email: guenter.lad...@kit.edu >> Web: www.aifb.kit.edu >> >> KIT – University of the State of Baden-Württemberg and National Large-scale >> Research Center of the Helmholtz Association >> >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com -- Dipl.-Inform. Günter Ladwig Karlsruhe Institute of Technology (KIT) Institute AIFB Englerstraße 11 (Building 11.40, Room 250) 76131 Karlsruhe, Germany Phone: +49 721 608-47946 Email: guenter.lad...@kit.edu Web: www.aifb.kit.edu KIT – University of the State of Baden-Württemberg and National Large-scale Research Center of the Helmholtz Association
smime.p7s
Description: S/MIME cryptographic signature