Rob Coli <rcoli <at> digg.com> writes:
> As I understand Julie's case, she is : > a) initializing her cluster > b) inserting some number of unique keys with CL.ALL > c) noticing that more disk space (6x?) than is expected is used > d) but that she gets expected usage if she does a major compaction > In other words, the problem isn't "temporary disk space occupied during > the compact", it's permanent disk space occupied unless she compacts. > > Julie : when compaction occurs, it logs the number of bytes that it > started with and the number it ended with, as well as the number of keys > involved in the compaction. What do these messages say? > > example line : > INFO [COMPACTION-POOL:1] 2010-08-06 13:48:00,328 CompactionManager.java > (line 398) Compacted to /path/to/MyColumnFamily-26-Data.db. > 999999999/888888888 bytes for 12345678 keys. Time: 123456ms. Rob - I reran the original test: 8 nodes in the cluster (160GB drives on each node). Populated each node with 30GB of data using unique keys and CL.ALL and repFactor=3. (Wrote 10GB of data to each node but with the repFactor=3, it results in about 30GB of data on each node.) One hour after the last write, the ring distribution looks excellent: Address Status Load Range Ring 170141183460469231731687303715884105728 10.210.198.64 Up 28.32 GB 21267647932558653966460912964485513216 |<--| 10.210.157.187Up 28.07 GB 42535295865117307932921825928971026432 | ^ 10.206.34.194 Up 28.12 GB 63802943797675961899382738893456539648 v | 10.254.107.178Up 28.15 GB 85070591730234615865843651857942052864 | ^ 10.254.234.226Up 28.02 GB 106338239662793269832304564822427566080 v | 10.254.242.159Up 27.96 GB 127605887595351923798765477786913079296 | ^ 10.214.18.198 Up 28.18 GB 148873535527910577765226390751398592512 v | 10.214.26.118 Up 29.82 GB 170141183460469231731687303715884105728 |-->| But 8 hours after the last write (absolutely no activity), things don't look as good: Address Status Load Range Ring 170141183460469231731687303715884105728 10.210.198.64 Up 30.28 GB 21267647932558653966460912964485513216 |<--| 10.210.157.187Up 28.12 GB 42535295865117307932921825928971026432 | ^ 10.206.34.194 Up 122.41 GB 63802943797675961899382738893456539648 v | 10.254.107.178Up 33.89 GB 85070591730234615865843651857942052864 | ^ 10.254.234.226Up 28.01 GB 106338239662793269832304564822427566080 v | 10.254.242.159Up 72.58 GB 127605887595351923798765477786913079296 | ^ 10.214.18.198 Up 83.41 GB 148873535527910577765226390751398592512 v | 10.214.26.118 Up 62.01 GB 170141183460469231731687303715884105728 |-->| The 122.41GB node is named ec2-server3. Here's what cfstats reports: On ec2-server3: Write Latency: 0.21970486121446028 ms. Pending Tasks: 0 Column Family: Standard1 SSTable count: 9 Space used (live): 131438293207 Space used (total): 143577216419 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 454 Read Count: 0 Read Latency: NaN ms. Write Count: 302373 Write Latency: 0.220 ms. Pending Tasks: 0 Key cache capacity: 200000 Key cache size: 0 Key cache hit rate: NaN Row cache: disabled Compacted row minimum size: 100316 Compacted row maximum size: 100323 Compacted row mean size: 100322 On ec2-server3, df reports: /dev/sdb 153899044 140784388 5297032 97% /mnt So this node should (I would think) contain 30GB of data on a 160GB hard drive but somehow it has grown to 122 GB of data (live space) plus several compacted files that have not yet been deleted (total space). Keep in mind that no deletions or updates have taken place, just unique key writes. At this point, here are the compaction lines from the ec2-server3 system.log file, taking place after all writes had completed: INFO [COMPACTION-POOL:1] 2010-08-16 14:11:58,614 CompactionManager.java (line 326) Compacted to /var/lib/cassandra/data/Keyspace1/Standard1-465-Data.db. 8595448738/8595448738 bytes for 85678 keys. Time: 984905ms. INFO [COMPACTION-POOL:1] 2010-08-16 14:20:02,825 CompactionManager.java (line 326) Compacted to /var/lib/cassandra/data/Keyspace1/Standard1-466-Data.db. 2157339354/2157339354 bytes for 21504 keys. Time: 484188ms. INFO [COMPACTION-POOL:1] 2010-08-16 14:28:37,066 CompactionManager.java (line 326) Compacted to /var/lib/cassandra/data/Keyspace1/Standard1-468-Data.db. 2157339329/2157339329 bytes for 21504 keys. Time: 514226ms. INFO [COMPACTION-POOL:1] 2010-08-16 14:36:04,806 CompactionManager.java (line 326) Compacted to /var/lib/cassandra/data/Keyspace1/Standard1-469-Data.db. 2157339417/2157339417 bytes for 21504 keys. Time: 447720ms. INFO [COMPACTION-POOL:1] 2010-08-16 14:40:31,234 CompactionManager.java (line 326) Compacted to /var/lib/cassandra/data/Keyspace1/Standard1-470-Data.db. 2157339202/2157339202 bytes for 21504 keys. Time: 266418ms. INFO [COMPACTION-POOL:1] 2010-08-16 14:58:48,943 CompactionManager.java (line 326) Compacted to /var/lib/cassandra/data/Keyspace1/Standard1-471-Data.db. 8629357302/8629357302 bytes for 86016 keys. Time: 1097659ms. INFO [COMPACTION-POOL:1] 2010-08-16 15:02:43,428 CompactionManager.java (line 326) Compacted to /var/lib/cassandra/data/Keyspace1/Standard1-473-Data.db. 2157338687/2157338687 bytes for 21504 keys. Time: 234464ms. INFO [COMPACTION-POOL:1] 2010-08-16 15:05:08,807 CompactionManager.java (line 326) Compacted to /var/lib/cassandra/data/Keyspace1/Standard1-474-Data.db. 1345327705/1345327705 bytes for 13410 keys. Time: 145363ms. Here’s what’s in the data directory on this node: -rw-r--r-- 1 root root 9518412981 Aug 16 13:36 Standard1-247-Data.db -rw-r--r-- 1 root root 8595448738 Aug 16 14:11 Standard1-465-Data.db -rw-r--r-- 1 root root 22968417153 Aug 16 14:34 Standard1-467-Data.db -rw-r--r-- 1 root root 8629357302 Aug 16 14:58 Standard1-471-Data.db -rw-r--r-- 1 root root 2157338687 Aug 16 15:02 Standard1-473-Data.db -rw-r--r-- 1 root root 1345327705 Aug 16 15:05 Standard1-474-Data.db -rw-r--r-- 1 root root 28995571589 Aug 16 15:38 Standard1-475-Data.db -rw-r--r-- 1 root root 30063580839 Aug 16 17:26 Standard1-476-Data.db -rw-r--r-- 1 root root 19091100731 Aug 16 17:24 Standard1-477-Data.db Plus a bunch of compacted files. At this point, I performed a manual "nodetool cleanup" on the super large node when *very bad things* happened. Here's an excerpt from the large node's log file upon issuing the cleanup: INFO [COMPACTION-POOL:1] 2010-08-16 23:04:04,230 CompactionManager.java (line 345) AntiCompacting [org.apache.cassandra.io.SSTableReader( path='/var/lib/cassandra/data/Keyspace1/Standard1-474-Data.db'), org.apache.cassandra.io.SSTableReader( path='/var/lib/cassandra/data/Keyspace1/Standard1-471-Data.db'), org.apache.cassandra.io.SSTableReader( path='/var/lib/cassandra/data/Keyspace1/Standard1-473-Data.db'), org.apache.cassandra.io.SSTableReader( path='/var/lib/cassandra/data/Keyspace1/Standard1-477-Data.db'), org.apache.cassandra.io.SSTableReader( path='/var/lib/cassandra/data/Keyspace1/Standard1-475-Data.db'), org.apache.cassandra.io.SSTableReader( path='/var/lib/cassandra/data/Keyspace1/Standard1-467-Data.db'), org.apache.cassandra.io.SSTableReader( path='/var/lib/cassandra/data/Keyspace1/Standard1-247-Data.db'), org.apache.cassandra.io.SSTableReader( path='/var/lib/cassandra/data/Keyspace1/Standard1-476-Data.db'), org.apache.cassandra.io.SSTableReader( path='/var/lib/cassandra/data/Keyspace1/Standard1-465-Data.db')] INFO [COMPACTION-POOL:1] 2010-08-16 23:04:04,230 StorageService.java (line 1499) requesting GC to free disk space INFO [GC inspection] 2010-08-16 23:04:04,470 GCInspector.java (line 110) GC for ConcurrentMarkSweep: 233 ms, 299442152 reclaimed leaving 20577560 used; max is 1172766720 INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:14,641 SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra /data/Keyspace1/Standard1-419-Data.db INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:17,974 SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra /data/Keyspace1/Standard1-466-Data.db INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:18,100 SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra /data/Keyspace1/Standard1-428-Data.db INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:18,225 SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra /data/Keyspace1/Standard1-415-Data.db INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:18,374 SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra /data/Keyspace1/Standard1-441-Data.db INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:18,514 SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra /data/Keyspace1/Standard1-461-Data.db INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:18,623 SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra /data/Keyspace1/Standard1-442-Data.db INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:18,740 SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra /data/Keyspace1/Standard1-425-Data.db INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:18,891 SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra /data/Keyspace1/Standard1-447-Data.db … INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:25,331 SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra /data/Keyspace1/Standard1-451-Data.db INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:25,423 SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra /data/Keyspace1/Standard1-450-Data.db ERROR [COMPACTION-POOL:1] 2010-08-16 23:04:25,525 DebuggableThreadPoolExecutor.java (line 94) Error in executor futuretask java.util.concurrent.ExecutionException: java.lang.UnsupportedOperationException: disk full at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.after Execute (DebuggableThreadPoolExecutor.java:86) at org.apache.cassandra.db.CompactionManager$CompactionExecutor.after Execute (CompactionManager.java:582) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask (ThreadPoolExecutor.java:888) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.UnsupportedOperationException: disk full at org.apache.cassandra.db.CompactionManager.doAntiCompaction (CompactionManager.java:351) at org.apache.cassandra.db.CompactionManager.doCleanupCompaction (CompactionManager.java:417) at org.apache.cassandra.db.CompactionManager.access$400 (CompactionManager.java:49) at org.apache.cassandra.db.CompactionManager$2.call(CompactionManager.java:130) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask (ThreadPoolExecutor.java:886) ... 2 more INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:25,558 SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra/data/Keyspace1/Standard1-449-Data.db … INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:33,265 SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra /data/Keyspace1/Standard1-453-Data.db INFO [COMPACTION-POOL:1] 2010-08-16 23:05:08,656 CompactionManager.java (line 246) Compacting [] The next morning (19 hours later): INFO [COMPACTION-POOL:1] 2010-08-16 23:05:08,656 CompactionManager.java (line 246) Compacting [] INFO [COMPACTION-POOL:1] 2010-08-17 00:05:08,637 CompactionManager.java (line 246) Compacting [] INFO [COMPACTION-POOL:1] 2010-08-17 01:05:08,607 CompactionManager.java (line 246) Compacting [] INFO [COMPACTION-POOL:1] 2010-08-17 02:05:08,581 CompactionManager.java (line 246) Compacting [] INFO [COMPACTION-POOL:1] 2010-08-17 03:05:08,568 CompactionManager.java (line 246) Compacting [] INFO [COMPACTION-POOL:1] 2010-08-17 04:05:08,532 CompactionManager.java (line 246) Compacting [] INFO [COMPACTION-POOL:1] 2010-08-17 05:05:08,505 CompactionManager.java (line 246) Compacting [] INFO [COMPACTION-POOL:1] 2010-08-17 06:05:08,494 CompactionManager.java (line 246) Compacting [] Also, 19 hours later, ring distribution is the same: Address Status Load Range Ring 170141183460469231731687303715884105728 10.210.198.64 Up 30.28 GB 21267647932558653966460912964485513216 |<--| 10.210.157.187Up 28.12 GB 42535295865117307932921825928971026432 | ^ 10.206.34.194 Up 122.41 GB 63802943797675961899382738893456539648 v | 10.254.107.178Up 33.89 GB 85070591730234615865843651857942052864 | ^ 10.254.234.226Up 28.01 GB 106338239662793269832304564822427566080 v | 10.254.242.159Up 72.58 GB 127605887595351923798765477786913079296 | ^ 10.214.18.198 Up 83.41 GB 148873535527910577765226390751398592512 v | 10.214.26.118 Up 62.01 GB 170141183460469231731687303715884105728 |-->| So nodetool cleanup seems to have resulted in a fatal error on my overly (>4x) bloated node. Can anyone help me with understanding why this happened, taking into account the node should only contain 30GB of data on a 160GB hard drive? I have the 8 system.log files from the 24 hour period (the nodes were only alive for 24 hours total). Thank you! Julie