Rob Coli <rcoli <at> digg.com> writes:

> As I understand Julie's case, she is :
> a) initializing her cluster
> b) inserting some number of unique keys with CL.ALL
> c) noticing that more disk space (6x?) than is expected is used
> d) but that she gets expected usage if she does a major compaction
> In other words, the problem isn't "temporary disk space occupied during
> the compact", it's permanent disk space occupied unless she compacts.
> 
> Julie : when compaction occurs, it logs the number of bytes that it 
> started with and the number it ended with, as well as the number of keys 
> involved in the compaction. What do these messages say?
> 
> example line :
> INFO [COMPACTION-POOL:1] 2010-08-06 13:48:00,328 CompactionManager.java 
> (line 398) Compacted to /path/to/MyColumnFamily-26-Data.db. 
> 999999999/888888888 bytes for 12345678 keys.  Time: 123456ms.
 
Rob -
I reran the original test: 8 nodes in the cluster (160GB drives on each 
node). Populated each node with 30GB of data using unique keys and CL.ALL 
and repFactor=3.  (Wrote 10GB of data to each node but with the repFactor=3, 
it results in about 30GB of data on each node.)  

One hour after the last write, the ring distribution looks excellent:
Address       Status     Load          Range                                   
  Ring
                                       170141183460469231731687303715884105728  
  
10.210.198.64 Up         28.32 GB      21267647932558653966460912964485513216  
  |<--|
10.210.157.187Up         28.07 GB      42535295865117307932921825928971026432  
  |   ^
10.206.34.194 Up         28.12 GB      63802943797675961899382738893456539648  
  v   |
10.254.107.178Up         28.15 GB      85070591730234615865843651857942052864  
  |   ^
10.254.234.226Up         28.02 GB      106338239662793269832304564822427566080 
  v   |
10.254.242.159Up         27.96 GB      127605887595351923798765477786913079296 
  |   ^
10.214.18.198 Up         28.18 GB      148873535527910577765226390751398592512 
  v   |
10.214.26.118 Up         29.82 GB      170141183460469231731687303715884105728 
  |-->|

But 8 hours after the last write (absolutely no activity), things don't 
look as good:

Address       Status     Load          Range                                   
  Ring
                                       170141183460469231731687303715884105728  
  
10.210.198.64 Up         30.28 GB      21267647932558653966460912964485513216  
  |<--|
10.210.157.187Up         28.12 GB      42535295865117307932921825928971026432  
  |   ^
10.206.34.194 Up         122.41 GB     63802943797675961899382738893456539648  
  v   |
10.254.107.178Up         33.89 GB      85070591730234615865843651857942052864  
  |   ^
10.254.234.226Up         28.01 GB      106338239662793269832304564822427566080 
  v   |
10.254.242.159Up         72.58 GB      127605887595351923798765477786913079296 
  |   ^
10.214.18.198 Up         83.41 GB      148873535527910577765226390751398592512 
  v   |
10.214.26.118 Up         62.01 GB      170141183460469231731687303715884105728 
  |-->|

The 122.41GB node is named ec2-server3.  Here's what cfstats reports:
On ec2-server3:

          Write Latency: 0.21970486121446028 ms.
          Pending Tasks: 0
                   Column Family: Standard1
                   SSTable count: 9
                        Space used (live): 131438293207
                        Space used (total): 143577216419
                   Memtable Columns Count: 0
                   Memtable Data Size: 0
                   Memtable Switch Count: 454
                   Read Count: 0
                   Read Latency: NaN ms.
                   Write Count: 302373
                   Write Latency: 0.220 ms.
                   Pending Tasks: 0
                   Key cache capacity: 200000
                   Key cache size: 0
                   Key cache hit rate: NaN
                   Row cache: disabled
                   Compacted row minimum size: 100316
                   Compacted row maximum size: 100323
                   Compacted row mean size: 100322

On ec2-server3, df reports:
/dev/sdb             153899044 140784388   5297032  97% /mnt

So this node should (I would think) contain 30GB of data on a 160GB hard 
drive but somehow it has grown to 122 GB of data (live space) plus several 
compacted files that have not yet been deleted (total space).  Keep in 
mind that no deletions or updates have taken place, just unique key writes.

At this point, here are the compaction lines from the ec2-server3 system.log
file, taking place after all writes had completed:

INFO [COMPACTION-POOL:1] 2010-08-16 14:11:58,614 CompactionManager.java (line 
326) Compacted to /var/lib/cassandra/data/Keyspace1/Standard1-465-Data.db. 
8595448738/8595448738 bytes for 85678 keys.  Time: 984905ms.
 INFO [COMPACTION-POOL:1] 2010-08-16 14:20:02,825 CompactionManager.java (line 
326) Compacted to /var/lib/cassandra/data/Keyspace1/Standard1-466-Data.db. 
2157339354/2157339354 bytes for 21504 keys.  Time: 484188ms.
 INFO [COMPACTION-POOL:1] 2010-08-16 14:28:37,066 CompactionManager.java (line 
326) Compacted to /var/lib/cassandra/data/Keyspace1/Standard1-468-Data.db.  
2157339329/2157339329 bytes for 21504 keys.  Time: 514226ms.
 INFO [COMPACTION-POOL:1] 2010-08-16 14:36:04,806 CompactionManager.java (line 
326) Compacted to /var/lib/cassandra/data/Keyspace1/Standard1-469-Data.db. 
2157339417/2157339417 bytes for 21504 keys.  Time: 447720ms.
 INFO [COMPACTION-POOL:1] 2010-08-16 14:40:31,234 CompactionManager.java (line 
326) Compacted to /var/lib/cassandra/data/Keyspace1/Standard1-470-Data.db.  
2157339202/2157339202 bytes for 21504 keys.  Time: 266418ms.
 INFO [COMPACTION-POOL:1] 2010-08-16 14:58:48,943 CompactionManager.java (line 
326) Compacted to /var/lib/cassandra/data/Keyspace1/Standard1-471-Data.db.  
8629357302/8629357302 bytes for 86016 keys.  Time: 1097659ms.
 INFO [COMPACTION-POOL:1] 2010-08-16 15:02:43,428 CompactionManager.java (line
326) Compacted to /var/lib/cassandra/data/Keyspace1/Standard1-473-Data.db.  
2157338687/2157338687 bytes for 21504 keys.  Time: 234464ms.
 INFO [COMPACTION-POOL:1] 2010-08-16 15:05:08,807 CompactionManager.java (line 
326) Compacted to /var/lib/cassandra/data/Keyspace1/Standard1-474-Data.db. 
1345327705/1345327705 bytes for 13410 keys.  Time: 145363ms.

Here’s what’s in the data directory on this node:

-rw-r--r-- 1 root root  9518412981 Aug 16 13:36 Standard1-247-Data.db
-rw-r--r-- 1 root root  8595448738 Aug 16 14:11 Standard1-465-Data.db
-rw-r--r-- 1 root root 22968417153 Aug 16 14:34 Standard1-467-Data.db
-rw-r--r-- 1 root root  8629357302 Aug 16 14:58 Standard1-471-Data.db
-rw-r--r-- 1 root root  2157338687 Aug 16 15:02 Standard1-473-Data.db
-rw-r--r-- 1 root root  1345327705 Aug 16 15:05 Standard1-474-Data.db
-rw-r--r-- 1 root root 28995571589 Aug 16 15:38 Standard1-475-Data.db
-rw-r--r-- 1 root root 30063580839 Aug 16 17:26 Standard1-476-Data.db
-rw-r--r-- 1 root root 19091100731 Aug 16 17:24 Standard1-477-Data.db
Plus a bunch of compacted files. 

At this point, I performed a manual "nodetool cleanup" on the super large 
node when *very bad things* happened.  Here's an excerpt from the large 
node's log file upon issuing the cleanup:

INFO [COMPACTION-POOL:1] 2010-08-16 23:04:04,230 CompactionManager.java (line 
345) AntiCompacting [org.apache.cassandra.io.SSTableReader(
path='/var/lib/cassandra/data/Keyspace1/Standard1-474-Data.db'),
org.apache.cassandra.io.SSTableReader(
path='/var/lib/cassandra/data/Keyspace1/Standard1-471-Data.db'),
org.apache.cassandra.io.SSTableReader(
path='/var/lib/cassandra/data/Keyspace1/Standard1-473-Data.db'),
org.apache.cassandra.io.SSTableReader(
path='/var/lib/cassandra/data/Keyspace1/Standard1-477-Data.db'),
org.apache.cassandra.io.SSTableReader(
path='/var/lib/cassandra/data/Keyspace1/Standard1-475-Data.db'),
org.apache.cassandra.io.SSTableReader(
path='/var/lib/cassandra/data/Keyspace1/Standard1-467-Data.db'),
org.apache.cassandra.io.SSTableReader(
path='/var/lib/cassandra/data/Keyspace1/Standard1-247-Data.db'),
org.apache.cassandra.io.SSTableReader(
path='/var/lib/cassandra/data/Keyspace1/Standard1-476-Data.db'),
org.apache.cassandra.io.SSTableReader(
path='/var/lib/cassandra/data/Keyspace1/Standard1-465-Data.db')]

 INFO [COMPACTION-POOL:1] 2010-08-16 23:04:04,230 StorageService.java 
(line 1499) requesting GC to free disk space
 INFO [GC inspection] 2010-08-16 23:04:04,470 GCInspector.java (line 110) 
GC for ConcurrentMarkSweep: 233 ms, 299442152 reclaimed leaving 20577560 
used; max is 1172766720
 INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:14,641 
SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra
/data/Keyspace1/Standard1-419-Data.db
 INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:17,974 
SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra
/data/Keyspace1/Standard1-466-Data.db
 INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:18,100 
SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra
/data/Keyspace1/Standard1-428-Data.db
 INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:18,225 
SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra
/data/Keyspace1/Standard1-415-Data.db
 INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:18,374 
SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra
/data/Keyspace1/Standard1-441-Data.db
 INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:18,514 
SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra
/data/Keyspace1/Standard1-461-Data.db
 INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:18,623 
SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra
/data/Keyspace1/Standard1-442-Data.db
 INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:18,740 
SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra
/data/Keyspace1/Standard1-425-Data.db
 INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:18,891 
SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra
/data/Keyspace1/Standard1-447-Data.db
…
INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:25,331 
SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra
/data/Keyspace1/Standard1-451-Data.db
 INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:25,423 
SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra
/data/Keyspace1/Standard1-450-Data.db
ERROR [COMPACTION-POOL:1] 2010-08-16 23:04:25,525 
DebuggableThreadPoolExecutor.java (line 94) Error in executor futuretask
java.util.concurrent.ExecutionException: 
java.lang.UnsupportedOperationException: disk full
        at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
        at java.util.concurrent.FutureTask.get(FutureTask.java:83)
        at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.after
Execute (DebuggableThreadPoolExecutor.java:86)
        at
org.apache.cassandra.db.CompactionManager$CompactionExecutor.after
Execute (CompactionManager.java:582)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask
(ThreadPoolExecutor.java:888)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.UnsupportedOperationException: disk full
        at 
org.apache.cassandra.db.CompactionManager.doAntiCompaction
(CompactionManager.java:351)
        at 
org.apache.cassandra.db.CompactionManager.doCleanupCompaction
(CompactionManager.java:417)
        at 
org.apache.cassandra.db.CompactionManager.access$400
(CompactionManager.java:49)
        at 
org.apache.cassandra.db.CompactionManager$2.call(CompactionManager.java:130)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask
(ThreadPoolExecutor.java:886)
        ... 2 more
 INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:25,558 
SSTableDeletingReference.java (line 104) 
Deleted /var/lib/cassandra/data/Keyspace1/Standard1-449-Data.db
…
INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:33,265 
SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra
/data/Keyspace1/Standard1-453-Data.db
 INFO [COMPACTION-POOL:1] 2010-08-16 23:05:08,656 CompactionManager.java 
(line 246) Compacting []

The next morning (19 hours later):

INFO [COMPACTION-POOL:1] 2010-08-16 23:05:08,656 CompactionManager.java 
(line 246) Compacting []
 INFO [COMPACTION-POOL:1] 2010-08-17 00:05:08,637 CompactionManager.java 
(line 246) Compacting []
 INFO [COMPACTION-POOL:1] 2010-08-17 01:05:08,607 CompactionManager.java 
(line 246) Compacting []
 INFO [COMPACTION-POOL:1] 2010-08-17 02:05:08,581 CompactionManager.java 
(line 246) Compacting []
 INFO [COMPACTION-POOL:1] 2010-08-17 03:05:08,568 CompactionManager.java 
(line 246) Compacting []
 INFO [COMPACTION-POOL:1] 2010-08-17 04:05:08,532 CompactionManager.java 
(line 246) Compacting []
 INFO [COMPACTION-POOL:1] 2010-08-17 05:05:08,505 CompactionManager.java 
(line 246) Compacting []
 INFO [COMPACTION-POOL:1] 2010-08-17 06:05:08,494 CompactionManager.java 
(line 246) Compacting []

Also, 19 hours later, ring distribution is the same:

Address       Status     Load          Range                                   
  Ring
                                       170141183460469231731687303715884105728  
  
10.210.198.64 Up         30.28 GB      21267647932558653966460912964485513216  
  |<--|
10.210.157.187Up         28.12 GB      42535295865117307932921825928971026432  
  |   ^
10.206.34.194 Up         122.41 GB     63802943797675961899382738893456539648  
  v   |
10.254.107.178Up         33.89 GB      85070591730234615865843651857942052864  
  |   ^
10.254.234.226Up         28.01 GB      106338239662793269832304564822427566080 
  v   |
10.254.242.159Up         72.58 GB      127605887595351923798765477786913079296 
  |   ^
10.214.18.198 Up         83.41 GB      148873535527910577765226390751398592512 
  v   |
10.214.26.118 Up         62.01 GB      170141183460469231731687303715884105728 
  |-->|

So nodetool cleanup seems to have resulted in a fatal error on my overly (>4x)
bloated node.

Can anyone help me with understanding why this happened, taking into account 
the node should only contain 30GB of data on a 160GB hard drive?  I have the 8
system.log files from the 24 hour period (the nodes were only alive for 24
hours total).

Thank you!
Julie



Reply via email to