Rob Coli <rcoli <at> digg.com> writes:
> As I understand Julie's case, she is :
> a) initializing her cluster
> b) inserting some number of unique keys with CL.ALL
> c) noticing that more disk space (6x?) than is expected is used
> d) but that she gets expected usage if she does a major compaction
> In other words, the problem isn't "temporary disk space occupied during
> the compact", it's permanent disk space occupied unless she compacts.
>
> Julie : when compaction occurs, it logs the number of bytes that it
> started with and the number it ended with, as well as the number of keys
> involved in the compaction. What do these messages say?
>
> example line :
> INFO [COMPACTION-POOL:1] 2010-08-06 13:48:00,328 CompactionManager.java
> (line 398) Compacted to /path/to/MyColumnFamily-26-Data.db.
> 999999999/888888888 bytes for 12345678 keys. Time: 123456ms.
Rob -
I reran the original test: 8 nodes in the cluster (160GB drives on each
node). Populated each node with 30GB of data using unique keys and CL.ALL
and repFactor=3. (Wrote 10GB of data to each node but with the repFactor=3,
it results in about 30GB of data on each node.)
One hour after the last write, the ring distribution looks excellent:
Address Status Load Range
Ring
170141183460469231731687303715884105728
10.210.198.64 Up 28.32 GB 21267647932558653966460912964485513216
|<--|
10.210.157.187Up 28.07 GB 42535295865117307932921825928971026432
| ^
10.206.34.194 Up 28.12 GB 63802943797675961899382738893456539648
v |
10.254.107.178Up 28.15 GB 85070591730234615865843651857942052864
| ^
10.254.234.226Up 28.02 GB 106338239662793269832304564822427566080
v |
10.254.242.159Up 27.96 GB 127605887595351923798765477786913079296
| ^
10.214.18.198 Up 28.18 GB 148873535527910577765226390751398592512
v |
10.214.26.118 Up 29.82 GB 170141183460469231731687303715884105728
|-->|
But 8 hours after the last write (absolutely no activity), things don't
look as good:
Address Status Load Range
Ring
170141183460469231731687303715884105728
10.210.198.64 Up 30.28 GB 21267647932558653966460912964485513216
|<--|
10.210.157.187Up 28.12 GB 42535295865117307932921825928971026432
| ^
10.206.34.194 Up 122.41 GB 63802943797675961899382738893456539648
v |
10.254.107.178Up 33.89 GB 85070591730234615865843651857942052864
| ^
10.254.234.226Up 28.01 GB 106338239662793269832304564822427566080
v |
10.254.242.159Up 72.58 GB 127605887595351923798765477786913079296
| ^
10.214.18.198 Up 83.41 GB 148873535527910577765226390751398592512
v |
10.214.26.118 Up 62.01 GB 170141183460469231731687303715884105728
|-->|
The 122.41GB node is named ec2-server3. Here's what cfstats reports:
On ec2-server3:
Write Latency: 0.21970486121446028 ms.
Pending Tasks: 0
Column Family: Standard1
SSTable count: 9
Space used (live): 131438293207
Space used (total): 143577216419
Memtable Columns Count: 0
Memtable Data Size: 0
Memtable Switch Count: 454
Read Count: 0
Read Latency: NaN ms.
Write Count: 302373
Write Latency: 0.220 ms.
Pending Tasks: 0
Key cache capacity: 200000
Key cache size: 0
Key cache hit rate: NaN
Row cache: disabled
Compacted row minimum size: 100316
Compacted row maximum size: 100323
Compacted row mean size: 100322
On ec2-server3, df reports:
/dev/sdb 153899044 140784388 5297032 97% /mnt
So this node should (I would think) contain 30GB of data on a 160GB hard
drive but somehow it has grown to 122 GB of data (live space) plus several
compacted files that have not yet been deleted (total space). Keep in
mind that no deletions or updates have taken place, just unique key writes.
At this point, here are the compaction lines from the ec2-server3 system.log
file, taking place after all writes had completed:
INFO [COMPACTION-POOL:1] 2010-08-16 14:11:58,614 CompactionManager.java (line
326) Compacted to /var/lib/cassandra/data/Keyspace1/Standard1-465-Data.db.
8595448738/8595448738 bytes for 85678 keys. Time: 984905ms.
INFO [COMPACTION-POOL:1] 2010-08-16 14:20:02,825 CompactionManager.java (line
326) Compacted to /var/lib/cassandra/data/Keyspace1/Standard1-466-Data.db.
2157339354/2157339354 bytes for 21504 keys. Time: 484188ms.
INFO [COMPACTION-POOL:1] 2010-08-16 14:28:37,066 CompactionManager.java (line
326) Compacted to /var/lib/cassandra/data/Keyspace1/Standard1-468-Data.db.
2157339329/2157339329 bytes for 21504 keys. Time: 514226ms.
INFO [COMPACTION-POOL:1] 2010-08-16 14:36:04,806 CompactionManager.java (line
326) Compacted to /var/lib/cassandra/data/Keyspace1/Standard1-469-Data.db.
2157339417/2157339417 bytes for 21504 keys. Time: 447720ms.
INFO [COMPACTION-POOL:1] 2010-08-16 14:40:31,234 CompactionManager.java (line
326) Compacted to /var/lib/cassandra/data/Keyspace1/Standard1-470-Data.db.
2157339202/2157339202 bytes for 21504 keys. Time: 266418ms.
INFO [COMPACTION-POOL:1] 2010-08-16 14:58:48,943 CompactionManager.java (line
326) Compacted to /var/lib/cassandra/data/Keyspace1/Standard1-471-Data.db.
8629357302/8629357302 bytes for 86016 keys. Time: 1097659ms.
INFO [COMPACTION-POOL:1] 2010-08-16 15:02:43,428 CompactionManager.java (line
326) Compacted to /var/lib/cassandra/data/Keyspace1/Standard1-473-Data.db.
2157338687/2157338687 bytes for 21504 keys. Time: 234464ms.
INFO [COMPACTION-POOL:1] 2010-08-16 15:05:08,807 CompactionManager.java (line
326) Compacted to /var/lib/cassandra/data/Keyspace1/Standard1-474-Data.db.
1345327705/1345327705 bytes for 13410 keys. Time: 145363ms.
Here’s what’s in the data directory on this node:
-rw-r--r-- 1 root root 9518412981 Aug 16 13:36 Standard1-247-Data.db
-rw-r--r-- 1 root root 8595448738 Aug 16 14:11 Standard1-465-Data.db
-rw-r--r-- 1 root root 22968417153 Aug 16 14:34 Standard1-467-Data.db
-rw-r--r-- 1 root root 8629357302 Aug 16 14:58 Standard1-471-Data.db
-rw-r--r-- 1 root root 2157338687 Aug 16 15:02 Standard1-473-Data.db
-rw-r--r-- 1 root root 1345327705 Aug 16 15:05 Standard1-474-Data.db
-rw-r--r-- 1 root root 28995571589 Aug 16 15:38 Standard1-475-Data.db
-rw-r--r-- 1 root root 30063580839 Aug 16 17:26 Standard1-476-Data.db
-rw-r--r-- 1 root root 19091100731 Aug 16 17:24 Standard1-477-Data.db
Plus a bunch of compacted files.
At this point, I performed a manual "nodetool cleanup" on the super large
node when *very bad things* happened. Here's an excerpt from the large
node's log file upon issuing the cleanup:
INFO [COMPACTION-POOL:1] 2010-08-16 23:04:04,230 CompactionManager.java (line
345) AntiCompacting [org.apache.cassandra.io.SSTableReader(
path='/var/lib/cassandra/data/Keyspace1/Standard1-474-Data.db'),
org.apache.cassandra.io.SSTableReader(
path='/var/lib/cassandra/data/Keyspace1/Standard1-471-Data.db'),
org.apache.cassandra.io.SSTableReader(
path='/var/lib/cassandra/data/Keyspace1/Standard1-473-Data.db'),
org.apache.cassandra.io.SSTableReader(
path='/var/lib/cassandra/data/Keyspace1/Standard1-477-Data.db'),
org.apache.cassandra.io.SSTableReader(
path='/var/lib/cassandra/data/Keyspace1/Standard1-475-Data.db'),
org.apache.cassandra.io.SSTableReader(
path='/var/lib/cassandra/data/Keyspace1/Standard1-467-Data.db'),
org.apache.cassandra.io.SSTableReader(
path='/var/lib/cassandra/data/Keyspace1/Standard1-247-Data.db'),
org.apache.cassandra.io.SSTableReader(
path='/var/lib/cassandra/data/Keyspace1/Standard1-476-Data.db'),
org.apache.cassandra.io.SSTableReader(
path='/var/lib/cassandra/data/Keyspace1/Standard1-465-Data.db')]
INFO [COMPACTION-POOL:1] 2010-08-16 23:04:04,230 StorageService.java
(line 1499) requesting GC to free disk space
INFO [GC inspection] 2010-08-16 23:04:04,470 GCInspector.java (line 110)
GC for ConcurrentMarkSweep: 233 ms, 299442152 reclaimed leaving 20577560
used; max is 1172766720
INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:14,641
SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra
/data/Keyspace1/Standard1-419-Data.db
INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:17,974
SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra
/data/Keyspace1/Standard1-466-Data.db
INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:18,100
SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra
/data/Keyspace1/Standard1-428-Data.db
INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:18,225
SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra
/data/Keyspace1/Standard1-415-Data.db
INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:18,374
SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra
/data/Keyspace1/Standard1-441-Data.db
INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:18,514
SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra
/data/Keyspace1/Standard1-461-Data.db
INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:18,623
SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra
/data/Keyspace1/Standard1-442-Data.db
INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:18,740
SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra
/data/Keyspace1/Standard1-425-Data.db
INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:18,891
SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra
/data/Keyspace1/Standard1-447-Data.db
…
INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:25,331
SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra
/data/Keyspace1/Standard1-451-Data.db
INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:25,423
SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra
/data/Keyspace1/Standard1-450-Data.db
ERROR [COMPACTION-POOL:1] 2010-08-16 23:04:25,525
DebuggableThreadPoolExecutor.java (line 94) Error in executor futuretask
java.util.concurrent.ExecutionException:
java.lang.UnsupportedOperationException: disk full
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.after
Execute (DebuggableThreadPoolExecutor.java:86)
at
org.apache.cassandra.db.CompactionManager$CompactionExecutor.after
Execute (CompactionManager.java:582)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask
(ThreadPoolExecutor.java:888)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.UnsupportedOperationException: disk full
at
org.apache.cassandra.db.CompactionManager.doAntiCompaction
(CompactionManager.java:351)
at
org.apache.cassandra.db.CompactionManager.doCleanupCompaction
(CompactionManager.java:417)
at
org.apache.cassandra.db.CompactionManager.access$400
(CompactionManager.java:49)
at
org.apache.cassandra.db.CompactionManager$2.call(CompactionManager.java:130)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask
(ThreadPoolExecutor.java:886)
... 2 more
INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:25,558
SSTableDeletingReference.java (line 104)
Deleted /var/lib/cassandra/data/Keyspace1/Standard1-449-Data.db
…
INFO [SSTABLE-CLEANUP-TIMER] 2010-08-16 23:04:33,265
SSTableDeletingReference.java (line 104) Deleted /var/lib/cassandra
/data/Keyspace1/Standard1-453-Data.db
INFO [COMPACTION-POOL:1] 2010-08-16 23:05:08,656 CompactionManager.java
(line 246) Compacting []
The next morning (19 hours later):
INFO [COMPACTION-POOL:1] 2010-08-16 23:05:08,656 CompactionManager.java
(line 246) Compacting []
INFO [COMPACTION-POOL:1] 2010-08-17 00:05:08,637 CompactionManager.java
(line 246) Compacting []
INFO [COMPACTION-POOL:1] 2010-08-17 01:05:08,607 CompactionManager.java
(line 246) Compacting []
INFO [COMPACTION-POOL:1] 2010-08-17 02:05:08,581 CompactionManager.java
(line 246) Compacting []
INFO [COMPACTION-POOL:1] 2010-08-17 03:05:08,568 CompactionManager.java
(line 246) Compacting []
INFO [COMPACTION-POOL:1] 2010-08-17 04:05:08,532 CompactionManager.java
(line 246) Compacting []
INFO [COMPACTION-POOL:1] 2010-08-17 05:05:08,505 CompactionManager.java
(line 246) Compacting []
INFO [COMPACTION-POOL:1] 2010-08-17 06:05:08,494 CompactionManager.java
(line 246) Compacting []
Also, 19 hours later, ring distribution is the same:
Address Status Load Range
Ring
170141183460469231731687303715884105728
10.210.198.64 Up 30.28 GB 21267647932558653966460912964485513216
|<--|
10.210.157.187Up 28.12 GB 42535295865117307932921825928971026432
| ^
10.206.34.194 Up 122.41 GB 63802943797675961899382738893456539648
v |
10.254.107.178Up 33.89 GB 85070591730234615865843651857942052864
| ^
10.254.234.226Up 28.01 GB 106338239662793269832304564822427566080
v |
10.254.242.159Up 72.58 GB 127605887595351923798765477786913079296
| ^
10.214.18.198 Up 83.41 GB 148873535527910577765226390751398592512
v |
10.214.26.118 Up 62.01 GB 170141183460469231731687303715884105728
|-->|
So nodetool cleanup seems to have resulted in a fatal error on my overly (>4x)
bloated node.
Can anyone help me with understanding why this happened, taking into account
the node should only contain 30GB of data on a 160GB hard drive? I have the 8
system.log files from the 24 hour period (the nodes were only alive for 24
hours total).
Thank you!
Julie