I'm attempting to plot how "busy" the node is doing compactions but there seems to only be a few metrics reported that might be suitable: CompletedTasks, PendingTasks, TotalBytesCompacted, TotalCompactionsCompleted.
It's not clear to me what the difference between CompletedTasks and TotalCompactionsCompleted is but I am plotting TotalCompactionsCompleted / sec as one metric; however, this rate is nearly always less than 1 and doesn't capture how much resources are used doing the compaction. A compaction of 4 smallest SSTables counts the same as a compaction of 4 largest SSTables but the cost is hugely different. Thus, I'm also plotting TotalBytesCompacted / sec. Since the TotalBytesCompacted value sometimes moves backwards I'm not confident that it's reporting what it is meant to report. The code and comments indicate that it should only be incremented by the final size of the newly created SSTable or by the bytes-compacted-so-far for a larger compaction, so I don't see why it should be reasonable for it to sometimes decrease. How should the impact of compaction be measured if not by bytes compacted? -Bryan On Sun, Oct 7, 2012 at 7:39 AM, Edward Capriolo <edlinuxg...@gmail.com>wrote: > I have not looked at this JMX object in a while, however the > compaction manager can support multiple threads. Also it moves from > 0-filesize each time it has to compact a set of files. > > That is more useful for showing current progress rather then lifetime > history. > > > > On Fri, Oct 5, 2012 at 7:27 PM, Bryan Talbot <btal...@aeriagames.com> > wrote: > > I've recently added compaction rate (in bytes / second) to my monitors > for > > cassandra and am seeing some odd values. I wasn't expecting the values > for > > TotalBytesCompacted to sometimes decrease from one reading to the next. > It > > seems that the value should be monotonically increasing while a server is > > running -- obviously it would start again at 0 when the server is > restarted > > or if the counter rolls over (unlikely for a 64 bit long). > > > > Below are two samples taken 60 seconds apart: the value decreased by > > 2,954,369,012 between the two readings. > > > > reported_metric=[timestamp:1349476449, status:200, > > request:[mbean:org.apache.cassandra.db:type=CompactionManager, > > attribute:TotalBytesCompacted, type:read], value:7548675470069] > > > > previous_metric=[timestamp:1349476389, status:200, > > request:[mbean:org.apache.cassandra.db:type=CompactionManager, > > attribute:TotalBytesCompacted, type:read], value:7551629839081] > > > > > > I briefly looked at the code for CompactionManager and a few related > classes > > and don't see anyplace that is performing subtraction explicitly; > however, > > there are many additions of signed long values that are not validated and > > could conceivably contain a negative value thus causing the > > totalBytesCompacted to decrease. It's interesting to note that the all > of > > the differences I've seen so far are more than the overflow value of a > > signed 32 bit value. The OS (CentOS 5.7) and sun java vm (1.6.0_29) are > > both 64 bit. JNA is enabled. > > > > Is this expected and normal? If so, what is the correct interpretation > of > > this metric? I'm seeing the negatives values a few times per hour when > > reading it once every 60 seconds. > > > > -Bryan > > > -- Bryan Talbot Architect / Platform team lead, Aeria Games and Entertainment Silicon Valley | Berlin | Tokyo | Sao Paulo