Thanks Julien. We ran repair. Increasing the RF should not make sstables obselete. I can understand reducing RF or adding new node etc can result in few obsolete sstables which eventually go away after you run cleanup.
On Wed, Dec 18, 2013 at 1:49 AM, Julien Campan <[email protected]>wrote: > Hi, > When you are increasing the RF, you need to perform repair for the > keyspace on each node.(Because datas are not automaticaly streamed). > After that you should perform a cleanup on each node to remove obsolete > sstable. > > > Good luck :) > > Julien Campan. > > > > > > > > > > 2013/12/18 Aaron Morton <[email protected]> > >> -tmp- files will sit in the data dir, if there was an error creating them >> during compaction or flushing to disk they will sit around until a restart. >> >> Check the logs for errors to see if compaction was failing on something. >> >> Cheers >> >> ----------------- >> Aaron Morton >> New Zealand >> @aaronmorton >> >> Co-Founder & Principal Consultant >> Apache Cassandra Consulting >> http://www.thelastpickle.com >> >> On 17/12/2013, at 12:28 pm, Narendra Sharma <[email protected]> >> wrote: >> >> No snapshots. >> >> I restarted the node and now the Load in ring is in sync with the disk >> usage. Not sure what caused it to go out of sync. However, the Live SStable >> count doesn't match exactly with the number of data files on disk. >> >> I am going through the Cassandra code to understand what could be the >> reason for the mismatch in the sstable count and also why there is no >> reference of some of the data files in system.log. >> >> >> >> >> On Mon, Dec 16, 2013 at 2:45 PM, Arindam Barua <[email protected]>wrote: >> >>> >>> >>> Do you have any snapshots on the nodes where you are seeing this issue? >>> >>> Snapshots will link to sstables which will cause them not be deleted. >>> >>> >>> >>> -Arindam >>> >>> >>> >>> *From:* Narendra Sharma [mailto:[email protected]] >>> *Sent:* Sunday, December 15, 2013 1:15 PM >>> *To:* [email protected] >>> *Subject:* Cassandra 1.1.6 - Disk usage and Load displayed in ring >>> doesn't match >>> >>> >>> >>> We have 8 node cluster. Replication factor is 3. >>> >>> >>> >>> For some of the nodes the Disk usage (du -ksh .) in the data directory >>> for CF doesn't match the Load reported in nodetool ring command. When we >>> expanded the cluster from 4 node to 8 nodes (4 weeks back), everything was >>> okay. Over period of last 2-3 weeks the disk usage has gone up. We >>> increased the RF from 2 to 3 2 weeks ago. >>> >>> >>> >>> I am not sure if increasing the RF is causing this issue. >>> >>> >>> >>> For one of the nodes that I analyzed: >>> >>> 1. nodetool ring reported load as 575.38 GB >>> >>> >>> >>> 2. nodetool cfstats for the CF reported: >>> >>> SSTable count: 28 >>> >>> Space used (live): 572671381955 >>> >>> Space used (total): 572671381955 >>> >>> >>> >>> >>> >>> 3. 'ls -1 *Data* | wc -l' in the data folder for CF returned >>> >>> 46 >>> >>> >>> >>> 4. 'du -ksh .' in the data folder for CF returned >>> >>> 720G >>> >>> >>> >>> The above numbers indicate that there are some sstables that are >>> obsolete and are still occupying space on disk. What could be wrong? Will >>> restarting the node help? The cassandra process is running for last 45 days >>> with no downtime. However, because the disk usage is high, we are not able >>> to run full compaction. >>> >>> >>> >>> Also, I can't find reference to each of the sstables on disk in the >>> system.log file. For eg I have one data file on disk as (ls -lth): >>> >>> 86G Nov 20 06:14 >>> >>> >>> >>> I have system.log file with first line: >>> >>> INFO [main] 2013-11-18 09:41:56,120 AbstractCassandraDaemon.java (line >>> 101) Logging initialized >>> >>> >>> >>> The 86G file must be a result of some compaction. I see no reference of >>> data file in system.log file between 11/18 to 11/25. What could be the >>> reason for that? The only reference is dated 11/29 when the file was being >>> streamed to another node (new node). >>> >>> >>> >>> How can I identify the obsolete files and remove them? I am thinking >>> about following. Let me know if it make sense. >>> >>> 1. Restart the node and check the state. >>> >>> 2. Move the oldest data files to another location (to another mount >>> point) >>> >>> 3. Restart the node again >>> >>> 4. Run repair on the node so that it can get the missing data from its >>> peers. >>> >>> >>> >>> >>> >>> I compared the numbers of a healthy node for the same CF: >>> >>> 1. nodetool ring reported load as 662.95 GB >>> >>> >>> >>> 2. nodetool cfstats for the CF reported: >>> >>> SSTable count: 16 >>> >>> Space used (live): 670524321067 >>> >>> Space used (total): 670524321067 >>> >>> >>> >>> 3. 'ls -1 *Data* | wc -l' in the data folder for CF returned >>> >>> 16 >>> >>> >>> >>> 4. 'du -ksh .' in the data folder for CF returned >>> >>> 625G >>> >>> >>> >>> >>> >>> -Naren >>> >>> >>> >>> >>> >>> >>> -- >>> Narendra Sharma >>> >>> Software Engineer >>> >>> *http://www.aeris.com <http://www.aeris.com/>* >>> >>> *http://narendrasharma.blogspot.com/ >>> <http://narendrasharma.blogspot.com/>* >>> >>> >>> >> >> >> >> -- >> Narendra Sharma >> Software Engineer >> *http://www.aeris.com <http://www.aeris.com/>* >> *http://narendrasharma.blogspot.com/ >> <http://narendrasharma.blogspot.com/>* >> >> >> > -- Narendra Sharma Software Engineer *http://www.aeris.com <http://www.aeris.com>* *http://narendrasharma.blogspot.com/ <http://narendrasharma.blogspot.com/>*
