Thanks for the reply. I tried with "./nodetool cleanup -j 1". It's very useful 
because the command reduces the amount of free space needed while only one 
sstable is being processed at a time.
 
From: Alain RODRIGUEZ
Date: 2017-06-20 17:03
To: user
Subject: Re: Large temporary files generated during cleaning up
Hi Simon,

I know for sure that clean up (like compaction) need to copy the entire SSTable 
 (Data + index) excepted from the part being evicted by the cleanup. As 
SSTables are immutable, to manipulate (remove) data, cleanup like compaction 
need to copy the data we want to keep before removing the old SSTable. Given 
this it is understandable that you have tmp files taking almost as much space 
as the original once. That's why you can read in many places that it is good to 
have over 50 % of the disk space available at any moment.

That being said:

- I am not sure why you have one tmp + one tmplink file for each sstable (like 
'tmplink-lb-59517-big-Data.db' + 'tmp-lb-59517-big-Data.db'). There must be a 
reason I am not aware of (I don't believe it's a bug, it would be quite gross). 
Maybe someone else knows about this?

- If new SSTable size is almost equal (or equal) to the old SSTable size, it 
means that there were not much data to clean up. Remember that cleanup only 
deletes data outside of the token ranges owned by each node (primary + 
replicas). This only occurs when adding nodes or messing with the token ranges.

And do I have choice to do cleaning with less disk space?

I would say yes, as it seems you are using C* 2.1+, there is something you 
could try to make the parallel process of cleanup to be sequential instead (as 
it was in pre C*2.1) by doing this:

'nodetool cleanup -j 1' 

More informations: 
http://cassandra.apache.org/doc/latest/tools/nodetool/cleanup.html

It could reduce the amount of free space needed as only one cleanup would run 
at any moment, meaning only 1 SSTable will be processed at the time. In these 
conditions, the maximum amount of extra disk space used by this process would 
be related to the biggest existing SSTable.

The other thing I would explore is the reason why Cassandra is maintaining 
'tmplink-lb-59517-big-Data.db' and 'tmp-lb-59517-big-Data.db'. I didn't cleanup 
for a while and I don't know your Cassandra version which makes it hard to 
investigate properly.

C*heers,
-----------------------
Alain Rodriguez - @arodream - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2017-06-20 6:51 GMT+01:00 wxn...@zjqunshuo.com <wxn...@zjqunshuo.com>:
Hi,
Cleaning up is generating temporary files which are occupying large disk space. 
I noticed for every source sstable file, it is generating 4 temporary files, 
and two of them is almost as large as the source sstable file. If there are two 
concurrent cleaning tasks running, I have to leave the remaining disk space at 
least as two times large as the sum size of the two sstable files being cleaned 
up.
Is it expected? And do I have choice to do cleaning with less disk space?

Below is the temporary files generated during cleaning up:
-rw-r--r-- 2 root root 798M Jun 20 13:34 tmplink-lb-59516-big-Index.db
-rw-r--r-- 2 root root 798M Jun 20 13:34 tmp-lb-59516-big-Index.db
-rw-r--r-- 2 root root 219G Jun 20 13:34 tmplink-lb-59516-big-Data.db
-rw-r--r-- 2 root root 219G Jun 20 13:34 tmp-lb-59516-big-Data.db
-rw-r--r-- 2 root root 978M Jun 20 13:33 tmplink-lb-59517-big-Index.db
-rw-r--r-- 2 root root 978M Jun 20 13:33 tmp-lb-59517-big-Index.db
-rw-r--r-- 2 root root 245G Jun 20 13:34 tmplink-lb-59517-big-Data.db
-rw-r--r-- 2 root root 245G Jun 20 13:34 tmp-lb-59517-big-Data.db

Cheers,
-Simon

Reply via email to