I believe this is caused by two things (and sorry if I go into too
much details):

1) there is http://wiki.apache.org/cassandra/FAQ#i_deleted_what_gives.
That is, Cassandra has to wait GCGraceSeconds before really remove physically
deleted columns. And by default, this is 10 days. For "normal" column (not the
Expiring ones of the patch), this is mitigated by the fact that only a
marker that
the column has been deleted is kept. That is, if you have 1K columns
each holding
a blob of 50Mo and you deletes them, after the first compaction the blobs are
deleted but not the columns. So you end up with you 1K columns but they now are
small. Those column will really be deleted only during a compaction that occurs
GCGraceSeconds after the deletion. But for ExpiringColumn there is 2).

2) For expired columns, the value is not deleted until the whole
column is removed,
that is, even though the column doesn't show up in a request, nothing
gets deleted
before GCGraceSeconds after the column expiration. Theoretically, what is done
for deleted columns could be done for expiring columns, that is, when
a column is
expired, the value could be removed even though the column is kept as a marker.
However this is a bit technically tricky. The natural place to do such
thing would
be when the column is serialized to disk. But the size of the
serialized column has
to be known before the actual serialization during the row indexing.
So column that
expiring in the time between the indexing and its serialization would
screw us up.
There would be way to get around that but they are not without default
and since
until now I have been able to live with that, I've moved this
'optimisation' for later.

If you yourself cannot leave with that for now, feel free to let me now.

--
Sylvain

On Wed, Mar 17, 2010 at 10:28 PM, Weijun Li <weiju...@gmail.com> wrote:
> I'm testing the ExpiringColumn patch in 0.6-beta2, inserted 26GB data with
> TTL, after columns have expired I use get_slice to verify that no columns
> can be retrieved. When I run "nodetool compact" I think all data should be
> gone. But the problem is:
>
> 1) After the first nodetool-comact, Cassandra duplicate data files to
> data-377* and then nothing happened. Total files size become 52GB. Some 0
> bytes *.Compacted files got generated.
> 2) After the second nodetool-compact, Cassandra again generated data-378*.
> Now I got 77GB data file that contains no valid columns. (See the list at
> the end)
> 3) Now I decided to run nodetool-clean and it ended up with 50GB data files
> like:
> total 53717104
> -rw-rw-r-- 1 cassandra cassandra           0 Mar 17 17:25 data-378-Compacted
> -rw-rw-r-- 1 cassandra cassandra 25563592504 Mar 17 16:25 data-378-Data.db
> -rw-rw-r-- 1 cassandra cassandra    54326245 Mar 17 16:25 data-378-Filter.db
> -rw-rw-r-- 1 cassandra cassandra  1871937928 Mar 17 16:25 data-378-Index.db
> -rw-rw-r-- 1 cassandra cassandra 25563592504 Mar 17 17:25 data-379-Data.db
> -rw-rw-r-- 1 cassandra cassandra    27163165 Mar 17 17:25 data-379-Filter.db
> -rw-rw-r-- 1 cassandra cassandra  1871937928 Mar 17 17:25 data-379-Index.db
>
> Any idea about what's going on here? I guess cleanup will remove all columns
> and don't belong this node but compact will remove all deleted columns then
> merge small files into a big one. What exactly are the differences between
> cleanup and compact?
>
> -Weijun
>
> total 80615576
> -rw-rw-r-- 1 cassandra cassandra           0 Mar 17 15:27 data-327-Compacted
> -rw-rw-r-- 1 cassandra cassandra 21013367426 Mar 16 17:43 data-327-Data.db
> -rw-rw-r-- 1 cassandra cassandra    44660005 Mar 16 17:43 data-327-Filter.db
> -rw-rw-r-- 1 cassandra cassandra  1538760208 Mar 16 17:43 data-327-Index.db
> -rw-rw-r-- 1 cassandra cassandra           0 Mar 17 15:27 data-363-Compacted
> -rw-rw-r-- 1 cassandra cassandra  2767150915 Mar 16 17:46 data-363-Data.db
> -rw-rw-r-- 1 cassandra cassandra     5890885 Mar 16 17:46 data-363-Filter.db
> -rw-rw-r-- 1 cassandra cassandra   202590655 Mar 16 17:46 data-363-Index.db
> -rw-rw-r-- 1 cassandra cassandra           0 Mar 17 15:27 data-370-Compacted
> -rw-rw-r-- 1 cassandra cassandra  1383745492 Mar 16 17:47 data-370-Data.db
> -rw-rw-r-- 1 cassandra cassandra     2947045 Mar 16 17:47 data-370-Filter.db
> -rw-rw-r-- 1 cassandra cassandra   101350867 Mar 16 17:47 data-370-Index.db
> -rw-rw-r-- 1 cassandra cassandra           0 Mar 17 15:27 data-375-Compacted
> -rw-rw-r-- 1 cassandra cassandra   345870869 Mar 16 17:50 data-375-Data.db
> -rw-rw-r-- 1 cassandra cassandra      736405 Mar 16 17:50 data-375-Filter.db
> -rw-rw-r-- 1 cassandra cassandra    25315970 Mar 16 17:50 data-375-Index.db
> -rw-rw-r-- 1 cassandra cassandra           0 Mar 17 15:27 data-376-Compacted
> -rw-rw-r-- 1 cassandra cassandra    53457802 Mar 16 18:52 data-376-Data.db
> -rw-rw-r-- 1 cassandra cassandra      113853 Mar 16 18:52 data-376-Filter.db
> -rw-rw-r-- 1 cassandra cassandra     3920228 Mar 16 18:52 data-376-Index.db
> -rw-rw-r-- 1 cassandra cassandra           0 Mar 17 16:25 data-377-Compacted
> -rw-rw-r-- 1 cassandra cassandra 25563592504 Mar 17 15:27 data-377-Data.db
> -rw-rw-r-- 1 cassandra cassandra    54327685 Mar 17 15:27 data-377-Filter.db
> -rw-rw-r-- 1 cassandra cassandra  1871937928 Mar 17 15:27 data-377-Index.db
> -rw-rw-r-- 1 cassandra cassandra 25563592504 Mar 17 16:25 data-378-Data.db
> -rw-rw-r-- 1 cassandra cassandra    54326245 Mar 17 16:25 data-378-Filter.db
> -rw-rw-r-- 1 cassandra cassandra  1871937928 Mar 17 16:25 data-378-Index.db
>

Reply via email to