Benjamin and Jonathan,

It is not difficult to stack thousands of small SSTables.

In a heavy inserting (many client threads), the memtable flush (generate new
sstable) is frequent (e.g. one in 30s).

The compaction only run in a single thread and is CPU bound. Consider the
compactionManager is compacting 10 sstables (totally 600GB), and this
compaction took 10 hours. Then, during this compaction, 1200 new small
sstables are generated.

Question:
(1) Can we modify the compaction policy, to compact the smaller sstables
with high priority? even when there is/are larger-compaction runing.
(2) Can we implement multi-thread compaction?

Schubert

On Sun, Jul 18, 2010 at 3:34 PM, Schubert Zhang <zson...@gmail.com> wrote:

> Benjamin,
>
> It is not difficult to stack thousands of SSTables.
> In a heavy inserting (many client threads), the memtable flush (generate
> new sstable) is fren
>
>
> On Mon, Jun 14, 2010 at 2:03 AM, Benjamin Black <b...@b3k.us> wrote:
>
>> On Sat, Jun 12, 2010 at 7:46 PM, Anty <anty....@gmail.com> wrote:
>> > Hi:ALL
>> > I have 10 nodes cluster ,after inserting many records into the cluster,
>> i
>> > compact each node by nodetool compact.
>> > during the compaciton process ,something  wrong with one of the 10 nodes
>> ,
>> > when the size of the compacted temp file rech nearly 100GB( before
>> > compaction ,the size is ~240G)
>>
>> Compaction is not compression, it is merging of SSTables and tombstone
>> elimination.  If you are not doing many deletes or overwrites of
>> existing data, the compacted SSTable will be about the same size as
>> the total size of all the smaller SSTables that went into it.  It is
>> not clear to me how you ended up with 5000 SSTables (the *-data.db
>> files) of such small size if you have not disabled minor compactions.
>>
>> Can you post your storage-conf.xml someplace (pastie or
>> gist.github.com, for example)?
>>
>>
>> b
>>
>
>

Reply via email to