I've only got one cf, and haven't changed the default flush expiry period. I'm 
not sure the node had fully started or not. I had to restart my data insertion 
(for other reasons), so I can check the system log upon restart when the data 
is finished inserting.

 Do you know off-hand how long the default flush expiry period is?

 Cheers,
 Steve

-----Original Message-----
From: sc...@scode.org [mailto:sc...@scode.org] On Behalf Of Peter Schuller
Sent: Tuesday, June 21, 2011 9:13 AM
To: user@cassandra.apache.org
Subject: Re: CommitLog replay

> I’ve got a single node deployment of 0.8 set up on my windows box. When I
> insert a bunch of data into it, the commitlogs directory doesn’t clear upon
> completion (should it?).

It is expected that commit logs are retained for a while, and that
there is reply going on when restarting a node. The main way to ensure
that a smaller amount of commit log is active at any given moment, is
to ensure that all column familes are flushed sufficiently often. This
is because when column families are flushed, they are no longer
necessitating the retention of the commit logs that contain the writes
that were just flushed.

Pay attention to whether you maybe have some cf:s that are written
very rarely and won't flush until the flush expiry period.

> As a result, when I stop and restart Cassandra it
> replays all the commitlogs, then starts compacting (which seems like it’s
> taking a long time). While it’s compacting it won’t talk to my test client.

That it starts compacting is expected if the data flushed as a result
of the commit log reply triggers compactions. However, compaction does
not imply that the node refuses to talk to clients.

Are you sure the node has fully started? it should log when it starts
up the thrift interface - check system.log.

-- 
/ Peter Schuller

Reply via email to