Version 0.7.3. Yes, I am talking about minor compactions. I have three nodes, RF=3. 3G data (before replication). Not many users (yet). It seems like 3 nodes should be plenty. But when all 3 nodes are compacting, I sometimes get timeouts on the client, and I see in my logs that each one is full of notifications that the other nodes have died (and come back to life after about a second). My cluster can tolerate one node being out of commission, so I would rather have longer compactions one at a time than shorter compactions all at the same time.
I think that our usage pattern of bursty writes causes the three nodes to decide to compact at the same time. These bursts are followed by periods of relative quiet, so there should be time for the other two nodes to compact one at a time. On Mon, Jun 6, 2011 at 3:27 PM, David Boxenhorn <da...@citypath.com> wrote: > > Version 0.7.3. > > Yes, I am talking about minor compactions. I have three nodes, RF=3. 3G data > (before replication). Not many users (yet). It seems like 3 nodes should be > plenty. But when all 3 nodes are compacting, I sometimes get timeouts on the > client, and I see in my logs that each one is full of notifications that the > other nodes have died (and come back to life after about a second). My > cluster can tolerate one node being out of commission, so I would rather have > longer compactions one at a time than shorter compactions all at the same > time. > > I think that our usage pattern of bursty writes causes the three nodes to > decide to compact at the same time. These bursts are followed by periods of > relative quiet, so there should be time for the other two nodes to compact > one at a time. > > > On Mon, Jun 6, 2011 at 2:36 PM, aaron morton <aa...@thelastpickle.com> wrote: >> >> Are you talking about minor (automatic) compactions ? Can you provide some >> more information on what's happening to make the node unusable and what >> version you are using? It's not lightweight process, but it should not hurt >> the node that badly. It is considered an online operation. >> >> Delaying compaction will only make it run for longer and take more resources. >> >> Cheers >> >> ----------------- >> Aaron Morton >> Freelance Cassandra Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 6 Jun 2011, at 20:14, David Boxenhorn wrote: >> >> > Is there some deep architectural reason why compaction can't be >> > replication-aware? >> > >> > What I mean is, if one node is doing compaction, its replicas >> > shouldn't be doing compaction at the same time. Or, at least a quorum >> > of nodes should be available at all times. >> > >> > For example, if RF=3, and one node is doing compaction, the nodes to >> > its right and left in the ring should wait on compaction until that >> > node is done. >> > >> > Of course, my real problem is that compaction makes a node pretty much >> > unavailable. If we can fix that problem then this is not necessary. >> >