The commit log replay is single threaded, so if you have a ton of overwrites in a whole lot of commit log (like you would with a queue pattern) it might be backing up.
The only real work around to this right now would be to turn off durable writes to the queue schema. The following has some details in the context of changes to make commit log replay multi-threaded for the 2.1 release: https://issues.apache.org/jira/browse/CASSANDRA-3578 I also recommend poking around the process a bit via jstack and jvmtop when this is happening just to make sure commitlog is what is holding it up. On Tue, Mar 4, 2014 at 2:34 PM, Charlie Mason <charlie....@gmail.com> wrote: > Hi All, > > I have single node cluster I use for development on my local machine. > After apt package upgrades and hard reboots the node takes a very long time > to restart. > > The node will always eventually come back up however it takes ages > sometimes. It seems to be CPU bound as all 4 cores are maxed out by > Cassandra. The disk IO is relativity tiny (less than 1 MB/s) considering > its running on an SSD. > > At the logs start-up once took over 6 hours once. From a development point > of view its not the end of the world but should I suffer a Data Centre > outage in production this could massively delay the time to come back > on-line. > > I suspect the workload might be causing it. There's 16 gig of data > actually stored in it. However one of the tables holds a message queue. > Which may well have a few hundred thousand tombstones and up to 500Kb per > record. Is this likely to have an impact on start up time? Is there > anything I can do to mitigate it. The queries on this are fast because it > knows where to start so using the table is not an issue. > > Any other suggestions to look at? > > Thanks, > > Charlie M > -- ----------------- Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com