On Fri, Nov 12, 2010 at 3:19 PM, Chip Salzenberg <rev.c...@gmail.com> wrote: > After I rebooted my 0.7.0beta3+ cluster to increase threads (read=100 > write=200 ... they're beefy machines), and putting them under load again, I > find gossip reporting yoyo up-down-up-down status for the other nodes. > Anyone know what this is a symptom of, and/or how to avoid it?
It means "the system is too overloaded to process gossip data in a timely manner." Usually this means GC storming but that does not like the problem here. Swapping is a less frequent offender. Since you are seeing this after bumping to extremely high thread counts I would guess context switching might be a factor. What are tpstats? > I haven't > seen any particular symptoms other than the log messages; and I suppose I'm > also dropping replication MUTATEs which had been happening already, anyway. I don't see any WARN lines about that, did you elide them? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com