I can't seem to be able to recover a failed node on a database where i did many updates to the schema.

I have a small cluster with 2 nodes, around 1000 CF (I know it's a lot, but it can't be changed right now), and ReplicationFactor=2. I shut down a node and cleaned its data entirely, then tried to bring it back up. The node starts fetching schema updates from the live node, but the operation fails halfway with an OOME.
After some investigation, what I found is that:

- I have a lot of schema updates (there are 2067 rows in the system.Schema CF). - The live node loads migrations 1-1000, and sends them to the recovering node (Migration.getLocalMigrations()) - Soon afterwards, the live node checks the schema version on the recovering node and finds it has moved by a little - say it has applied the first 3 migrations. It then loads migrations 3-1003, and sends them to the node. - This process is repeated very quickly (sends migrations 6-1006, 9-1009, etc).

Analyzing the memory dump and the logs, it looks like each of these 1000 migration blocks are composed in a single message and sent to the OutboundTcpConnection queue. However, since the schema is big, the messages occupy a lot of space, and are built faster than the connection can send them. Therefore, they accumulate in OutboundTcpConnection.queue, until memory is completely filled.

Any suggestions? Can I change something to make this work, apart from reducing the number of CFs?

Flavio

Reply via email to