W dniu 21.04.2013 22:17, aaron morton pisze:
This is a tricky one to diagnose remotely. I could try using nodetool resetlocalschema on each node, it's just wild guess incase there is something odd one one node.
I've run it on one node (let's call it A) and it finished without any problems. Then I've run it on the second one (B) and I ended up with my production keyspace missing. I can see entries in system.schema_column*, data are - obviously - on disk, but I can't use it and node reports 205 KB of data.
Shortly after this, node A went OOM (never happened before). After restart it reports different schema version at first (describe cluster in CLI), but it is fixed after a while. However, few minutes later it is reported as Down by other nodes and it doesn't see other nodes as well... After restart - the same.
Will try to investigate the problem with A first; For now B seems to be a candidate for removing/rejoining the ring and setting it up from scratch ;-)
M.