Hi, we are running a 3 node cassandra (0.7.6-2) cluster and some of our column families contain quite large rows (400k+ columns, 4-6GB row size). Replicaton factor is 3 for all keyspaces. The cluster is running fine for several months now and we never experienced any serious trouble.
Some days ago we noticed, that some previously written columns could not be read. This does not always happen, and only some dozen columns out of 400k are affected. After ruling out application logic as a cause I dumped the row in question with sstable2json and the columns are there (and are not marked for deletion). Next thing was setting up a fresh single node cluster and copying the column family data to that node. Columns could not be read either. Right now I'm running a nodetool compact for the cf to see if data could be read afterwards. Is there any explanation for such behavior? Are there any suggestions for further investigation? TIA, Thomas