I have a CF on our cluster which has several rows with 200k+ columns of TimeUUID data. I have noticed recently that this CF is reaching my memtable thresholds (128M or 1.5 mill obj) far more frequently than I would expect (every 10 minutes or so). This CF is used as an index of items in another CF. So, all of the columns only have a single value, but there are lots of them. In the other CF, the rows all have about 10-15 columns, but there are millions of rows. I have reviewed our code several times and cannot see where we would be writing millions of columns to the index CF with this kind of frequency. Could this be caused by the replication of data between nodes? When one node has new data for a row, does it pass the entire row to the other nodes for replication or does it just pass the portion of the row that has changed? I have two nodes with a replication factor of 2. In the end, this is causing both of my servers to constantly work on compacting the files for the index CF.
Lee Parker