replication with large rows

Lee Parker Mon, 03 May 2010 12:41:08 -0700

I have a CF on our cluster which has several rows with 200k+ columns of
TimeUUID data.  I have noticed recently that this CF is reaching my memtable
thresholds (128M or 1.5 mill obj) far more frequently than I would expect
(every 10 minutes or so).  This CF is used as an index of items in another
CF.  So, all of the columns only have a single value, but there are lots of
them.  In the other CF, the rows all have about 10-15 columns, but there are
millions of rows.  I have reviewed our code several times and cannot see
where we would be writing millions of columns to the index CF with this kind
of frequency.  Could this be caused by the replication of data between
nodes?  When one node has new data for a row, does it pass the entire row to
the other nodes for replication or does it just pass the portion of the row
that has changed? I have two nodes with a replication factor of 2.  In the
end, this is causing both of my servers to constantly work on compacting the
files for the index CF.


Lee Parker

replication with large rows

Reply via email to