Indexed columns cause read before write so that the index can be updated
if the column already exists.
On 11/09/2011 02:46 PM, Oleg Tsernetsov wrote:
When monitoring JMX metrics of cassandra 0.8.7 loaded by write-only
test I observe significant read activity on column family where I
write to. It seems strange to me, but I expected no read activity on
write-only load. The read activity is caused by writes, as when I stop
the write test, reads activity disappears. The test performs parallel
column writes to a single row, writing the values of fixed column set
over and over again. Furthermore, the second problem is that parallel
massive reads of such row degrade over time (even without parallel
write load) and cassandra starts burning 100% of CPU with read latency
degrading x20 times comparing with exactly the same row created from
scratch. The test setup is 3 cassandra nodes, read/write consistency =
Quorum. Row has 10 and above columns (tested with 10, 100, 1000, 10000
cols), the higher is the number of columns, the worse is observed
degradation. Column family has 2 indexed columns that are written with
exactly the same values on each and every write. Row key, column name
and column value are all Utf8Type. Column family compaction on all the
nodes does not help, and the row remains "degraded". Read here means
one of: read all the the columns with slice query without bounds/with
bounds; executing column count query for a row with bounds/without
bounds. I use Hector as cassandra client. I would be thankful if
anyone could explain the read activity on write load and give any
hints on row read degradation after massive write load on that row.
Regards,
Oleg