Re: Cass returns Incorrect column data on writes during flushing

aaron morton Tue, 29 Jan 2013 00:21:37 -0800

> Ie. Query for a single column works but the column does not appear in slice 
> queries depending on the other columns in the query
> 
> cfq.getKey("foo").getColumn("A") returns "A"
> cfq.getKey("foo").withColumnSlice("A", "B") returns "B" only
> cfq.getKey("foo").withColumnSlice("A","B","C") returns "A","B" and "C"
Can you replicate this using cassandra-cli or CQL ? 
Makes it clearer what's happening and removes any potential issues with the 
client or your code.
If you cannot repo it show you astynax code.
 
Cheers


-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 29/01/2013, at 1:15 PM, Elden Bishop <ebis...@exacttarget.com> wrote:

> I'm trying to track down some really worrying behavior. It appears that 
> writing multiple columns while a table flush is occurring can result in 
> Cassandra recording its data in a way that makes columns visible only to some 
> queries but not others.
> 
> Ie. Query for a single column works but the column does not appear in slice 
> queries depending on the other columns in the query
> 
> cfq.getKey("foo").getColumn("A") returns "A"
> cfq.getKey("foo").withColumnSlice("A", "B") returns "B" only
> cfq.getKey("foo").withColumnSlice("A","B","C") returns "A","B" and "C"
> 
> This is a permanent condition meaning that even hours later with no reads or 
> writes the DB will return the same results. I can reproduce this 100% of the 
> time by writing multiple columns and then reading a different set of multiple 
> columns. Columns written during the flush may or may not appear.
> 
> Details
> 
> # There are no log errors
> # All single column queries return correct data.
> # Slice queries may or may not return the column depending on which other 
> columns are in the query.
> # This is on a stock "unzip and run" installation of Cassandra using default 
> options only; basically doing the cassandra getting started tutorial and 
> using the Demo table described in that tutorial.
> # Cassandra 1.2.0 using Astynax and Java 1.6.0_37.
> # There are no errors but there is always a "flushing high traffic column 
> family" that happens right before the incoherent state occurs
> # to reproduce just update multiple columns at the same time, using random 
> rows and then verify the writes by reading multiple columns. I get can 
> generate the error on 100% of runs. Once the state is screwed up, the multi 
> column read will not contain the column but the single column read will.
> 
> Log snippet
>  INFO 15:47:49,066 GC for ParNew: 320 ms for 1 collections, 207199992 used; 
> max is 1052770304
>  INFO 15:47:58,076 GC for ParNew: 330 ms for 1 collections, 232839680 used; 
> max is 1052770304
>  INFO 15:48:00,374 flushing high-traffic column family CFS(Keyspace='BUGS', 
> ColumnFamily='Test') (estimated 50416978 bytes)
>  INFO 15:48:00,374 Enqueuing flush of 
> Memtable-Test@1575891161(4529586/50416978 serialized/live bytes, 279197 ops)
>  INFO 15:48:00,378 Writing Memtable-Test@1575891161(4529586/50416978 
> serialized/live bytes, 279197 ops)
>  INFO 15:48:01,142 GC for ParNew: 654 ms for 1 collections, 239478568 used; 
> max is 1052770304
>  INFO 15:48:01,474 Completed flushing 
> /var/lib/cassandra/data/BUGS/Test/BUGS-Test-ia-45-Data.db (4580066 bytes) for 
> commitlog position ReplayPosition(segmentId=1359415964165, position=7462737)
> 
> 
> Any ideas on what could be going on? I could not find anything like this in 
> the open bugs and the only workaround seems to be never doing multi-column 
> reads or writes. I'm concerned that the DB can get into a state where 
> different queries can return such inconsistent results. All with no warning 
> or errors. There is no way to even verify data correctness; every column can 
> seem correct when queried and then disappear during slice queries depending 
> on the other columns in the query.
> 
> 
> Thanks

Re: Cass returns Incorrect column data on writes during flushing

Reply via email to