Re: Cass returns Incorrect column data on writes during flushing

aaron morton Wed, 30 Jan 2013 16:50:27 -0800

The looks bug like, can you create a ticket on 
https://issues.apache.org/jira/browse/CASSANDRA


Please include the C* version, the table and insert statements, and if you can 
repo is using CQL 3. 

Thanks
Aaron

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 30/01/2013, at 8:10 AM, Elden Bishop <ebis...@exacttarget.com> wrote:

> Sure thing, Here is a console dump showing the error. Notice that column 
> '9801' is NOT NULL on the first two queries but IS NULL on the last query. I 
> get this behavior constantly on any writes that coincide with a flush. The 
> column is always readable by itself but disappears depending on the other 
> columns being queried.
> 
> $
> $ bin/cqlsh –2
> cqlsh>
> cqlsh> SELECT '9801' FROM BUGS.Test WHERE KEY='a';
> 
>  9801
> ---------------------
>  0.02271159951509616
> 
> cqlsh> SELECT '9801','6814' FROM BUGS.Test WHERE KEY='a';
> 
>  9801                | 6814
> ---------------------+--------------------
>  0.02271159951509616 | 0.6612351709326891
> 
> cqlsh> SELECT '9801','6814','3333' FROM BUGS.Test WHERE KEY='a';
> 
>  9801 | 6814               | 3333
> ------+--------------------+--------------------
>  null | 0.6612351709326891 | 0.8921380283891902
> 
> cqlsh> exit;
> $
> $
> 
> From: aaron morton <aa...@thelastpickle.com>
> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
> Date: Tuesday, January 29, 2013 12:21 AM
> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
> Subject: Re: Cass returns Incorrect column data on writes during flushing
> 
>> Ie. Query for a single column works but the column does not appear in slice 
>> queries depending on the other columns in the query
>> 
>> cfq.getKey("foo").getColumn("A") returns "A"
>> cfq.getKey("foo").withColumnSlice("A", "B") returns "B" only
>> cfq.getKey("foo").withColumnSlice("A","B","C") returns "A","B" and "C"
> Can you replicate this using cassandra-cli or CQL ? 
> Makes it clearer what's happening and removes any potential issues with the 
> client or your code.
> If you cannot repo it show you astynax code.
>  
> Cheers
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 29/01/2013, at 1:15 PM, Elden Bishop <ebis...@exacttarget.com> wrote:
> 
>> I'm trying to track down some really worrying behavior. It appears that 
>> writing multiple columns while a table flush is occurring can result in 
>> Cassandra recording its data in a way that makes columns visible only to 
>> some queries but not others.
>> 
>> Ie. Query for a single column works but the column does not appear in slice 
>> queries depending on the other columns in the query
>> 
>> cfq.getKey("foo").getColumn("A") returns "A"
>> cfq.getKey("foo").withColumnSlice("A", "B") returns "B" only
>> cfq.getKey("foo").withColumnSlice("A","B","C") returns "A","B" and "C"
>> 
>> This is a permanent condition meaning that even hours later with no reads or 
>> writes the DB will return the same results. I can reproduce this 100% of the 
>> time by writing multiple columns and then reading a different set of 
>> multiple columns. Columns written during the flush may or may not appear.
>> 
>> Details
>> 
>> # There are no log errors
>> # All single column queries return correct data.
>> # Slice queries may or may not return the column depending on which other 
>> columns are in the query.
>> # This is on a stock "unzip and run" installation of Cassandra using default 
>> options only; basically doing the cassandra getting started tutorial and 
>> using the Demo table described in that tutorial.
>> # Cassandra 1.2.0 using Astynax and Java 1.6.0_37.
>> # There are no errors but there is always a "flushing high traffic column 
>> family" that happens right before the incoherent state occurs
>> # to reproduce just update multiple columns at the same time, using random 
>> rows and then verify the writes by reading multiple columns. I get can 
>> generate the error on 100% of runs. Once the state is screwed up, the multi 
>> column read will not contain the column but the single column read will.
>> 
>> Log snippet
>>  INFO 15:47:49,066 GC for ParNew: 320 ms for 1 collections, 207199992 used; 
>> max is 1052770304
>>  INFO 15:47:58,076 GC for ParNew: 330 ms for 1 collections, 232839680 used; 
>> max is 1052770304
>>  INFO 15:48:00,374 flushing high-traffic column family CFS(Keyspace='BUGS', 
>> ColumnFamily='Test') (estimated 50416978 bytes)
>>  INFO 15:48:00,374 Enqueuing flush of 
>> Memtable-Test@1575891161(4529586/50416978 serialized/live bytes, 279197 ops)
>>  INFO 15:48:00,378 Writing Memtable-Test@1575891161(4529586/50416978 
>> serialized/live bytes, 279197 ops)
>>  INFO 15:48:01,142 GC for ParNew: 654 ms for 1 collections, 239478568 used; 
>> max is 1052770304
>>  INFO 15:48:01,474 Completed flushing 
>> /var/lib/cassandra/data/BUGS/Test/BUGS-Test-ia-45-Data.db (4580066 bytes) 
>> for commitlog position ReplayPosition(segmentId=1359415964165, 
>> position=7462737)
>> 
>> 
>> Any ideas on what could be going on? I could not find anything like this in 
>> the open bugs and the only workaround seems to be never doing multi-column 
>> reads or writes. I'm concerned that the DB can get into a state where 
>> different queries can return such inconsistent results. All with no warning 
>> or errors. There is no way to even verify data correctness; every column can 
>> seem correct when queried and then disappear during slice queries depending 
>> on the other columns in the query.
>> 
>> 
>> Thanks
>

Re: Cass returns Incorrect column data on writes during flushing

Reply via email to