> Ie. Query for a single column works but the column does not appear in slice > queries depending on the other columns in the query > > cfq.getKey("foo").getColumn("A") returns "A" > cfq.getKey("foo").withColumnSlice("A", "B") returns "B" only > cfq.getKey("foo").withColumnSlice("A","B","C") returns "A","B" and "C" Can you replicate this using cassandra-cli or CQL ? Makes it clearer what's happening and removes any potential issues with the client or your code. If you cannot repo it show you astynax code. Cheers
----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 29/01/2013, at 1:15 PM, Elden Bishop <ebis...@exacttarget.com> wrote: > I'm trying to track down some really worrying behavior. It appears that > writing multiple columns while a table flush is occurring can result in > Cassandra recording its data in a way that makes columns visible only to some > queries but not others. > > Ie. Query for a single column works but the column does not appear in slice > queries depending on the other columns in the query > > cfq.getKey("foo").getColumn("A") returns "A" > cfq.getKey("foo").withColumnSlice("A", "B") returns "B" only > cfq.getKey("foo").withColumnSlice("A","B","C") returns "A","B" and "C" > > This is a permanent condition meaning that even hours later with no reads or > writes the DB will return the same results. I can reproduce this 100% of the > time by writing multiple columns and then reading a different set of multiple > columns. Columns written during the flush may or may not appear. > > Details > > # There are no log errors > # All single column queries return correct data. > # Slice queries may or may not return the column depending on which other > columns are in the query. > # This is on a stock "unzip and run" installation of Cassandra using default > options only; basically doing the cassandra getting started tutorial and > using the Demo table described in that tutorial. > # Cassandra 1.2.0 using Astynax and Java 1.6.0_37. > # There are no errors but there is always a "flushing high traffic column > family" that happens right before the incoherent state occurs > # to reproduce just update multiple columns at the same time, using random > rows and then verify the writes by reading multiple columns. I get can > generate the error on 100% of runs. Once the state is screwed up, the multi > column read will not contain the column but the single column read will. > > Log snippet > INFO 15:47:49,066 GC for ParNew: 320 ms for 1 collections, 207199992 used; > max is 1052770304 > INFO 15:47:58,076 GC for ParNew: 330 ms for 1 collections, 232839680 used; > max is 1052770304 > INFO 15:48:00,374 flushing high-traffic column family CFS(Keyspace='BUGS', > ColumnFamily='Test') (estimated 50416978 bytes) > INFO 15:48:00,374 Enqueuing flush of > Memtable-Test@1575891161(4529586/50416978 serialized/live bytes, 279197 ops) > INFO 15:48:00,378 Writing Memtable-Test@1575891161(4529586/50416978 > serialized/live bytes, 279197 ops) > INFO 15:48:01,142 GC for ParNew: 654 ms for 1 collections, 239478568 used; > max is 1052770304 > INFO 15:48:01,474 Completed flushing > /var/lib/cassandra/data/BUGS/Test/BUGS-Test-ia-45-Data.db (4580066 bytes) for > commitlog position ReplayPosition(segmentId=1359415964165, position=7462737) > > > Any ideas on what could be going on? I could not find anything like this in > the open bugs and the only workaround seems to be never doing multi-column > reads or writes. I'm concerned that the DB can get into a state where > different queries can return such inconsistent results. All with no warning > or errors. There is no way to even verify data correctness; every column can > seem correct when queried and then disappear during slice queries depending > on the other columns in the query. > > > Thanks