https://issues.apache.org/jira/browse/CASSANDRA-1040
On Thu, Apr 29, 2010 at 6:55 PM, Joost Ouwerkerk <jo...@openplaces.org> wrote: > Ok, I reproduced without mapred. Here is my recipe: > > On a single-node cassandra cluster with basic config (-Xmx:1G) > loop { > * insert 5,000 records in a single columnfamily with UUID keys and > random string values (between 1 and 1000 chars) in 5 different columns > spanning two different supercolumns > * delete all the data by iterating over the rows with > get_range_slices(ONE) and calling remove(QUORUM) on each row id > returned (path containing only columnfamily) > * count number of non-tombstone rows by iterating over the rows > with get_range_slices(ONE) and testing data. Break if not zero. > } > > Here's the flakey part: while this is running, call "bin/nodetool -h > localhost -p 8081 flush KeySpace" in the background every minute or > so. When the data hits some critical size, the loop will break. > Anyone care to try this at home? > > On Thu, Apr 29, 2010 at 12:51 AM, Jonathan Ellis <jbel...@gmail.com> wrote: >> Good! :) >> >> Can you reproduce w/o map/reduce, with raw get_range_slices? >> >> On Wed, Apr 28, 2010 at 3:56 PM, Joost Ouwerkerk <jo...@openplaces.org> >> wrote: >>> Yes! Reproduced on single-node cluster: >>> >>> 10/04/28 16:30:24 INFO mapred.JobClient: ROWS=274884 >>> 10/04/28 16:30:24 INFO mapred.JobClient: TOMBSTONES=951083 >>> >>> 10/04/28 16:42:49 INFO mapred.JobClient: ROWS=166580 >>> 10/04/28 16:42:49 INFO mapred.JobClient: TOMBSTONES=1059387 >>> >>> On Wed, Apr 28, 2010 at 10:43 AM, Jonathan Ellis <jbel...@gmail.com> wrote: >>>> It sounds like either there is a fairly obvious bug, or you're doing >>>> something wrong. :) >>>> >>>> Can you reproduce against a single node? >>>> >>>> On Tue, Apr 27, 2010 at 5:14 PM, Joost Ouwerkerk <jo...@openplaces.org> >>>> wrote: >>>>> Update: I ran a test whereby I deleted ALL the rows in a column >>>>> family, using a consistency level of ALL. To do this, I mapped the >>>>> ColumnFamily and called remove on each row id. There were 1.5 million >>>>> rows, so 1.5 million rows were deleted. >>>>> >>>>> I ran a counter job immediately after. This job maps the same column >>>>> family and tests if any data is returned. If not, it considers the >>>>> row a "tombstone". If yes, it considers the row not deleted. Below >>>>> are the hadoop counters for those jobs. Note the fluctuation in the >>>>> number of rows with data over time, and the increase in time to map >>>>> the column family after the destroy job. No other clients were >>>>> accessing cassandra during this time. >>>>> >>>>> I'm thoroughly confused. >>>>> >>>>> Count: started 13:02:30 EDT, finished 13:11:33 EDT (9 minutes 2 seconds): >>>>> ROWS: 1,542,479 >>>>> TOMBSTONES: 69 >>>>> >>>>> Destroy: started 16:48:45 EDT, finished 17:07:36 EDT (18 minutes 50 >>>>> seconds) >>>>> DESTROYED: 1,542,548 >>>>> >>>>> Count: started 17:15:42 EDT, finished 17:31:03 EDT (15 minutes 21 seconds) >>>>> ROWS 876,464 >>>>> TOMBSTONES 666,084 >>>>> >>>>> Count: started 17:31:32, finished 17:47:16 (15mins, 44 seconds) >>>>> ROWS 1,451,665 >>>>> TOMBSTONES 90,883 >>>>> >>>>> Count: started 17:52:34, finished 18:10:28 (17mins, 53 seconds) >>>>> ROWS 1,425,644 >>>>> TOMBSTONES 116,904 >>>>> >>>>> On Tue, Apr 27, 2010 at 5:37 PM, Joost Ouwerkerk <jo...@openplaces.org> >>>>> wrote: >>>>>> Clocks are in sync: >>>>>> >>>>>> cluster04:~/cassandra$ dsh -g development "date" >>>>>> Tue Apr 27 17:36:33 EDT 2010 >>>>>> Tue Apr 27 17:36:33 EDT 2010 >>>>>> Tue Apr 27 17:36:33 EDT 2010 >>>>>> Tue Apr 27 17:36:33 EDT 2010 >>>>>> Tue Apr 27 17:36:34 EDT 2010 >>>>>> Tue Apr 27 17:36:34 EDT 2010 >>>>>> Tue Apr 27 17:36:34 EDT 2010 >>>>>> Tue Apr 27 17:36:34 EDT 2010 >>>>>> Tue Apr 27 17:36:34 EDT 2010 >>>>>> Tue Apr 27 17:36:35 EDT 2010 >>>>>> Tue Apr 27 17:36:35 EDT 2010 >>>>>> Tue Apr 27 17:36:35 EDT 2010 >>>>>> >>>>>> On Tue, Apr 27, 2010 at 5:35 PM, Nathan McCall <n...@vervewireless.com> >>>>>> wrote: >>>>>>> Have you confirmed that your clocks are all synced in the cluster? >>>>>>> This may be the result of an unintentional read-repair occurring if >>>>>>> that were the case. >>>>>>> >>>>>>> -Nate >>>>>>> >>>>>>> On Tue, Apr 27, 2010 at 2:20 PM, Joost Ouwerkerk <jo...@openplaces.org> >>>>>>> wrote: >>>>>>>> Hmm... Even after deleting with cl.ALL, I'm getting data back for some >>>>>>>> rows after having deleted them. Which rows return data is >>>>>>>> inconsistent from one run of the job to the next. >>>>>>>> >>>>>>>> On Tue, Apr 27, 2010 at 1:44 PM, Joost Ouwerkerk >>>>>>>> <jo...@openplaces.org> wrote: >>>>>>>>> To check that rows are gone, I check that KeySlice.columns is empty. >>>>>>>>> And as >>>>>>>>> I mentioned, immediately after the delete job, this returns the >>>>>>>>> expected >>>>>>>>> number. >>>>>>>>> Unfortunately I reproduced with QUORUM this morning. No node >>>>>>>>> outages. I am >>>>>>>>> going to try ALL to see if that changes anything, but I am starting to >>>>>>>>> wonder if I'm doing something else wrong. >>>>>>>>> On Mon, Apr 26, 2010 at 9:45 PM, Jonathan Ellis <jbel...@gmail.com> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> How are you checking that the rows are gone? >>>>>>>>>> >>>>>>>>>> Are you experiencing node outages during this? >>>>>>>>>> >>>>>>>>>> DC_QUORUM is unfinished code right now, you should avoid using it. >>>>>>>>>> Can you reproduce with normal QUORUM? >>>>>>>>>> >>>>>>>>>> On Sat, Apr 24, 2010 at 12:23 PM, Joost Ouwerkerk >>>>>>>>>> <jo...@openplaces.org> >>>>>>>>>> wrote: >>>>>>>>>> > I'm having trouble deleting rows in Cassandra. After running a >>>>>>>>>> > job that >>>>>>>>>> > deletes hundreds of rows, I run another job that verifies that the >>>>>>>>>> > rows >>>>>>>>>> > are >>>>>>>>>> > gone. Both jobs run correctly. However, when I run the >>>>>>>>>> > verification >>>>>>>>>> > job an >>>>>>>>>> > hour later, the rows have re-appeared. This is not a case of >>>>>>>>>> > "ghosting" >>>>>>>>>> > because the verification job actually checks that there is data in >>>>>>>>>> > the >>>>>>>>>> > columns. >>>>>>>>>> > >>>>>>>>>> > I am running a cluster with 12 nodes and a replication factor of >>>>>>>>>> > 3. I >>>>>>>>>> > am >>>>>>>>>> > using DC_QUORUM consistency when deleting. >>>>>>>>>> > >>>>>>>>>> > Any ideas? >>>>>>>>>> > Joost. >>>>>>>>>> > >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Jonathan Ellis >>>>>>>>>> Project Chair, Apache Cassandra >>>>>>>>>> co-founder of Riptano, the source for professional Cassandra support >>>>>>>>>> http://riptano.com >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Jonathan Ellis >>>> Project Chair, Apache Cassandra >>>> co-founder of Riptano, the source for professional Cassandra support >>>> http://riptano.com >>>> >>> >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of Riptano, the source for professional Cassandra support >> http://riptano.com >> > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com