Great, thank you. Do you have a hypothesis about where things might be going wrong? Let me know what I can do to help.
On Fri, Apr 30, 2010 at 9:33 AM, Jonathan Ellis <jbel...@gmail.com> wrote: > https://issues.apache.org/jira/browse/CASSANDRA-1040 > > On Thu, Apr 29, 2010 at 6:55 PM, Joost Ouwerkerk <jo...@openplaces.org> wrote: >> Ok, I reproduced without mapred. Here is my recipe: >> >> On a single-node cassandra cluster with basic config (-Xmx:1G) >> loop { >> * insert 5,000 records in a single columnfamily with UUID keys and >> random string values (between 1 and 1000 chars) in 5 different columns >> spanning two different supercolumns >> * delete all the data by iterating over the rows with >> get_range_slices(ONE) and calling remove(QUORUM) on each row id >> returned (path containing only columnfamily) >> * count number of non-tombstone rows by iterating over the rows >> with get_range_slices(ONE) and testing data. Break if not zero. >> } >> >> Here's the flakey part: while this is running, call "bin/nodetool -h >> localhost -p 8081 flush KeySpace" in the background every minute or >> so. When the data hits some critical size, the loop will break. >> Anyone care to try this at home? >> >> On Thu, Apr 29, 2010 at 12:51 AM, Jonathan Ellis <jbel...@gmail.com> wrote: >>> Good! :) >>> >>> Can you reproduce w/o map/reduce, with raw get_range_slices? >>> >>> On Wed, Apr 28, 2010 at 3:56 PM, Joost Ouwerkerk <jo...@openplaces.org> >>> wrote: >>>> Yes! Reproduced on single-node cluster: >>>> >>>> 10/04/28 16:30:24 INFO mapred.JobClient: ROWS=274884 >>>> 10/04/28 16:30:24 INFO mapred.JobClient: TOMBSTONES=951083 >>>> >>>> 10/04/28 16:42:49 INFO mapred.JobClient: ROWS=166580 >>>> 10/04/28 16:42:49 INFO mapred.JobClient: TOMBSTONES=1059387 >>>> >>>> On Wed, Apr 28, 2010 at 10:43 AM, Jonathan Ellis <jbel...@gmail.com> wrote: >>>>> It sounds like either there is a fairly obvious bug, or you're doing >>>>> something wrong. :) >>>>> >>>>> Can you reproduce against a single node? >>>>> >>>>> On Tue, Apr 27, 2010 at 5:14 PM, Joost Ouwerkerk <jo...@openplaces.org> >>>>> wrote: >>>>>> Update: I ran a test whereby I deleted ALL the rows in a column >>>>>> family, using a consistency level of ALL. To do this, I mapped the >>>>>> ColumnFamily and called remove on each row id. There were 1.5 million >>>>>> rows, so 1.5 million rows were deleted. >>>>>> >>>>>> I ran a counter job immediately after. This job maps the same column >>>>>> family and tests if any data is returned. If not, it considers the >>>>>> row a "tombstone". If yes, it considers the row not deleted. Below >>>>>> are the hadoop counters for those jobs. Note the fluctuation in the >>>>>> number of rows with data over time, and the increase in time to map >>>>>> the column family after the destroy job. No other clients were >>>>>> accessing cassandra during this time. >>>>>> >>>>>> I'm thoroughly confused. >>>>>> >>>>>> Count: started 13:02:30 EDT, finished 13:11:33 EDT (9 minutes 2 seconds): >>>>>> ROWS: 1,542,479 >>>>>> TOMBSTONES: 69 >>>>>> >>>>>> Destroy: started 16:48:45 EDT, finished 17:07:36 EDT (18 minutes 50 >>>>>> seconds) >>>>>> DESTROYED: 1,542,548 >>>>>> >>>>>> Count: started 17:15:42 EDT, finished 17:31:03 EDT (15 minutes 21 >>>>>> seconds) >>>>>> ROWS 876,464 >>>>>> TOMBSTONES 666,084 >>>>>> >>>>>> Count: started 17:31:32, finished 17:47:16 (15mins, 44 seconds) >>>>>> ROWS 1,451,665 >>>>>> TOMBSTONES 90,883 >>>>>> >>>>>> Count: started 17:52:34, finished 18:10:28 (17mins, 53 seconds) >>>>>> ROWS 1,425,644 >>>>>> TOMBSTONES 116,904 >>>>>> >>>>>> On Tue, Apr 27, 2010 at 5:37 PM, Joost Ouwerkerk <jo...@openplaces.org> >>>>>> wrote: >>>>>>> Clocks are in sync: >>>>>>> >>>>>>> cluster04:~/cassandra$ dsh -g development "date" >>>>>>> Tue Apr 27 17:36:33 EDT 2010 >>>>>>> Tue Apr 27 17:36:33 EDT 2010 >>>>>>> Tue Apr 27 17:36:33 EDT 2010 >>>>>>> Tue Apr 27 17:36:33 EDT 2010 >>>>>>> Tue Apr 27 17:36:34 EDT 2010 >>>>>>> Tue Apr 27 17:36:34 EDT 2010 >>>>>>> Tue Apr 27 17:36:34 EDT 2010 >>>>>>> Tue Apr 27 17:36:34 EDT 2010 >>>>>>> Tue Apr 27 17:36:34 EDT 2010 >>>>>>> Tue Apr 27 17:36:35 EDT 2010 >>>>>>> Tue Apr 27 17:36:35 EDT 2010 >>>>>>> Tue Apr 27 17:36:35 EDT 2010 >>>>>>> >>>>>>> On Tue, Apr 27, 2010 at 5:35 PM, Nathan McCall <n...@vervewireless.com> >>>>>>> wrote: >>>>>>>> Have you confirmed that your clocks are all synced in the cluster? >>>>>>>> This may be the result of an unintentional read-repair occurring if >>>>>>>> that were the case. >>>>>>>> >>>>>>>> -Nate >>>>>>>> >>>>>>>> On Tue, Apr 27, 2010 at 2:20 PM, Joost Ouwerkerk >>>>>>>> <jo...@openplaces.org> wrote: >>>>>>>>> Hmm... Even after deleting with cl.ALL, I'm getting data back for some >>>>>>>>> rows after having deleted them. Which rows return data is >>>>>>>>> inconsistent from one run of the job to the next. >>>>>>>>> >>>>>>>>> On Tue, Apr 27, 2010 at 1:44 PM, Joost Ouwerkerk >>>>>>>>> <jo...@openplaces.org> wrote: >>>>>>>>>> To check that rows are gone, I check that KeySlice.columns is empty. >>>>>>>>>> And as >>>>>>>>>> I mentioned, immediately after the delete job, this returns the >>>>>>>>>> expected >>>>>>>>>> number. >>>>>>>>>> Unfortunately I reproduced with QUORUM this morning. No node >>>>>>>>>> outages. I am >>>>>>>>>> going to try ALL to see if that changes anything, but I am starting >>>>>>>>>> to >>>>>>>>>> wonder if I'm doing something else wrong. >>>>>>>>>> On Mon, Apr 26, 2010 at 9:45 PM, Jonathan Ellis <jbel...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> How are you checking that the rows are gone? >>>>>>>>>>> >>>>>>>>>>> Are you experiencing node outages during this? >>>>>>>>>>> >>>>>>>>>>> DC_QUORUM is unfinished code right now, you should avoid using it. >>>>>>>>>>> Can you reproduce with normal QUORUM? >>>>>>>>>>> >>>>>>>>>>> On Sat, Apr 24, 2010 at 12:23 PM, Joost Ouwerkerk >>>>>>>>>>> <jo...@openplaces.org> >>>>>>>>>>> wrote: >>>>>>>>>>> > I'm having trouble deleting rows in Cassandra. After running a >>>>>>>>>>> > job that >>>>>>>>>>> > deletes hundreds of rows, I run another job that verifies that >>>>>>>>>>> > the rows >>>>>>>>>>> > are >>>>>>>>>>> > gone. Both jobs run correctly. However, when I run the >>>>>>>>>>> > verification >>>>>>>>>>> > job an >>>>>>>>>>> > hour later, the rows have re-appeared. This is not a case of >>>>>>>>>>> > "ghosting" >>>>>>>>>>> > because the verification job actually checks that there is data >>>>>>>>>>> > in the >>>>>>>>>>> > columns. >>>>>>>>>>> > >>>>>>>>>>> > I am running a cluster with 12 nodes and a replication factor of >>>>>>>>>>> > 3. I >>>>>>>>>>> > am >>>>>>>>>>> > using DC_QUORUM consistency when deleting. >>>>>>>>>>> > >>>>>>>>>>> > Any ideas? >>>>>>>>>>> > Joost. >>>>>>>>>>> > >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Jonathan Ellis >>>>>>>>>>> Project Chair, Apache Cassandra >>>>>>>>>>> co-founder of Riptano, the source for professional Cassandra support >>>>>>>>>>> http://riptano.com >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Jonathan Ellis >>>>> Project Chair, Apache Cassandra >>>>> co-founder of Riptano, the source for professional Cassandra support >>>>> http://riptano.com >>>>> >>>> >>> >>> >>> >>> -- >>> Jonathan Ellis >>> Project Chair, Apache Cassandra >>> co-founder of Riptano, the source for professional Cassandra support >>> http://riptano.com >>> >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com >