Re: Cassandra reverting deletes?

Joost Ouwerkerk Fri, 30 Apr 2010 07:06:32 -0700

Great, thank you.  Do you have a hypothesis about where things might
be going wrong?  Let me know what I can do to help.


On Fri, Apr 30, 2010 at 9:33 AM, Jonathan Ellis <jbel...@gmail.com> wrote:
> https://issues.apache.org/jira/browse/CASSANDRA-1040
>
> On Thu, Apr 29, 2010 at 6:55 PM, Joost Ouwerkerk <jo...@openplaces.org> wrote:
>> Ok, I reproduced without mapred.  Here is my recipe:
>>
>> On a single-node cassandra cluster with basic config (-Xmx:1G)
>> loop {
>>   * insert 5,000 records in a single columnfamily with UUID keys and
>> random string values (between 1 and 1000 chars) in 5 different columns
>> spanning two different supercolumns
>>   * delete all the data by iterating over the rows with
>> get_range_slices(ONE) and calling remove(QUORUM) on each row id
>> returned (path containing only columnfamily)
>>   * count number of non-tombstone rows by iterating over the rows
>> with get_range_slices(ONE) and testing data.  Break if not zero.
>> }
>>
>> Here's the flakey part:  while this is running, call "bin/nodetool -h
>> localhost -p 8081 flush KeySpace" in the background every minute or
>> so.  When the data hits some critical size, the loop will break.
>> Anyone care to try this at home?
>>
>> On Thu, Apr 29, 2010 at 12:51 AM, Jonathan Ellis <jbel...@gmail.com> wrote:
>>> Good! :)
>>>
>>> Can you reproduce w/o map/reduce, with raw get_range_slices?
>>>
>>> On Wed, Apr 28, 2010 at 3:56 PM, Joost Ouwerkerk <jo...@openplaces.org> 
>>> wrote:
>>>> Yes! Reproduced on single-node cluster:
>>>>
>>>> 10/04/28 16:30:24 INFO mapred.JobClient:     ROWS=274884
>>>> 10/04/28 16:30:24 INFO mapred.JobClient:     TOMBSTONES=951083
>>>>
>>>> 10/04/28 16:42:49 INFO mapred.JobClient:     ROWS=166580
>>>> 10/04/28 16:42:49 INFO mapred.JobClient:     TOMBSTONES=1059387
>>>>
>>>> On Wed, Apr 28, 2010 at 10:43 AM, Jonathan Ellis <jbel...@gmail.com> wrote:
>>>>> It sounds like either there is a fairly obvious bug, or you're doing
>>>>> something wrong. :)
>>>>>
>>>>> Can you reproduce against a single node?
>>>>>
>>>>> On Tue, Apr 27, 2010 at 5:14 PM, Joost Ouwerkerk <jo...@openplaces.org> 
>>>>> wrote:
>>>>>> Update: I ran a test whereby I deleted ALL the rows in a column
>>>>>> family, using a consistency level of ALL.  To do this, I mapped the
>>>>>> ColumnFamily and called remove on each row id.  There were 1.5 million
>>>>>> rows, so 1.5 million rows were deleted.
>>>>>>
>>>>>> I ran a counter job immediately after.  This job maps the same column
>>>>>> family and tests if any data is returned.  If not, it considers the
>>>>>> row a "tombstone".  If yes, it considers the row not deleted.  Below
>>>>>> are the hadoop counters for those jobs.  Note the fluctuation in the
>>>>>> number of rows with data over time, and the increase in time to map
>>>>>> the column family after the destroy job.  No other clients were
>>>>>> accessing cassandra during this time.
>>>>>>
>>>>>> I'm thoroughly confused.
>>>>>>
>>>>>> Count: started 13:02:30 EDT, finished 13:11:33 EDT (9 minutes 2 seconds):
>>>>>>   ROWS:        1,542,479
>>>>>>   TOMBSTONES:  69
>>>>>>
>>>>>> Destroy: started 16:48:45 EDT, finished 17:07:36 EDT (18 minutes 50 
>>>>>> seconds)
>>>>>>   DESTROYED:  1,542,548
>>>>>>
>>>>>> Count: started 17:15:42 EDT, finished 17:31:03 EDT (15 minutes 21 
>>>>>> seconds)
>>>>>>   ROWS 876,464
>>>>>>   TOMBSTONES   666,084
>>>>>>
>>>>>> Count: started 17:31:32, finished 17:47:16 (15mins, 44 seconds)
>>>>>>   ROWS 1,451,665
>>>>>>   TOMBSTONES   90,883
>>>>>>
>>>>>> Count: started 17:52:34, finished 18:10:28 (17mins, 53 seconds)
>>>>>>   ROWS 1,425,644
>>>>>>   TOMBSTONES   116,904
>>>>>>
>>>>>> On Tue, Apr 27, 2010 at 5:37 PM, Joost Ouwerkerk <jo...@openplaces.org> 
>>>>>> wrote:
>>>>>>> Clocks are in sync:
>>>>>>>
>>>>>>> cluster04:~/cassandra$ dsh -g development "date"
>>>>>>> Tue Apr 27 17:36:33 EDT 2010
>>>>>>> Tue Apr 27 17:36:33 EDT 2010
>>>>>>> Tue Apr 27 17:36:33 EDT 2010
>>>>>>> Tue Apr 27 17:36:33 EDT 2010
>>>>>>> Tue Apr 27 17:36:34 EDT 2010
>>>>>>> Tue Apr 27 17:36:34 EDT 2010
>>>>>>> Tue Apr 27 17:36:34 EDT 2010
>>>>>>> Tue Apr 27 17:36:34 EDT 2010
>>>>>>> Tue Apr 27 17:36:34 EDT 2010
>>>>>>> Tue Apr 27 17:36:35 EDT 2010
>>>>>>> Tue Apr 27 17:36:35 EDT 2010
>>>>>>> Tue Apr 27 17:36:35 EDT 2010
>>>>>>>
>>>>>>> On Tue, Apr 27, 2010 at 5:35 PM, Nathan McCall <n...@vervewireless.com> 
>>>>>>> wrote:
>>>>>>>> Have you confirmed that your clocks are all synced in the cluster?
>>>>>>>> This may be the result of an unintentional read-repair occurring if
>>>>>>>> that were the case.
>>>>>>>>
>>>>>>>> -Nate
>>>>>>>>
>>>>>>>> On Tue, Apr 27, 2010 at 2:20 PM, Joost Ouwerkerk 
>>>>>>>> <jo...@openplaces.org> wrote:
>>>>>>>>> Hmm... Even after deleting with cl.ALL, I'm getting data back for some
>>>>>>>>> rows after having deleted them.  Which rows return data is
>>>>>>>>> inconsistent from one run of the job to the next.
>>>>>>>>>
>>>>>>>>> On Tue, Apr 27, 2010 at 1:44 PM, Joost Ouwerkerk 
>>>>>>>>> <jo...@openplaces.org> wrote:
>>>>>>>>>> To check that rows are gone, I check that KeySlice.columns is empty. 
>>>>>>>>>>  And as
>>>>>>>>>> I mentioned, immediately after the delete job, this returns the 
>>>>>>>>>> expected
>>>>>>>>>> number.
>>>>>>>>>> Unfortunately I reproduced with QUORUM this morning.  No node 
>>>>>>>>>> outages.  I am
>>>>>>>>>> going to try ALL to see if that changes anything, but I am starting 
>>>>>>>>>> to
>>>>>>>>>> wonder if I'm doing something else wrong.
>>>>>>>>>> On Mon, Apr 26, 2010 at 9:45 PM, Jonathan Ellis <jbel...@gmail.com> 
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> How are you checking that the rows are gone?
>>>>>>>>>>>
>>>>>>>>>>> Are you experiencing node outages during this?
>>>>>>>>>>>
>>>>>>>>>>> DC_QUORUM is unfinished code right now, you should avoid using it.
>>>>>>>>>>> Can you reproduce with normal QUORUM?
>>>>>>>>>>>
>>>>>>>>>>> On Sat, Apr 24, 2010 at 12:23 PM, Joost Ouwerkerk 
>>>>>>>>>>> <jo...@openplaces.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>> > I'm having trouble deleting rows in Cassandra.  After running a 
>>>>>>>>>>> > job that
>>>>>>>>>>> > deletes hundreds of rows, I run another job that verifies that 
>>>>>>>>>>> > the rows
>>>>>>>>>>> > are
>>>>>>>>>>> > gone.  Both jobs run correctly.  However, when I run the 
>>>>>>>>>>> > verification
>>>>>>>>>>> > job an
>>>>>>>>>>> > hour later, the rows have re-appeared.  This is not a case of 
>>>>>>>>>>> > "ghosting"
>>>>>>>>>>> > because the verification job actually checks that there is data 
>>>>>>>>>>> > in the
>>>>>>>>>>> > columns.
>>>>>>>>>>> >
>>>>>>>>>>> > I am running a cluster with 12 nodes and a replication factor of 
>>>>>>>>>>> > 3.  I
>>>>>>>>>>> > am
>>>>>>>>>>> > using DC_QUORUM consistency when deleting.
>>>>>>>>>>> >
>>>>>>>>>>> > Any ideas?
>>>>>>>>>>> > Joost.
>>>>>>>>>>> >
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Jonathan Ellis
>>>>>>>>>>> Project Chair, Apache Cassandra
>>>>>>>>>>> co-founder of Riptano, the source for professional Cassandra support
>>>>>>>>>>> http://riptano.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jonathan Ellis
>>>>> Project Chair, Apache Cassandra
>>>>> co-founder of Riptano, the source for professional Cassandra support
>>>>> http://riptano.com
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of Riptano, the source for professional Cassandra support
>>> http://riptano.com
>>>
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Re: Cassandra reverting deletes?

Reply via email to