Re: Increasing replication factor and repair doesn't seem to work

Bryan Cheng Tue, 24 May 2016 17:01:54 -0700

Hi Luke,

I've never found nodetool status' load to be useful beyond a general
indicator.


You should expect some small skew, as this will depend on your current
compaction status, tombstones, etc. IIRC repair will not provide
consistency of intermediate states nor will it remove tombstones, it only
guarantees consistency in the final state. This means, in the case of
dropped hints or mutations, you will see differences in intermediate
states, and therefore storage footrpint, even in fully repaired nodes. This
includes intermediate UPDATE operations as well.

Your one node with sub 1GB sticks out like a sore thumb, though. Where did
you originate the nodetool repair from? Remember that repair will only
ensure consistency for ranges held by the node you're running it on. While
I am not sure if missing ranges are included in this, if you ran nodetool
repair only on a machine with partial ownership, you will need to complete
repairs across the ring before data will return to full consistency.

I would query some older data using consistency = ONE on the affected
machine to determine if you are actually missing data.  There are a few
outstanding bugs in the 2.1.x  and older release families that may result
in tombstone creation even without deletes, for example CASSANDRA-10547,
which impacts updates on collections in pre-2.1.13 Cassandra.

You can also try examining the output of nodetool ring, which will give you
a breakdown of tokens and their associations within your cluster.

--Bryan

On Tue, May 24, 2016 at 3:49 PM, kurt Greaves <k...@instaclustr.com> wrote:

> Not necessarily considering RF is 2 so both nodes should have all
> partitions. Luke, are you sure the repair is succeeding? You don't have
> other keyspaces/duplicate data/extra data in your cassandra data directory?
> Also, you could try querying on the node with less data to confirm if it
> has the same dataset.
>
> On 24 May 2016 at 22:03, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
>
>> For the other DC, it can be acceptable because partition reside on one
>> node, so say  if you have a large partition, it may skew things a bit.
>> On May 25, 2016 2:41 AM, "Luke Jolly" <l...@getadmiral.com> wrote:
>>
>>> So I guess the problem may have been with the initial addition of the
>>> 10.128.0.20 node because when I added it in it never synced data I
>>> guess?  It was at around 50 MB when it first came up and transitioned to
>>> "UN". After it was in I did the 1->2 replication change and tried repair
>>> but it didn't fix it.  From what I can tell all the data on it is stuff
>>> that has been written since it came up.  We never delete data ever so we
>>> should have zero tombstones.
>>>
>>> If I am not mistaken, only two of my nodes actually have all the data,
>>> 10.128.0.3 and 10.142.0.14 since they agree on the data amount. 10.142.0.13
>>> is almost a GB lower and then of course 10.128.0.20 which is missing
>>> over 5 GB of data.  I tried running nodetool -local on both DCs and it
>>> didn't fix either one.
>>>
>>> Am I running into a bug of some kind?
>>>
>>> On Tue, May 24, 2016 at 4:06 PM Bhuvan Rawal <bhu1ra...@gmail.com>
>>> wrote:
>>>
>>>> Hi Luke,
>>>>
>>>> You mentioned that replication factor was increased from 1 to 2. In
>>>> that case was the node bearing ip 10.128.0.20 carried around 3GB data
>>>> earlier?
>>>>
>>>> You can run nodetool repair with option -local to initiate repair local
>>>> datacenter for gce-us-central1.
>>>>
>>>> Also you may suspect that if a lot of data was deleted while the node
>>>> was down it may be having a lot of tombstones which is not needed to be
>>>> replicated to the other node. In order to verify the same, you can issue a
>>>> select count(*) query on column families (With the amount of data you have
>>>> it should not be an issue) with tracing on and with consistency local_all
>>>> by connecting to either 10.128.0.3  or 10.128.0.20 and store it in a
>>>> file. It will give you a fair amount of idea about how many deleted cells
>>>> the nodes have. I tried searching for reference if tombstones are moved
>>>> around during repair, but I didnt find evidence of it. However I see no
>>>> reason to because if the node didnt have data then streaming tombstones
>>>> does not make a lot of sense.
>>>>
>>>> Regards,
>>>> Bhuvan
>>>>
>>>> On Tue, May 24, 2016 at 11:06 PM, Luke Jolly <l...@getadmiral.com>
>>>> wrote:
>>>>
>>>>> Here's my setup:
>>>>>
>>>>> Datacenter: gce-us-central1
>>>>> ===========================
>>>>> Status=Up/Down
>>>>> |/ State=Normal/Leaving/Joining/Moving
>>>>> --  Address      Load       Tokens       Owns (effective)  Host ID
>>>>>                           Rack
>>>>> UN  10.128.0.3   6.4 GB     256          100.0%
>>>>>  3317a3de-9113-48e2-9a85-bbf756d7a4a6  default
>>>>> UN  10.128.0.20  943.08 MB  256          100.0%
>>>>>  958348cb-8205-4630-8b96-0951bf33f3d3  default
>>>>> Datacenter: gce-us-east1
>>>>> ========================
>>>>> Status=Up/Down
>>>>> |/ State=Normal/Leaving/Joining/Moving
>>>>> --  Address      Load       Tokens       Owns (effective)  Host ID
>>>>>                           Rack
>>>>> UN  10.142.0.14  6.4 GB     256          100.0%
>>>>>  c3a5c39d-e1c9-4116-903d-b6d1b23fb652  default
>>>>> UN  10.142.0.13  5.55 GB    256          100.0%
>>>>>  d0d9c30e-1506-4b95-be64-3dd4d78f0583  default
>>>>>
>>>>> And my replication settings are:
>>>>>
>>>>> {'class': 'NetworkTopologyStrategy', 'aws-us-west': '2',
>>>>> 'gce-us-central1': '2', 'gce-us-east1': '2'}
>>>>>
>>>>> As you can see 10.128.0.20 in the gce-us-central1 DC only has a load
>>>>> of 943 MB even though it's supposed to own 100% and should have 6.4 GB.
>>>>> Also 10.142.0.13 seems also not to have everything as it only has a
>>>>> load of 5.55 GB.
>>>>>
>>>>> On Mon, May 23, 2016 at 7:28 PM, kurt Greaves <k...@instaclustr.com>
>>>>> wrote:
>>>>>
>>>>>> Do you have 1 node in each DC or 2? If you're saying you have 1 node
>>>>>> in each DC then a RF of 2 doesn't make sense. Can you clarify on what 
>>>>>> your
>>>>>> set up is?
>>>>>>
>>>>>> On 23 May 2016 at 19:31, Luke Jolly <l...@getadmiral.com> wrote:
>>>>>>
>>>>>>> I am running 3.0.5 with 2 nodes in two DCs, gce-us-central1 and
>>>>>>> gce-us-east1.  I increased the replication factor of gce-us-central1 
>>>>>>> from 1
>>>>>>> to 2.  Then I ran 'nodetool repair -dc gce-us-central1'.  The
>>>>>>> "Owns" for the node switched to 100% as it should but the Load showed 
>>>>>>> that
>>>>>>> it didn't actually sync the data.  I then ran a full 'nodetool repair' 
>>>>>>> and
>>>>>>> it didn't fix it still.  This scares me as I thought 'nodetool repair' 
>>>>>>> was
>>>>>>> a way to assure consistency and that all the nodes were synced but it
>>>>>>> doesn't seem to be.  Outside of that command, I have no idea how I would
>>>>>>> assure all the data was synced or how to get the data correctly synced
>>>>>>> without decommissioning the node and re-adding it.
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Kurt Greaves
>>>>>> k...@instaclustr.com
>>>>>> www.instaclustr.com
>>>>>>
>>>>>
>>>>>
>>>>
>
>
> --
> Kurt Greaves
> k...@instaclustr.com
> www.instaclustr.com
>

Re: Increasing replication factor and repair doesn't seem to work

Reply via email to