For the other DC, it can be acceptable because partition reside on one node, so say if you have a large partition, it may skew things a bit. On May 25, 2016 2:41 AM, "Luke Jolly" <l...@getadmiral.com> wrote:
> So I guess the problem may have been with the initial addition of the > 10.128.0.20 node because when I added it in it never synced data I > guess? It was at around 50 MB when it first came up and transitioned to > "UN". After it was in I did the 1->2 replication change and tried repair > but it didn't fix it. From what I can tell all the data on it is stuff > that has been written since it came up. We never delete data ever so we > should have zero tombstones. > > If I am not mistaken, only two of my nodes actually have all the data, > 10.128.0.3 and 10.142.0.14 since they agree on the data amount. 10.142.0.13 > is almost a GB lower and then of course 10.128.0.20 which is missing over > 5 GB of data. I tried running nodetool -local on both DCs and it didn't > fix either one. > > Am I running into a bug of some kind? > > On Tue, May 24, 2016 at 4:06 PM Bhuvan Rawal <bhu1ra...@gmail.com> wrote: > >> Hi Luke, >> >> You mentioned that replication factor was increased from 1 to 2. In that >> case was the node bearing ip 10.128.0.20 carried around 3GB data earlier? >> >> You can run nodetool repair with option -local to initiate repair local >> datacenter for gce-us-central1. >> >> Also you may suspect that if a lot of data was deleted while the node was >> down it may be having a lot of tombstones which is not needed to be >> replicated to the other node. In order to verify the same, you can issue a >> select count(*) query on column families (With the amount of data you have >> it should not be an issue) with tracing on and with consistency local_all >> by connecting to either 10.128.0.3 or 10.128.0.20 and store it in a >> file. It will give you a fair amount of idea about how many deleted cells >> the nodes have. I tried searching for reference if tombstones are moved >> around during repair, but I didnt find evidence of it. However I see no >> reason to because if the node didnt have data then streaming tombstones >> does not make a lot of sense. >> >> Regards, >> Bhuvan >> >> On Tue, May 24, 2016 at 11:06 PM, Luke Jolly <l...@getadmiral.com> wrote: >> >>> Here's my setup: >>> >>> Datacenter: gce-us-central1 >>> =========================== >>> Status=Up/Down >>> |/ State=Normal/Leaving/Joining/Moving >>> -- Address Load Tokens Owns (effective) Host ID >>> Rack >>> UN 10.128.0.3 6.4 GB 256 100.0% >>> 3317a3de-9113-48e2-9a85-bbf756d7a4a6 default >>> UN 10.128.0.20 943.08 MB 256 100.0% >>> 958348cb-8205-4630-8b96-0951bf33f3d3 default >>> Datacenter: gce-us-east1 >>> ======================== >>> Status=Up/Down >>> |/ State=Normal/Leaving/Joining/Moving >>> -- Address Load Tokens Owns (effective) Host ID >>> Rack >>> UN 10.142.0.14 6.4 GB 256 100.0% >>> c3a5c39d-e1c9-4116-903d-b6d1b23fb652 default >>> UN 10.142.0.13 5.55 GB 256 100.0% >>> d0d9c30e-1506-4b95-be64-3dd4d78f0583 default >>> >>> And my replication settings are: >>> >>> {'class': 'NetworkTopologyStrategy', 'aws-us-west': '2', >>> 'gce-us-central1': '2', 'gce-us-east1': '2'} >>> >>> As you can see 10.128.0.20 in the gce-us-central1 DC only has a load of >>> 943 MB even though it's supposed to own 100% and should have 6.4 GB. Also >>> 10.142.0.13 >>> seems also not to have everything as it only has a load of 5.55 GB. >>> >>> On Mon, May 23, 2016 at 7:28 PM, kurt Greaves <k...@instaclustr.com> >>> wrote: >>> >>>> Do you have 1 node in each DC or 2? If you're saying you have 1 node in >>>> each DC then a RF of 2 doesn't make sense. Can you clarify on what your set >>>> up is? >>>> >>>> On 23 May 2016 at 19:31, Luke Jolly <l...@getadmiral.com> wrote: >>>> >>>>> I am running 3.0.5 with 2 nodes in two DCs, gce-us-central1 and >>>>> gce-us-east1. I increased the replication factor of gce-us-central1 from >>>>> 1 >>>>> to 2. Then I ran 'nodetool repair -dc gce-us-central1'. The "Owns" >>>>> for the node switched to 100% as it should but the Load showed that it >>>>> didn't actually sync the data. I then ran a full 'nodetool repair' and it >>>>> didn't fix it still. This scares me as I thought 'nodetool repair' was a >>>>> way to assure consistency and that all the nodes were synced but it >>>>> doesn't >>>>> seem to be. Outside of that command, I have no idea how I would assure >>>>> all >>>>> the data was synced or how to get the data correctly synced without >>>>> decommissioning the node and re-adding it. >>>>> >>>> >>>> >>>> >>>> -- >>>> Kurt Greaves >>>> k...@instaclustr.com >>>> www.instaclustr.com >>>> >>> >>> >>