Re: Adding new DC results in clients failing to connect

Jorge Bay Gondra Thu, 30 Apr 2020 01:46:31 -0700

Hi,
You can enable logging at driver to see what's happening under the hood:
https://docs.datastax.com/en/developer/csharp-driver/3.14/faq/#how-can-i-enable-logging-in-the-driver
With logging information, it should be easy to track the issue down.


Can you query system.local and system.peers on a seed node / contact point
to see if all the node list / token info is expected. You can compare it to
nodetool ring info.

Not directly related: 256 vnodes is probably more than you want.

Thanks,
Jorge

On Thu, Apr 30, 2020 at 9:48 AM Gediminas Blazys
<gediminas.bla...@microsoft.com.invalid> wrote:

> Hello,
>
>
>
> We have run into a very interesting issue and maybe some of you have
> encountered it or just have an idea where to look.
>
>
>
> We are working towards adding new dcs into our cluster, here's the current
> topology:
>
> DC1 - 18 nodes
>
> DC2 - 18 nodes
>
> DC3 - 18 nodes
>
> DC4 - 18 nodes
>
> DC5 - 18 nodes
>
>
>
> Recently we introduced a new DC6 (60 nodes) into our cluster. The joining
> and rebuilding of DC6 went smoothly, clients are using it without issue.
> This is how it looked after joining DC6:
>
> DC1 - 18 nodes
>
> DC2 - 18 nodes
>
> DC3 - 18 nodes
>
> DC4 - 18 nodes
>
> DC5 - 18 nodes
>
> DC6 - 60 nodes
>
>
>
> Next we wanted to add another DC7 (also 60 nodes) making it a total of 210
> nodes in the cluster, and while joining new nodes went smoothly, once we
> changed the replication of user defined keyspaces to include DC7, no
> clients were able to connect to Cassandra (regardless of which DC is being
> addressed). They would throw an exception that I have provided at the end
> of the email.
>
>
>
> Cassandra version 3.11.4.
>
> C# driver version 3.12.0. Also tested with 3.14.0. We use dc round robin
> policy and update ring metadata for connecting clients.
>
> Amount of vnodes per node: 256
>
>
>
> The stack trace starts with an exception 'The source argument contains
> duplicate keys.'. Maybe you know what kind of data is in this dictionary?
> What data can be duplicated here?
>
>
>
> Clients are unable to connect until the moment we remove DC7 from
> replication. Once replication is adjusted to exclude DC7, clients can
> connect normally.
>
>
>
> Cassandra.NoHostAvailableException: All hosts tried for query failed
> (tried <<IPaddress>>:9042: ArgumentException 'The source argument contains
> duplicate keys.')2020/04/29 10:19:27.51410636
>
> at
> Cassandra.Connections.ControlConnection.<Connect>d__39.MoveNext()2020/04/29
> 10:19:27.51410636
>
> --- End of stack trace from previous location where exception was thrown
> ---2020/04/29 10:19:27.51410636
>
> System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()2020/04/29
> 10:19:27.51410636
>
> System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task
> task)2020/04/29 10:19:27.51410636
>
> Cassandra.Connections.ControlConnection.<InitAsync>d__36.MoveNext()2020/04/29
> 10:19:27.51410636
>
> End of stack trace from previous location where exception was thrown
> ---2020/04/29 10:19:27.51410636
>
> System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()2020/04/29
> 10:19:27.51410636
>
> System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task
> task)2020/04/29 10:19:27.51410636
>
> Cassandra.Tasks.TaskHelper.<WaitToCompleteAsync>d__10.MoveNext()2020/04/29
> 10:19:27.51410636
>
> End of stack trace from previous location where exception was thrown
> ---2020/04/29 10:19:27.51410636
>
> System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()2020/04/29
> 10:19:27.51410636
>
> System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task
> task)2020/04/29 10:19:27.51410636
>
> Cassandra.Cluster.<Cassandra-SessionManagement-IInternalCluster-OnInitializeAsync>d__50.MoveNext()2020/04/29
> 10:19:27.51410636
>
> End of stack trace from previous location where exception was thrown
> ---2020/04/29 10:19:27.51410636
>
> System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()2020/04/29
> 10:19:27.51410636
>
> System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task
> task)2020/04/29 10:19:27.51410636
>
> Cassandra.ClusterLifecycleManager.<InitializeAsync>d__3.MoveNext()2020/04/29
> 10:19:27.51410636
>
> End of stack trace from previous location where exception was thrown
> ---2020/04/29 10:19:27.51410636
>
> System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()2020/04/29
> 10:19:27.51410636
>
> System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task
> task)2020/04/29 10:19:27.51410636
>
> Cassandra.Cluster.<Cassandra-SessionManagement-IInternalCluster-ConnectAsync>d__47`1.MoveNext()2020/04/29
> 10:19:27.51410636
>
> End of stack trace from previous location where exception was thrown
> ---2020/04/29 10:19:27.51410636
>
> System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()2020/04/29
> 10:19:27.51410636
>
> System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task
> task)2020/04/29 10:19:27.51410636
>
> Cassandra.Cluster.<ConnectAsync>d__46.MoveNext()2020/04/29
> 10:19:27.51410636
>
> End of stack trace from previous location where exception was thrown
> ---2020/04/29 10:19:27.51410636
>
> System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()2020/04/29
> 10:19:27.51410636
>
> Cassandra.Tasks.TaskHelper.WaitToComplete(Task task, Int32
> timeout)2020/04/29 10:19:27.51410636
>
> Cassandra.Cluster.Connect()2020/04/29 10:19:27.51410636
>
>
>
> We would really appreciate your input, big thanks in advance.
>
>
>
> Gediminas
>
>
>

Re: Adding new DC results in clients failing to connect

Reply via email to