Re: [DISCUSSION] New Ignite settings for IGNITE-12438 and IGNITE-13013

Ivan Bessonov Mon, 29 Jun 2020 03:09:43 -0700

Hi igniters, Hi Raymond,

that was a really good point. I will try to address it as much as I can.


First of all, this new mode will be configurable for now. As Val suggested,
"TcpCommunicationSpi#forceClientToServerConnections" will be a new
setting to trigger this behavior. Disabled by default.

About issues with K8S deployments - I'm not an expert, but from what I've
heard, sometimes servers and client nodes are not in the same environments.
For example, there is an Ignite cluster and user tries to start client node
in
isolated K8S pod. In this case clients cannot properly resolve their own
addresses
and send it to servers, making it impossible for servers to connect to such
clients.
Or, in other words, clients are used as if they were thin.

In your case everything is fine, clients and servers share the same network
and can resolve each other's addresses.

Now, CQ issue [1]. You can pass a custom event filter when you register a
new
continuous query. But, depending on the setup, the class of this filter may
not
be in the classpath of the server node that holds the data and invokes that
filter.
There are two solutions to the problem:
- server fails to resolve class name and fails to register CQ;
- or server can have p2p deployment enabled. Let's assume that it was a
client
node that requested CQ. In this case the server will try to download
"class" file
directly from the node that sent the filter object in the first place. Due
to a poor
design decision it will be done synchronously while registering the query,
and
query registration is happening in "discovery" thread. In normal
circumstances
the server will load the class and finish query registration, it's just a
little bit slow.

Second case is not compatible with a new "forceClientToServerConnections"
setting. I'm not sure that I need to go into all technical details, but the
result of
such procedure is a cluster that cannot process any discovery messages
during
TCP connection timeout, we're talking about tens of seconds or maybe even
several minutes depending on the settings and the environment. All this
time the
server will be in a "deadlock" state inside of the "discovery" thread. It
means that
some cluster operations will be unavailable during this period, like new
node joining
or starting a new cache. Node failures will not be processed properly as
well. For
me it's hard to predict real behavior until we reproduce the situation in a
live
environment. I saw this in tests only.

I hope that my message clarifies the situation, or at least doesn't cause
more
confusion. These changes will not affect your infrastructure or your Ignite
installations, they are aimed at adding more flexibility to other ways of
using Ignite.

[1] https://issues.apache.org/jira/browse/IGNITE-13156



сб, 27 июн. 2020 г. в 09:54, Raymond Wilson <raymond_wil...@trimble.com>:

> I have just caught up with this discussion and wanted to outline a set of
> use
> cases we have that rely on server nodes communicating with client nodes.
>
> Firstly, I'd like to confirm my mental model of server & client nodes
> within
> a grid (ignoring thin clients for now):
>
> A grid contains a set of nodes somewhat arbitrarily labelled 'server' and
> 'client' where the distinction of a 'server' node is that it is responsible
> for containing data (in-memory only, or also with persistence). Apart from
> that distinction, all nodes are essentially peers in the grid and may use
> the messaging fabric, compute layer and other grid features on an equal
> footing.
>
> In our solution we leverage these capabilities to build and orchestrate
> complex analytics queries that utilise compute functions that are initiated
> in three distinct ways: client -> client, client -> server and server ->
> client, and where all three styles of initiation are using within a single
> analytics request made to the grid it self. I can go into more detail about
> the exact sequencing of these activities if you like, but it may be
> sufficient to know they are used to reason about the problem statement and
> proposals outlined here.
>
> Our infrastructure is deployed to Kubernetes using EKS on AWS, and all
> three
> relationships between client and server nodes noted above function well
> (caveat: we do see odd things though such as long pauses on critical worker
> threads, and occasional empty topology warnings when locating client nodes
> to send requests to). We also use continuous queries in three contexts (all
> within server nodes).
>
> If this thread is suggesting changing the functional relationship between
> server and client nodes then this may have impacts on our architecture and
> implementation that we will need to consider.
>
> This thread has highlighted issues with K8s deployments and also CQ issues.
> The suggestion is that Server to Client just doesn't work on K8s, which
> does
> not agree with our experience of it working. I'd also like to understand
> better the bounds of the issue with CQ: When does it not work and what are
> the symptoms we would see if there was an issue with the way we are using
> it, or the K8s infrastructure we deploy to?
>
> Thanks,
> Raymond.
>
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>


-- 
Sincerely yours,
Ivan Bessonov

Re: [DISCUSSION] New Ignite settings for IGNITE-12438 and IGNITE-13013

Reply via email to