Hi igniters, Hi Raymond, that was a really good point. I will try to address it as much as I can.
First of all, this new mode will be configurable for now. As Val suggested, "TcpCommunicationSpi#forceClientToServerConnections" will be a new setting to trigger this behavior. Disabled by default. About issues with K8S deployments - I'm not an expert, but from what I've heard, sometimes servers and client nodes are not in the same environments. For example, there is an Ignite cluster and user tries to start client node in isolated K8S pod. In this case clients cannot properly resolve their own addresses and send it to servers, making it impossible for servers to connect to such clients. Or, in other words, clients are used as if they were thin. In your case everything is fine, clients and servers share the same network and can resolve each other's addresses. Now, CQ issue [1]. You can pass a custom event filter when you register a new continuous query. But, depending on the setup, the class of this filter may not be in the classpath of the server node that holds the data and invokes that filter. There are two solutions to the problem: - server fails to resolve class name and fails to register CQ; - or server can have p2p deployment enabled. Let's assume that it was a client node that requested CQ. In this case the server will try to download "class" file directly from the node that sent the filter object in the first place. Due to a poor design decision it will be done synchronously while registering the query, and query registration is happening in "discovery" thread. In normal circumstances the server will load the class and finish query registration, it's just a little bit slow. Second case is not compatible with a new "forceClientToServerConnections" setting. I'm not sure that I need to go into all technical details, but the result of such procedure is a cluster that cannot process any discovery messages during TCP connection timeout, we're talking about tens of seconds or maybe even several minutes depending on the settings and the environment. All this time the server will be in a "deadlock" state inside of the "discovery" thread. It means that some cluster operations will be unavailable during this period, like new node joining or starting a new cache. Node failures will not be processed properly as well. For me it's hard to predict real behavior until we reproduce the situation in a live environment. I saw this in tests only. I hope that my message clarifies the situation, or at least doesn't cause more confusion. These changes will not affect your infrastructure or your Ignite installations, they are aimed at adding more flexibility to other ways of using Ignite. [1] https://issues.apache.org/jira/browse/IGNITE-13156 сб, 27 июн. 2020 г. в 09:54, Raymond Wilson <raymond_wil...@trimble.com>: > I have just caught up with this discussion and wanted to outline a set of > use > cases we have that rely on server nodes communicating with client nodes. > > Firstly, I'd like to confirm my mental model of server & client nodes > within > a grid (ignoring thin clients for now): > > A grid contains a set of nodes somewhat arbitrarily labelled 'server' and > 'client' where the distinction of a 'server' node is that it is responsible > for containing data (in-memory only, or also with persistence). Apart from > that distinction, all nodes are essentially peers in the grid and may use > the messaging fabric, compute layer and other grid features on an equal > footing. > > In our solution we leverage these capabilities to build and orchestrate > complex analytics queries that utilise compute functions that are initiated > in three distinct ways: client -> client, client -> server and server -> > client, and where all three styles of initiation are using within a single > analytics request made to the grid it self. I can go into more detail about > the exact sequencing of these activities if you like, but it may be > sufficient to know they are used to reason about the problem statement and > proposals outlined here. > > Our infrastructure is deployed to Kubernetes using EKS on AWS, and all > three > relationships between client and server nodes noted above function well > (caveat: we do see odd things though such as long pauses on critical worker > threads, and occasional empty topology warnings when locating client nodes > to send requests to). We also use continuous queries in three contexts (all > within server nodes). > > If this thread is suggesting changing the functional relationship between > server and client nodes then this may have impacts on our architecture and > implementation that we will need to consider. > > This thread has highlighted issues with K8s deployments and also CQ issues. > The suggestion is that Server to Client just doesn't work on K8s, which > does > not agree with our experience of it working. I'd also like to understand > better the bounds of the issue with CQ: When does it not work and what are > the symptoms we would see if there was an issue with the way we are using > it, or the K8s infrastructure we deploy to? > > Thanks, > Raymond. > > > > > -- > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ > -- Sincerely yours, Ivan Bessonov