Hi all,

I want to share my thought with you about improving the queryable state and
introducing a QueryServerProxy component.

I think the current queryable state's client is hard to use. Because it
needs users to know the TaskManager's address and proxy's port. Actually,
some business users who do not have good knowledge about the Flink's inner
or runtime in production. However, sometimes they need to query the values
of states.

IMO, the reason caused this problem is because of the queryable state's
architecture. Currently, the queryable state clients interact with
query state client proxy components which host on each TaskManager. This
design is difficult to encapsulate the point of change and exposes too much
detail to the user.

My personal idea is that we could introduce a really queryable state
server, named e.g. *QueryStateProxyServer* which would delegate all the
query state request and query the local registry then redirect the request
to the specific *QueryStateClientProxy*(runs on each TaskManager). The
server is the users really want to care about. And it would make the users
ignorant to the TaskManagers' address and proxies' port. The current
*QueryStateClientProxy* would become *QueryStateProxyClient*.

Generally speaking, the roles of the QueryStateProxyServer list below:


   - works as all the query client's proxy to receive all the request and
   send response;
   - a router to redirect the real query requests to the specific proxy
   client;
   - maintain route table registry (state <-> TaskManager,
   TaskManager<->proxy client address)
   - more fine-granted control, such as cache result, ACL, TTL, SLA(rate
   limit) and so on

About the implementation, there are three opts:

opt 1:

Let the JobManager acts as the query proxy server.

   - pros: reuse the exists JM, do not need to introduce a new process can
   reduce the complexity;
   - cons: would make JM heavy burdens, depends on the query frequency, may
   impact on the stability


[image: Screen Shot 2019-04-25 at 5.12.07 PM.png]

opt 2:

Introduce a new component  which runs as a single process and acts as the
query proxy server:


   - pros: reduce the burdens and make the JM more stability
   - cons: introduced a new component will make the implementation more
   complexity

[image: Screen Shot 2019-04-25 at 5.14.05 PM.png]

opt 3 (suggestion comes from Stefan Richter):

Combining the two opts, the query server could run as a single entry
point(process) and integrate with JobManager.

If we keep it well encapsulated, the only difference would be how we
register new TMs with the query server in the different scenarios, in JM we
might have this information already, in standalone e.g. the TMs be started
with the query server address to register. This would give the convenience
to start QS with the JM and the flexibility for power user to reduce load
on their JM.

IMO, the queryable state is a very valuable feature. It can let users query
some real-time measure results. I hope it will get the attention of the
community.

It is just a roughly thought. If it is valuable to the community, I will
give a design draft.

What's your opinion? Any feedback and comment are welcome!

Best,
Vino.

Reply via email to