Hi all, In the past, I have tried to further refine the design of this topic thread and wrote a design document to give more detailed design images and text description, so that it is more conducive to discussion.[1]
Note: The document is not yet completed, for example, the "Implementation" section is missing. Therefore, it is still in an open discussion state. I will improve the rest while listening to the opinions of the community. Welcome and appreciate more discussions and feedback. Best, Vino [1]: https://docs.google.com/document/d/181qYVIiHQGrc3hCj3QBn1iEHF4bUztdw4XO8VSaf_uI/edit?usp=sharing yanghua1127 <yanghua1...@gmail.com> 于2019年6月7日周五 下午11:32写道: > Hi Georgi, > > Thanks for your feedback. And glad to hear you are using queryable state. > > I agree that implementation of option 1 is easier than others. However, > when we design the new architecture we need to consider more aspects .e.g. > scalability. So it seems option 3 is more suitable. Actually, some > committers such as Stefan, Gordon and Aljoscha have given me feedback and > direction. > > Currently, I am writing the design document. If it is ready to be > presented. I will copy to this thread and we can discuss further details. > > ---- > Best, > Vino > > > On 2019-06-07 19:03 , Georgi Stoyanov <gstoya...@live.com> Wrote: > > Hi Vino, > > > > I was investigating the current architecture and AFAIK the first proposal > will be a lot easier to implement, cause currently JM has the information > about the states (where, which etc thanks to KvStateLocationRegistry. > Correct me if I’m wrong) > > We are using the feature and it’s indeed not very cool to iterate trough > ports, check which TM is the responsible one etc etc. > > > > It will be very useful if someone from the committers joins the topic and > give us some insights what’s going to happen with that feature. > > > > > > Kind Regards, > > Georgi > > > > > > > > *From:* vino yang <yanghua1...@gmail.com> > *Sent:* Thursday, April 25, 2019 5:18 PM > *To:* dev <dev@flink.apache.org>; user <u...@flink.apache.org> > *Cc:* Stefan Richter <s.rich...@ververica.com>; Aljoscha Krettek < > aljos...@apache.org>; kklou...@gmail.com > *Subject:* [DISCUSS] Improve Queryable State and introduce a > QueryServerProxy component > > > > Hi all, > > > > I want to share my thought with you about improving the queryable state > and introducing a QueryServerProxy component. > > > > I think the current queryable state's client is hard to use. Because it > needs users to know the TaskManager's address and proxy's port. Actually, > some business users who do not have good knowledge about the Flink's inner > or runtime in production. However, sometimes they need to query the values > of states. > > > > IMO, the reason caused this problem is because of the queryable state's > architecture. Currently, the queryable state clients interact with > query state client proxy components which host on each TaskManager. This > design is difficult to encapsulate the point of change and exposes too much > detail to the user. > > > > My personal idea is that we could introduce a really queryable state > server, named e.g. *QueryStateProxyServer* which would delegate all the > query state request and query the local registry then redirect the request > to the specific *QueryStateClientProxy*(runs on each TaskManager). The > server is the users really want to care about. And it would make the users > ignorant to the TaskManagers' address and proxies' port. The current > *QueryStateClientProxy* would become *QueryStateProxyClient*. > > > > Generally speaking, the roles of the QueryStateProxyServer list below: > > > > - works as all the query client's proxy to receive all the request and > send response; > - a router to redirect the real query requests to the specific proxy > client; > - maintain route table registry (state <-> TaskManager, > TaskManager<->proxy client address) > - more fine-granted control, such as cache result, ACL, TTL, SLA(rate > limit) and so on > > About the implementation, there are three opts: > > > > opt 1: > > > > Let the JobManager acts as the query proxy server. > > · pros: reuse the exists JM, do not need to introduce a new process can > reduce the complexity; > > · cons: would make JM heavy burdens, depends on the query frequency, may > impact on the stability > > > > [image: Screen Shot 2019-04-25 at 5.12.07 PM.png] > > > > opt 2: > > > > Introduce a new component which runs as a single process and acts as the > query proxy server: > > > > · pros: reduce the burdens and make the JM more stability > > · cons: introduced a new component will make the implementation more > complexity > > [image: Screen Shot 2019-04-25 at 5.14.05 PM.png] > > > > opt 3 (suggestion comes from Stefan Richter): > > > > Combining the two opts, the query server could run as a single entry > point(process) and integrate with JobManager. > > > > If we keep it well encapsulated, the only difference would be how we > register new TMs with the query server in the different scenarios, in JM we > might have this information already, in standalone e.g. the TMs be started > with the query server address to register. This would give the convenience > to start QS with the JM and the flexibility for power user to reduce load > on their JM. > > > > IMO, the queryable state is a very valuable feature. It can let users > query some real-time measure results. I hope it will get the attention of > the community. > > > > It is just a roughly thought. If it is valuable to the community, I will > give a design draft. > > > > What's your opinion? Any feedback and comment are welcome! > > > > Best, > > Vino. > > > >