It’s a good idea to get the process information of large ongoing window. +1 from my side.
> 在 2019年7月4日,11:41,vino yang <yanghua1...@gmail.com> 写道: > > Hi folks, > > Currently, the queryable state is not widely used in production. IMO, there > are two key reasons caused this result. 1) the client of the queryable > state is hard to use. Because it requires users to know the address of > TaskManager and the port of the proxy. Actually, most business users who do > not have good knowledge about the Flink's inner and runtime in production. > 2) The benefit of this feature has not been excavated. In Flink DataStream > API, State is the first level citizen, it’s Flink key advantage compared > with other compute engines. Because the queryable state is the most > effective way to pry the latest computing progress. > > Three months ago, I started a discussion about improving the queryable > state and introducing a proxy component.[1] It brings a lot of attention > and discussion. Recently, I have submitted a design document about the > proposal.[2] These efforts try to process the first problem. > > About the second question, the most essential solution is that we should > really make the queryable state work. The window operator is one of the > most valuable and most frequently used operators of all Flink operators. > And it also uses keyed state which is queryable. So we propose to let the > state of the window operator be queried. This is not only for increasing > the value of the queryable state but also for the real business needs. > > IMO, allowing window state to be queried will provide great value. In many > scenarios, we often use large windows for aggregate calculations. A very > common example is a day-level window that counts the PV of a day. But > usually, the user is not only satisfied to wait until the end of the window > to get the result. They want to get "intermediate results" at a smaller > time granularity to analyze trends. Because Flink does not provide periodic > triggers for fixed windows. We have extended this and implemented an > "incremental window". It can trigger a fixed window with a smaller interval > period and feedback intermediate results. However, we believe that this > approach is still not flexible enough. We should let the user query the > current calculation result of the window through the API at any time. > > However, I know that if we want to implement it, we still have some details > that need to be discussed, such as how to let users know the state > descriptors in the window, namespace and so on. > > This discussion thread is mainly to listen to the community's opinion on > this proposal. > > Any feedback and ideas are welcome and appreciated. > > Best, > Vino > > [1]: > http://mail-archives.apache.org/mod_mbox/flink-dev/201907.mbox/%3ctencent_35a56d6858408be2e2064...@qq.com%3E > [2]: > https://docs.google.com/document/d/181qYVIiHQGrc3hCj3QBn1iEHF4bUztdw4XO8VSaf_uI/edit?usp=sharing