Hi Márton and Gabor, Thanks for sharing context!
Yes, I'd admit that users need a more friendly way to explore states. And it seems Flink lacks something like the state metadata store. I'd suggest we could think of this as a whole, to store enough information for querying, including operator names, uids, hashes, as well as the state types or descriptors. Moreover we provide a tool to list those metadata. My thoughts is to provide a complete solution instead of adding one or two specific data alongside the checkpoint. WDTY? I believe with the state schema queryable, the State Processor API could become more powerful and easier to use. But such a solution requires more design and discussion. Regarding the current issue you are facing, here's my idea: If you could get access to the web UI, you can get the hash (vertex id) in the url by clicking and zooming in on the operator you want to query. IIUC, this hash can be used to query the state. Is this feasible? Additionally, I think we could add user-defined UIDs on the web UI and related REST APIs. Thus users could easily identify an operator by uid, or get the uid of an operator. Best, Zakelly On Thu, Aug 8, 2024 at 11:03 PM Gabor Somogyi <gabor.g.somo...@gmail.com> wrote: > Hi Zakelly, > > Thanks for the feedback, let me elaborate on this. > > In short Databricks has created a much more user friendly solution[1] for > state observability (based on Flink's state processor API) than what we > have now. > > Up until now our state processor API was good enough but now we're lagging > behind. We see users (just like Spark) where the first class citizen is the > state itself and they're > pointing to the new Spark solution. Since the state became first class > citizen there is a natural need to use it for business logic validation, > debugging, explanatory browsing, etc... > > The main message here is that there are cases where users are not able to > identify operators because hash is a one way conversion. > I'm open to any suggestion but somehow the initial operator human readable > identifier must be available. Let me come up with examples where > users are completely blind. > > > Are you saying the user can set the operator uid but then doesn't know > what they set when debugging? > > There are cases where the user is setting the UID in the job, such case > it's not user friendly to parse git repos but doable. > But there are cases where the user has limited or no control related UIDs: > * SQL jobs are generating operators with meaningful names, but I think it's > not realistic to enforce users to understand all the internals of Flink SQL > implementation (which operator named where and how). > * Iceberg is using the given UID as prefix and generating more operators > with it > * Weak justification but exists: Since operator name and UID are both > optional some of the users are setting name only. Such case Flink generates > a random hash, where only name can give some pointers. > > Hope I've given better context. > > [1] > > https://www.databricks.com/blog/announcing-state-reader-api-new-statestore-data-source > > BR, > G > > > > On Thu, Aug 8, 2024 at 12:06 PM Zakelly Lan <zakelly....@gmail.com> wrote: > > > Hi Gabor, > > > > Thanks for the proposal! However, I find it a little strange. Are you > > saying the user can set the operator uid but then doesn't know what they > > set when debugging? Otherwise, is the > `OperatorIdentifier.forUid("my-uid")` > > feasible? I understand your point about potential cross-team work, but > the > > person may not be able to debug code that was not written by them. Things > > get complex in this scenario. Could you provide more details about the > > issue you are facing? > > > > Regarding the checkpoint, it is not designed to be self-contained or > > human-readable. I suggest not introducing such columns for debugging > > purposes. > > > > > > Best, > > Zakelly > > > > On Wed, Aug 7, 2024 at 10:07 PM Gabor Somogyi <gabor.g.somo...@gmail.com > > > > wrote: > > > > > Hi Devs, > > > > > > I would like to start a discussion on FLIP-474: Store operator name and > > UID > > > in state metadata[1]. > > > > > > In short users are interested in what kind of operators are inside a > > > checkpoint data which can be enhanced from user experience perspective. > > The > > > details can be found in FLIP-474[1]. > > > > > > Please share your thoughts on this. > > > > > > [1] > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-474%3A+Store+operator+name+and+UID+in+state+metadata > > > > > > BR, > > > G > > > > > >