Thanks for the feedback :) I agree that combining this with SQL would give an extremely nice layer to analyse the states.
Our goal is to contribute this to Flink, I think this should live as part of the Flink project to make deeper intergration possible in the long run. Of course a pre-requisite for this is that there is enough production interest for such a tool, but I believe there should be :) Gyula Piotr Nowojski <pi...@data-artisans.com> ezt írta (időpont: 2018. aug. 17., P, 15:07): > Hi, > > Very huge +1 from my side. I found lack of such tool/possibility as a big > problem for long term maintainability of Flink jobs. > > In the long run, I would be delight to see Flink SQL support for those > things as well. Ad hoc analysis is one of the prime use case of SQL. This > tool would make analysis possible, while SQL could make them easy to use > and shorten the feedback loop. Especially in cases when you are not sure > what you are looking for in the state. > > Just to clarify. Is your end goal to contribute such tool to apache Flink > or do you want it to be separate tool? > > Piotrek > > > On 17 Aug 2018, at 12:28, Gyula Fóra <gyula.f...@gmail.com> wrote: > > > > Hi All! > > > > I want to share with you a little project we have been working on at King > > (with some help from some dataArtisans folks). I think this would be a > > valuable addition to Flink and solve a bunch of outstanding production > > use-cases and headaches around state bootstrapping and state analytics. > > > > We have built a quick and dirty POC implementation on top of Flink 1.6, > > please check the README for some nice examples to get a quick idea: > > > > https://github.com/king/bravo > > > > *Short story* > > Bravo is a convenient state reader and writer library leveraging the > > Flink’s batch processing capabilities. It supports processing and writing > > Flink streaming savepoints. At the moment it only supports processing > > RocksDB savepoints but this can be extended in the future for other state > > backends and checkpoint types. > > > > Our goal is to cover a few basic features: > > > > - Converting keyed states to Flink DataSets for processing and > analytics > > - Reading/Writing non-keyed operators states > > - Bootstrap keyed states from Flink DataSets and create new valid > > savepoints > > - Transform existing savepoints by replacing/changing some states > > > > > > Some example use-cases: > > > > - Point-in-time state analytics across all operators and keys > > - Bootstrap state of a streaming job from external resources such as > > reading from database/filesystem > > - Validate and potentially repair corrupted state of a streaming job > > - Change max parallelism of a job > > > > > > Our main goal is to start working together with other Flink production > > users and make this something useful that can be part of Flink. So if you > > have use-cases please talk to us :) > > I have also started a google doc which contains a little bit more info > than > > the readme and could be a starting place for discussions: > > > > > https://docs.google.com/document/d/103k6wPX20kMu5H3SOOXSg5PZIaYpwdhqBMr-ppkFL5E/edit?usp=sharing > > > > I know there are a bunch of rough edges and bugs (and no tests) but our > > motto is: If you are not embarrassed, you released too late :) > > > > Please let me know what you think! > > > > Cheers, > > Gyula > >