Re: [Discuss] FLIP-43: Savepoint Connector

2019-06-25 Thread Jark Wu
Hi Gordon, The modify max parallelism API looks good to me. Thank you and Seth for the great work on it. Cheers, Jark On Tue, 25 Jun 2019 at 16:01, Tzu-Li (Gordon) Tai wrote: > Hi Jark, > > Thanks for the reminder. I've updated the FLIP name in confluence to match > the new name "State Process

Re: [Discuss] FLIP-43: Savepoint Connector

2019-06-25 Thread Tzu-Li (Gordon) Tai
Hi Jark, Thanks for the reminder. I've updated the FLIP name in confluence to match the new name "State Processor API". Concerning an API for changing max parallelism: That is actually in the works and has been considered, and would look something like - ``` ExistingSavepoint savepoint = Savepoin

Re: [Discuss] FLIP-43: Savepoint Connector

2019-06-24 Thread Jark Wu
Thanks for the awesome FLIP. I think it will be very useful in state migration scenario. We are also looking for a state reuse solution for SQL jobs. And I think this feature will help a lot. Looking forward to have it in the near future. Regarding to the naming, I'm +1 to "State Processing API".

Re: [Discuss] FLIP-43: Savepoint Connector

2019-06-04 Thread Tzu-Li (Gordon) Tai
On Wed, Jun 5, 2019 at 6:39 AM Xiaowei Jiang wrote: > Hi Gordon & Seth, this looks like a very useful feature for analyze and > manage states. > I agree that using DataSet is probably the most practical choice right > now. But in the longer adding the TableAPI support for this will be nice. > A

Re: [Discuss] FLIP-43: Savepoint Connector

2019-06-04 Thread Xiaowei Jiang
Hi Gordon & Seth, this looks like a very useful feature for analyze and manage states.  I agree that using DataSet is probably the most practical choice right now. But in the longer adding the TableAPI support for this will be nice. When analyzing the savepoint, I assume that the state backend r

Re: [Discuss] FLIP-43: Savepoint Connector

2019-06-04 Thread Aljoscha Krettek
+1 I think is is a very valuable new additional and we should try and not get stuck on trying to design the perfect solution for everything > On 4. Jun 2019, at 13:24, Tzu-Li (Gordon) Tai wrote: > > +1 to renaming it as State Processing API and adding it under the > flink-libraries module. > >

Re: [Discuss] FLIP-43: Savepoint Connector

2019-06-04 Thread Tzu-Li (Gordon) Tai
+1 to renaming it as State Processing API and adding it under the flink-libraries module. I also think we can start with the development of the feature. From the feedback so far, it seems like we're in a good spot to add in at least the initial version of this API, hopefully making it ready for 1.

Re: [Discuss] FLIP-43: Savepoint Connector

2019-06-04 Thread Seth Wiesman
It seems like a recurring piece of feedback was a different name. I’d like to propose moving the functionality to the libraries module and naming this the State Processing API. Seth > On May 31, 2019, at 3:47 PM, Seth Wiesman wrote: > > The SavepointOutputFormat only writes out the savepoint

Re: [Discuss] FLIP-43: Savepoint Connector

2019-05-31 Thread Seth Wiesman
The SavepointOutputFormat only writes out the savepoint metadata file and should be mostly ignored. The actual state is written out by stream operators and tied into the flink runtime[1, 2, 3]. This is the most important part and the piece that I don’t think can be reasonably extracted. Seth

Re: [Discuss] FLIP-43: Savepoint Connector

2019-05-31 Thread Jan Lukavský
Hi Seth, yes, that helped! :-) What I was looking for is essentially `org.apache.flink.connectors.savepoint.output.SavepointOutputFormat`. It would be great if this could be written in a way, that would enable reusing it in different engine (as I mentioned - Apache Spark). There seem to be s

Re: [Discuss] FLIP-43: Savepoint Connector

2019-05-31 Thread Seth Wiesman
@Jan Gotcha, So in reusing components it explicitly is not a writer. This is not a savepoint output format in the way we have a parquet output format. The reason for the Transform api is to hide the underlying details, it does not simply append a output writer to the end of a dataset. This get

Re: [Discuss] FLIP-43: Savepoint Connector

2019-05-31 Thread Piotr Nowojski
> I think it’s best to keep this initial implementation focused and add those > changes if there is adoption and interest in the community. I agree. I didn’t mean to hold the implementation/acceptance of this until someone solve the SQL story :) Piotrek > On 31 May 2019, at 13:18, Seth Wiesma

Re: [Discuss] FLIP-43: Savepoint Connector

2019-05-31 Thread Jan Lukavský
Hi Seth, that sounds reasonable. What I was asking for was not to reverse engineer binary format, but to make the savepoint write API a little more reusable, so that it could be wrapped into some other technology. I don't know the details enough to propose a solution, but it seems to me, that

Re: [Discuss] FLIP-43: Savepoint Connector

2019-05-31 Thread Seth Wiesman
And I can definitely imagine a “savepoint catalog” at some point in the future. Seth > On May 31, 2019, at 4:39 AM, Tzu-Li (Gordon) Tai wrote: > > @Piotr > Yes, we're aiming this for the 1.9 release. This was also mentioned in the > recent 1.9 feature discussion thread [1]. > > [1] > http://a

Re: [Discuss] FLIP-43: Savepoint Connector

2019-05-31 Thread Seth Wiesman
@Konstantin agreed, that was a large impotence for this feature. Also I am happy to change the name to something that better describes the feature set. @Lan Savepoints depend heavily on a number of flink internal components that may change between versions: state backends internals, type seria

Re: [Discuss] FLIP-43: Savepoint Connector

2019-05-31 Thread Seth Wiesman
@Piotr I definitely would like to see this have sql integrations at some point. The reason for holding off is that to do so would require savepoint format, it is not currently possible to discover states and schemas without state descriptors in a robust way. I think it’s best to keep this in

Re: [Discuss] FLIP-43: Savepoint Connector

2019-05-31 Thread Jan Lukavský
Hi, this is awesome, and really useful feature. If I might ask for one thing to consider - would it be possible to make the Savepoint manipulation API (at least writing the Savepoint) less dependent on other parts of Flink internals (e.g. |KeyedStateBootstrapFunction|) and provide something m

Re: [Discuss] FLIP-43: Savepoint Connector

2019-05-31 Thread Tzu-Li (Gordon) Tai
@Piotr Yes, we're aiming this for the 1.9 release. This was also mentioned in the recent 1.9 feature discussion thread [1]. [1] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Features-for-Apache-Flink-1-9-0-td28701.html On Fri, May 31, 2019 at 4:34 PM Piotr Nowojski wrote

Re: [Discuss] FLIP-43: Savepoint Connector

2019-05-31 Thread Piotr Nowojski
I was long awaiting this feature! I can not help much with reviewing, but big +1 from my side :) One thing that would be great for analyzing the state and possible smaller modifications, would be to hook this in with Flink SQL :) Especially if it could be done in a way that would work out of th

Re: [Discuss] FLIP-43: Savepoint Connector

2019-05-31 Thread Konstantin Knauf
Hi Seth, big +1, happy to see this moving forward :) I have seen plenty of users, who refrained using managed state for some of their data/use cases due to the lack of something like this. I am not sure about the name "Savepoint Connector", but for a different reason. While it is technically a "co

Re: [Discuss] FLIP-43: Savepoint Connector

2019-05-30 Thread Steven Wu
this is an awesome feature. > The name "Savepoint Connector" might indeed be not that good, as it doesn't point out the fact that with the current design, all kinds of snapshots (savepoint / full or incremental checkpoints) can be read. @Gordon can you add the above clarification to the FLIP page

Re: [Discuss] FLIP-43: Savepoint Connector

2019-05-30 Thread Paul Lam
Hi, @Gordon @Seth Thanks a lot for your inputs! In general, I agree with you. The metadata querying feature is a nice-to-have but not a must-have, and it’s reasonable to make it as a follow up since it requires some extra work. Best, Paul Lam > 在 2019年5月30日,19:22,Seth Wiesman 写道: > > @Paul

Re: [Discuss] FLIP-43: Savepoint Connector

2019-05-30 Thread Seth Wiesman
@Paul I agree with Gordon that those are useful features. The only thing I’d like to add is that I don’t believe listing operator ids will be useful to most users, they want to see UIDs which would also require changes to the Savepoint metadata file. I think that would be a good follow up but

Re: [Discuss] FLIP-43: Savepoint Connector

2019-05-30 Thread Louis
+1 from my size. I think it will be a good feature. Best -- Louis Email:xu_soft39211...@163.com > On 30 May 2019, at 15:57, Tzu-Li (Gordon) Tai wrote: > > The name "Savepoint Connector" might indeed be not that good, as it doesn't > point out the fact that with the current design, all kinds o

Re: [Discuss] FLIP-43: Savepoint Connector

2019-05-30 Thread Tzu-Li (Gordon) Tai
The name "Savepoint Connector" might indeed be not that good, as it doesn't point out the fact that with the current design, all kinds of snapshots (savepoint / full or incremental checkpoints) can be read. @Paul That would be a very valid requirement. Querying the list of existing operator ids sh

Re: [Discuss] FLIP-43: Savepoint Connector

2019-05-29 Thread Paul Lam
Hi Seth, Sorry for the confusion. I mean currently we need to know the operator id, state name and the state type (eg. ListState, MapState) beforehand to get the states. Is possible that we can perform a scan to get all existing operator ids or state names in the savepoint? It would be good to

Re: [Discuss] FLIP-43: Savepoint Connector

2019-05-29 Thread Seth Wiesman
Hi Paul, I’m not following, could you provide and example of the kind of operation your describing? Seth > On May 29, 2019, at 7:37 PM, Paul Lam wrote: > > Hi Seth, > > +1 from my side. > > I was wondering if we can add a reader method to provide a full view of the > states instead of t

Re: [Discuss] FLIP-43: Savepoint Connector

2019-05-29 Thread Terry Wang
Hi Seth, Big +1 from my side. I like this idea. IMO, it’s better to chose another flip name instead of ‘connector’, which is a little confusing. > 在 2019年5月30日,上午10:37,Paul Lam 写道: > > Hi Seth, > > +1 from my side. > > I was wondering if we can add a reader method to provide a full view of t

Re: [Discuss] FLIP-43: Savepoint Connector

2019-05-29 Thread Paul Lam
Hi Seth, +1 from my side. I was wondering if we can add a reader method to provide a full view of the states instead of the state of a specific operator? It would be helpful when there is some unrestored states of a previously removed operator in the savepoint. Best, Paul Lam > 在 2019年5月30日

Re: [Discuss] FLIP-43: Savepoint Connector

2019-05-29 Thread vino yang
Hi Seth, Glad to see this FLIP, big +1 for this feature! Best, Vino Seth Wiesman 于2019年5月30日周四 上午7:14写道: > Hey Everyone! > ​ > Gordon and I have been discussing adding a savepoint connector to flink > for reading, writing and modifying savepoints. > ​ > This is useful for: > ​ > Analyzing

[Discuss] FLIP-43: Savepoint Connector

2019-05-29 Thread Seth Wiesman
Hey Everyone! ​ Gordon and I have been discussing adding a savepoint connector to flink for reading, writing and modifying savepoints. ​ This is useful for: ​ Analyzing state for interesting patterns Troubleshooting or auditing jobs by checking for discrepancies in state Bootstrapping

[Discuss] FLIP-43: Savepoint Connector

2019-05-29 Thread Seth Wiesman
Hey Everyone! Gordon and I have been discussing adding a savepoint connector to flink for reading, writing, and modifying savepoints. This is useful for: * Analyzing state for interesting patterns * Troubleshooting or auditing jobs by checking for discrepancies in state * Bootstrapping state fo