Re: Flink job cannot find recover path after using entropy injection for s3 file systems

2021-04-06 Thread chenqin
Friendly ping, the fix for entropy marker is ready. -- Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/

Re: Flink job cannot find recover path after using entropy injection for s3 file systems

2021-03-31 Thread Till Rohrmann
Thanks for creating this PR. I think it would be good to re-open the issue and post your analysis there together with the proposal for the fix. Cheers, Till On Wed, Mar 31, 2021 at 3:41 AM chenqin wrote: > link fix pr here https://github.com/apache/flink/pull/15442 > we might need someone help

Re: Flink job cannot find recover path after using entropy injection for s3 file systems

2021-03-30 Thread chenqin
link fix pr here https://github.com/apache/flink/pull/15442 we might need someone help review and merge meanwhile. -- Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/

Re: Flink job cannot find recover path after using entropy injection for s3 file systems

2021-03-24 Thread Till Rohrmann
There is indeed a ticket which tried to fix it for 1.11. release [1, 2]. Maybe the fix is not working properly. [1] https://issues.apache.org/jira/browse/FLINK-17359 [2] https://github.com/apache/flink/pull/11891 On Wed, Mar 24, 2021 at 12:08 PM Till Rohrmann wrote: > Thanks for looking into th

Re: Flink job cannot find recover path after using entropy injection for s3 file systems

2021-03-24 Thread Till Rohrmann
Thanks for looking into this issue Chenqin. To me this looks like a bug in Flink. I am not entirely sure but somehow the wrapping order might have been changed when using files systems from the plugin system. Maybe Arvid knows about any changes in this area. I think we should open a JIRA ticket for

Re: Flink job cannot find recover path after using entropy injection for s3 file systems

2021-03-23 Thread chenqin
Also noticed the actual states stored in _metadata still contains entropy marker after we fix metadata directory issue. This issue seems related to code refactory as well as doesn't conveyed in tests. -- Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/

Re: Flink job cannot find recover path after using entropy injection for s3 file systems

2021-03-23 Thread chenqin
make it easier to read @Nullable private static EntropyInjectingFileSystem getEntropyFs(FileSystem fs) { LOG.warn(fs.getClass().toGenericString()); if (fs instanceof EntropyInjectingFileSystem) { return (EntropyInjectingFileSy

Re: Flink job cannot find recover path after using entropy injection for s3 file systems

2021-03-23 Thread chenqin
Hi Till, Thanks for sharing pointers related to entropy injection feature on 1.11. We did some investigation and so far it seems like an edge case handling bug. Testing Environment: flink 1.11.2 release with plugins plugins/s3-fs-hadoop/flink-s3-fs-hadoop state.backend.rocksdb.timer-service.fac

Re: Flink job cannot find recover path after using entropy injection for s3 file systems

2021-03-19 Thread Rainie Li
I see, thanks for the info, Till. Appreciated for your help. Best regards Rainie On Thu, Mar 18, 2021 at 2:09 AM Till Rohrmann wrote: > Hi Rainie, > > if I remember correctly (unfortunately I don't have a S3 deployment at hand > to try it out), then in v1.9 you should find the data files for th

Re: Flink job cannot find recover path after using entropy injection for s3 file systems

2021-03-18 Thread Till Rohrmann
Hi Rainie, if I remember correctly (unfortunately I don't have a S3 deployment at hand to try it out), then in v1.9 you should find the data files for the checkpoint under s3a://{bucket name}/dev/checkpoints/_entropy_/{job_id}/chk-2230. A checkpoint consists of these data files and a metadata file

Re: Flink job cannot find recover path after using entropy injection for s3 file systems

2021-03-17 Thread Rainie Li
Thanks for checking, Till. I have a follow up question for #2, do you know why the same job cannot show up at the entropy checkpoint in Version 1.9. For example: *When it's running in v1.11, checkpoint path is: * s3a://{bucket name}/dev/checkpoints/_entropy_/{job_id}/chk-1537 *When it's running in

Re: Flink job cannot find recover path after using entropy injection for s3 file systems

2021-03-17 Thread Till Rohrmann
Hi Rainie, 1. I think what you need to do is to look for the {job_id} in all the possible sub folders of the dev/checkpoints/ folder or you extract the entropy from the logs. 2. According to [1] entropy should only be used for the data files and not for the metadata files. The idea was to keep th

Flink job cannot find recover path after using entropy injection for s3 file systems

2021-03-16 Thread Rainie Li
Hi Flink Developers. We enabled entropy injection for s3, here is our setting on Yarn Cluster. s3.entropy.key: _entropy_ s3.entropy.length: 1 state.checkpoints.dir: 's3a://{bucket name}/dev/checkpoints/_entropy_' I have two questions: 1. After enabling entropy, job's checkpoint path changed to: *