Re: Savepoint failure along with JobManager crash

2021-08-31 Thread Matthias Pohl
Hi Prasanna, thanks for reaching out to the community. What you're experiencing is that the savepoint was created but the job itself ended up in an inconsistent state with Executions being cancelled instead of being finished. This should have triggered a global failover resulting in a job restart.

Re: savepoint failure

2021-07-14 Thread Till Rohrmann
t;>> Whenever I try to trigger a savepoint after my state is >>>>>>> bootstrapped I get the following error for different operators: >>>>>>> >>>>>>> Caused by: java.lang.IllegalArgumentException: Key group 0 is not in >>

Re: savepoint failure

2021-07-13 Thread Dan Hill
at >>>>>> org.apache.flink.runtime.state.KeyGroupRangeOffsets.computeKeyGroupIndex(KeyGroupRangeOffsets.java:142) >>>>>> at >>>>>> org.apache.flink.runtime.state.KeyGroupRangeOffsets.setKeyGroupOffset(KeyGroupRangeOffsets.java:104) >>>>>> at >>>>>> org.apache.flink.contrib.streaming.state.snapshot.RocksFullSnapshotStrategy$SnapshotAsynchronousPartCallable.writeKVStateData(RocksFullSnapshotStrategy.java:319) >>>>>> at >>>>>> org.apache.flink.contrib.streaming.state.snapshot.RocksFullSnapshotStrategy$SnapshotAsynchronousPartCallable.writeSnapshotToOutputStream(RocksFullSnapshotStrategy.java:261) >>>>>> >>>>>> Note: key group might vary. >>>>>> >>>>>> I found this >>>>>> <https://stackoverflow.com/questions/49140654/flink-error-key-group-is-not-in-keygrouprange> >>>>>> article >>>>>> in Stackoverflow which relates to such an exception (btw my job graph >>>>>> looks >>>>>> similar to the one described in the article except that my job has more >>>>>> joins). I double checked my hashcodes and I think that they are fine. >>>>>> >>>>>> I tried to reduce the parallelism to 1 with 1 task slot per task >>>>>> manager and this configuration seems to work. This leads me to a >>>>>> direction >>>>>> that it might be some concurrency issue. >>>>>> >>>>>> I would like to understand what is causing the savepoint failure. Do >>>>>> you have any suggestions what I might be missing? >>>>>> >>>>>> Thanks in advance! >>>>>> >>>>>> Best Regards, >>>>>> Rado >>>>>> >>>>>

Re: savepoint failure

2021-07-13 Thread Dan Hill
pIndex(KeyGroupRangeOffsets.java:142) >>>>> at >>>>> org.apache.flink.runtime.state.KeyGroupRangeOffsets.setKeyGroupOffset(KeyGroupRangeOffsets.java:104) >>>>> at >>>>> org.apache.flink.contrib.streaming.state.snapshot.RocksFullSnapshotStrate

Re: savepoint failure

2021-07-13 Thread Dan Hill
9) >>>> at >>>> org.apache.flink.contrib.streaming.state.snapshot.RocksFullSnapshotStrategy$SnapshotAsynchronousPartCallable.writeSnapshotToOutputStream(RocksFullSnapshotStrategy.java:261) >>>> >>>> Note: key group might vary. >>>> >>>> I found this >>>> <https://stackoverflow.com/questions/49140654/flink-error-key-group-is-not-in-keygrouprange> >>>> article >>>> in Stackoverflow which relates to such an exception (btw my job graph looks >>>> similar to the one described in the article except that my job has more >>>> joins). I double checked my hashcodes and I think that they are fine. >>>> >>>> I tried to reduce the parallelism to 1 with 1 task slot per task >>>> manager and this configuration seems to work. This leads me to a direction >>>> that it might be some concurrency issue. >>>> >>>> I would like to understand what is causing the savepoint failure. Do >>>> you have any suggestions what I might be missing? >>>> >>>> Thanks in advance! >>>> >>>> Best Regards, >>>> Rado >>>> >>>

Re: Savepoint failure with operation not found under key

2021-06-29 Thread Rainie Li
I see, then it passed longer than 5 mins. Thanks for the help. Best regards Rainie On Tue, Jun 29, 2021 at 12:29 AM Chesnay Schepler wrote: > How much time has passed between the requests? (You can only query the > status for about 5 minutes) > > On 6/29/2021 6:37 AM, Rainie Li wrote: > > Thank

Re: Savepoint failure with operation not found under key

2021-06-29 Thread Chesnay Schepler
How much time has passed between the requests? (You can only query the status for about 5 minutes) On 6/29/2021 6:37 AM, Rainie Li wrote: Thanks for the context Chesnay. Yes, I sent both requests to the same JM. Best regards Rainie On Mon, Jun 28, 2021 at 8:33 AM Chesnay Schepler

Re: Savepoint failure with operation not found under key

2021-06-28 Thread Rainie Li
Thanks for the context Chesnay. Yes, I sent both requests to the same JM. Best regards Rainie On Mon, Jun 28, 2021 at 8:33 AM Chesnay Schepler wrote: > Ordinarily this happens because the status request is sent to a different > JM than the one who received the request for creating a savepoint.

Re: Savepoint failure with operation not found under key

2021-06-28 Thread Chesnay Schepler
Ordinarily this happens because the status request is sent to a different JM than the one who received the request for creating a savepoint. The meta information for such requests is only stored locally on each JM and neither distributed to all JMs nor persisted anywhere. Did you send both requ

Savepoint failure with operation not found under key

2021-06-26 Thread Rainie Li
Hi Flink Community, I found this error when I tried to create a savepoint for my flink job. It's in version 1.9. { "errors": [ "Operation not found under key: org.apache.flink.runtime.rest.handler.job.AsynchronousJobOperationKey@57b9711e" ] } Here is error from JM log: 2021-06-2

Re: savepoint failure

2020-10-23 Thread Till Rohrmann
;>> Note: key group might vary. >>> >>> I found this >>> <https://stackoverflow.com/questions/49140654/flink-error-key-group-is-not-in-keygrouprange> >>> article >>> in Stackoverflow which relates to such an exception (btw my job graph looks >>> similar to the one described in the article except that my job has more >>> joins). I double checked my hashcodes and I think that they are fine. >>> >>> I tried to reduce the parallelism to 1 with 1 task slot per task manager >>> and this configuration seems to work. This leads me to a direction that it >>> might be some concurrency issue. >>> >>> I would like to understand what is causing the savepoint failure. Do you >>> have any suggestions what I might be missing? >>> >>> Thanks in advance! >>> >>> Best Regards, >>> Rado >>> >>

Re: savepoint failure

2020-10-23 Thread Till Rohrmann
k slot per task manager > and this configuration seems to work. This leads me to a direction that it > might be some concurrency issue. > > I would like to understand what is causing the savepoint failure. Do you > have any suggestions what I might be missing? > > Thanks in advance! > > Best Regards, > Rado >

savepoint failure

2020-10-21 Thread Radoslav Smilyanov
. This leads me to a direction that it might be some concurrency issue. I would like to understand what is causing the savepoint failure. Do you have any suggestions what I might be missing? Thanks in advance! Best Regards, Rado