Re: Issues while upgrading from 1.12.1 to 1.14.0

Parag Somani Tue, 05 Oct 2021 23:42:46 -0700

Yes Nico. I have evaluated this.

I have tried below:


   1. Take the savepoint
   2. Stop the job
   3. Shutdown the instances
   4. Started new pod using below command:

/docker-entrypoint.sh "standalone-job" "-Ds3.access-key=${AWS_ACCESS_KEY_ID}
" "-Ds3.secret-key=${AWS_SECRET_ACCESS_KEY}"  "-Ds3.endpoint=
${AWS_S3_ENDPOINT}" "-Dhigh-availability.zookeeper.quorum=
${ZOOKEEPER_CLUSTER}" "--job-classname" "com.test.MySpringBootApp"
"--fromSavepoint" "s3://s3-health-service-discovery/savepoints" ${args}

I haven't observed any errors during start-up in logs. But the state got
reset i.e. values stored inside the accumulator got flushed.

On Tue, Oct 5, 2021 at 9:40 PM Nicolaus Weidner <
nicolaus.weid...@ververica.com> wrote:

> Hi Parag,
>
> I am not so familiar with the setup you are using, but did you check out
> [1]? Maybe the parameter
> [--fromSavepoint /path/to/savepoint [--allowNonRestoredState]]
> is what you are looking for?
>
> Best regards,
> Nico
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/standalone/docker/#application-mode-on-docker
>
> On Tue, Oct 5, 2021 at 12:37 PM Parag Somani <somanipa...@gmail.com>
> wrote:
>
>> Hello,
>>
>> We are currently using Apache flink 1.12.0 deployed on k8s cluster of
>> 1.18 with zk for HA. Due to certain vulnerabilities in container related
>> with few jar(like netty-*, meso), we are forced to upgrade.
>>
>> While upgrading flink to 1.14.0, faced NPE,
>> https://issues.apache.org/jira/browse/FLINK-23901?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=17402570#comment-17402570
>>
>> To address it, I have followed steps
>>
>>    1. savepoint creation
>>    2. Stop the job
>>    3. Restore from save point where i am facing challenge.
>>
>> For step #3 from above, i was able to restore from savepoint mainly
>> because:
>> "bin/flink run -s :savepointPath [:runArgs] "
>> It majorly about restarting a jar file uploaded. As our application is
>> based on k8s and running using docker, i was not able to restore it. And
>> because of it, state of variables in accumulator got corrupted and i lost
>> the data in one of env.
>>
>> My query is, what is preffered way to restore from savepoint, if
>> application is running on k8s using docker.
>>
>> We are using following command to run job manager:
>>  /docker-entrypoint.sh "standalone-job" "-Ds3.access-key=
>> ${AWS_ACCESS_KEY_ID}" "-Ds3.secret-key=${AWS_SECRET_ACCESS_KEY}"
>> "-Ds3.endpoint=${AWS_S3_ENDPOINT}" "-Dhigh-availability.zookeeper.quorum=
>> ${ZOOKEEPER_CLUSTER}" "--job-classname" "<class-name>"  ${args}
>>
>> Thank you in advance...!
>>
>> --
>> Regards,
>> Parag Surajmal Somani.
>>
>

-- 
Regards,
Parag Surajmal Somani.

Re: Issues while upgrading from 1.12.1 to 1.14.0

Reply via email to