Re: Task Manager restart and RocksDB incremental checkpoints issue.

Vidya Sagar Mula Mon, 14 Nov 2022 08:46:47 -0800

Hi Martjin,

Thanks for the info. We are in the process of moving to 1.15. Is this
version actively supported by community?


And coming to my original and follow up questions, I checked the
RocksDbStatebackend code from 1.11 and 1.15, it is same.

Given K8s configuration with Volume and mounth path, I would like to know
the design recommendation for Rocks Db local storage path.

Thanks,
Vidya

On Mon, Nov 14, 2022 at 6:57 AM Martijn Visser <martijnvis...@apache.org>
wrote:

> Hi Vidya,
>
> Given that you are still on Flink 1.11 which was released in July 2020 and
> no longer supported by the community, I would recommend first upgrading to
> a later, supported version like Flink 1.16.
>
> Best regards,
>
> Martijn
>
> On Sat, Nov 12, 2022 at 8:07 PM Vidya Sagar Mula <mulasa...@gmail.com>
> wrote:
>
>> Hi Yanfei,
>>
>> Thank you for the response. I have follow up answer and questions.
>>
>> I have two set ups. One is on the local environment and the other one is
>> a deployment scenario that is on K8s.
>>
>> - In K8s set up, I have Volume on the cluster node and mount path is
>> specified for the RockDB checkpoints location. So, when the Application TM
>> POD is restarted, the older checkpoints are read back from the host path
>> again when the TM is UP again.
>> In this case, RocksDB local directory is pulled with all the older data
>> which is not useful for the JOB ID as the "instanceBasePath" is calculated
>> with new random UUID.
>>
>> Questions:
>> - What do you think about the older files that are pulled from the
>> hostpath to mount path should be deleted first and then create the new
>> instanceBasepath?
>> Otherwise, we are going to be ended with the GBs of unwanted data.
>>
>> What is the general design recommendation is such cases where RocksDB has
>> mount path to a Volume on host node?
>> Please clarify.
>>
>> Thanks,
>> Vidya Sagar.
>>
>>
>> On Thu, Nov 10, 2022 at 7:52 PM Yanfei Lei <fredia...@gmail.com> wrote:
>>
>>> Hi Vidya Sagar,
>>>
>>> Could you please share the reason for TaskManager restart? If the
>>> machine or JVM process of TaskManager crashes, the
>>> `RocksDBKeyedStateBackend` can't be disposed/closed normally,  so the
>>> existing rocksdb instance directory would remain.
>>>
>>> BTW, if you use Application Mode on k8s, if a TaskManager(pod) crashes,
>>> the rocksdb directory would be deleted as the pod is released.
>>>
>>> Best,
>>> Yanfei
>>>
>>> Vidya Sagar Mula <mulasa...@gmail.com> 于2022年11月11日周五 01:39写道：
>>>
>>>> Hi,
>>>>
>>>> I am using RocksDB state backend for incremental checkpointing with
>>>> Flink 1.11 version.
>>>>
>>>> Question:
>>>> ----------
>>>> For a given Job ID, Intermediate RocksDB checkpoints are stored under
>>>> the path defined with ""
>>>>
>>>> The files are stored with "_jobID+ radom UUID" prefixed to the location.
>>>>
>>>> Case : 1
>>>> ---------
>>>> - When I cancel the job, then all the rocksDB checkpoints are deleted
>>>> properly from the location corresponding to that JobId.
>>>> (based on "instanceBasePath" variable stored in
>>>> RocksDBKeyedStateBackend object).
>>>> "NO Issue here. Working as expected".
>>>>
>>>> Case : 2
>>>> ---------
>>>> - When my TaskManger is restarted, the existing rocksDb checkpoints are
>>>> not deleted.
>>>> New "instanceBasePath" is constructed with the new Random UUID
>>>> appended to the directory.
>>>> And, old checkpoint directories are still there.
>>>>
>>>> questions:
>>>> - Is this expected behaviour not to delete the existing checkPoint
>>>> dirs under the rocksDB local directory?
>>>> - I see the "StreamTaskStateInitializerImpl.java", where new
>>>> StateBackend objects are created. In this case, new directory is created
>>>> for this Job ID appended with new random UUID.
>>>> What happens to the old Directories. Are they going to be purged later
>>>> on?
>>>> If not, the disk is going to filled up with the older checkpoints.
>>>> Please clarify this.
>>>>
>>>> Thanks,
>>>> Vidya Sagar.
>>>>
>>>
>>>
>>>

Re: Task Manager restart and RocksDB incremental checkpoints issue.

Reply via email to