Re: Question over Incremental Snapshot vs Full Snapshot in rocksDb state backend

chandan prakash Sun, 28 Oct 2018 22:18:02 -0700

Thanks a lot Andrey .
Your answer to 1st question clarifies well for both the questions.
Appreciate your help in clearing doubt



Regards,
Chandan

On Thu, Oct 25, 2018 at 6:10 PM Andrey Zagrebin <and...@data-artisans.com>
wrote:

> Hi Chandan,
>
> > 1. Why did we took 2 different approaches using different RocksDB apis ?
> > We could have used Checkpoint api of RocksDB for fullSnapshot as well .
>
> The reason here is partially historical. Full snapshot in RocksDB backend
> was implemented before incremental and rescaling for incremental snapshot
> but after heap backend. Full snapshot in RocksDB uses approach close to
> heap backend because Flink community plans to support the unified format
> for savepoints. The unified format would make it possible to switch
> backends and restore from savepoint. The formats still differ due to
> backend specifics to optimise snapshotting and restore but it is
> technically possible to unify them in future.
>
> > 2. Is there any specific reason to use Snapshot API of rocksDB
> over Checkpoint api of RocksDB for fullSnapshot?
>
> I think Checkpoint API produces separate SST file list to copy them to
> HDFS in case of incremental snapshot.
>
> Full snapshot does not need the file list, it just needs an iterator over
> snapshotted (frozen) data. Internally RocksDB just hard-links immutable
> already existing SST files and iterates their data for Snapshot API.
>
> Best,
> Andrey
>
>
> On 24 Oct 2018, at 18:40, chandan prakash <chandanbaran...@gmail.com>
> wrote:
>
> Thanks Tzu-Li for redirecting.
> Would also like to be corrected if my any inference from the code is
> incorrect or incomplete.
> I am sure it will help to clear doubts of more developers like me  :)
> Thanks in advance.
>
> Regards,
> Chandan
>
>
> On Wed, Oct 24, 2018 at 9:19 PM Tzu-Li (Gordon) Tai <tzuli...@apache.org>
> wrote:
>
>> Hi,
>>
>> I’m forwarding this question to Stefan (cc’ed).
>> He would most likely be able to answer your question, as he has done
>> substantial work in the RocksDB state backends.
>>
>> Cheers,
>> Gordon
>>
>>
>> On 24 October 2018 at 8:47:24 PM, chandan prakash (
>> chandanbaran...@gmail.com) wrote:
>>
>> Hi,
>> I am new to Flink.
>> Was looking into the code to understand how Flink does FullSnapshot and
>> Incremental Snapshot using RocksDB
>>
>> What I understood:
>> 1. *For full snapshot, we call RocksDb snapshot api* which basically an
>> iterator handle to the entries in RocksDB instance. We iterate over every
>> entry one by one and serialize that to some distributed file system.
>> Similarly in restore for fullSnapshot, we read the file to get every
>> entry and apply that to the rocksDb instance one by one to fully construct
>> the db instance.
>>
>> 2. On the other hand in *for Incremental Snapshot, we rely on RocksDB
>> Checkpoint api* to copy the sst files to HDFS/S3 incrementally.
>> Similarly on restore, we copy the sst files to local directory and
>> instantiate rocksDB instance with the path of the directory.
>>
>> *My Question is:*
>> 1. Why did we took 2 different approaches using different RocksDB apis ?
>> We could have used Checkpoint api of RocksDB for fullSnapshot as well .
>> 2. Is there any specific reason to use *Snapshot API of rocksDB*  over 
>> *Checkpoint
>> api of RocksDB* for *fullSnapshot*?
>>
>> I am sure, I am missing some important point, really curious to know that.
>> Any explanation will be really great. Thanks in advance.
>>
>>
>> Regards,
>> Chandan
>>
>>
>>
>>
>>
>> --
>> Chandan Prakash
>>
>>
>
> --
> Chandan Prakash
>
>
>

-- 
Chandan Prakash

Re: Question over Incremental Snapshot vs Full Snapshot in rocksDb state backend

Reply via email to