Re: Flink: For terabytes of keyed state.

Congxian Qiu Wed, 06 May 2020 03:30:03 -0700

Hi Gowri

Please let us know if you meet any problem~


Best,
Congxian


Gowri Sundaram <gowripsunda...@gmail.com> 于2020年5月6日周三 下午1:53写道：

> Hi Congxian,
> Thank you so much for your response! We will go ahead and do a POC to test
> out how Flink performs at scale.
>
> Regards,
> - Gowri
>
> On Wed, May 6, 2020 at 8:34 AM Congxian Qiu <qcx978132...@gmail.com>
> wrote:
>
>> Hi
>>
>> From my experience, you should care the state size for a single task(not
>> the whole job state size), the download speed for single thread is almost
>> 100 MB/s (this may vary in different env), and I do not have much
>> performance for loading state into RocksDB(we use an internal KV store in
>> my company), but loading state into RocksDB will not slower than
>> downloading from my experience.
>>
>> Best,
>> Congxian
>>
>>
>> Gowri Sundaram <gowripsunda...@gmail.com> 于2020年5月3日周日 下午11:25写道：
>>
>>> Hi Congxian,
>>> Thank you so much for your response, that really helps!
>>>
>>> From your experience, how long does it take for Flink to redistribute
>>> terabytes of state data on node addition / node failure.
>>>
>>> Thanks!
>>>
>>> On Sun, May 3, 2020 at 6:56 PM Congxian Qiu <qcx978132...@gmail.com>
>>> wrote:
>>>
>>>> Hi
>>>>
>>>> 1. From my experience, Flink can support such big state, you can set
>>>> appropriate parallelism for the stateful operator. for RocksDB you may need
>>>> to care about the disk performance.
>>>> 2. Inside Flink, the state is separated by key-group, each
>>>> task/parallelism contains multiple key-groups.  Flink does not need to
>>>> restart when you add a node to the cluster, but every time restart from
>>>> savepoint/checkpoint(or failover), Flink needs to redistribute the
>>>> checkpoint data, this can be omitted if it's failover and local recovery[1]
>>>> is enabled
>>>> 3. for upload/download state, you can ref to the multiple thread
>>>> upload/download[2][3] for speed up them
>>>>
>>>> [1]
>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/state/large_state_tuning.html#task-local-recovery
>>>> [2] https://issues.apache.org/jira/browse/FLINK-10461
>>>> [3] https://issues.apache.org/jira/browse/FLINK-11008
>>>>
>>>> Best,
>>>> Congxian
>>>>
>>>>
>>>> Gowri Sundaram <gowripsunda...@gmail.com> 于2020年5月1日周五 下午6:29写道：
>>>>
>>>>> Hello all,
>>>>> We have read in multiple
>>>>> <https://flink.apache.org/features/2018/01/30/incremental-checkpointing.html>
>>>>> sources <https://flink.apache.org/usecases.html> that Flink has been
>>>>> used for use cases with terabytes of application state.
>>>>>
>>>>> We are considering using Flink for a similar use case with* keyed
>>>>> state in the range of 20 to 30 TB*. We had a few questions regarding
>>>>> the same.
>>>>>
>>>>>
>>>>>    - *Is Flink a good option for this kind of scale of data* ? We are
>>>>>    considering using RocksDB as the state backend.
>>>>>    - *What happens when we want to add a node to the cluster *?
>>>>>       - As per our understanding, if we have 10 nodes in our cluster,
>>>>>       with 20TB of state, this means that adding a node would require the 
>>>>> entire
>>>>>       20TB of data to be shipped again from the external checkpoint remote
>>>>>       storage to the taskmanager nodes.
>>>>>       - Assuming 1Gb/s network speed, and assuming all nodes can read
>>>>>       their respective 2TB state parallely, this would mean a *minimum
>>>>>       downtime of half an hour*. And this is assuming the throughput
>>>>>       of the remote storage does not become the bottleneck.
>>>>>       - Is there any way to reduce this estimated downtime ?
>>>>>
>>>>>
>>>>> Thank you!
>>>>>
>>>>

Re: Flink: For terabytes of keyed state.

Reply via email to