Hi Gowri Please let us know if you meet any problem~
Best, Congxian Gowri Sundaram <gowripsunda...@gmail.com> 于2020年5月6日周三 下午1:53写道: > Hi Congxian, > Thank you so much for your response! We will go ahead and do a POC to test > out how Flink performs at scale. > > Regards, > - Gowri > > On Wed, May 6, 2020 at 8:34 AM Congxian Qiu <qcx978132...@gmail.com> > wrote: > >> Hi >> >> From my experience, you should care the state size for a single task(not >> the whole job state size), the download speed for single thread is almost >> 100 MB/s (this may vary in different env), and I do not have much >> performance for loading state into RocksDB(we use an internal KV store in >> my company), but loading state into RocksDB will not slower than >> downloading from my experience. >> >> Best, >> Congxian >> >> >> Gowri Sundaram <gowripsunda...@gmail.com> 于2020年5月3日周日 下午11:25写道: >> >>> Hi Congxian, >>> Thank you so much for your response, that really helps! >>> >>> From your experience, how long does it take for Flink to redistribute >>> terabytes of state data on node addition / node failure. >>> >>> Thanks! >>> >>> On Sun, May 3, 2020 at 6:56 PM Congxian Qiu <qcx978132...@gmail.com> >>> wrote: >>> >>>> Hi >>>> >>>> 1. From my experience, Flink can support such big state, you can set >>>> appropriate parallelism for the stateful operator. for RocksDB you may need >>>> to care about the disk performance. >>>> 2. Inside Flink, the state is separated by key-group, each >>>> task/parallelism contains multiple key-groups. Flink does not need to >>>> restart when you add a node to the cluster, but every time restart from >>>> savepoint/checkpoint(or failover), Flink needs to redistribute the >>>> checkpoint data, this can be omitted if it's failover and local recovery[1] >>>> is enabled >>>> 3. for upload/download state, you can ref to the multiple thread >>>> upload/download[2][3] for speed up them >>>> >>>> [1] >>>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/state/large_state_tuning.html#task-local-recovery >>>> [2] https://issues.apache.org/jira/browse/FLINK-10461 >>>> [3] https://issues.apache.org/jira/browse/FLINK-11008 >>>> >>>> Best, >>>> Congxian >>>> >>>> >>>> Gowri Sundaram <gowripsunda...@gmail.com> 于2020年5月1日周五 下午6:29写道: >>>> >>>>> Hello all, >>>>> We have read in multiple >>>>> <https://flink.apache.org/features/2018/01/30/incremental-checkpointing.html> >>>>> sources <https://flink.apache.org/usecases.html> that Flink has been >>>>> used for use cases with terabytes of application state. >>>>> >>>>> We are considering using Flink for a similar use case with* keyed >>>>> state in the range of 20 to 30 TB*. We had a few questions regarding >>>>> the same. >>>>> >>>>> >>>>> - *Is Flink a good option for this kind of scale of data* ? We are >>>>> considering using RocksDB as the state backend. >>>>> - *What happens when we want to add a node to the cluster *? >>>>> - As per our understanding, if we have 10 nodes in our cluster, >>>>> with 20TB of state, this means that adding a node would require the >>>>> entire >>>>> 20TB of data to be shipped again from the external checkpoint remote >>>>> storage to the taskmanager nodes. >>>>> - Assuming 1Gb/s network speed, and assuming all nodes can read >>>>> their respective 2TB state parallely, this would mean a *minimum >>>>> downtime of half an hour*. And this is assuming the throughput >>>>> of the remote storage does not become the bottleneck. >>>>> - Is there any way to reduce this estimated downtime ? >>>>> >>>>> >>>>> Thank you! >>>>> >>>>