Re: Slow restart from savepoint with large broadcast state when increasing parallelism

Jun Qin Fri, 16 Dec 2022 09:04:18 -0800

Hi Ken,

> Broadcast state is weird in that it’s duplicated, apparently avoid “hot 
> spots” when restoring from state. So I’m wondering how Flink handles the case 
> of restoring broadcast state when the parallelism increases.


The Flink doc is here: 
https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/datastream/fault-tolerance/broadcast_state/.
 Particularly,
> Upon scaling up, each task reads its own state, and the remaining tasks 
> (p_new-p_old) read checkpoints of previous tasks in a round-robin manner.

You could also consider to enable DEBUG logs (for necessary classes) when you 
give another try to see what happens in TMs. I also suggest to check all of 
your state storage metrics for any possible indication.

Thanks
Jun

> On Dec 16, 2022, at 4:48 PM, Ken Krugler <kkrugler_li...@transpac.com> wrote:
> 
> Hi Jun,
> 
> Thanks for following up.
> 
> The state storage is internal at a client, and isn’t throttled.
> 
> Also restoring from the savepoint when we didn’t change the parallelism was 
> fine.
> 
> I didn’t see any errors in the TM logs, but I didn’t carefully inspect them - 
> I’ll do that when we give this another test.
> 
> Broadcast state is weird in that it’s duplicated, apparently avoid “hot 
> spots” when restoring from state. So I’m wondering how Flink handles the case 
> of restoring broadcast state when the parallelism increases.
> 
> Regards,
> 
> — Ken
>  
> 
>> On Dec 15, 2022, at 4:33 PM, Jun Qin <qinjunje...@gmail.com 
>> <mailto:qinjunje...@gmail.com>> wrote:
>> 
>> Hi Ken,
>> 
>> Without knowning the details, the first thing I would suggest to check is 
>> whether you have reached a threshold which is configured in your state 
>> storage (e.g., s3) therefore your further download were throttled. Checking 
>> your storage metrics or logs should help to confirm whether this is the case.
>> 
>> In addition, in those TMs where the restarting was slow, do you see anything 
>> suspicious in the logs, e.g., reconnecting?
>> 
>> Thanks
>> Jun
>> 
>> 
>> 
>> 
>> 发自我的手机
>> 
>> 
>> -------- 原始邮件 --------
>> 发件人： Ken Krugler <kkrugler_li...@transpac.com 
>> <mailto:kkrugler_li...@transpac.com>>
>> 日期： 2022年12月14日周三 19:32
>> 收件人： User <user@flink.apache.org <mailto:user@flink.apache.org>>
>> 主 题： Slow restart from savepoint with large broadcast state when
>> increasing parallelism
>> Hi all,
>> 
>> I have a job with a large amount of broadcast state (62MB).
>> 
>> I took a savepoint when my workflow was running with parallelism 300.
>> 
>> I then restarted the workflow with parallelism 400.
>> 
>> The first 297 sub-tasks restored their broadcast state fairly quickly, but 
>> after that it slowed to a crawl (maybe 2 sub-tasks finished per minute)
>> 
>> After 10 minutes we killed the job, so I don’t know if it would have 
>> ultimately succeeded.
>> 
>> Is this expected? Seems like it could lead to a bad situation, where it 
>> would take an hour to restart the workflow.
>> 
>> Thanks,
>> 
>> — Ken
>> 
>> --------------------------
>> Ken Krugler
>> http://www.scaleunlimited.com <http://www.scaleunlimited.com/>
>> Custom big data solutions
>> Flink, Pinot, Solr, Elasticsearch
>> 
> 
> --------------------------
> Ken Krugler
> http://www.scaleunlimited.com <http://www.scaleunlimited.com/>
> Custom big data solutions
> Flink, Pinot, Solr, Elasticsearch
> 
> 
>

Re: Slow restart from savepoint with large broadcast state when increasing parallelism

Reply via email to