Hi Ken, > Broadcast state is weird in that it’s duplicated, apparently avoid “hot > spots” when restoring from state. So I’m wondering how Flink handles the case > of restoring broadcast state when the parallelism increases.
The Flink doc is here: https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/datastream/fault-tolerance/broadcast_state/. Particularly, > Upon scaling up, each task reads its own state, and the remaining tasks > (p_new-p_old) read checkpoints of previous tasks in a round-robin manner. You could also consider to enable DEBUG logs (for necessary classes) when you give another try to see what happens in TMs. I also suggest to check all of your state storage metrics for any possible indication. Thanks Jun > On Dec 16, 2022, at 4:48 PM, Ken Krugler <kkrugler_li...@transpac.com> wrote: > > Hi Jun, > > Thanks for following up. > > The state storage is internal at a client, and isn’t throttled. > > Also restoring from the savepoint when we didn’t change the parallelism was > fine. > > I didn’t see any errors in the TM logs, but I didn’t carefully inspect them - > I’ll do that when we give this another test. > > Broadcast state is weird in that it’s duplicated, apparently avoid “hot > spots” when restoring from state. So I’m wondering how Flink handles the case > of restoring broadcast state when the parallelism increases. > > Regards, > > — Ken > > >> On Dec 15, 2022, at 4:33 PM, Jun Qin <qinjunje...@gmail.com >> <mailto:qinjunje...@gmail.com>> wrote: >> >> Hi Ken, >> >> Without knowning the details, the first thing I would suggest to check is >> whether you have reached a threshold which is configured in your state >> storage (e.g., s3) therefore your further download were throttled. Checking >> your storage metrics or logs should help to confirm whether this is the case. >> >> In addition, in those TMs where the restarting was slow, do you see anything >> suspicious in the logs, e.g., reconnecting? >> >> Thanks >> Jun >> >> >> >> >> 发自我的手机 >> >> >> -------- 原始邮件 -------- >> 发件人: Ken Krugler <kkrugler_li...@transpac.com >> <mailto:kkrugler_li...@transpac.com>> >> 日期: 2022年12月14日周三 19:32 >> 收件人: User <user@flink.apache.org <mailto:user@flink.apache.org>> >> 主 题: Slow restart from savepoint with large broadcast state when >> increasing parallelism >> Hi all, >> >> I have a job with a large amount of broadcast state (62MB). >> >> I took a savepoint when my workflow was running with parallelism 300. >> >> I then restarted the workflow with parallelism 400. >> >> The first 297 sub-tasks restored their broadcast state fairly quickly, but >> after that it slowed to a crawl (maybe 2 sub-tasks finished per minute) >> >> After 10 minutes we killed the job, so I don’t know if it would have >> ultimately succeeded. >> >> Is this expected? Seems like it could lead to a bad situation, where it >> would take an hour to restart the workflow. >> >> Thanks, >> >> — Ken >> >> -------------------------- >> Ken Krugler >> http://www.scaleunlimited.com <http://www.scaleunlimited.com/> >> Custom big data solutions >> Flink, Pinot, Solr, Elasticsearch >> > > -------------------------- > Ken Krugler > http://www.scaleunlimited.com <http://www.scaleunlimited.com/> > Custom big data solutions > Flink, Pinot, Solr, Elasticsearch > > >