Hi all, I have a job with a large amount of broadcast state (62MB).
I took a savepoint when my workflow was running with parallelism 300. I then restarted the workflow with parallelism 400. The first 297 sub-tasks restored their broadcast state fairly quickly, but after that it slowed to a crawl (maybe 2 sub-tasks finished per minute) After 10 minutes we killed the job, so I don’t know if it would have ultimately succeeded. Is this expected? Seems like it could lead to a bad situation, where it would take an hour to restart the workflow. Thanks, — Ken -------------------------- Ken Krugler http://www.scaleunlimited.com Custom big data solutions Flink, Pinot, Solr, Elasticsearch