Hi, The failure of requesting slots usually because of the lack of resources. If you put part of the workflow to a specific slot sharing group, it may require more slots to run the workflow than before. Could you share logs of the ResourceManager and SlotManager, I think there are more clues in it.
Best, Yangze Guo On Thu, Sep 3, 2020 at 4:39 AM Ken Krugler <kkrugler_li...@transpac.com> wrote: > > Hi all, > > I’ve got a streaming workflow (using Flink 1.11.1) that runs fine locally > (via Eclipse), with a parallelism of either 3 or 6. > > If I set up part of the workflow to use a specific (not “default”) slot > sharing group with a parallelism of 3, and the remaining portions of the > workflow have a parallelism of either 1 or 2, then the workflow never starts > running, and eventually fails due to a slot request not being fulfilled in > time. > > So I’m wondering how best to debug this. > > I don’t see any information (even at DEBUG level) being logged about which > operators are in what slot sharing group, or which slots are assigned to what > groups. > > Thanks, > > — Ken > > PS - I’ve looked at https://issues.apache.org/jira/browse/FLINK-8712, and > tried the approach of setting # of slots in the config, but that didn’t > change anything. I see that issue is still open, so wondering what Til and > Konstantin have to say about it. > > -------------------------- > Ken Krugler > http://www.scaleunlimited.com > custom big data solutions & training > Hadoop, Cascading, Cassandra & Solr >