Hi Til, > On Sep 3, 2020, at 12:31 AM, Till Rohrmann <trohrm...@apache.org> wrote: > > Hi Ken, > > I believe that we don't have a lot if not any explicit logging about the slot > sharing group in the code. You can, however, learn indirectly about it by > looking at the required number of AllocatedSlots in the SlotPool. Also the > number of "multi task slot" which are created should vary because every group > of slot sharing tasks will create one of them. For learning about the > SlotPoolImpl's status, you can also take a look at SlotPoolImpl.printStatus. > > For the underlying problem, I believe that Yangze could be right. How many > resources do you have in your cluster?
I've got a Flink MiniCluster with 12 slots. Even with only 6 pipelined operators, each with a parallelism of 1, it still hangs while starting. So I don't think that it's a resource issue. One odd thing I've noticed. I've got three streams that I union together. Two of the streams are in separate slot sharing groups, the third is not assigned to a group. But when I check the logs, I see three "Create multi task slot" entries. I'm wondering if unioning streams that are in different slot sharing groups creates a problem. Thanks, -- Ken > On Thu, Sep 3, 2020 at 4:25 AM Yangze Guo <karma...@gmail.com > <mailto:karma...@gmail.com>> wrote: > Hi, > > The failure of requesting slots usually because of the lack of > resources. If you put part of the workflow to a specific slot sharing > group, it may require more slots to run the workflow than before. > Could you share logs of the ResourceManager and SlotManager, I think > there are more clues in it. > > Best, > Yangze Guo > > On Thu, Sep 3, 2020 at 4:39 AM Ken Krugler <kkrugler_li...@transpac.com > <mailto:kkrugler_li...@transpac.com>> wrote: > > > > Hi all, > > > > I’ve got a streaming workflow (using Flink 1.11.1) that runs fine locally > > (via Eclipse), with a parallelism of either 3 or 6. > > > > If I set up part of the workflow to use a specific (not “default”) slot > > sharing group with a parallelism of 3, and the remaining portions of the > > workflow have a parallelism of either 1 or 2, then the workflow never > > starts running, and eventually fails due to a slot request not being > > fulfilled in time. > > > > So I’m wondering how best to debug this. > > > > I don’t see any information (even at DEBUG level) being logged about which > > operators are in what slot sharing group, or which slots are assigned to > > what groups. > > > > Thanks, > > > > — Ken > > > > PS - I’ve looked at https://issues.apache.org/jira/browse/FLINK-8712 > > <https://issues.apache.org/jira/browse/FLINK-8712>, and tried the approach > > of setting # of slots in the config, but that didn’t change anything. I see > > that issue is still open, so wondering what Til and Konstantin have to say > > about it. > > > > -------------------------- > > Ken Krugler > > http://www.scaleunlimited.com <http://www.scaleunlimited.com/> > > custom big data solutions & training > > Hadoop, Cascading, Cassandra & Solr > > -------------------------- Ken Krugler http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassandra & Solr