Re: Use of slot sharing groups causing workflow to hang

2020-09-09 Thread Xintong Song
Hi Ken, I've got a Flink MiniCluster with 12 slots. Even with only 6 pipelined > operators, each with a parallelism of 1, it still hangs while starting. > Could you double check that the minicluster has 12 slots when each or your operators has only 1 parallelism? I've looked into the codes. Curr

Re: Use of slot sharing groups causing workflow to hang

2020-09-09 Thread Yangze Guo
Hi, Ken >From the RM perspective, could you share the following logs: - "Request slot with profile {} for job {} with allocation id {}.". - "Requesting new slot [{}] and profile {} with allocation id {} from resource manager." This will help to figure out how many slots your job indeed requests. A

Re: Use of slot sharing groups causing workflow to hang

2020-09-09 Thread Ken Krugler
Hi Til, > On Sep 3, 2020, at 12:31 AM, Till Rohrmann wrote: > > Hi Ken, > > I believe that we don't have a lot if not any explicit logging about the slot > sharing group in the code. You can, however, learn indirectly about it by > looking at the required number of AllocatedSlots in the SlotP

Re: Use of slot sharing groups causing workflow to hang

2020-09-03 Thread Till Rohrmann
Hi Ken, I believe that we don't have a lot if not any explicit logging about the slot sharing group in the code. You can, however, learn indirectly about it by looking at the required number of AllocatedSlots in the SlotPool. Also the number of "multi task slot" which are created should vary becau

Re: Use of slot sharing groups causing workflow to hang

2020-09-02 Thread Yangze Guo
Hi, The failure of requesting slots usually because of the lack of resources. If you put part of the workflow to a specific slot sharing group, it may require more slots to run the workflow than before. Could you share logs of the ResourceManager and SlotManager, I think there are more clues in it