Re: Use of slot sharing groups causing workflow to hang

Yangze Guo Wed, 02 Sep 2020 19:26:01 -0700

Hi,

The failure of requesting slots usually because of the lack of
resources. If you put part of the workflow to a specific slot sharing
group, it may require more slots to run the workflow than before.
Could you share logs of the ResourceManager and SlotManager, I think
there are more clues in it.


Best,
Yangze Guo

On Thu, Sep 3, 2020 at 4:39 AM Ken Krugler <kkrugler_li...@transpac.com> wrote:
>
> Hi all,
>
> I’ve got a streaming workflow (using Flink 1.11.1) that runs fine locally 
> (via Eclipse), with a parallelism of either 3 or 6.
>
> If I set up part of the workflow to use a specific (not “default”) slot 
> sharing group with a parallelism of 3, and the remaining portions of the 
> workflow have a parallelism of either 1 or 2, then the workflow never starts 
> running, and eventually fails due to a slot request not being fulfilled in 
> time.
>
> So I’m wondering how best to debug this.
>
> I don’t see any information (even at DEBUG level) being logged about which 
> operators are in what slot sharing group, or which slots are assigned to what 
> groups.
>
> Thanks,
>
> — Ken
>
> PS - I’ve looked at https://issues.apache.org/jira/browse/FLINK-8712, and 
> tried the approach of setting # of slots in the config, but that didn’t 
> change anything. I see that issue is still open, so wondering what Til and 
> Konstantin have to say about it.
>
> --------------------------
> Ken Krugler
> http://www.scaleunlimited.com
> custom big data solutions & training
> Hadoop, Cascading, Cassandra & Solr
>

Re: Use of slot sharing groups causing workflow to hang

Reply via email to