Thanks Guowei! I'll check it out.

Best,
Zhanghao Chen
________________________________
From: Guowei Ma <guowei....@gmail.com>
Sent: Wednesday, April 6, 2022 16:01
To: Zhanghao Chen <zhanghao.c...@outlook.com>
Cc: user@flink.apache.org <user@flink.apache.org>
Subject: Re: Why first op after union cannot be chained?

Hi Zhanghao

AFAIK, you might to see the `StreamingJobGraphGenerator` not the 
`JobGraphGenerator` which is only used by the old flink stream sql stack.
>From comment of the `StreamingJobGraphGenerator::isChainableInput` the `an 
>union operator` does not support chain currently.

Best,
Guowei


On Wed, Apr 6, 2022 at 12:11 AM Zhanghao Chen 
<zhanghao.c...@outlook.com<mailto:zhanghao.c...@outlook.com>> wrote:
Dear all,

I was recently investigating why the chaining behavior of a Flink SQL job 
containing union ops is a bit surprising. The SQL, simplified to the extreme, 
is as below:

CREATE  TABLE datagen_source (word VARCHAR)
        WITH ('connector' = 'datagen', 'rows-per-second' = '5');

CREATE  TABLE blackhole_sink (word VARCHAR)
        WITH ('connector' = 'blackhole');

INSERT INTO blackhole_sink
SELECT  word
FROM    (
            SELECT  word
            FROM    datagen_source
            WHERE   word = '1'
            UNION ALL
            SELECT  word
            FROM    datagen_source
            WHERE   word = '1'
        )

With all the operators having the same parallelism, I thought all the ops 
should be chained, but it turns out that the sink is not chained. I found the 
following comment in the code piece for checking the eligibility of chaining in 
JobGraphGenerator::createSingleInputVertex:
"first op after union is stand-alone, because union is merged" that could be 
relevant, but I'm not sure what it means.

Could anyone enlighten me how to understand this?

Best,
Zhanghao Chen

Reply via email to