Dear all,

I was recently investigating why the chaining behavior of a Flink SQL job 
containing union ops is a bit surprising. The SQL, simplified to the extreme, 
is as below:

CREATE  TABLE datagen_source (word VARCHAR)
        WITH ('connector' = 'datagen', 'rows-per-second' = '5');

CREATE  TABLE blackhole_sink (word VARCHAR)
        WITH ('connector' = 'blackhole');

INSERT INTO blackhole_sink
SELECT  word
FROM    (
            SELECT  word
            FROM    datagen_source
            WHERE   word = '1'
            UNION ALL
            SELECT  word
            FROM    datagen_source
            WHERE   word = '1'
        )

With all the operators having the same parallelism, I thought all the ops 
should be chained, but it turns out that the sink is not chained. I found the 
following comment in the code piece for checking the eligibility of chaining in 
JobGraphGenerator::createSingleInputVertex:
"first op after union is stand-alone, because union is merged" that could be 
relevant, but I'm not sure what it means.

Could anyone enlighten me how to understand this?

Best,
Zhanghao Chen

Reply via email to