Why this Flink batch SQL immediately produces records ?

Xiaolong Wang Tue, 22 Apr 2025 00:44:02 -0700

Hi, team, I'm running a Flink SQL via Flink SQL gateway in the version of
1.20.


The SQL reads from Hive and writes into Kafka but needs to join with a
sub-query that queries out a problematic uuid and filter it out, it looks
like this:

INSERT INTO
>     kafka_sink
> SELECT /*+ BROADCAST(t1) */
> *
> FROM `default`.user_event
> CROSS JOIN  (
>     SELECT uuid as problematic_uuid, COUNT(*) as cnt
>     FROM `default`.user_event
>     WHERE
>         (
>             (dt = '2025-03-25' and hh between '14' and '23')
>                 or (dt = '2025-03-26' and hh between '00' and '11')
>             )
>       AND uuid IS NOT NULL
>     GROUP BY uuid
>     ORDER BY cnt DESC
>         LIMIT 1
> ) t1
> WHERE
>     (
>         (dt = '2025-03-25' and hh between '14' and '23')
>             or (dt = '2025-03-26' and hh between '00' and '11')
>         )
>   AND uuid IS NOT NULL
>   AND uuid <> t1.problematic_uuid;
>
>
My question is, since the subquery t1 needs to check billions of records
and do aggregations, the whole Flink SQL should wait till the sub-query is
done and then output records into Kafka.

But that's not the case, in fact it immediately produces records into
Kafka. This does not make any sense to me, so my theory is:
1. I've got something wrong with my SQL.
2. There is something wrong with Flink.

Could anyone be kind enough to check it out ?

Thanks in advance !

Why this Flink batch SQL immediately produces records ?

Reply via email to