When I use DataStream and implement the join myself, I can get 50x the
throughput.  I assume I'm doing something wrong with Flink's Table API and
SQL interface.

On Tue, Sep 22, 2020 at 11:21 PM Dan Hill <quietgol...@gmail.com> wrote:

> Hi!
>
> My goal is to better understand how my code impacts streaming throughput.
>
> I have a streaming job where I join multiple tables (A, B, C, D) using
> interval joins.
>
> Case 1) If I have 3 joins in the same query, I don't hit back pressure.
>
> SELECT ...
> FROM A
> LEFT JOIN B
> ON...
> LEFT JOIN C
> ON...
> LEFT JOIN D
> ON...
>
>
> Case 2) If I create temporary views for two of the joins (for reuse with
> another query), I hit back a lot of back pressure.  This is selecting
> slightly more fields than the first.
>
> CREATE TEMPORARY VIEW `AB`
>
> SELECT ...
> FROM A
> LEFT JOIN B
> ...
>
> CREATE TEMPORARY VIEW `ABC`
> SELECT ...
> FROM AB
> LEFT JOIN C
> ...
>
>
>
> Can Temporary Views increase back pressure?
>
> If A, B, C and D are roughly the same size (fake data), does the join
> order matter?  E.g. I assume reducing the size of the columns in each join
> stage would help.
>
> Thanks!
> - Dan
>
>
>

Reply via email to