Hi Dan,

could you share the plan with us using `TableEnvironment.explainSql()` for both queries?

In general, views should not have an impact on the performance. They are a logical concept that gives a bunch of operations a name. The contained operations are inlined into the bigger query during optimization.

Unless you execute multiple queries in a StatementSet, the the data is read twice from the source. How do you execute the SQL stataments?

For deterministic plans, join reordering is disabled by default. You can set it via:

org.apache.flink.table.api.config.OptimizerConfigOptions#TABLE_OPTIMIZER_JOIN_REORDER_ENABLED

Regards,
Timo

On 23.09.20 08:23, Dan Hill wrote:
When I use DataStream and implement the join myself, I can get 50x the throughput.  I assume I'm doing something wrong with Flink's Table API and SQL interface.

On Tue, Sep 22, 2020 at 11:21 PM Dan Hill <quietgol...@gmail.com <mailto:quietgol...@gmail.com>> wrote:

    Hi!

    My goal is to better understand how my code impacts streaming
    throughput.

    I have a streaming job where I join multiple tables (A, B, C, D)
    using interval joins.

    Case 1) If I have 3 joins in the same query, I don't hit back pressure.

        SELECT ...
        FROM A
        LEFT JOIN B
        ON...
        LEFT JOIN C
        ON...
        LEFT JOIN D
        ON...


    Case 2) If I create temporary views for two of the joins (for reuse
    with another query), I hit back a lot of back pressure.  This is
    selecting slightly more fields than the first.

        CREATE TEMPORARY VIEW `AB`

        SELECT ...
        FROM A
        LEFT JOIN B
        ...

        CREATE TEMPORARY VIEW `ABC`
        SELECT ...
        FROM AB
        LEFT JOIN C
        ...



    Can Temporary Views increase back pressure?

    If A, B, C and D are roughly the same size (fake data), does the
    join order matter?  E.g. I assume reducing the size of the columns
    in each join stage would help.

    Thanks!
    - Dan



Reply via email to