Hi Dan,
could you share the plan with us using `TableEnvironment.explainSql()`
for both queries?
In general, views should not have an impact on the performance. They are
a logical concept that gives a bunch of operations a name. The contained
operations are inlined into the bigger query during optimization.
Unless you execute multiple queries in a StatementSet, the the data is
read twice from the source. How do you execute the SQL stataments?
For deterministic plans, join reordering is disabled by default. You can
set it via:
org.apache.flink.table.api.config.OptimizerConfigOptions#TABLE_OPTIMIZER_JOIN_REORDER_ENABLED
Regards,
Timo
On 23.09.20 08:23, Dan Hill wrote:
When I use DataStream and implement the join myself, I can get 50x the
throughput. I assume I'm doing something wrong with Flink's Table API
and SQL interface.
On Tue, Sep 22, 2020 at 11:21 PM Dan Hill <quietgol...@gmail.com
<mailto:quietgol...@gmail.com>> wrote:
Hi!
My goal is to better understand how my code impacts streaming
throughput.
I have a streaming job where I join multiple tables (A, B, C, D)
using interval joins.
Case 1) If I have 3 joins in the same query, I don't hit back pressure.
SELECT ...
FROM A
LEFT JOIN B
ON...
LEFT JOIN C
ON...
LEFT JOIN D
ON...
Case 2) If I create temporary views for two of the joins (for reuse
with another query), I hit back a lot of back pressure. This is
selecting slightly more fields than the first.
CREATE TEMPORARY VIEW `AB`
SELECT ...
FROM A
LEFT JOIN B
...
CREATE TEMPORARY VIEW `ABC`
SELECT ...
FROM AB
LEFT JOIN C
...
Can Temporary Views increase back pressure?
If A, B, C and D are roughly the same size (fake data), does the
join order matter? E.g. I assume reducing the size of the columns
in each join stage would help.
Thanks!
- Dan