When I use DataStream and implement the join myself, I can get 50x the throughput. I assume I'm doing something wrong with Flink's Table API and SQL interface.
On Tue, Sep 22, 2020 at 11:21 PM Dan Hill <quietgol...@gmail.com> wrote: > Hi! > > My goal is to better understand how my code impacts streaming throughput. > > I have a streaming job where I join multiple tables (A, B, C, D) using > interval joins. > > Case 1) If I have 3 joins in the same query, I don't hit back pressure. > > SELECT ... > FROM A > LEFT JOIN B > ON... > LEFT JOIN C > ON... > LEFT JOIN D > ON... > > > Case 2) If I create temporary views for two of the joins (for reuse with > another query), I hit back a lot of back pressure. This is selecting > slightly more fields than the first. > > CREATE TEMPORARY VIEW `AB` > > SELECT ... > FROM A > LEFT JOIN B > ... > > CREATE TEMPORARY VIEW `ABC` > SELECT ... > FROM AB > LEFT JOIN C > ... > > > > Can Temporary Views increase back pressure? > > If A, B, C and D are roughly the same size (fake data), does the join > order matter? E.g. I assume reducing the size of the columns in each join > stage would help. > > Thanks! > - Dan > > >