Hi,
I'd like to understand a little more how joins work. I have a fairly simple
LEFT JOIN query, and I'm seeing spotty results on the joins. I know there
is a record on the right side, but sometimes it produces a result, and
sometimes it doesn't.

Sample query:
SELECT a.id, b.val1, c.val2
FROM table1 a
LEFT JOIN table2 b ON a.bId = b.Id
LEFT JOIN table3 c ON a.cId = c.Id

I'm using Flink 1.13.2. All tables are loaded from kafka. Using the
Table/SQL API.

My suspicion is that the distributed nature of the join causes the problem.
If I reduce the parallelism to match the number of slots, resulting in a
single task manager, the joins always (I think) seem to work. So, my
assumption is that records in table3 that should join to table1 are not
present on the same task manager, so the join produces null values for
table3.

I thought there might be a way to specify the partitioning. If the tables
are from a multi-tenant database, I could specify the tenant-id as the
partition key, but the Flink SQL "PARTITION BY" statement doesn't seem to
work that way.

Is there a way to confirm this? Does anyone know if this is a known
problem, and is there a solution?

Thanks,
Curt

Reply via email to