Hi! Streaming joins will not throw away records in the state unless it exceeds the TTL. Have you tried increasing the parallelism of join operators (and maybe decrease the parallelism of the large Kafka source)?
Dan Hill <quietgol...@gmail.com> 于2021年7月21日周三 上午4:19写道: > Hi. My team's flink job has cascading interval joins. The problem I'm > outlining below is fine when streaming normally. It's an issue with > backfills. We've been running into a bunch of backfills to evaluate the > job over older data. > > When running as backfills, I've noticed that sometimes one of downstream > kafka inputs will read in a lot of data from it's kafka source before the > upstream kafka sources makes much progress. The downstream kafka source > gets far ahead of the interval join window constrained by the upstream > sources. This appears to cause the state to grow unnecessarily and has > caused checkpoint failures. I assumed there was built in Flink code to not > get too far ahead for a single downstream kafka source. Looking through > the code, I don't think this exists. > > Is this a known issue with trying to use Flink to backfill? Am I > misunderstanding something? > > Here's an example flow chart for a cascading join job. One of the right > kafka data sources goes 10x-100x more records than the left data sources > and causes state to grow. > [image: Screen Shot 2021-07-20 at 1.02.27 PM.png] >