lincoln-lil opened a new pull request, #25217:
URL: https://github.com/apache/flink/pull/25217

   ## What is the purpose of the change
   If user encounter the NDU issue caused by a lookup join and enable the 
[TRY_RESOLVE 
mode](https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/concepts/determinism/#33-how-to-eliminate-the-impact-of-non-deterministic-update-in-streaming),
 the current KeyedLookupJoinWrapper implementation use the join key as the 
shuffle key, this may lead to changelog disordering issue. It should be fixed 
to use input upsertKey(or the complete row if upsertKey is empty) instead of 
join key.
   
   Also, there's a hotfix for test values sink: KeyedUpsertingSinkFunction 
which didn't implement state recovery properly and will encounter a 
RuntimeException says  "Tried to delete a value that wasn't inserted first. "
   
   ## Brief change log
   * fix StreamExecLookupJoin#createSyncLookupJoinWithState to use input 
upsertKey as the shuffle key.
   * update all related tests to verify the upsertKey takes effect
   * add a new restore case: LookupJoinTestPrograms.LOOKUP_JOIN_WITH_TRY_RESOLVE
   * fix KeyedUpsertingSinkFunction and add restore tests
   
   ## Verifying this change
   * existing NonDeterministicDagTest &  LookupJoinXx tests
   * newly added LookupJoinTestPrograms.LOOKUP_JOIN_WITH_TRY_RESOLVE
   * newly added TableSinkTestPrograms.SINK_UPSERT
   
   ## Does this pull request potentially affect one of the following parts:
     - Dependencies (does it add or upgrade a dependency): (no)
     - The public API, i.e., is any changed class annotated with 
@Public(Evolving): (no)
     - The serializers: (no )
     - The runtime per-record code paths (performance sensitive): (no)
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
     - The S3 file system connector: (no)
   
   ## Documentation
     - Does this pull request introduce a new feature? (no)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to