Hello,

I am looking to implement a Flink sink for the following use case, where
the steps below are executed for each microbatch (using Spark terminology)
or trigger:

   1. Group data by category and write it to S3 under its respective prefix.
   2. Update category metrics in a manifest file stored in the S3 manifest
   prefix.
   3. Send category metrics to an external system to initiate consumption.

In case of failure, any previously completed steps should be rolled back,
i.e., delete files from S3 and reprocess the entire microbatch.

It seems this pattern can be implemented using the Unified Sink API, as
discussed in this video: https://www.youtube.com/watch?v=0GVI25OEs4A.

I'm currently reviewing FileSink and IcebergSink (still searching for the
source code) to understand their implementations and create a new one.

Are there any step-by-step docs or examples available for writing a new
unified sink?

Thanks

Reply via email to