Hello, I had a question and was hoping this was the right place to ask.

Let's say I'm moving data from a NoSQL database to a SQL database. Here is
an example document in the NoSQL database:

{
 "id": "1234",
  "identifier": "5678"
}

The id is system generated, and the identifier is user provided. This is
being moved into a SQL database with two columns:

   - id
   - identifier

In the SQL database there is a UNIQUE index on identifier, however the same
thing cannot be enforced on the NoSQL side. Now I could check for this like
so:

   1. Get source data
   2. Check to see if identifier has already been inserted
   3. Move duplicates to a dead letter queue
   4. Write the data
   5. Success

But what could happen is:

   1. Get source data
   2. Check to see if identifier has already been inserted
   3. Move duplicates to a dead letter queue
   4. Another worker inserts a duplicate identifier
   5. Write the data
   6. Failure


If I was doing this outside of the beam context I would try the write,
capture the errors, and then redirect the failures to some kind of dead
letter queue. However for the life of me I can't figure out how to do this
in Beam. In cases where writes are failing, retries will never succeed, and
you can't reliably check for the trigger of the failure ahead of time what
is the recommended pattern?

Thanks!

Reply via email to