Hi Guys,

I just wanted to run this by the group to see if this strategy makes sense.

I have a Kafka Streams App that is using the JDBC connector to stream in
updates and inserts happening on multiple tables at the RDBMS layer.

Inside the Streams app, we are performing joins between two or more
KStreams and KTables to get new KTable and KStream pipelines.

The final messages are pushed to output topics that are eventually pushed
into ElasticSearch permanently.

I need to add support so that when records are physically deleted from the
RDBMS tables, we can pick up those changes at the Kafka Streams layer and
get rid of the documents linked to this deleted objects from ElasticSearch.

I also need to make sure that the streams and stores do not have any
records associated with these deleted objects.

The deletes for each RDBMS table are being captured and INSERTed into RDBMS
audit tables that is used to create another set of streams via the JDBC
Source connector

The delete streams are being pushed into Kafka Stream and picked up as
KStreams inside the app.

http://docs.confluent.io/3.2.0/connect/connect-jdbc/docs/index.html

When the delete event comes in, I have logic that picks up each event via
the Kafka Streams Processor API to remove the documents from ElasticSearch
that are tied to this deleted object.

http://docs.confluent.io/3.2.0/streams/developer-guide.html?highlight=processor%20api#applying-processors-and-transformers-processor-api-integration

Since we also need to remove all dependent records that were associated
with this deleted parent records, I also have logic that performs a LEFT
JOIN first between the delete stream and the individuals streams and then
setting a delete flag on messages in the results of the LEFT join that have
to be removed.

Records whose keys are in the DELETED object streams are passed as non-null
values and those are flagged to be deleted in the result from the Value
joiner.

After the LEFT join result is obtained, I am filtering to exclude records
that have the DELETED flag set on those messages.

I am basically attempting to perform an ANTI JOIN that can be described
like this using SQL

SELECT * FROM each stream WHERE id NOT IN (SELECT id FROM
stream_of_deleted_objects)

Is there a simpler, better way of implementing this?

Thanks in advance for any feedback.

Reply via email to