Hi!
We want to reliably produce events into a remote Kafka cluster in (mostly) near real-time. We have to provide an at-least-once guarantee. Examples are a "Customer logged in" event, that will be consumed by a data warehouse for reporting (numbers should be correct) or a "Customer unsubscribed from newsletter" event, that determines whether the customer gets emails (if she unsubscribes, but the message is lost, she will not be happy). Context: * We run an ecommerce website on a cluster of up to ten servers and an Oracle database. * We have a small Kafka cluster at a different site. We have in the past had a small number of network issues, where the web servers could not reach the other site for maybe an hour. * We don't persist all events in the database. If the application is restarted, events that occurred before the restart cannot be sent to Kafka. The row of a customer might have a newer timestamp, but we couldn't tell which columns were changed. Concerns: * In case of, for example, a network outage between the web servers and the Kafka cluster, we may accumulate thousands of events on each web server that cannot be sent to Kafka. If a server is shut down during that time, the messages would be lost. * If we produce to Kafka from within the application in addition to writing to the database, the data may become inconsistent if one of the writes fails. The more I read about Kafka, the more options I see, but I cannot assess, how well the options might work and what the trade-offs between the options are. 1. produce records directly within the application 2. produce records from the Oracle database via Kafka Connect 3. produce records from the Oracle database via a CDC solution (GoldenGate, Attunity, Striim, others?) 4. persist events in log files and produce to Kafka via elastic Logstash/Filebeat 5. persist events in log files and produce to Kafka via a Kafka Connect source connector 6. persist events in a local, embedded database and produce to Kafka via an existing source connector 7. produce records directly within the application to a new Kafka cluster in the same network and mirror to remote cluster 8. ? These are all the options I could gather so far. Some of the options probably won't work for my situation -- for example Oracle Golden Gate might be too expensive -- but I don't want to rule anything out just yet. How would you approach this, and why? Which options might work? Which options would you advise against? I appreciate any advice. Thank you in advance. Thanks, Philip