Reliably producing records to remote cluster: what are my options?

Philip Schmitt Tue, 12 Sep 2017 12:20:05 -0700

Hi!



We want to reliably produce events into a remote Kafka cluster in (mostly) near 
real-time. We have to provide an at-least-once guarantee.

Examples are a "Customer logged in" event, that will be consumed by a data 
warehouse for reporting (numbers should be correct) or a "Customer unsubscribed 
from newsletter" event, that determines whether the customer gets emails (if 
she unsubscribes, but the message is lost, she will not be happy).



Context:

  *   We run an ecommerce website on a cluster of up to ten servers and an 
Oracle database.
  *   We have a small Kafka cluster at a different site. We have in the past 
had a small number of network issues, where the web servers could not reach the 
other site for maybe an hour.
  *   We don't persist all events in the database. If the application is 
restarted, events that occurred before the restart cannot be sent to Kafka. The 
row of a customer might have a newer timestamp, but we couldn't tell which 
columns were changed.



Concerns:

  *   In case of, for example, a network outage between the web servers and the 
Kafka cluster, we may accumulate thousands of events on each web server that 
cannot be sent to Kafka. If a server is shut down during that time, the 
messages would be lost.
  *   If we produce to Kafka from within the application in addition to writing 
to the database, the data may become inconsistent if one of the writes fails.





The more I read about Kafka, the more options I see, but I cannot assess, how 
well the options might work and what the trade-offs between the options are.



  1.  produce records directly within the application
  2.  produce records from the Oracle database via Kafka Connect
  3.  produce records from the Oracle database via a CDC solution (GoldenGate, 
Attunity, Striim, others?)
  4.  persist events in log files and produce to Kafka via elastic 
Logstash/Filebeat
  5.  persist events in log files and produce to Kafka via a Kafka Connect 
source connector
  6.  persist events in a local, embedded database and produce to Kafka via an 
existing source connector
  7.  produce records directly within the application to a new Kafka cluster in 
the same network and mirror to remote cluster
  8.  ?



These are all the options I could gather so far. Some of the options probably 
won't work for my situation -- for example Oracle Golden Gate might be too 
expensive -- but I don't want to rule anything out just yet.





How would you approach this, and why? Which options might work? Which options 
would you advise against?




I appreciate any advice. Thank you in advance.


Thanks,

Philip

Reliably producing records to remote cluster: what are my options?

Reply via email to