Hi, Adam & Márton 

Thanks for bringing the discussion here. 

The Flink CDC project provides the Oracle CDC Connector, which can be used to 
capture historical and transaction log data from the Oracle database and ingest 
it into Flink. In the latest version 2.3, the Oracle CDC Connector already 
supports the parallel-incremental snapshot algorithm is supported, which 
supports parallel reading for historical data and lock-free switching from 
historical reading to transaction log reading. In the phase of capturing 
transaction log data, the connector uses Debezium as the library, which 
supports  LogMiner and XStream API to capture change data. IIUC that 
OpenLogReplicator can be used as a third way. 

For integrating OpenLogReplicator, there are several interesting points that we 
can discuss further:
(1) All Flink CDC connectors do not rely on Kafka or other message queue 
storage, and are directly calculated after data capture. I think the network 
stream way of OpenLogReplicator needs to be adapted better. 
(2) The Flink CDC project is mainly developed in Java as well as Flink. Does 
OpenLogReplicator provide Java SDK for easy integration?
(3) If OpenLogReplicator have a plan to be integrated into the Debezium project 
firstly, the Flink CDC project can easily integrate OpenLogReplicator by 
bumping Debezium version. 

Best,
Leonard


> On Jan 4, 2023, at 7:07 PM, Márton Balassi <balassi.mar...@gmail.com> wrote:
> 
> (cc Leonard)
> 
> Hi Adam,
> 
> From an architectural perspective if you land the records to Kafka or other 
> message broker Flink will be able to process them, at this point I do not see 
> much merit trying to circumvent this step.
> There is a related project in the Flink space called CDC connectors [1], I 
> highly encourage you to check that out for context and ccd Leonard one of its 
> primary maintainers.
> 
> [1] https://github.com/ververica/flink-cdc-connectors/ 
> <https://github.com/ververica/flink-cdc-connectors/>
> On Tue, Jan 3, 2023 at 8:40 PM Adam Leszczyński <aleszczyn...@bersler.com 
> <mailto:aleszczyn...@bersler.com>> wrote:
> Hi Flink Team,
> 
> I’m the author of OpenLogReplictor - open source parser of Oracle redo logs 
> which allows to send transactions 
> to some message bus. Currently the sink that is implemented is just text file 
> or Kafka topic. 
> Also transactions can be sent using plain tcp connection or some message 
> queue like ZeroMQ.
> Code is GPL and all versions from 11.2 are supported. No LogMiner needed.
> 
> Transactions can be sent using json or protobuf format. Currently the code 
> has reached GA and is actually used in production.
> 
> The architecture is modular and allows very easily to add other sinks like 
> for example Apache Flink.
> Actually I’m going towards approach that OpenLogReplicator could used 
> Kubernetes and work in HA.
> 
> Well… that is the general direction. Do you think there could some 
> application of this soft with Apache Flink?
> For example very easily there could be some client which could connect to 
> OpenLogReplicator using tcp connection
> and get transactions and just send them to Apache Flink. An example of such 
> client is also present in GitHub repo.
> https://github.com/bersler/OpenLogReplicator 
> <https://github.com/bersler/OpenLogReplicator>
> 
> Is there any rational for such integration? Or just a waste of time cause 
> nobody would use it anyway?
> 
> Kind regards,
> Adam Leszczyński
> 

Reply via email to