Hi, Adam & Márton Thanks for bringing the discussion here.
The Flink CDC project provides the Oracle CDC Connector, which can be used to capture historical and transaction log data from the Oracle database and ingest it into Flink. In the latest version 2.3, the Oracle CDC Connector already supports the parallel-incremental snapshot algorithm is supported, which supports parallel reading for historical data and lock-free switching from historical reading to transaction log reading. In the phase of capturing transaction log data, the connector uses Debezium as the library, which supports LogMiner and XStream API to capture change data. IIUC that OpenLogReplicator can be used as a third way. For integrating OpenLogReplicator, there are several interesting points that we can discuss further: (1) All Flink CDC connectors do not rely on Kafka or other message queue storage, and are directly calculated after data capture. I think the network stream way of OpenLogReplicator needs to be adapted better. (2) The Flink CDC project is mainly developed in Java as well as Flink. Does OpenLogReplicator provide Java SDK for easy integration? (3) If OpenLogReplicator have a plan to be integrated into the Debezium project firstly, the Flink CDC project can easily integrate OpenLogReplicator by bumping Debezium version. Best, Leonard > On Jan 4, 2023, at 7:07 PM, Márton Balassi <balassi.mar...@gmail.com> wrote: > > (cc Leonard) > > Hi Adam, > > From an architectural perspective if you land the records to Kafka or other > message broker Flink will be able to process them, at this point I do not see > much merit trying to circumvent this step. > There is a related project in the Flink space called CDC connectors [1], I > highly encourage you to check that out for context and ccd Leonard one of its > primary maintainers. > > [1] https://github.com/ververica/flink-cdc-connectors/ > <https://github.com/ververica/flink-cdc-connectors/> > On Tue, Jan 3, 2023 at 8:40 PM Adam Leszczyński <aleszczyn...@bersler.com > <mailto:aleszczyn...@bersler.com>> wrote: > Hi Flink Team, > > I’m the author of OpenLogReplictor - open source parser of Oracle redo logs > which allows to send transactions > to some message bus. Currently the sink that is implemented is just text file > or Kafka topic. > Also transactions can be sent using plain tcp connection or some message > queue like ZeroMQ. > Code is GPL and all versions from 11.2 are supported. No LogMiner needed. > > Transactions can be sent using json or protobuf format. Currently the code > has reached GA and is actually used in production. > > The architecture is modular and allows very easily to add other sinks like > for example Apache Flink. > Actually I’m going towards approach that OpenLogReplicator could used > Kubernetes and work in HA. > > Well… that is the general direction. Do you think there could some > application of this soft with Apache Flink? > For example very easily there could be some client which could connect to > OpenLogReplicator using tcp connection > and get transactions and just send them to Apache Flink. An example of such > client is also present in GitHub repo. > https://github.com/bersler/OpenLogReplicator > <https://github.com/bersler/OpenLogReplicator> > > Is there any rational for such integration? Or just a waste of time cause > nobody would use it anyway? > > Kind regards, > Adam Leszczyński >