Re: kafka-connect-jdbc: ids, timestamps, and transactions

Mark Drago Fri, 18 Dec 2015 12:09:02 -0800

Ewen,

Thanks for the reply.  We'll proceed while keeping all of your points in
mind.  I looked around for a more focused forum for the jdbc connector
before posting here but didn't come across the confluent-platform group.
I'll direct any more questions about the jdbc connector there.  I'll also
close the github issue with a link to this thread.


Thanks again,
Mark.

On Wed, Dec 16, 2015 at 9:51 PM Ewen Cheslack-Postava <e...@confluent.io>
wrote:

> Mark,
>
> There are definitely limitations to using JDBC for change data capture.
> Using a database-specific implementation, especially if you can read
> directly off the database's log, will be able to handle more situations
> like this. Cases like the one you describe are difficult to address
> efficiently working only with simple queries.
>
> The JDBC connector offers a few different modes for handling incremental
> queries. One of them uses both a timestamp and a unique ID, which will be
> more robust to issues like these. However, even with both, you can still
> come up with variants that can cause issues like the one you describe. You
> also have the option of using a custom query which might help if you can do
> something smarter by making assumptions about your table, but for now
> that's pretty limited for constructing incremental queries since the
> connector doesn't provide a way to track offset columns with custom
> queries. I'd like to improve the support for this in the future, but at
> some point it starts making sense to look at database-specific connectors.
>
> (By the way, this gets even messier once you start thinking about the
> variety of different isolation levels people may be using...)
>
> -Ewen
>
> P.S. Where to ask these questions is a bit confusing since Connect is part
> of Kafka. In general, for specific connectors I'd suggest asking on the
> corresponding mailing list for the project, which in the case of the JDBC
> connector would be the Confluent Platform mailing list here:
> https://groups.google.com/forum/#!forum/confluent-platform
>
> On Wed, Dec 16, 2015 at 5:27 AM, Mark Drago <markdr...@gmail.com> wrote:
>
> > I had asked this in a github issue but I'm reposting here to try and get
> an
> > answer from a wider audience.
> >
> > Has any thought gone into how kafka-connect-jdbc will be impacted by SQL
> > transactions committing IDs and timestamps out-of-order?  Let me give an
> > example with two connections.
> >
> > 1: begin transaction
> > 1: insert (get id 1)
> > 2: begin transaction
> > 2: insert (get id 2)
> > 2: commit (recording id 2)
> > kafka-connect-jdbc runs and thinks it has handled everything through id 2
> > 1: commit (recording id 1)
> >
> > This would result in kafka-connect-jdbc missing id 1. The same thing
> could
> > happen with timestamps. I've read through some of the kafka-connect-jdbc
> > code and I think it may be susceptible to this problem, but I haven't run
> > it or verified that it would be an issue. Has this come up before? Are
> > there plans to deal with this situation?
> >
> > Obviously something like bottled-water for postgresql would handle this
> > nicely as it would get the changes once they're committed.
> >
> >
> > Thanks for any insight,
> >
> > Mark.
> >
> >
> > Original github issue:
> > https://github.com/confluentinc/kafka-connect-jdbc/issues/27
> >
>
>
>
> --
> Thanks,
> Ewen
>

Re: kafka-connect-jdbc: ids, timestamps, and transactions

Reply via email to