Sunny,

As I said on Twitter, I'm stoked to hear you're working on a Mongo
connector! It struck me as a pretty natural source to tackle since it does
such a nice job of cleanly exposing the op log.

Regarding the problem of only getting deltas, unfortunately there is not a
trivial solution here -- if you want to generate the full updated record,
you're going to have to have a way to recover the original document.

In fact, I'm curious how you were thinking of even bootstrapping. Are you
going to do a full dump and then start reading the op log? Is there a good
way to do the dump and figure out the exact location in the op log that the
query generating the dump was initially performed? I know that internally
mongo effectively does these two steps, but I'm not sure if the necessary
info is exposed via normal queries.

If you want to reconstitute the data, I can think of a couple of options:

1. Try to reconstitute inline in the connector. This seems difficult to
make work in practice. At some point you basically have to query for the
entire data set to bring it into memory and then the connector is
effectively just applying the deltas to its in memory copy and then just
generating one output record containing the full document each time it
applies an update.
2. Make the connector send just the updates and have a separate stream
processing job perform the reconstitution and send to another topic. In
this case, the first topic should not be compacted, but the second one
could be.

Unfortunately, without additional hooks into the database, there's not much
you can do besides this pretty heavyweight process. There may be some
tricks you can use to reduce the amount of memory used during the process
(e.g. keep a small cache of actual records and for the rest only store
Kafka offsets for the last full value, performing a (possibly expensive)
random read as necessary to get the full document value back), but to get
full correctness you will need to perform this process.

In terms of Kafka Connect supporting something like this, I'm not sure how
general it could be made, or that you even want to perform the process
inline with the Kafka Connect job. If it's an issue that repeatedly arises
across a variety of systems, then we should consider how to address it more
generally.

-Ewen

On Tue, Jan 26, 2016 at 8:43 PM, Sunny Shah <su...@tinyowl.co.in> wrote:

>
> Hi ,
>
> We are trying to write a Kafka-connect connector for Mongodb. The issue
> is, MongoDB does not provide an entire changed document for update
> operations, It just provides the modified fields.
>
> if Kafka allows custom log compaction then It is possible to eventually
> merge an entire document and subsequent update to to create an entire
> record again.
>
> As Ewen pointed out to me on twitter, this is not possible, then What is
> the Kafka-connect way of solving this issue?
>
> @Ewen, Thanks a lot for a really quick answer on twitter.
>
> --
> Thanks and Regards,
>  Sunny
>
> The contents of this e-mail and any attachment(s) are confidential and
> intended for the named recipient(s) only. It shall not attach any liability
> on the originator or TinyOwl Technology Pvt. Ltd. or its affiliates. Any
> form of reproduction, dissemination, copying, disclosure, modification,
> distribution and / or publication of this message without the prior written
> consent of the author of this e-mail is strictly prohibited. If you have
> received this email in error please delete it and notify the sender
> immediately. You are liable to the company (TinyOwl Technology Pvt. Ltd.) in
> case of any breach in ​
> ​confidentialy (through any form of communication) wherein the company has
> the right to injunct legal action and an equitable relief for damages.
>



-- 
Thanks,
Ewen

Reply via email to