Re: [Copycat] How will copycat serialize its metadata

Ewen Cheslack-Postava Fri, 14 Aug 2015 01:30:02 -0700

I'm not sure the existing discussion is clear about how the format of
offset data is decided. One possibility is that we choose one fixed format
and that is what we use internally to store offsets no matter what
serializer you choose. This would be similar to how the __offsets topic is
currently handled (with a custom serialization format). In other words, we
use format X to store offsets. If you serialize your data with Y or Z, we
don't care, we still use format X. The other option (which is used in the
current PR-99 patch) would still make offset serialization pluggable, but
there wouldn't be a separate option for it. Offset serialization would use
the same format as the data serialization. If you use X for data, we use X
for offsets; you use Y for data, we use Y for offsets.


@neha wrt providing access through a REST API, I guess you are suggesting
that we can serialize that data to JSON for that API. I think it's
important to point out that this is arbitrarily structured,
connector-specific data. In many ways, it's not that different from the
actual message data in that it is highly dependent on the connector and
downstream consumers need to understand the connector and its data format
to do anything meaningful with the data. Because of this, I'm not convinced
that serializing it in a format other than the one used for the data will
be particularly useful.


On Thu, Aug 13, 2015 at 11:22 PM, Neha Narkhede <[email protected]> wrote:

> Copycat enables streaming data in and out of Kafka. Connector writers need
> to define the serde of the data as it is different per system. Metadata
> should be entirely hidden by the copycat framework and isn't something
> users or connector implementors need to serialize differently as long as we
> provide tools/REST APIs to access the metadata where required. Moreover, as
> you suggest, evolution, maintenance and configs are much simpler if it
> remains hidden.
>
> +1 on keeping just the serializers for data configurable.
>
> On Thu, Aug 13, 2015 at 9:59 PM, Gwen Shapira <[email protected]> wrote:
>
> > Hi Team Kafka,
> >
> > As you know from KIP-26 and PR-99, when you will use Copycat to move data
> > from an external system to Kafka, in addition to storing the data itself,
> > Copycat will also need to store some metadata.
> >
> > This metadata is currently offsets on the source system (say, SCN# from
> > Oracle redo log), but I can imagine to store a bit more.
> >
> > When storing data, we obviously want pluggable serializers, so users will
> > get the data in a format they like.
> >
> > But the metadata seems internal. i.e users shouldn't care about it and if
> > we want them to read or change anything, we want to provide them with
> tools
> > to do it.
> >
> > Moreover, by controlling the format we can do three important things:
> > * Read the metadata for monitoring / audit purposes
> > * Evolve / modify it. If users serialize it in their own format, and
> > actually write clients to use this metadata, we don't know if its safe to
> > evolve.
> > * Keep configuration a bit simpler. This adds at least 4 new
> configuration
> > items...
> >
> > What do you guys think?
> >
> > Gwen
> >
>
>
>
> --
> Thanks,
> Neha
>



-- 
Thanks,
Ewen

Re: [Copycat] How will copycat serialize its metadata

Reply via email to