Ewen,

I meant we use format X to store offsets, whether you serialize your data
with Y or Z and we don't expose it as something that can be configured. As
far as the serialization format goes, I was suggesting just going with
simple base64 encoded strings (maybe there is a reason you are saying this
doesn't work?) for simplicity though I can see how we can just use the same
one used for the data. Don't have a strong preference either way as long as
the tooling and REST APIs can expose the data effortlessly.

Thanks,
Neha

On Fri, Aug 14, 2015 at 1:29 AM, Ewen Cheslack-Postava <e...@confluent.io>
wrote:

> I'm not sure the existing discussion is clear about how the format of
> offset data is decided. One possibility is that we choose one fixed format
> and that is what we use internally to store offsets no matter what
> serializer you choose. This would be similar to how the __offsets topic is
> currently handled (with a custom serialization format). In other words, we
> use format X to store offsets. If you serialize your data with Y or Z, we
> don't care, we still use format X. The other option (which is used in the
> current PR-99 patch) would still make offset serialization pluggable, but
> there wouldn't be a separate option for it. Offset serialization would use
> the same format as the data serialization. If you use X for data, we use X
> for offsets; you use Y for data, we use Y for offsets.
>
> @neha wrt providing access through a REST API, I guess you are suggesting
> that we can serialize that data to JSON for that API. I think it's
> important to point out that this is arbitrarily structured,
> connector-specific data. In many ways, it's not that different from the
> actual message data in that it is highly dependent on the connector and
> downstream consumers need to understand the connector and its data format
> to do anything meaningful with the data. Because of this, I'm not convinced
> that serializing it in a format other than the one used for the data will
> be particularly useful.
>
>
> On Thu, Aug 13, 2015 at 11:22 PM, Neha Narkhede <n...@confluent.io> wrote:
>
> > Copycat enables streaming data in and out of Kafka. Connector writers
> need
> > to define the serde of the data as it is different per system. Metadata
> > should be entirely hidden by the copycat framework and isn't something
> > users or connector implementors need to serialize differently as long as
> we
> > provide tools/REST APIs to access the metadata where required. Moreover,
> as
> > you suggest, evolution, maintenance and configs are much simpler if it
> > remains hidden.
> >
> > +1 on keeping just the serializers for data configurable.
> >
> > On Thu, Aug 13, 2015 at 9:59 PM, Gwen Shapira <g...@confluent.io> wrote:
> >
> > > Hi Team Kafka,
> > >
> > > As you know from KIP-26 and PR-99, when you will use Copycat to move
> data
> > > from an external system to Kafka, in addition to storing the data
> itself,
> > > Copycat will also need to store some metadata.
> > >
> > > This metadata is currently offsets on the source system (say, SCN# from
> > > Oracle redo log), but I can imagine to store a bit more.
> > >
> > > When storing data, we obviously want pluggable serializers, so users
> will
> > > get the data in a format they like.
> > >
> > > But the metadata seems internal. i.e users shouldn't care about it and
> if
> > > we want them to read or change anything, we want to provide them with
> > tools
> > > to do it.
> > >
> > > Moreover, by controlling the format we can do three important things:
> > > * Read the metadata for monitoring / audit purposes
> > > * Evolve / modify it. If users serialize it in their own format, and
> > > actually write clients to use this metadata, we don't know if its safe
> to
> > > evolve.
> > > * Keep configuration a bit simpler. This adds at least 4 new
> > configuration
> > > items...
> > >
> > > What do you guys think?
> > >
> > > Gwen
> > >
> >
> >
> >
> > --
> > Thanks,
> > Neha
> >
>
>
>
> --
> Thanks,
> Ewen
>



-- 
Thanks,
Neha

Reply via email to