I'm not sure the existing discussion is clear about how the format of offset data is decided. One possibility is that we choose one fixed format and that is what we use internally to store offsets no matter what serializer you choose. This would be similar to how the __offsets topic is currently handled (with a custom serialization format). In other words, we use format X to store offsets. If you serialize your data with Y or Z, we don't care, we still use format X. The other option (which is used in the current PR-99 patch) would still make offset serialization pluggable, but there wouldn't be a separate option for it. Offset serialization would use the same format as the data serialization. If you use X for data, we use X for offsets; you use Y for data, we use Y for offsets.
@neha wrt providing access through a REST API, I guess you are suggesting that we can serialize that data to JSON for that API. I think it's important to point out that this is arbitrarily structured, connector-specific data. In many ways, it's not that different from the actual message data in that it is highly dependent on the connector and downstream consumers need to understand the connector and its data format to do anything meaningful with the data. Because of this, I'm not convinced that serializing it in a format other than the one used for the data will be particularly useful. On Thu, Aug 13, 2015 at 11:22 PM, Neha Narkhede <n...@confluent.io> wrote: > Copycat enables streaming data in and out of Kafka. Connector writers need > to define the serde of the data as it is different per system. Metadata > should be entirely hidden by the copycat framework and isn't something > users or connector implementors need to serialize differently as long as we > provide tools/REST APIs to access the metadata where required. Moreover, as > you suggest, evolution, maintenance and configs are much simpler if it > remains hidden. > > +1 on keeping just the serializers for data configurable. > > On Thu, Aug 13, 2015 at 9:59 PM, Gwen Shapira <g...@confluent.io> wrote: > > > Hi Team Kafka, > > > > As you know from KIP-26 and PR-99, when you will use Copycat to move data > > from an external system to Kafka, in addition to storing the data itself, > > Copycat will also need to store some metadata. > > > > This metadata is currently offsets on the source system (say, SCN# from > > Oracle redo log), but I can imagine to store a bit more. > > > > When storing data, we obviously want pluggable serializers, so users will > > get the data in a format they like. > > > > But the metadata seems internal. i.e users shouldn't care about it and if > > we want them to read or change anything, we want to provide them with > tools > > to do it. > > > > Moreover, by controlling the format we can do three important things: > > * Read the metadata for monitoring / audit purposes > > * Evolve / modify it. If users serialize it in their own format, and > > actually write clients to use this metadata, we don't know if its safe to > > evolve. > > * Keep configuration a bit simpler. This adds at least 4 new > configuration > > items... > > > > What do you guys think? > > > > Gwen > > > > > > -- > Thanks, > Neha > -- Thanks, Ewen