Ewen, I meant we use format X to store offsets, whether you serialize your data with Y or Z and we don't expose it as something that can be configured. As far as the serialization format goes, I was suggesting just going with simple base64 encoded strings (maybe there is a reason you are saying this doesn't work?) for simplicity though I can see how we can just use the same one used for the data. Don't have a strong preference either way as long as the tooling and REST APIs can expose the data effortlessly.
Thanks, Neha On Fri, Aug 14, 2015 at 1:29 AM, Ewen Cheslack-Postava <e...@confluent.io> wrote: > I'm not sure the existing discussion is clear about how the format of > offset data is decided. One possibility is that we choose one fixed format > and that is what we use internally to store offsets no matter what > serializer you choose. This would be similar to how the __offsets topic is > currently handled (with a custom serialization format). In other words, we > use format X to store offsets. If you serialize your data with Y or Z, we > don't care, we still use format X. The other option (which is used in the > current PR-99 patch) would still make offset serialization pluggable, but > there wouldn't be a separate option for it. Offset serialization would use > the same format as the data serialization. If you use X for data, we use X > for offsets; you use Y for data, we use Y for offsets. > > @neha wrt providing access through a REST API, I guess you are suggesting > that we can serialize that data to JSON for that API. I think it's > important to point out that this is arbitrarily structured, > connector-specific data. In many ways, it's not that different from the > actual message data in that it is highly dependent on the connector and > downstream consumers need to understand the connector and its data format > to do anything meaningful with the data. Because of this, I'm not convinced > that serializing it in a format other than the one used for the data will > be particularly useful. > > > On Thu, Aug 13, 2015 at 11:22 PM, Neha Narkhede <n...@confluent.io> wrote: > > > Copycat enables streaming data in and out of Kafka. Connector writers > need > > to define the serde of the data as it is different per system. Metadata > > should be entirely hidden by the copycat framework and isn't something > > users or connector implementors need to serialize differently as long as > we > > provide tools/REST APIs to access the metadata where required. Moreover, > as > > you suggest, evolution, maintenance and configs are much simpler if it > > remains hidden. > > > > +1 on keeping just the serializers for data configurable. > > > > On Thu, Aug 13, 2015 at 9:59 PM, Gwen Shapira <g...@confluent.io> wrote: > > > > > Hi Team Kafka, > > > > > > As you know from KIP-26 and PR-99, when you will use Copycat to move > data > > > from an external system to Kafka, in addition to storing the data > itself, > > > Copycat will also need to store some metadata. > > > > > > This metadata is currently offsets on the source system (say, SCN# from > > > Oracle redo log), but I can imagine to store a bit more. > > > > > > When storing data, we obviously want pluggable serializers, so users > will > > > get the data in a format they like. > > > > > > But the metadata seems internal. i.e users shouldn't care about it and > if > > > we want them to read or change anything, we want to provide them with > > tools > > > to do it. > > > > > > Moreover, by controlling the format we can do three important things: > > > * Read the metadata for monitoring / audit purposes > > > * Evolve / modify it. If users serialize it in their own format, and > > > actually write clients to use this metadata, we don't know if its safe > to > > > evolve. > > > * Keep configuration a bit simpler. This adds at least 4 new > > configuration > > > items... > > > > > > What do you guys think? > > > > > > Gwen > > > > > > > > > > > -- > > Thanks, > > Neha > > > > > > -- > Thanks, > Ewen > -- Thanks, Neha