Hey Jay,

Thanks for reply. Yes, this should do the job.

We were thinking about something that's abstracting away this logic from
user (e.g. the same way Cassandra handles its PK definitions in CQL -
"hiding" the row key and optional clustering key behind the concept of
"primary key"), but introducing such design obviously has some pros/cons
and non-trivial implications, so if using the partitioner interface is the
way to go in Kafka - we'll use it :-)

Thanks,
Michał


On 5 October 2017 at 15:22, Jay Kreps <j...@confluent.io> wrote:

> I think you can do this now by using a custom partitioner, no?
>
> https://kafka.apache.org/0110/javadoc/org/apache/kafka/
> clients/producer/Partitioner.html
>
> -Jay
>
> On Mon, Oct 2, 2017 at 6:29 AM Michal Michalski <
> michal.michal...@zalando.ie>
> wrote:
>
> > Hi,
> >
> > TL;DR: I'd love to be able to make log compaction more "granular" than
> just
> > per-partition-key, so I was thinking about the concept of a "composite
> > key", where partitioning logic is using one part of the key, while
> > compaction uses the whole key - is this something desirable / doable /
> > worth a KIP?
> >
> > Longer story / use case:
> >
> > I'm currently a member of a team working on a project that's using a
> bunch
> > of applications to ingest data to the system (one "entity type" per app).
> > Once ingested by each application, since the entities are referring to
> each
> > other, they're all published to a single topic to ensure ordering for
> later
> > processing stages. Because of the nature of the data, for a given set of
> > entities related together, there's always a single "master" / parent"
> > entity, which ID we're using as the partition key; to give an example:
> > let's say you have "product" entity which can have things like "media",
> > "reviews", "stocks" etc. associated with it - product ID will be the
> > partition key for *all* these entities. However, with this approach we
> > simply cannot use log compaction because having e.g. "product", "media"
> and
> > "review" events, all with the same partition key "X", means that
> compaction
> > process will at some point delete all but one of them, causing a data
> loss
> > - only a single entity with key "X" will remain (and that's absolutely
> > correct - Kafka doesn't "understand" what does the message contain).
> >
> > We were thinking about introducing something we internally called
> > "composite key". The idea is to have a key that's not just a single
> String
> > K, but a pair of Strings: (K1, K2). For specifying the partition that the
> > message should be sent to, K1 would be used; however, for log compaction
> > purposes, the whole (K1, K2) would be used instead. This way, referring
> to
> > the example above, different entities "belonging" to the same "master
> > entity" (product), could be published to that topic with composite keys:
> > (productId, "product"), (productId, "media") and (productId, "review"),
> so
> > they all end up in single partition (specified by K1, which is always:
> > productId), but they won't get compacted together, because the K2 part is
> > different for them, making the whole "composite key" (K1, K2) different.
> Of
> > course K2 would be optional, so for someone who only needs the default
> > behaviour nothing would change.
> >
> > Since I'm not a Kafka developer and I don't know its internals that
> well, I
> > can't say if this idea is technically feasible or not, but I'd think it
> is
> > - I'd be more afraid of the complexity around backwards compatibility
> etc.
> > and potential performance implications of such change.
> >
> > I know that similar behaviour is achievable by using the producer API
> that
> > allows explicitly specifying the partition ID (and the key), but I think
> > it's a bit "clunky" (for each message, generate a key that this message
> > should normally be using [productId] and somehow "map" that key into a
> > partition X; then send that message to this partition X, *but* use the
> > "compaction" key instead [productId, entity type] as the message key) and
> > it's something that could be abstracted away from the user.
> >
> > Thoughts?
> >
> > Question to Kafka users: Is this something that anyone here would find
> > useful? Is anyone here dealing with similar problem?
> >
> > Question to Kafka maintainers: Is this something that you could
> potentially
> > consider a useful feature? Would it be worth a KIP? Is something like
> this
> > (technically) doable at all?
> >
> > --
> > Kind regards,
> > Michał Michalski
> >
>

Reply via email to