
Could you maybe post your final solution here in this thread? I'm curious
to hear about which way you finally went with this.


> Thanks Jim. I like the fact that the offset management will not require us
> to customize kafka. I will think more on this. maybe a time based seek will
> just work...i think the math you proposed require partition setup should be
> exactly the same as the original and partitioner should map the message to
> the same partition id (hopeful it is always true, haven't verified).
> BTW, any concern with codec approach apart from customization/make codec
> pluggable?
> Thanks,
> Josh
> For the offset, at the start of topic (and perhaps periodically in the
> topic), the script could make a note of the corresponding offset in the
> previous topic.  The consumer could then see the correspondence between
> the current topic offsets and the previous topic offsets and do some math
> to get to where they left off.  That's just the start of idea for a
> possible approach; it would have to be thought through more carefully.
> Not sure, but you may need to handle cases where messages get re-ordered.
> -- Jim
> >Jim,
> >So I guess the problem of copying to a different topic (or would rather
> >have a replicated cluster) is when existing consumer do the "switch" to
> >new topic, how is the offset to be set correctly so they don't replay the
> >whole thing again. While we can certain do idempotency with consumer,
> >they are not going to handle the volume from beginning on regular basis
> >(BTW, the key rotation will be on regular basis). Maybe implement custom
> >offset storage somewhere else?
> >grade user moving to cloud and good to see interests here. It is
> >currently a blocker for us, while kafka 0.9 made progress with SASL at
> >communication front, I see less talk around encryption at rest and
> >support for live/in-place key rotation/re-encrypt. It does seem to me
> >easy to implement by just exposing pluggability of codec and JMX for
> >cleaner thread invocation and things will be taken care  of transparently?
> >
> >You could do this with (I expect) reasonable efficiency and with no
> >changes to Kafka code by using multiple topics.
> >
> >You can have a script that in a streaming manner reads out all messages in
> >a topic, decrypts them with the old key, encrypts them with the new key,
> >and adds them to a new topic.  At the time you need to re-encrypt all the
> >messages in a topic, invoke the script.  You would probably have one topic
> >per encryption key.  The producers would know the right topic to write to
> >since they know which version of the encryption key.  The consumers would
> >have to have some logic to switch over.  Perhaps the last message in a
> >particular topic could tell the consumers to switch to a different topic.
> >
> >If you invoke the script from someplace close to the Kafka brokers, the
> >overhead should I think not be too much higher than custom codec approach
> >you were talking about.  Just some local network traffic, with the amount
> >depending on the size of the topic and your replication factor.
> >
> >-- Jim
> >
> >>Hi Jens,
> >>I got your point but some of our use case cannot just rely on TTL. We try
> >>to have long expiry for message and rather compact them (dedup) so we can
> >>replay messages as system of records. When key is lost, we will invalid
> >>the old key so message encrypted by old message will not be able to
> >>decrypt if we don't re-encrypt them also.
> >>
> >>Josh
> >>Kafka will/can expire message logs after a certain TTL. You can't simply
> >>rely on expiration for key rotation? That is, you start to produce
> >>messages with a different key while your consumer temporarily handles the
> >>overlap of keys for the duration of the TTL.
> >>
> >>
> >>
> >>
> >>Just an idea,
> >>
> >>Jens
> >>
> >>> We are trying to deploy kafka into EC2 and one of the requirement from
> >>>infosec is to have kafka encryption at rest (stored with encrypted
> >>>value). We also need to be able to rotate encryption keys and re-encrypt
> >>>all the messages on regular basis since we are a financial company. The
> >>>re-encryption feels challenging since kafka messages are immutable from
> >>>client side (producer and consumer). Some ideas floating around to have
> >>>replicated clustered but then it will mess up all the offsets of the
> >>>consumer and switching is complicated from operational perspective.
> >>> One idea we have is to achieve this is to plugin our own "compression"
> >>>codec which deal with both compression and encryption logic and leverage
> >>>compaction cycle to re-write all the messages by calling decompress and
> >>>compress into a new file. It feels this approach can also have zero
> >>>impact to the consumer/producer if they are using the same "codec" for
> >>>compression since the offsets will be intact.
> >>> My current understanding is the codecs are hardcoded right now (we are
> >>>using .9) so it will require us to customize kafka. Also compaction
> >>>cannot be triggered on-demand, which is needed in case of the key loss.
> >>>So before we take on customization of kafka, I am just wondering if our
> >>>thinking is on the right track.
> >>> I hope some of the committers from Confluent/Hornton/Cloudera can
> >>>comment on that and the road map to support encryption at rest and key
> >>>rotation, or otherwise alternative to what is proposed. Also please let
> >>>me know if my question/problem is not clear.
> >>> Thanks,
> >>> Josh
Reply via email to