Hi Navneeth,

I wrote that the *local state stores* are not affected when the topic configs cleanup.policy and retention.ms are passed to the state store. The *changelog topics* will consider the configs and they will remove data as specified in the configs.

In the case of a state migration to another instance, it depends whether the other instance already has some state for the given state locally.

- If the most recent offset on the instance is still within the range of offsets of the changelog topic on the brokers, the state will be replayed from the local offset to the most recent offset on the brokers. Data removed on the brokers might still exist locally.

- If the most recent offset on the instance is before the range of offsets of the changelog topic on the brokers, the state will be replayed from the beginning of the the changelog on the brokers which means that removed data in the changelog topic cannot be replayed because the beginning of the changelog was moved after this data.

- If the state does not exist on the instance, the state will be replayed from the beginning of the the changelog and removed data is not replayed as in the previous case.

I hope that helps.

Best,
Bruno

On 08.05.21 00:59, Navneeth Krishnan wrote:
Hi Bruno/All,

I have a follow up question regarding the same topic. As per you had
mentioned there will be no impact to key value stores even when retention.ms
and clean up policy is provided. Does that mean the change log topic will
not clear the data in the broker even after the retention period is over?

I agree the local state stores will not be able to delete the data but when
there is any reallocation then the state replay would just have to replay
the data for the given retention time. Is this understanding correct?

Thanks

On Mon, Apr 19, 2021 at 1:57 AM Bruno Cadonna <cado...@apache.org> wrote:

Hi Upesh,

The answers to your questions are:

1.
The configs cleanup.policy and retention.ms are topic configs. Hence,
they only affect the changelog of a state store, not the local state
store in a Kafka Streams client.

Locally, window and session stores remove data they do not need anymore.
Window and session stores are segmented stores. That means they consist
of segments that are ordered by the windows they contain. Once the
segment that contains the oldest windows is not needed anymore, i.e.,
the data exceeded the retention time of the state store, the segment is
removed.

Non-windowed state store will not remove data.

Worth noting here: If you change retention.ms directly on the brokers,
it will not affect the behavior of local state stores.

2.
Yes, this behavior is the same for in-memory state stores and persistent
state stores.

3.
Window and session state stores do remove data.


Best,
Bruno



On 18.04.21 18:18, Upesh Desai wrote:
Hello, I have not been able to find a concrete answer on if/how state
stores on a running kafka streams instance remove data when it has
passed the configured retention.ms config. So a couple clarification
questions:

  1. If the stores are configured with: cleanup.policy=compact,delete AND
     retention.ms=N, will the stores remove data automatically over time
     in the running stream instance stores?
  2. Is this behavior the same for in-memory stores and persistent
     rocksdb stores?
  3. If they do not remove data that has passed the retention.ms period,
     is there a different way to periodically remove old data from the
     stores?

I’m using kafka 2.7.0 components across the board (broker, connect,
etc.).

Thanks in advance,
Upesh

<https://www.itrsgroup.com/>


Upesh Desai​
Senior Software Developer

*ude...@itrsgroup.com* <mailto:ude...@itrsgroup.com>
*www.itrsgroup.com* <https://www.itrsgroup.com/>

Internet communications are not secure and therefore the ITRS Group does
not accept legal responsibility for the contents of this message. Any
view or opinions presented are solely those of the author and do not
necessarily represent those of the ITRS Group unless otherwise
specifically stated.

[itrs.email.signature]



*Disclaimer*

The information contained in this communication from the sender is
confidential. It is intended solely for use by the recipient and others
authorized to receive it. If you are not the recipient, you are hereby
notified that any disclosure, copying, distribution or taking action in
relation of the contents of this information is strictly prohibited and
may be unlawful.

This email has been scanned for viruses and malware, and may have been
automatically archived by *Mimecast Ltd*, an innovator in Software as a
Service (SaaS) for business. Providing a *safer* and *more useful* place
for your human generated data. Specializing in; Security, archiving and
compliance.



Reply via email to