Hi Navneeth,
I wrote that the *local state stores* are not affected when the topic
configs cleanup.policy and retention.ms are passed to the state store.
The *changelog topics* will consider the configs and they will remove
data as specified in the configs.
In the case of a state migration to another instance, it depends whether
the other instance already has some state for the given state locally.
- If the most recent offset on the instance is still within the range of
offsets of the changelog topic on the brokers, the state will be
replayed from the local offset to the most recent offset on the brokers.
Data removed on the brokers might still exist locally.
- If the most recent offset on the instance is before the range of
offsets of the changelog topic on the brokers, the state will be
replayed from the beginning of the the changelog on the brokers which
means that removed data in the changelog topic cannot be replayed
because the beginning of the changelog was moved after this data.
- If the state does not exist on the instance, the state will be
replayed from the beginning of the the changelog and removed data is not
replayed as in the previous case.
I hope that helps.
Best,
Bruno
On 08.05.21 00:59, Navneeth Krishnan wrote:
Hi Bruno/All,
I have a follow up question regarding the same topic. As per you had
mentioned there will be no impact to key value stores even when retention.ms
and clean up policy is provided. Does that mean the change log topic will
not clear the data in the broker even after the retention period is over?
I agree the local state stores will not be able to delete the data but when
there is any reallocation then the state replay would just have to replay
the data for the given retention time. Is this understanding correct?
Thanks
On Mon, Apr 19, 2021 at 1:57 AM Bruno Cadonna <cado...@apache.org> wrote:
Hi Upesh,
The answers to your questions are:
1.
The configs cleanup.policy and retention.ms are topic configs. Hence,
they only affect the changelog of a state store, not the local state
store in a Kafka Streams client.
Locally, window and session stores remove data they do not need anymore.
Window and session stores are segmented stores. That means they consist
of segments that are ordered by the windows they contain. Once the
segment that contains the oldest windows is not needed anymore, i.e.,
the data exceeded the retention time of the state store, the segment is
removed.
Non-windowed state store will not remove data.
Worth noting here: If you change retention.ms directly on the brokers,
it will not affect the behavior of local state stores.
2.
Yes, this behavior is the same for in-memory state stores and persistent
state stores.
3.
Window and session state stores do remove data.
Best,
Bruno
On 18.04.21 18:18, Upesh Desai wrote:
Hello, I have not been able to find a concrete answer on if/how state
stores on a running kafka streams instance remove data when it has
passed the configured retention.ms config. So a couple clarification
questions:
1. If the stores are configured with: cleanup.policy=compact,delete AND
retention.ms=N, will the stores remove data automatically over time
in the running stream instance stores?
2. Is this behavior the same for in-memory stores and persistent
rocksdb stores?
3. If they do not remove data that has passed the retention.ms period,
is there a different way to periodically remove old data from the
stores?
I’m using kafka 2.7.0 components across the board (broker, connect,
etc.).
Thanks in advance,
Upesh
<https://www.itrsgroup.com/>
Upesh Desai
Senior Software Developer
*ude...@itrsgroup.com* <mailto:ude...@itrsgroup.com>
*www.itrsgroup.com* <https://www.itrsgroup.com/>
Internet communications are not secure and therefore the ITRS Group does
not accept legal responsibility for the contents of this message. Any
view or opinions presented are solely those of the author and do not
necessarily represent those of the ITRS Group unless otherwise
specifically stated.
[itrs.email.signature]
*Disclaimer*
The information contained in this communication from the sender is
confidential. It is intended solely for use by the recipient and others
authorized to receive it. If you are not the recipient, you are hereby
notified that any disclosure, copying, distribution or taking action in
relation of the contents of this information is strictly prohibited and
may be unlawful.
This email has been scanned for viruses and malware, and may have been
automatically archived by *Mimecast Ltd*, an innovator in Software as a
Service (SaaS) for business. Providing a *safer* and *more useful* place
for your human generated data. Specializing in; Security, archiving and
compliance.