Hi Sandor, thanks again for your reply.

> If you have a non-log-compacted topic, after `retention.ms` the message>
> (along with the PII) gets deleted from the Kafka message store without any>
> further action, which should satisfy GDPR requirements:>
> - you are handling PII in Kafka for a limited amount of time>
> - you are processing the data for the given purpose it was given>
> - the data will automatically be deleted without any further steps>
> If you have a downstream system, you should also be able to publish a>
> message through Kafka so that the downstream system executes its delete>
> processes - if required. We implemented a similar process where we>
> published an AnonymizeOrder event, which instructed downstream systems to>
> anonymize the order data in their own data store.>

Our problem is, the data could have been published shortly before the system 
receives a delete order from the "coordinator". This is because the data might 
have been mutated and the update needs to be propagated to consumer systems. If 
we go with a retention-period of days we would only be able to proceed with 
subsequent systems in the coordinated chain with too much of a delay. Going 
with an even shorter retention would be problematic.

> If you have a log-compacted topic:>
> - yes, I have the same understanding as you have on the active segment.>
> - You can set the segment.ms>
> <https://kafka.apache.org/documentation/#segment.ms> property to force the>
> compaction to occur within an expected timeframe.>
>
> In general what I understand is true in both cases that Kafka gives you>
> good enough guarantees to either remove the old message after retention.ms>
> milliseconds or execute the topic compaction after segment.ms time that it>
> is unnecessary to try to figure out more specifically in what exact moment>
> the data is deleted. Setting these configurations should give you enough>
> guarantee that the data removal will occur - if not, that imo should be>
> considered a bug and reported back to the project.>

We investigated the max.compaction.lag.ms parameter which was introduced in 
KIP-354 and from our understanding the intent is exactly what we'd like to 
accomplish, but unless we missed something we have noticed new segments are 
rolled only if new messages are appended. If the topic has very low activity it 
can be that no new message is appended and the segment is left active 
indefinitely. This means the cleaning for that segment might remain also 
indefinitely stalled. We are unsure whether our understanding is correct and 
whether it's a bug or not.

In general, I think part of the issue is that the system receives the delete 
order at the time that it has to be performed: we don't deal with the 
processing of the required waiting periods, that's what happens in the 
"coordinator system". The system with the data to be deleted receives the order 
and has to perform the deletion immediately.

Kind regards,

 -- 
 Christian Apolloni



Disclaimer: The contents of this email and any attachment thereto are intended 
exclusively for the attention of the addressee(s). The email and any such 
attachment(s) may contain information that is confidential and protected on the 
strength of professional, official or business secrecy laws and regulations or 
contractual obligations. Should you have received this email by mistake, you 
may neither make use of nor divulge the contents of the email or of any 
attachment thereto. In such a case, please inform the email's sender and delete 
the message and all attachments without delay from your systems.
You can find our e-mail disclaimer statement in other languages under 
http://www.baloise.ch/email_disclaimer

Reply via email to