Hi Sandor, thanks again for your reply. > If you have a non-log-compacted topic, after `retention.ms` the message> > (along with the PII) gets deleted from the Kafka message store without any> > further action, which should satisfy GDPR requirements:> > - you are handling PII in Kafka for a limited amount of time> > - you are processing the data for the given purpose it was given> > - the data will automatically be deleted without any further steps> > If you have a downstream system, you should also be able to publish a> > message through Kafka so that the downstream system executes its delete> > processes - if required. We implemented a similar process where we> > published an AnonymizeOrder event, which instructed downstream systems to> > anonymize the order data in their own data store.>
Our problem is, the data could have been published shortly before the system receives a delete order from the "coordinator". This is because the data might have been mutated and the update needs to be propagated to consumer systems. If we go with a retention-period of days we would only be able to proceed with subsequent systems in the coordinated chain with too much of a delay. Going with an even shorter retention would be problematic. > If you have a log-compacted topic:> > - yes, I have the same understanding as you have on the active segment.> > - You can set the segment.ms> > <https://kafka.apache.org/documentation/#segment.ms> property to force the> > compaction to occur within an expected timeframe.> > > In general what I understand is true in both cases that Kafka gives you> > good enough guarantees to either remove the old message after retention.ms> > milliseconds or execute the topic compaction after segment.ms time that it> > is unnecessary to try to figure out more specifically in what exact moment> > the data is deleted. Setting these configurations should give you enough> > guarantee that the data removal will occur - if not, that imo should be> > considered a bug and reported back to the project.> We investigated the max.compaction.lag.ms parameter which was introduced in KIP-354 and from our understanding the intent is exactly what we'd like to accomplish, but unless we missed something we have noticed new segments are rolled only if new messages are appended. If the topic has very low activity it can be that no new message is appended and the segment is left active indefinitely. This means the cleaning for that segment might remain also indefinitely stalled. We are unsure whether our understanding is correct and whether it's a bug or not. In general, I think part of the issue is that the system receives the delete order at the time that it has to be performed: we don't deal with the processing of the required waiting periods, that's what happens in the "coordinator system". The system with the data to be deleted receives the order and has to perform the deletion immediately. Kind regards, -- Christian Apolloni Disclaimer: The contents of this email and any attachment thereto are intended exclusively for the attention of the addressee(s). The email and any such attachment(s) may contain information that is confidential and protected on the strength of professional, official or business secrecy laws and regulations or contractual obligations. Should you have received this email by mistake, you may neither make use of nor divulge the contents of the email or of any attachment thereto. In such a case, please inform the email's sender and delete the message and all attachments without delay from your systems. You can find our e-mail disclaimer statement in other languages under http://www.baloise.ch/email_disclaimer