Hello,
we (actually Dave) recently found that the S3 offloader did not delete
the objects.

This is a quick fix for the problem
https://github.com/apache/pulsar/pull/14694

I am working on a set of integration tests to prevent these kinds of
problems for the future.

But I would like to start a discussion about how to recover from the
side effects of this bug:
Users have old Blobs that must be deleted, for costs saving and also
because in some countries there are strict rules about guarantees of
deleting data when it is no longer needed (for instance European
GDPR).

I am working on a tool that scans in the background existing objects
on Tiered Storage and then performs cleaning up.

The basic version of this "sweeper" tool:
- searches for orphan objects
- logs them ("dry run" mode) or delete them

This tool may be launched manually, or it can run as a k8s job periodically.

Thoughts ?

Enrico

Reply via email to