Hi Kyle, There’s no minimum amount to keep - it will always keep everything that it can keep, up to the configured max. So in your case once you hit 10 GB of provenance events it will age off the older events to make room for newer events. And anything over 30 days will also be eliminated, for compliance types of reasons. Note that the provenance repository only stores information related to lineage and FlowFile attributes. It does not store FlowFile content. The content is stored in the Content Repository and the amount of data to keep available for viewing through provenance is controlled by the “nifi.content.repository.archive.max.retention.period” and “nifi.content.repository.archive.max.usage.percentage" properties.
Thanks -Mark On Mar 4, 2025, at 10:53 AM, Nguyen, Kyle <kyle.ngu...@mlp.com.INVALID> wrote: Hi. Is there a “minimum provenance” to keep setting for a given processor? Use-case is 1. User asks us to check provenance on one of their “weekly” workflows (for data errors, etc.) 2. The “weekly” task’s provenance got deleted—perhaps from other, more frequently executed processors, due to our quota/limit settings. 3. So it would be nice if there was a “minimum provenance to keep” setting. We’re aware of and currently have these properties in `nifi.properties`. # Provenance Repository Properties nifi.provenance.repository.implementation=org.apache.nifi.provenance.WriteAheadProvenanceRepository # Persistent Provenance Repository Properties nifi.provenance.repository.directory.default=./provenance_repository nifi.provenance.repository.max.storage.time=30 days nifi.provenance.repository.max.storage.size=10 GB nifi.provenance.repository.rollover.time=10 mins nifi.provenance.repository.rollover.size=100 MB nifi.provenance.repository.query.threads=2 nifi.provenance.repository.index.threads=2 nifi.provenance.repository.compress.on.rollover=true nifi.provenance.repository.always.sync=false # Comma-separated list of fields. Fields that are not indexed will not be searchable. Valid fields are: # EventType, FlowFileUUID, Filename, TransitURI, ProcessorID, AlternateIdentifierURI, Relationship, Details nifi.provenance.repository.indexed.fields=EventType, FlowFileUUID, Filename, ProcessorID, Relationship # FlowFile Attributes that should be indexed and made searchable. Some examples to consider are filename, uuid, mime.type nifi.provenance.repository.indexed.attributes= # Large values for the shard size will result in more Java heap usage when searching the Provenance Repository # but should provide better performance nifi.provenance.repository.index.shard.size=500 MB # Indicates the maximum length that a FlowFile attribute can be when retrieving a Provenance Event from # the repository. If the length of any attribute exceeds this value, it will be truncated when the event is retrieved. nifi.provenance.repository.max.attribute.length=65536 nifi.provenance.repository.concurrent.merge.threads=2 # Volatile Provenance Respository Properties nifi.provenance.repository.buffer.size=100000 [cid:image001.png@01DB8CF3.0A1EBDD0] Kyle Nguyen Corporate Technology, Software Engineer Millennium Management LLC 399 Park Avenue | New York, NY 10022 📞 +1.212.708.1366 | 📱 +1.929.837.1788 mlp.com<https://www.mlp.com/home/> ###################################################################### The information contained in this communication is confidential and may contain information that is privileged or exempt from disclosure under applicable law. If you are not a named addressee, please notify the sender immediately and delete this email from your system. If you have received this communication, and are not a named recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. ######################################################################