Hi Kyle,

There’s no minimum amount to keep - it will always keep everything that it can 
keep, up to the configured max. So in your case once you hit 10 GB of 
provenance events it will age off the older events to make room for newer 
events. And anything over 30 days will also be eliminated, for compliance types 
of reasons. Note that the provenance repository only stores information related 
to lineage and FlowFile attributes. It does not store FlowFile content. The 
content is stored in the Content Repository and the amount of data to keep 
available for viewing through provenance is controlled by the 
“nifi.content.repository.archive.max.retention.period” and 
“nifi.content.repository.archive.max.usage.percentage" properties.

Thanks
-Mark

On Mar 4, 2025, at 10:53 AM, Nguyen, Kyle <kyle.ngu...@mlp.com.INVALID> wrote:

Hi. Is there a “minimum provenance” to keep setting for a given processor?  
Use-case is


  1.  User asks us to check provenance on one of their “weekly” workflows (for 
data errors, etc.)
  2.  The “weekly” task’s provenance got deleted—perhaps from other, more 
frequently executed processors, due to our quota/limit settings.
  3.  So it would be nice if there was a “minimum provenance to keep” setting.


We’re aware of and currently have these properties in `nifi.properties`.

# Provenance Repository Properties
nifi.provenance.repository.implementation=org.apache.nifi.provenance.WriteAheadProvenanceRepository

# Persistent Provenance Repository Properties
nifi.provenance.repository.directory.default=./provenance_repository
nifi.provenance.repository.max.storage.time=30 days
nifi.provenance.repository.max.storage.size=10 GB
nifi.provenance.repository.rollover.time=10 mins
nifi.provenance.repository.rollover.size=100 MB
nifi.provenance.repository.query.threads=2
nifi.provenance.repository.index.threads=2
nifi.provenance.repository.compress.on.rollover=true
nifi.provenance.repository.always.sync=false
# Comma-separated list of fields. Fields that are not indexed will not be 
searchable. Valid fields are:
# EventType, FlowFileUUID, Filename, TransitURI, ProcessorID, 
AlternateIdentifierURI, Relationship, Details
nifi.provenance.repository.indexed.fields=EventType, FlowFileUUID, Filename, 
ProcessorID, Relationship
# FlowFile Attributes that should be indexed and made searchable.  Some 
examples to consider are filename, uuid, mime.type
nifi.provenance.repository.indexed.attributes=
# Large values for the shard size will result in more Java heap usage when 
searching the Provenance Repository
# but should provide better performance
nifi.provenance.repository.index.shard.size=500 MB
# Indicates the maximum length that a FlowFile attribute can be when retrieving 
a Provenance Event from
# the repository. If the length of any attribute exceeds this value, it will be 
truncated when the event is retrieved.
nifi.provenance.repository.max.attribute.length=65536
nifi.provenance.repository.concurrent.merge.threads=2


# Volatile Provenance Respository Properties
nifi.provenance.repository.buffer.size=100000

[cid:image001.png@01DB8CF3.0A1EBDD0]
Kyle Nguyen
Corporate Technology, Software Engineer

Millennium Management LLC
399 Park Avenue  |  New York, NY 10022
📞 +1.212.708.1366  | 📱 +1.929.837.1788
mlp.com<https://www.mlp.com/home/>



######################################################################
The information contained in this communication is confidential and
may contain information that is privileged or exempt from disclosure
under applicable law. If you are not a named addressee, please notify
the sender immediately and delete this email from your system.
If you have received this communication, and are not a named
recipient, you are hereby notified that any dissemination,
distribution or copying of this communication is strictly prohibited.
######################################################################

Reply via email to