Hi, I am reaching back regarding this topic, as any insight would be very helpful. Would it be possible to implement such processor state, which updates would be atomic with that of the flowfile repository, in light of the way that NiFi is currently architected? And if so, would you have any hints on how it could be achieved?
Thank you very much! Le dim. 26 janv. 2025 à 12:37, Corentin Régent <corentin.reg...@gmail.com> a écrit : > Hi NiFi team, > > The ConsumeKafka > <https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-kafka-2-6-nar/stable/org.apache.nifi.processors.kafka.pubsub.ConsumeKafka_2_6/index.html> > processor > currently offers at-least-once semantics through asynchronous offset > commits to Kafka, meaning that the data can be duplicated for example if a > cluster restart occurs before the offsets are acknowledged. One way to > guarantee exactly-once semantics (assuming no data loss in NiFi from > hardware failures) instead could be to persist some processor state within > NiFi, tracking the current offsets. But, this state management has to be > atomic with operations performed on the flowfile repository, therefore the > StateManager > <https://nifi.apache.org/nifi-docs/developer-guide.html#state_manager> is > unsuitable. > > I was wondering whether it would be feasible, in a reliable way, to > retrieve such state from the latest flowfiles' attributes, presumably from > the provenance repository? Which I assume is atomic with / recovered from > the flowfile repository? I am not familiar with Lucene; I may be looking > for something similar to > WriteAheadProvenanceRepository::getLatestCachedEvents > <https://github.com/apache/nifi/blob/main/nifi-framework-bundle/nifi-framework-extensions/nifi-provenance-repository-bundle/nifi-persistent-provenance-repository/src/main/java/org/apache/nifi/provenance/WriteAheadProvenanceRepository.java#L260>, > but that would leverage up-to-date persisted information, rather than an > in-memory map that is lost upon cluster restart. > > Thanks, >