Hello NiFi team,

I am trying to monitor NiFi data ingestions from Kafka to HDFS, by tracing
the offsets of the messages to make sure none is lost. The idea I have
would be to aggregate Kafka provenance attributes as the messages are
merged into a single file, then atomically save them as HDFS extended
attributes when writing the output file.

So I *may* (if I find enough time and if it is worth the effort) wish to
contribute and:
- Implement a new attribute strategy for the MergeContent processor, that
would allow the user to merge the attributes of the flowfiles. I only need
a "join-like" operation so I probably won't be creating a fully-fledged
aggregation expression language though.
- Add the ability to write extended file attributes in the PutHDFS file
(based on flowfile attributes). I saw that ticket NIFI-10524
<https://issues.apache.org/jira/browse/NIFI-10524> discusses this subject a
little, though does not really address it.

I am unfamiliar with NiFi / Apache development processes. First of all,
would you be interested in me implementing such features? (I make no
promises though)

And, at my company we are using NiFi 1.x through Cloudera Data Flow. So, I
was wondering whether you were still adding new features in NiFi 1.x? I
assume this would imply making one PR for NiFi 1, and one PR for NiFi 2,
for each feature? And, how long (approximately) could I expect to be
waiting for a new NiFi release after the (potential) merge? Also, if you
are still closely related with Cloudera, how long can it take to support a
new NiFi version in CDF?

Thank you for your work and support,

Corentin Régent

Reply via email to