Mark Payne created NIFI-7940:
--------------------------------
Summary: Create a ScriptedPartitionRecord
Key: NIFI-7940
URL: https://issues.apache.org/jira/browse/NIFI-7940
Project: Apache NiFi
Issue Type: Bug
Components: Extensions
Reporter: Mark Payne
In 1.12.0, we introduced the ScriptedTransformRecord. This has worked very well
for many different transformations that are simple in code but very difficult
with the DSL's that NiFi supports. In addition to transforming records, another
use case that can be made dramatically easier with scripting is partitioning
records.
The PartitionRecord processor is very powerful and easy to use, but RecordPath
is somewhat limited in the functions that it provides. For example, recently in
the Apache Slack channel, we had someone asking about how to route data based
on whether or not the "timestamp" field matches a given regular expression.
Attempts were made using QueryRecord with RPATH but that didn't work because
the timestamp field is a top-level field, not a Record. Tried using
UpdateRecord but that failed because the matchesRegex function of RecordPath is
a predicate so can't be used to partition on. Eventually a pattern was found
with QueryRecord using the `SIMILAR TO` but that function does not support for
regular expressions.
A Scripted processor would likely make this far more trivial to handle. This
processor should be focused around making it dead simple to partition (and
subsequently route based on added attributes) records with a scripting
language. So, it will be important, like ScriptedTransformRecord, to make the
processor geared more toward ease of use than being given the full power of
FlowFiles, sessions, etc.
The script should have the same bindings as ScriptedTransformRecord:
* attributes
* log
* record
* recordIndex
Unlike ScriptedTransformRecord, though, the ScriptedPartitionRecord should
return one of three things:
* A string (or a primitive value such as an int, that can be turned into a
String) representing the partition for the Record
* A collection of strings/primitives representing multiple partitions that the
Record should go to (indicating that the Record should be added to multiple
Record Writers)
* A null value or an empty collection indicating that the Record should be
dropped.
The processor should then write the Record to a FlowFile for each of the
Partitions returned. For each outbound FlowFile, an attribute should be added
indicating the partition for that FlowFile for easy follow-on routing via
RouteOnAttribute, etc.
The processor should keep a counter for how many Records were dropped.
The processor should keep counters for how many Records were routed to each
Partition.
The processor should include additionalDetails.html to provide sufficient
documentation. Similar to ScriptedTransformRecord, the documentation should
include several examples, spanning at least Groovy and Python (since those are
by far the most often used languages we see used in script processors). Each
example provided in the additionalDetails.html should also have an accompanying
unit test to verify the behavior.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)