simonbence commented on a change in pull request #4824:
URL: https://github.com/apache/nifi/pull/4824#discussion_r578386400
##########
File path:
nifi-nar-bundles/nifi-splunk-bundle/nifi-splunk-processors/src/main/resources/docs/org.apache.nifi.processors.splunk.QuerySplunkIndexingStatus/additionalDetails.html
##########
@@ -46,10 +46,21 @@ <h3>Unacknowledged and undetermined cases</h3>
includes unsuccessful or ongoing indexing and unknown acknowledgement
identifiers. In order to avoid infinite tries,
QuerySplunkIndexingStatus gives user the possibility to set a "Maximum
waiting time". Results with value of false from Splunk
within the specified waiting time will be handled as "undetermined" and
are transferred to the "undetermined" relationship.
- Flow files outside of this time range will be transferred to the
"unacknowledged" relationship next time the processor is
- triggered. In order to determine if the indexing of a given event is
within the waiting time, the Unix Epoch of the original
- Splunk response is stored in the attribute "splunk.responded.at". Setting
"Maximum waiting time" too low might
- result some false negative result as in case under higher load, Splunk
server might index slower than it is expected.
+</p>
+
+<p>
+ Acknowledgment status of flow files being in the "undetermined"
relationship are checked with Splunk and regardless of the result
Review comment:
Flow files at this point are not transferred to the undetermined
relationship (or any other). In a nutshell, when they are transferred into any
relationship, the leave the scope of the processor. I would suggest to phrase
like: every flow file, after the first acknowledgement poll query towards the
Splunk will be extended with the attribute "ack.checked.at.splunk" considering
the Splunk server responded. The attribute serves as a flag for later
processing to determine if the acknowledgement was checked at least once.
##########
File path:
nifi-nar-bundles/nifi-splunk-bundle/nifi-splunk-processors/src/main/resources/docs/org.apache.nifi.processors.splunk.QuerySplunkIndexingStatus/additionalDetails.html
##########
@@ -46,10 +46,21 @@ <h3>Unacknowledged and undetermined cases</h3>
includes unsuccessful or ongoing indexing and unknown acknowledgement
identifiers. In order to avoid infinite tries,
QuerySplunkIndexingStatus gives user the possibility to set a "Maximum
waiting time". Results with value of false from Splunk
within the specified waiting time will be handled as "undetermined" and
are transferred to the "undetermined" relationship.
- Flow files outside of this time range will be transferred to the
"unacknowledged" relationship next time the processor is
- triggered. In order to determine if the indexing of a given event is
within the waiting time, the Unix Epoch of the original
- Splunk response is stored in the attribute "splunk.responded.at". Setting
"Maximum waiting time" too low might
- result some false negative result as in case under higher load, Splunk
server might index slower than it is expected.
+</p>
+
+<p>
+ Acknowledgment status of flow files being in the "undetermined"
relationship are checked with Splunk and regardless of the result
+ an "ack.checked.at.splunk" attribute with true value will be put on them
ensuring at least one acknowledgment check is happening
+ even outside of "Maximum waiting time".
+ Flow files having both their "ack.checked.at.splunk" attribute equals to
true and being outside of the time range will be
Review comment:
I would approach this from the other side: flow files are timed out but
not having the "ack.checked.at.splunk" are prevented from being transferred to
the "unacknowledged" relationship but instead will be handled as "undetermined"
##########
File path:
nifi-nar-bundles/nifi-splunk-bundle/nifi-splunk-processors/src/main/java/org/apache/nifi/processors/splunk/QuerySplunkIndexingStatus.java
##########
@@ -54,6 +52,9 @@
@ReadsAttributes({
@ReadsAttribute(attribute = "splunk.acknowledgement.id", description =
"The indexing acknowledgement id provided by Splunk."),
@ReadsAttribute(attribute = "splunk.responded.at", description = "The
time of the response of put request for Splunk.")})
+@WritesAttributes({
+ @WritesAttribute(attribute = "ack.checked.at.splunk", description =
"Identifying whether Splunk acknowledgement check has happened.")
Review comment:
Please extend the description with information about when the attribute
is expected to be set (it is not set when not already checked, etc.)
##########
File path:
nifi-nar-bundles/nifi-splunk-bundle/nifi-splunk-processors/src/main/resources/docs/org.apache.nifi.processors.splunk.QuerySplunkIndexingStatus/additionalDetails.html
##########
@@ -72,5 +83,195 @@ <h3>Performance</h3>
batch but the processor might execute the query with less number of
undetermined events.
</p>
+<h2>Example</h2>
+
+The following table represents four flow files being batch processed by the
processor. For simplicity reasons "splunk.responded.at" time
+will be provided in HH:MM:SS format instead of Epoch time. The "Maximum
waiting time" (referred as TTL from now on) is set for 3 minutes
+and the processor is scheduled to run every minute. The files are penalized
for 30 seconds. "Undetermined" relationship is loop backed to the processor.
+
+<table>
+ <tbody>
+ <tr>
+ <th>Flow file id</th>
+ <th>Splunk acknowledgement id</th>
+ <th>Splunk responded at</th>
+ <th>Ack checked at Splunk</th>
+ <th>Relationship</th>
+ </tr>
+ <tr>
+ <td>flow file 1</td>
+ <td>null</td>
+ <td>10:46:02</td>
+ <td>No value set</td>
+ </tr>
+ <tr>
+ <td>flow file 2</td>
+ <td>0</td>
+ <td>10:46:18</td>
+ <td>No value set</td>
+ </tr>
+ <tr>
+ <td>flow file 3</td>
+ <td>1</td>
+ <td>10:47:04</td>
+ <td>No value set</td>
+ </tr>
+ <tr>
+ <td>flow file 4</td>
+ <td>2</td>
+ <td>10:47:23</td>
+ <td>No value set</td>
+ </tr>
+ </tbody>
+</table>
+
+<h4>First iteration</h4>
+
+The current time is 10:48:05 so all of our files are within TTL. Flow file 1
does not have a correct "splunk.acknowledgment.id"
+so will be directed to the "Failure" relationship. The remaining files will be
sent for Splunk ack check.
+
+With the following Splunk response "acks:{[0:true, 1:false, 2:false]}" flow
file 2 will be directed to "Success" relationship
+and the rest of the files to the "Undetermined" relationship and penalized for
30 seconds.
+
+<table>
+ <tbody>
+ <tr>
+ <th>Flow file id</th>
+ <th>Splunk acknowledgement id</th>
+ <th>Splunk responded at</th>
+ <th>Ack checked at Splunk</th>
+ <th>Relationship</th>
+ </tr>
+ <tr>
+ <td>flow file 1</td>
+ <td>null</td>
+ <td>10:46:02</td>
+ <td>No value set</td>
+ <td>Failure</td>
+ </tr>
+ <tr>
+ <td>flow file 2</td>
+ <td>0</td>
+ <td>10:46:18</td>
+ <td>true</td>
+ <td>Success</td>
+
+ </tr>
+ <tr>
+ <td>flow file 3</td>
+ <td>1</td>
+ <td>10:47:04</td>
+ <td>true</td>
+ <td>Undetermined</td>
+ </tr>
+ <tr>
+ <td>flow file 4</td>
+ <td>2</td>
+ <td>10:47:23</td>
+ <td>true</td>
+ <td>Undetermined</td>
+ </tr>
+ </tbody>
+</table>
+
+<h4>Second iteration</h4>
Review comment:
Just for keeping the focus, you can remove the FFs from the later tables
which are already determined
##########
File path:
nifi-nar-bundles/nifi-splunk-bundle/nifi-splunk-processors/src/main/resources/docs/org.apache.nifi.processors.splunk.QuerySplunkIndexingStatus/additionalDetails.html
##########
@@ -72,5 +83,195 @@ <h3>Performance</h3>
batch but the processor might execute the query with less number of
undetermined events.
</p>
+<h2>Example</h2>
+
+The following table represents four flow files being batch processed by the
processor. For simplicity reasons "splunk.responded.at" time
+will be provided in HH:MM:SS format instead of Epoch time. The "Maximum
waiting time" (referred as TTL from now on) is set for 3 minutes
+and the processor is scheduled to run every minute. The files are penalized
for 30 seconds. "Undetermined" relationship is loop backed to the processor.
+
+<table>
+ <tbody>
+ <tr>
+ <th>Flow file id</th>
+ <th>Splunk acknowledgement id</th>
+ <th>Splunk responded at</th>
+ <th>Ack checked at Splunk</th>
+ <th>Relationship</th>
+ </tr>
+ <tr>
+ <td>flow file 1</td>
+ <td>null</td>
+ <td>10:46:02</td>
+ <td>No value set</td>
+ </tr>
+ <tr>
+ <td>flow file 2</td>
+ <td>0</td>
+ <td>10:46:18</td>
+ <td>No value set</td>
+ </tr>
+ <tr>
+ <td>flow file 3</td>
+ <td>1</td>
+ <td>10:47:04</td>
+ <td>No value set</td>
+ </tr>
+ <tr>
+ <td>flow file 4</td>
+ <td>2</td>
+ <td>10:47:23</td>
+ <td>No value set</td>
+ </tr>
+ </tbody>
+</table>
+
+<h4>First iteration</h4>
+
+The current time is 10:48:05 so all of our files are within TTL. Flow file 1
does not have a correct "splunk.acknowledgment.id"
+so will be directed to the "Failure" relationship. The remaining files will be
sent for Splunk ack check.
+
+With the following Splunk response "acks:{[0:true, 1:false, 2:false]}" flow
file 2 will be directed to "Success" relationship
+and the rest of the files to the "Undetermined" relationship and penalized for
30 seconds.
+
+<table>
+ <tbody>
+ <tr>
+ <th>Flow file id</th>
+ <th>Splunk acknowledgement id</th>
+ <th>Splunk responded at</th>
+ <th>Ack checked at Splunk</th>
+ <th>Relationship</th>
+ </tr>
+ <tr>
+ <td>flow file 1</td>
+ <td>null</td>
+ <td>10:46:02</td>
+ <td>No value set</td>
+ <td>Failure</td>
+ </tr>
+ <tr>
+ <td>flow file 2</td>
+ <td>0</td>
+ <td>10:46:18</td>
+ <td>true</td>
+ <td>Success</td>
+
+ </tr>
+ <tr>
+ <td>flow file 3</td>
+ <td>1</td>
+ <td>10:47:04</td>
+ <td>true</td>
+ <td>Undetermined</td>
+ </tr>
+ <tr>
+ <td>flow file 4</td>
+ <td>2</td>
+ <td>10:47:23</td>
+ <td>true</td>
+ <td>Undetermined</td>
+ </tr>
+ </tbody>
+</table>
+
+<h4>Second iteration</h4>
+
+The current time is 10:49:18 so both flow file 3 and flow file 4 are within
TTL and they will be sent for Splunk ack check.
+
+With the following Splunk response "acks:{[1:false, 2:true]}" flow file 4 will
be directed to "Success" relationship and
+flow file 3 to the "Undetermined" relationship and penalized for 30 seconds.
+In the meantime flow file 1 missing "Splunk acknowledgment id" has been
corrected.
Review comment:
I think, this is something we can skip: we do not provide any mechanism
to correct this attribute so this situation is too hypothetical.
##########
File path:
nifi-nar-bundles/nifi-splunk-bundle/nifi-splunk-processors/src/main/resources/docs/org.apache.nifi.processors.splunk.QuerySplunkIndexingStatus/additionalDetails.html
##########
@@ -72,5 +83,195 @@ <h3>Performance</h3>
batch but the processor might execute the query with less number of
undetermined events.
</p>
+<h2>Example</h2>
+
+The following table represents four flow files being batch processed by the
processor. For simplicity reasons "splunk.responded.at" time
+will be provided in HH:MM:SS format instead of Epoch time. The "Maximum
waiting time" (referred as TTL from now on) is set for 3 minutes
+and the processor is scheduled to run every minute. The files are penalized
for 30 seconds. "Undetermined" relationship is loop backed to the processor.
+
+<table>
+ <tbody>
+ <tr>
+ <th>Flow file id</th>
+ <th>Splunk acknowledgement id</th>
+ <th>Splunk responded at</th>
+ <th>Ack checked at Splunk</th>
+ <th>Relationship</th>
+ </tr>
+ <tr>
+ <td>flow file 1</td>
+ <td>null</td>
+ <td>10:46:02</td>
+ <td>No value set</td>
+ </tr>
+ <tr>
+ <td>flow file 2</td>
+ <td>0</td>
+ <td>10:46:18</td>
+ <td>No value set</td>
+ </tr>
+ <tr>
+ <td>flow file 3</td>
+ <td>1</td>
+ <td>10:47:04</td>
+ <td>No value set</td>
+ </tr>
+ <tr>
+ <td>flow file 4</td>
+ <td>2</td>
+ <td>10:47:23</td>
+ <td>No value set</td>
+ </tr>
+ </tbody>
+</table>
+
+<h4>First iteration</h4>
+
+The current time is 10:48:05 so all of our files are within TTL. Flow file 1
does not have a correct "splunk.acknowledgment.id"
+so will be directed to the "Failure" relationship. The remaining files will be
sent for Splunk ack check.
+
+With the following Splunk response "acks:{[0:true, 1:false, 2:false]}" flow
file 2 will be directed to "Success" relationship
+and the rest of the files to the "Undetermined" relationship and penalized for
30 seconds.
+
+<table>
+ <tbody>
+ <tr>
+ <th>Flow file id</th>
+ <th>Splunk acknowledgement id</th>
+ <th>Splunk responded at</th>
+ <th>Ack checked at Splunk</th>
+ <th>Relationship</th>
+ </tr>
+ <tr>
+ <td>flow file 1</td>
+ <td>null</td>
+ <td>10:46:02</td>
+ <td>No value set</td>
+ <td>Failure</td>
+ </tr>
+ <tr>
+ <td>flow file 2</td>
+ <td>0</td>
+ <td>10:46:18</td>
+ <td>true</td>
+ <td>Success</td>
+
+ </tr>
+ <tr>
+ <td>flow file 3</td>
+ <td>1</td>
+ <td>10:47:04</td>
+ <td>true</td>
+ <td>Undetermined</td>
+ </tr>
+ <tr>
+ <td>flow file 4</td>
+ <td>2</td>
+ <td>10:47:23</td>
+ <td>true</td>
+ <td>Undetermined</td>
+ </tr>
+ </tbody>
+</table>
+
+<h4>Second iteration</h4>
+
+The current time is 10:49:18 so both flow file 3 and flow file 4 are within
TTL and they will be sent for Splunk ack check.
Review comment:
As your example is pretty exhaustive, it might worth mentioning that
during the timeframe of penalise, the processor might be triggered. In the case
no further flow files arrived, it will abruptly finishes it's activation, in
case there are new FFs are arrived, those will be processed (in this case:
before the penalised ones)
##########
File path:
nifi-nar-bundles/nifi-splunk-bundle/nifi-splunk-processors/src/main/resources/docs/org.apache.nifi.processors.splunk.QuerySplunkIndexingStatus/additionalDetails.html
##########
@@ -46,10 +46,21 @@ <h3>Unacknowledged and undetermined cases</h3>
includes unsuccessful or ongoing indexing and unknown acknowledgement
identifiers. In order to avoid infinite tries,
QuerySplunkIndexingStatus gives user the possibility to set a "Maximum
waiting time". Results with value of false from Splunk
within the specified waiting time will be handled as "undetermined" and
are transferred to the "undetermined" relationship.
- Flow files outside of this time range will be transferred to the
"unacknowledged" relationship next time the processor is
- triggered. In order to determine if the indexing of a given event is
within the waiting time, the Unix Epoch of the original
- Splunk response is stored in the attribute "splunk.responded.at". Setting
"Maximum waiting time" too low might
- result some false negative result as in case under higher load, Splunk
server might index slower than it is expected.
+</p>
+
+<p>
+ Acknowledgment status of flow files being in the "undetermined"
relationship are checked with Splunk and regardless of the result
+ an "ack.checked.at.splunk" attribute with true value will be put on them
ensuring at least one acknowledgment check is happening
+ even outside of "Maximum waiting time".
+ Flow files having both their "ack.checked.at.splunk" attribute equals to
true and being outside of the time range will be
+ transferred to the "unacknowledged" relationship next time the processor
is triggered.
+</p>
+
+<p>
+ In order to determine if the indexing of
+ a given event is within the waiting time, the Unix Epoch of the original
Splunk response is stored in the attribute "splunk.responded.at".
Review comment:
I would go with: the arrival time of the original Splunk response is
stored in the attribute "splunk.responded.at" in Unix Epoch format.
##########
File path:
nifi-nar-bundles/nifi-splunk-bundle/nifi-splunk-processors/src/main/resources/docs/org.apache.nifi.processors.splunk.QuerySplunkIndexingStatus/additionalDetails.html
##########
@@ -72,5 +83,195 @@ <h3>Performance</h3>
batch but the processor might execute the query with less number of
undetermined events.
</p>
+<h2>Example</h2>
Review comment:
Great idea to add an example! 👍
##########
File path:
nifi-nar-bundles/nifi-splunk-bundle/nifi-splunk-processors/src/main/resources/docs/org.apache.nifi.processors.splunk.QuerySplunkIndexingStatus/additionalDetails.html
##########
@@ -72,5 +83,195 @@ <h3>Performance</h3>
batch but the processor might execute the query with less number of
undetermined events.
</p>
+<h2>Example</h2>
+
+The following table represents four flow files being batch processed by the
processor. For simplicity reasons "splunk.responded.at" time
+will be provided in HH:MM:SS format instead of Epoch time. The "Maximum
waiting time" (referred as TTL from now on) is set for 3 minutes
Review comment:
I think "will be presented" would communicate the intent better
##########
File path:
nifi-nar-bundles/nifi-splunk-bundle/nifi-splunk-processors/src/main/resources/docs/org.apache.nifi.processors.splunk.QuerySplunkIndexingStatus/additionalDetails.html
##########
@@ -72,5 +83,195 @@ <h3>Performance</h3>
batch but the processor might execute the query with less number of
undetermined events.
</p>
+<h2>Example</h2>
+
+The following table represents four flow files being batch processed by the
processor. For simplicity reasons "splunk.responded.at" time
+will be provided in HH:MM:SS format instead of Epoch time. The "Maximum
waiting time" (referred as TTL from now on) is set for 3 minutes
+and the processor is scheduled to run every minute. The files are penalized
for 30 seconds. "Undetermined" relationship is loop backed to the processor.
+
+<table>
+ <tbody>
+ <tr>
+ <th>Flow file id</th>
+ <th>Splunk acknowledgement id</th>
+ <th>Splunk responded at</th>
+ <th>Ack checked at Splunk</th>
+ <th>Relationship</th>
+ </tr>
+ <tr>
+ <td>flow file 1</td>
+ <td>null</td>
+ <td>10:46:02</td>
+ <td>No value set</td>
+ </tr>
+ <tr>
+ <td>flow file 2</td>
+ <td>0</td>
+ <td>10:46:18</td>
+ <td>No value set</td>
+ </tr>
+ <tr>
+ <td>flow file 3</td>
+ <td>1</td>
+ <td>10:47:04</td>
+ <td>No value set</td>
+ </tr>
+ <tr>
+ <td>flow file 4</td>
+ <td>2</td>
+ <td>10:47:23</td>
+ <td>No value set</td>
+ </tr>
+ </tbody>
+</table>
+
+<h4>First iteration</h4>
+
+The current time is 10:48:05 so all of our files are within TTL. Flow file 1
does not have a correct "splunk.acknowledgment.id"
Review comment:
I think it is a bit challenging to read flow file 1, within a
paragraph. I would suggest either highlight it with some way (italian, bold) or
instead refer into the flow files as #1, #2 and so.
A minor clarification: there is no validation on the value of
"splunk.acknowledgment.id", thus, setting it to null is considered as "no
attribute is set". Just to be sure, I made a quick test, and when the ackId is
extracted, having a null will result an Optional#empty just like when it is not
set at all. As a result I would suggest to rephrase it to not set.
##########
File path:
nifi-nar-bundles/nifi-splunk-bundle/nifi-splunk-processors/src/main/resources/docs/org.apache.nifi.processors.splunk.QuerySplunkIndexingStatus/additionalDetails.html
##########
@@ -72,5 +83,195 @@ <h3>Performance</h3>
batch but the processor might execute the query with less number of
undetermined events.
</p>
+<h2>Example</h2>
+
+The following table represents four flow files being batch processed by the
processor. For simplicity reasons "splunk.responded.at" time
+will be provided in HH:MM:SS format instead of Epoch time. The "Maximum
waiting time" (referred as TTL from now on) is set for 3 minutes
+and the processor is scheduled to run every minute. The files are penalized
for 30 seconds. "Undetermined" relationship is loop backed to the processor.
Review comment:
Minor: looped back sounds more natural
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]