[GitHub] [nifi] simonbence commented on a change in pull request #4824: NIFI-8176 adding ackChecked flag to QuerySplunkIndexingStatus.onTrigg…

GitBox Thu, 18 Feb 2021 05:08:09 -0800


simonbence commented on a change in pull request #4824:
URL: https://github.com/apache/nifi/pull/4824#discussion_r578386400




##########
File path: 
nifi-nar-bundles/nifi-splunk-bundle/nifi-splunk-processors/src/main/resources/docs/org.apache.nifi.processors.splunk.QuerySplunkIndexingStatus/additionalDetails.html
##########
@@ -46,10 +46,21 @@ <h3>Unacknowledged and undetermined cases</h3>
     includes unsuccessful or ongoing indexing and unknown acknowledgement 
identifiers. In order to avoid infinite tries,
     QuerySplunkIndexingStatus gives user the possibility to set a "Maximum 
waiting time". Results with value of false from Splunk
     within the specified waiting time will be handled as "undetermined" and 
are transferred to the "undetermined" relationship.
-    Flow files outside of this time range will be transferred to the 
"unacknowledged" relationship next time the processor is
-    triggered. In order to determine if the indexing of a given event is 
within the waiting time, the Unix Epoch of the original
-    Splunk response is stored in the attribute "splunk.responded.at". Setting 
"Maximum waiting time" too low might
-    result some false negative result as in case under higher load, Splunk 
server might index slower than it is expected.
+</p>
+
+<p>
+    Acknowledgment status of flow files being in the "undetermined" 
relationship are checked with Splunk and regardless of the result

Review comment:
       Flow files at this point are not transferred to the undetermined 
relationship (or any other). In a nutshell, when they are transferred into any 
relationship, the leave the scope of the processor. I would suggest to phrase 
like: every flow file, after the first acknowledgement poll query towards the 
Splunk will be extended with the attribute "ack.checked.at.splunk" considering 
the Splunk server responded. The attribute serves as a flag for later 
processing to determine if the acknowledgement was checked at least once.

##########
File path: 
nifi-nar-bundles/nifi-splunk-bundle/nifi-splunk-processors/src/main/resources/docs/org.apache.nifi.processors.splunk.QuerySplunkIndexingStatus/additionalDetails.html
##########
@@ -46,10 +46,21 @@ <h3>Unacknowledged and undetermined cases</h3>
     includes unsuccessful or ongoing indexing and unknown acknowledgement 
identifiers. In order to avoid infinite tries,
     QuerySplunkIndexingStatus gives user the possibility to set a "Maximum 
waiting time". Results with value of false from Splunk
     within the specified waiting time will be handled as "undetermined" and 
are transferred to the "undetermined" relationship.
-    Flow files outside of this time range will be transferred to the 
"unacknowledged" relationship next time the processor is
-    triggered. In order to determine if the indexing of a given event is 
within the waiting time, the Unix Epoch of the original
-    Splunk response is stored in the attribute "splunk.responded.at". Setting 
"Maximum waiting time" too low might
-    result some false negative result as in case under higher load, Splunk 
server might index slower than it is expected.
+</p>
+
+<p>
+    Acknowledgment status of flow files being in the "undetermined" 
relationship are checked with Splunk and regardless of the result
+    an "ack.checked.at.splunk" attribute with true value will be put on them 
ensuring at least one acknowledgment check is happening
+    even outside of "Maximum waiting time".
+    Flow files having both their "ack.checked.at.splunk" attribute equals to 
true and being outside of the time range will be

Review comment:
       I would approach this from the other side: flow files are timed out but 
not having the "ack.checked.at.splunk" are prevented from being transferred to 
the "unacknowledged" relationship but instead will be handled as "undetermined"

##########
File path: 
nifi-nar-bundles/nifi-splunk-bundle/nifi-splunk-processors/src/main/java/org/apache/nifi/processors/splunk/QuerySplunkIndexingStatus.java
##########
@@ -54,6 +52,9 @@
 @ReadsAttributes({
         @ReadsAttribute(attribute = "splunk.acknowledgement.id", description = 
"The indexing acknowledgement id provided by Splunk."),
         @ReadsAttribute(attribute = "splunk.responded.at", description = "The 
time of the response of put request for Splunk.")})
+@WritesAttributes({
+        @WritesAttribute(attribute = "ack.checked.at.splunk", description = 
"Identifying whether Splunk acknowledgement check has happened.")

Review comment:
       Please extend the description with information about when the attribute 
is expected to be set (it is not set when not already checked, etc.)

##########
File path: 
nifi-nar-bundles/nifi-splunk-bundle/nifi-splunk-processors/src/main/resources/docs/org.apache.nifi.processors.splunk.QuerySplunkIndexingStatus/additionalDetails.html
##########
@@ -72,5 +83,195 @@ <h3>Performance</h3>
     batch but the processor might execute the query with less number of 
undetermined events.
 </p>
 
+<h2>Example</h2>
+
+The following table represents four flow files being batch processed by the 
processor. For simplicity reasons "splunk.responded.at" time
+will be provided in HH:MM:SS format instead of Epoch time. The "Maximum 
waiting time" (referred as TTL from now on) is set for 3 minutes
+and the processor is scheduled to run every minute. The files are penalized 
for 30 seconds. "Undetermined" relationship is loop backed to the processor.
+
+<table>
+    <tbody>
+    <tr>
+        <th>Flow file id</th>
+        <th>Splunk acknowledgement id</th>
+        <th>Splunk responded at</th>
+        <th>Ack checked at Splunk</th>
+        <th>Relationship</th>
+    </tr>
+    <tr>
+        <td>flow file 1</td>
+        <td>null</td>
+        <td>10:46:02</td>
+        <td>No value set</td>
+    </tr>
+    <tr>
+        <td>flow file 2</td>
+        <td>0</td>
+        <td>10:46:18</td>
+        <td>No value set</td>
+    </tr>
+    <tr>
+        <td>flow file 3</td>
+        <td>1</td>
+        <td>10:47:04</td>
+        <td>No value set</td>
+    </tr>
+    <tr>
+        <td>flow file 4</td>
+        <td>2</td>
+        <td>10:47:23</td>
+        <td>No value set</td>
+    </tr>
+    </tbody>
+</table>
+
+<h4>First iteration</h4>
+
+The current time is 10:48:05 so all of our files are within TTL. Flow file 1 
does not have a correct "splunk.acknowledgment.id"
+so will be directed to the "Failure" relationship. The remaining files will be 
sent for Splunk ack check.
+
+With the following Splunk response "acks:{[0:true, 1:false, 2:false]}" flow 
file 2 will be directed to "Success" relationship
+and the rest of the files to the "Undetermined" relationship and penalized for 
30 seconds.
+
+<table>
+    <tbody>
+    <tr>
+        <th>Flow file id</th>
+        <th>Splunk acknowledgement id</th>
+        <th>Splunk responded at</th>
+        <th>Ack checked at Splunk</th>
+        <th>Relationship</th>
+    </tr>
+    <tr>
+        <td>flow file 1</td>
+        <td>null</td>
+        <td>10:46:02</td>
+        <td>No value set</td>
+        <td>Failure</td>
+    </tr>
+    <tr>
+        <td>flow file 2</td>
+        <td>0</td>
+        <td>10:46:18</td>
+        <td>true</td>
+        <td>Success</td>
+
+    </tr>
+    <tr>
+        <td>flow file 3</td>
+        <td>1</td>
+        <td>10:47:04</td>
+        <td>true</td>
+        <td>Undetermined</td>
+    </tr>
+    <tr>
+        <td>flow file 4</td>
+        <td>2</td>
+        <td>10:47:23</td>
+        <td>true</td>
+        <td>Undetermined</td>
+    </tr>
+    </tbody>
+</table>
+
+<h4>Second iteration</h4>

Review comment:
       Just for keeping the focus, you can remove the FFs from the later tables 
which are already determined

##########
File path: 
nifi-nar-bundles/nifi-splunk-bundle/nifi-splunk-processors/src/main/resources/docs/org.apache.nifi.processors.splunk.QuerySplunkIndexingStatus/additionalDetails.html
##########
@@ -72,5 +83,195 @@ <h3>Performance</h3>
     batch but the processor might execute the query with less number of 
undetermined events.
 </p>
 
+<h2>Example</h2>
+
+The following table represents four flow files being batch processed by the 
processor. For simplicity reasons "splunk.responded.at" time
+will be provided in HH:MM:SS format instead of Epoch time. The "Maximum 
waiting time" (referred as TTL from now on) is set for 3 minutes
+and the processor is scheduled to run every minute. The files are penalized 
for 30 seconds. "Undetermined" relationship is loop backed to the processor.
+
+<table>
+    <tbody>
+    <tr>
+        <th>Flow file id</th>
+        <th>Splunk acknowledgement id</th>
+        <th>Splunk responded at</th>
+        <th>Ack checked at Splunk</th>
+        <th>Relationship</th>
+    </tr>
+    <tr>
+        <td>flow file 1</td>
+        <td>null</td>
+        <td>10:46:02</td>
+        <td>No value set</td>
+    </tr>
+    <tr>
+        <td>flow file 2</td>
+        <td>0</td>
+        <td>10:46:18</td>
+        <td>No value set</td>
+    </tr>
+    <tr>
+        <td>flow file 3</td>
+        <td>1</td>
+        <td>10:47:04</td>
+        <td>No value set</td>
+    </tr>
+    <tr>
+        <td>flow file 4</td>
+        <td>2</td>
+        <td>10:47:23</td>
+        <td>No value set</td>
+    </tr>
+    </tbody>
+</table>
+
+<h4>First iteration</h4>
+
+The current time is 10:48:05 so all of our files are within TTL. Flow file 1 
does not have a correct "splunk.acknowledgment.id"
+so will be directed to the "Failure" relationship. The remaining files will be 
sent for Splunk ack check.
+
+With the following Splunk response "acks:{[0:true, 1:false, 2:false]}" flow 
file 2 will be directed to "Success" relationship
+and the rest of the files to the "Undetermined" relationship and penalized for 
30 seconds.
+
+<table>
+    <tbody>
+    <tr>
+        <th>Flow file id</th>
+        <th>Splunk acknowledgement id</th>
+        <th>Splunk responded at</th>
+        <th>Ack checked at Splunk</th>
+        <th>Relationship</th>
+    </tr>
+    <tr>
+        <td>flow file 1</td>
+        <td>null</td>
+        <td>10:46:02</td>
+        <td>No value set</td>
+        <td>Failure</td>
+    </tr>
+    <tr>
+        <td>flow file 2</td>
+        <td>0</td>
+        <td>10:46:18</td>
+        <td>true</td>
+        <td>Success</td>
+
+    </tr>
+    <tr>
+        <td>flow file 3</td>
+        <td>1</td>
+        <td>10:47:04</td>
+        <td>true</td>
+        <td>Undetermined</td>
+    </tr>
+    <tr>
+        <td>flow file 4</td>
+        <td>2</td>
+        <td>10:47:23</td>
+        <td>true</td>
+        <td>Undetermined</td>
+    </tr>
+    </tbody>
+</table>
+
+<h4>Second iteration</h4>
+
+The current time is 10:49:18 so both flow file 3 and flow file 4 are within 
TTL and they will be sent for Splunk ack check.
+
+With the following Splunk response "acks:{[1:false, 2:true]}" flow file 4 will 
be directed to "Success" relationship and
+flow file 3 to the "Undetermined" relationship and penalized for 30 seconds.
+In the meantime flow file 1 missing "Splunk acknowledgment id" has been 
corrected.

Review comment:
       I think, this is something we can skip: we do not provide any mechanism 
to correct this attribute so this situation is too hypothetical.

##########
File path: 
nifi-nar-bundles/nifi-splunk-bundle/nifi-splunk-processors/src/main/resources/docs/org.apache.nifi.processors.splunk.QuerySplunkIndexingStatus/additionalDetails.html
##########
@@ -72,5 +83,195 @@ <h3>Performance</h3>
     batch but the processor might execute the query with less number of 
undetermined events.
 </p>
 
+<h2>Example</h2>
+
+The following table represents four flow files being batch processed by the 
processor. For simplicity reasons "splunk.responded.at" time
+will be provided in HH:MM:SS format instead of Epoch time. The "Maximum 
waiting time" (referred as TTL from now on) is set for 3 minutes
+and the processor is scheduled to run every minute. The files are penalized 
for 30 seconds. "Undetermined" relationship is loop backed to the processor.
+
+<table>
+    <tbody>
+    <tr>
+        <th>Flow file id</th>
+        <th>Splunk acknowledgement id</th>
+        <th>Splunk responded at</th>
+        <th>Ack checked at Splunk</th>
+        <th>Relationship</th>
+    </tr>
+    <tr>
+        <td>flow file 1</td>
+        <td>null</td>
+        <td>10:46:02</td>
+        <td>No value set</td>
+    </tr>
+    <tr>
+        <td>flow file 2</td>
+        <td>0</td>
+        <td>10:46:18</td>
+        <td>No value set</td>
+    </tr>
+    <tr>
+        <td>flow file 3</td>
+        <td>1</td>
+        <td>10:47:04</td>
+        <td>No value set</td>
+    </tr>
+    <tr>
+        <td>flow file 4</td>
+        <td>2</td>
+        <td>10:47:23</td>
+        <td>No value set</td>
+    </tr>
+    </tbody>
+</table>
+
+<h4>First iteration</h4>
+
+The current time is 10:48:05 so all of our files are within TTL. Flow file 1 
does not have a correct "splunk.acknowledgment.id"
+so will be directed to the "Failure" relationship. The remaining files will be 
sent for Splunk ack check.
+
+With the following Splunk response "acks:{[0:true, 1:false, 2:false]}" flow 
file 2 will be directed to "Success" relationship
+and the rest of the files to the "Undetermined" relationship and penalized for 
30 seconds.
+
+<table>
+    <tbody>
+    <tr>
+        <th>Flow file id</th>
+        <th>Splunk acknowledgement id</th>
+        <th>Splunk responded at</th>
+        <th>Ack checked at Splunk</th>
+        <th>Relationship</th>
+    </tr>
+    <tr>
+        <td>flow file 1</td>
+        <td>null</td>
+        <td>10:46:02</td>
+        <td>No value set</td>
+        <td>Failure</td>
+    </tr>
+    <tr>
+        <td>flow file 2</td>
+        <td>0</td>
+        <td>10:46:18</td>
+        <td>true</td>
+        <td>Success</td>
+
+    </tr>
+    <tr>
+        <td>flow file 3</td>
+        <td>1</td>
+        <td>10:47:04</td>
+        <td>true</td>
+        <td>Undetermined</td>
+    </tr>
+    <tr>
+        <td>flow file 4</td>
+        <td>2</td>
+        <td>10:47:23</td>
+        <td>true</td>
+        <td>Undetermined</td>
+    </tr>
+    </tbody>
+</table>
+
+<h4>Second iteration</h4>
+
+The current time is 10:49:18 so both flow file 3 and flow file 4 are within 
TTL and they will be sent for Splunk ack check.

Review comment:
       As your example is pretty exhaustive, it might worth mentioning that 
during the timeframe of penalise, the processor might be triggered. In the case 
no further flow files arrived, it will abruptly finishes it's activation, in 
case there are new FFs are arrived, those will be processed (in this case: 
before the penalised ones)

##########
File path: 
nifi-nar-bundles/nifi-splunk-bundle/nifi-splunk-processors/src/main/resources/docs/org.apache.nifi.processors.splunk.QuerySplunkIndexingStatus/additionalDetails.html
##########
@@ -46,10 +46,21 @@ <h3>Unacknowledged and undetermined cases</h3>
     includes unsuccessful or ongoing indexing and unknown acknowledgement 
identifiers. In order to avoid infinite tries,
     QuerySplunkIndexingStatus gives user the possibility to set a "Maximum 
waiting time". Results with value of false from Splunk
     within the specified waiting time will be handled as "undetermined" and 
are transferred to the "undetermined" relationship.
-    Flow files outside of this time range will be transferred to the 
"unacknowledged" relationship next time the processor is
-    triggered. In order to determine if the indexing of a given event is 
within the waiting time, the Unix Epoch of the original
-    Splunk response is stored in the attribute "splunk.responded.at". Setting 
"Maximum waiting time" too low might
-    result some false negative result as in case under higher load, Splunk 
server might index slower than it is expected.
+</p>
+
+<p>
+    Acknowledgment status of flow files being in the "undetermined" 
relationship are checked with Splunk and regardless of the result
+    an "ack.checked.at.splunk" attribute with true value will be put on them 
ensuring at least one acknowledgment check is happening
+    even outside of "Maximum waiting time".
+    Flow files having both their "ack.checked.at.splunk" attribute equals to 
true and being outside of the time range will be
+    transferred to the "unacknowledged" relationship next time the processor 
is triggered.
+</p>
+
+<p>
+    In order to determine if the indexing of
+    a given event is within the waiting time, the Unix Epoch of the original 
Splunk response is stored in the attribute "splunk.responded.at".

Review comment:
       I would go with: the arrival time of the original Splunk response is 
stored in the attribute "splunk.responded.at" in Unix Epoch format.

##########
File path: 
nifi-nar-bundles/nifi-splunk-bundle/nifi-splunk-processors/src/main/resources/docs/org.apache.nifi.processors.splunk.QuerySplunkIndexingStatus/additionalDetails.html
##########
@@ -72,5 +83,195 @@ <h3>Performance</h3>
     batch but the processor might execute the query with less number of 
undetermined events.
 </p>
 
+<h2>Example</h2>

Review comment:
       Great idea to add an example! 👍 

##########
File path: 
nifi-nar-bundles/nifi-splunk-bundle/nifi-splunk-processors/src/main/resources/docs/org.apache.nifi.processors.splunk.QuerySplunkIndexingStatus/additionalDetails.html
##########
@@ -72,5 +83,195 @@ <h3>Performance</h3>
     batch but the processor might execute the query with less number of 
undetermined events.
 </p>
 
+<h2>Example</h2>
+
+The following table represents four flow files being batch processed by the 
processor. For simplicity reasons "splunk.responded.at" time
+will be provided in HH:MM:SS format instead of Epoch time. The "Maximum 
waiting time" (referred as TTL from now on) is set for 3 minutes

Review comment:
       I think "will be presented" would communicate the intent better

##########
File path: 
nifi-nar-bundles/nifi-splunk-bundle/nifi-splunk-processors/src/main/resources/docs/org.apache.nifi.processors.splunk.QuerySplunkIndexingStatus/additionalDetails.html
##########
@@ -72,5 +83,195 @@ <h3>Performance</h3>
     batch but the processor might execute the query with less number of 
undetermined events.
 </p>
 
+<h2>Example</h2>
+
+The following table represents four flow files being batch processed by the 
processor. For simplicity reasons "splunk.responded.at" time
+will be provided in HH:MM:SS format instead of Epoch time. The "Maximum 
waiting time" (referred as TTL from now on) is set for 3 minutes
+and the processor is scheduled to run every minute. The files are penalized 
for 30 seconds. "Undetermined" relationship is loop backed to the processor.
+
+<table>
+    <tbody>
+    <tr>
+        <th>Flow file id</th>
+        <th>Splunk acknowledgement id</th>
+        <th>Splunk responded at</th>
+        <th>Ack checked at Splunk</th>
+        <th>Relationship</th>
+    </tr>
+    <tr>
+        <td>flow file 1</td>
+        <td>null</td>
+        <td>10:46:02</td>
+        <td>No value set</td>
+    </tr>
+    <tr>
+        <td>flow file 2</td>
+        <td>0</td>
+        <td>10:46:18</td>
+        <td>No value set</td>
+    </tr>
+    <tr>
+        <td>flow file 3</td>
+        <td>1</td>
+        <td>10:47:04</td>
+        <td>No value set</td>
+    </tr>
+    <tr>
+        <td>flow file 4</td>
+        <td>2</td>
+        <td>10:47:23</td>
+        <td>No value set</td>
+    </tr>
+    </tbody>
+</table>
+
+<h4>First iteration</h4>
+
+The current time is 10:48:05 so all of our files are within TTL. Flow file 1 
does not have a correct "splunk.acknowledgment.id"

Review comment:
       I think it is a bit challenging  to read flow file 1, within a 
paragraph. I would suggest either highlight it with some way (italian, bold) or 
instead refer into the flow files as #1, #2 and so.
   
   A minor clarification: there is no validation on the value of 
"splunk.acknowledgment.id", thus, setting it to null is considered as "no 
attribute is set". Just to be sure, I made a quick test, and when the ackId is 
extracted, having a null will result an Optional#empty just like when it is not 
set at all. As a result I would suggest to rephrase it to not set.

##########
File path: 
nifi-nar-bundles/nifi-splunk-bundle/nifi-splunk-processors/src/main/resources/docs/org.apache.nifi.processors.splunk.QuerySplunkIndexingStatus/additionalDetails.html
##########
@@ -72,5 +83,195 @@ <h3>Performance</h3>
     batch but the processor might execute the query with less number of 
undetermined events.
 </p>
 
+<h2>Example</h2>
+
+The following table represents four flow files being batch processed by the 
processor. For simplicity reasons "splunk.responded.at" time
+will be provided in HH:MM:SS format instead of Epoch time. The "Maximum 
waiting time" (referred as TTL from now on) is set for 3 minutes
+and the processor is scheduled to run every minute. The files are penalized 
for 30 seconds. "Undetermined" relationship is loop backed to the processor.

Review comment:
       Minor: looped back sounds more natural




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [nifi] simonbence commented on a change in pull request #4824: NIFI-8176 adding ackChecked flag to QuerySplunkIndexingStatus.onTrigg…

Reply via email to