[
https://issues.apache.org/jira/browse/NIP-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17936320#comment-17936320
]
Joe Witt commented on NIP-6:
----------------------------
Gotta love the humor in the alternatives considered. Save that forever.
I am a strong +1 here and it is fascinating we've not solved this sooner.
What I'd suggest we consider though is not including 'partially engaged' yet
unless that is completely free to determine that at least one but not all
upstream connections are backpressured.
In Flow Based Programming designs this is why there were 'named input ports'
because then you could decide if backpressure for a given input was applied
THAT might mean something. We don't currently have that concept so I'm not
sure what we'd use this for (partial).
Dont feel strongly - just a thought.
> Allow Processors to understand state of incoming connections' backpressure
> --------------------------------------------------------------------------
>
> Key: NIP-6
> URL: https://issues.apache.org/jira/browse/NIP-6
> Project: NiFi Improvement Proposal
> Issue Type: New Feature
> Reporter: Mark Payne
> Priority: Major
>
> *Motivation and description*
> There are cases when we need to know whether or not backpressure is applied
> on incoming connections. For example, in MergeContent, when a Correlation
> Attribute is used, it is very easy to have a situation in which the Min
> Number of Entries or Min Bin Size is not met, but in which backpressure is
> applied upstream. In this case, we have to just wait for the Max Bin Age to
> elapse in order for MergeContent to progress. This introduces significant
> artificial delays into our dataflow. And while Max Bin Age is always
> recommended, it's not required, so the flow could potentially block
> indefinitely it not properly configured.
> Instead, it would make sense for MergeContent to determine "all upstream
> connections have backpressure applied" and as a result immediately merge the
> "most full bin" in order to make progress. This information is not currently
> available, though. We need to surface this information via the
> {{{}ProcessContext{}}}.
>
> *Scope*
> Implementing this will require that the following method be added to
> {{{}ProcessContext{}}}:
> {code:java}
> BackpressureEngagement getBackpressureEngagement(); {code}
> where {{BackpressureEngagement}} is defined as:
> {code:java}
> public enum BackpressureEngagement {
> // No upstream connections have backpressure engaged
> BACKPRESSURE_NOT_ENGAGED,
> // At least one upstream connection has backpressure engaged but not all
> BACKPRESSURE_PARTIALLY_ENGAGED,
> // All upstream connections have backpressure engaged
> BACKPRESSURE_ENGAGED;
> } {code}
> This, naturally, means that the the method must also be implemented in the
> {{StandardProcessContext}} as well as {{ConnectableProcessContext}} and
> {{{}MockProcessContext{}}}.
> In order to allow proper testing of components, we must also update the
> {{TestRunner}} interface and {{StandardProcessorTestRunner}} implementation
> to accommodate this with a setter method:
> {code:java}
> public void setBackpressureEngagement(BackpressureEngagement engagement);
> {code}
> The default value in {{StandardProcessorTestRunner}} should be
> {{{}BackpressureEngagement.BACKPRESSURE_NOT_ENGAGED{}}}.
>
> *Alternatives Considered*
> As outlined in NIFI-14366, in addition to the prescribed values for
> {{BackpressureEngagement}} the following alternatives were considered:
> {code:java}
> - SINGLE
> - IN_A_RELATIONSHIP
> - ITS_COMPLICATED
> - ENGAGED {code}
> Ultimately, I think we should avoid this nomenclature for the following
> reasons:
> * May lead to confusion. I fear we would also need to introduce a value of
> "GHOSTED" which would be confusing because it means something very different
> from when a Processor or Controller Service is Ghosted.
> * Backpressure is tied to a Connection, and Connections are already often
> confused with Processor Relationships. This may well further blur that line
> and lead to more confusion.
> * I'm concerned about the complexity of implementing `ITS_COMPLICATED`. Once
> we start factoring in the emotional state of each connection, well... it gets
> complicated.
> * I'd be concerned about "API sprawl" - for instance, will someone then want
> to introduce a `breakup()` method to set the status to SINGLE? The error
> handling there could be a nightmare.
> It was also suggested that we name the method {{amITheProblem}} with
> potential return values of {{YES_YOU_ARE_SLOW}} or {{{}NO_ITS_ME{}}}.
> My main concern here is that the value can potentially change rather quickly.
> Imagine the logs, indicating why backpressure is engaged:
> {code:java}
> Backpressure engaged? YES_YOU_ARE_SLOW
> Backpressure engaged? NO_ITS_ME
> Backpressure engaged? YES_YOU_ARE_SLOW
> Backpressure engaged? NO_ITS_ME
> Backpressure engaged? YES_YOU_ARE_SLOW
> Backpressure engaged? NO_ITS_ME
> Backpressure engaged? YES_YOU_ARE_SLOW
> Backpressure engaged? NO_ITS_ME{code}
> We don't need people thinking of our api is trapped in a loop of self-doubt
> and finger-pointing.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)