[
https://issues.apache.org/jira/browse/NIFI-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pierre Villard resolved NIFI-3564.
----------------------------------
Resolution: Feedback Received
Apache NiFi 1.x is no longer maintained and no new release is planned on the
1.x release line. Marking as resolved as part of a cleanup operation. Please
open a new one with an updated description if this is still relevant for NiFi
2.x.
> Deadlock on startup
> -------------------
>
> Key: NIFI-3564
> URL: https://issues.apache.org/jira/browse/NIFI-3564
> Project: Apache NiFi
> Issue Type: Bug
> Affects Versions: 0.7.1, 1.1.1
> Reporter: Brandon Rhys DeVries
> Priority: Major
>
> We have uncovered an issue in the way that ControllerServices and Processors
> are started that can result in a deadlock. Basically, a ControllerService
> that is reported by the framework as ENABLING might not actually be. This is
> because of how they are scheduled to be started in
> StandardControllerServiceNode.enable()\[1]. This changes the state from
> DISABLED to ENABLING, and *then* actually schedules the OnEnabled method to
> be called. However, it is scheduled with a ScheduledExecutorService that is
> limited to 8 threads\[2], and is *also used to start Processors*\[3].
> The situation that exposed the bug was a Processor that attempted to wait for
> a ControllerService to become ENABLED in its customValidate() method. The
> ControllerService must be at least in the ENABLING state to pass framework
> validation, and since the ControllerService was neccessary to do the custom
> validation, waiting for it to become ENABLED seems reasonable. However,
> there were several (more than 8) instances of this custom Processor on the
> graph, and the ControllerService being waited on was one of dozens. This led
> to the situation where all 8 of the executor threads were held by our
> Processor's customValidate() method waiting for a service that will never
> transition from ENABLING to ENABLED because to do so it needs one of those
> same 8 threads. This deadlocks the instance, preventing startup.
> My first thought as to a fix was to not set the ENABLING state until the
> OnEnabled method was actually being called (as opposed to scheduled to be
> called). However, this could result in a Processor attempting to start with
> a dependent ControllerService in a DISABLED state (even though the
> ControllerService will eventually be ENABLED), which would cause the
> processor to not start\[4](as opposed to being retried as is the case when
> OnScheduled throws an Exception). My feeling is that ultimately we're going
> to need to wait for all ControllerServices to be ENABLED before moving on to
> Processors, possibly using schedule(Callable) instead of execute(Runnable).
> \[1]
> https://github.com/apache/nifi/blob/rel/nifi-0.7.1/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/service/StandardControllerServiceNode.java#L299-L304
> \[2]
> https://github.com/apache/nifi/blob/rel/nifi-0.7.1/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/scheduling/StandardProcessScheduler.java#L83
> \[3]
> https://github.com/apache/nifi/blob/rel/nifi-0.7.1/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/StandardProcessorNode.java#L1219-L1228
> \[4]
> https://github.com/apache/nifi/blob/rel/nifi-0.7.1/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/StandardProcessorNode.java#L1221-L1223
--
This message was sent by Atlassian Jira
(v8.20.10#820010)