Mark Payne created NIFI-4772:
--------------------------------
Summary: If several processors do not return from their
@OnScheduled method, NiFi will stop scheduling any Processors
Key: NIFI-4772
URL: https://issues.apache.org/jira/browse/NIFI-4772
Project: Apache NiFi
Issue Type: Bug
Components: Core Framework
Reporter: Mark Payne
Assignee: Mark Payne
If a Processor does not properly return from its @OnScheduled method and
several instances of the processor are started, we can get into a state where
no Processors can start. We start seeing log messages like the following:
{code}
2018-01-10 10:16:31,433 WARN [StandardProcessScheduler Thread-1]
o.a.n.controller.StandardProcessorNode Timed out while waiting for OnScheduled
of 'UpdateAttribute' processor to finish. An attempt is made to cancel the task
via Thread.interrupt(). However it does not guarantee that the task will be
canceled since the code inside current OnScheduled operation may have been
written to ignore interrupts which may result in a runaway thread. This could
lead to more issues, eventually requiring NiFi to be restarted. This is usually
a bug in the target Processor
'UpdateAttribute[id=95423ee6-e6a6-1220-83ad-af20577063bd]' that needs to be
documented, reported and eventually fixed.
2018-01-10 10:16:42,937 WARN [StandardProcessScheduler Thread-2]
o.a.n.controller.StandardProcessorNode Timed out while waiting for OnScheduled
of 'PutHDFS' processor to finish. An attempt is made to cancel the task via
Thread.interrupt(). However it does not guarantee that the task will be
canceled since the code inside current OnScheduled operation may have been
written to ignore interrupts which may result in a runaway thread. This could
lead to more issues, eventually requiring NiFi to be restarted. This is usually
a bug in the target Processor
'PutHDFS[id=25e531ec-d873-1dec-acc9-ea745e7869ed]' that needs to be documented,
reported and eventually fixed.
2018-01-10 10:16:46,993 WARN [StandardProcessScheduler Thread-4]
o.a.n.controller.StandardProcessorNode Timed out while waiting for OnScheduled
of 'LogAttribute' processor to finish. An attempt is made to cancel the task
via Thread.interrupt(). However it does not guarantee that the task will be
canceled since the code inside current OnScheduled operation may have been
written to ignore interrupts which may result in a runaway thread. This could
lead to more issues, eventually requiring NiFi to be restarted. This is usually
a bug in the target Processor
'LogAttribute[id=9a683a06-aa24-19b5-ffff-ffff944a0216]' that needs to be
documented, reported and eventually fixed.
{code}
While we should avoid having misbehaving Processors to begin with, the
framework must also be tolerant of this and should not allow one misbehaving
Processor from affecting other Processors.
We can "approximate" this issue by following these steps:
1. Create 1 DebugFlow Processor. Auto-terminate its success & failure
relationships. Set the "@OnScheduled Pause Time" property to "2 mins"
2. Copy & paste this DebugFlow Processor so that there are at least 8 of them.
3. Create a GenerateFlowFile Processor and an UpdateAttribute Processor. Send
success of GenerateFlowFile to UpdateAttribute.
4. Start all of the DebugFlow Processors.
5. Start the GenerateFlowFIle and UpdateAttribute Processors.
In this scenario, we will not see the above log messages, because after 1
minute the DebugFlow Processor is interrupted and the @OnSchedule method
completes Exceptionally. However, we do see that GenerateFlowFile and
UpdateAttribute do not start running until after the 2 minute time window has
elapsed. If DebugFlow instead did not complete Exceptionally, then
GenerateFlowFile and UpdateAttribute would never start running and we would see
the above error messages in the log.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)