[
https://issues.apache.org/jira/browse/NIFI-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422674#comment-17422674
]
Matthieu RÉ commented on NIFI-8760:
-----------------------------------
Today I have two simple fixes equivalent in terms of performance (tested on
GenerateFF and MergeRecord, SplitJson, QueryRecord) :
* First is to follow [the idea of the first
implementation|https://github.com/apache/nifi/blob/528fce2407d092d4ced1a58fcc14d0bc6e660b89/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/repository/VolatileContentRepository.java#L473],
that was for a ResourceClaim to call the corresponding ContentClaim at the
offset 0. It doesn't work when the searched ContentClaim has a length, because
the ContentClaim implements an "equalsTo" that takes the length into account
and its constructor called by read(ResourceClaim) initializes it to -1. So a
fix could be to search for the ContentClaim in the map matching the
ResourceClaim and the offset 0.
As I said, even if this implementation seems poor since it does not benefit
from the structure of the Map of Comparable keys to search for a ContentClaim,
the performance of this solution seems equivalent to the second one.
* Second is to simply consider the VolatileContentRepository as non-compatible
with the read(ResourceClaim) and to only allow read(ContentClaim) as it is the
case for the EncryptedFileSystemRepository.
Since the structure of the data storage(s) in this implementation is
Map<ContentClaim, ContentBlock>, I lake of experience to answer the question :
* Does it make sense to try to use the ResourceClaim to call ContentBlock(s)
in case of a VolatileContentRepository ?
* If yes, could there be a benefit to call ContentBlock from all the offset
matching the ResourceClaim, instead of only the offset 0 as it intended to be ?
* Else, the second fix is probably the good one
Please don't hesitate to correct me if I'm wrong or misunderstood something.
For now, I will link the second fix as a Git Patch here :
[^0001-fix-2-set-VolatileContentRepository-as-non-supportiv.patch], to help
anyone in the need of a fix.
> VolatileContentRepository fails to retrieve content from claims with several
> processors
> ---------------------------------------------------------------------------------------
>
> Key: NIFI-8760
> URL: https://issues.apache.org/jira/browse/NIFI-8760
> Project: Apache NiFi
> Issue Type: Bug
> Components: Core Framework
> Affects Versions: 1.13.1, 1.13.2
> Reporter: Matthieu RÉ
> Priority: Major
> Labels: content-repository, volatile
> Attachments:
> 0001-fix-2-set-VolatileContentRepository-as-non-supportiv.patch, flow.xml.gz,
> nifi.properties
>
>
> For several processors such as MergeRecord, QueryRecord, SplitJson, the use
> of VolatileContentRepository implementation infers errors while retrieving
> Flowfiles from claims. The following logs are generated using NiFi 1.13.1
> from Docker and the flow.xml.gz and nifi.properties file attached.
> MergeRecord (with JsonTreeReader, JsonRecordSetWriter with default
> configuration):
> {{2021-07-06 10:15:09,170 ERROR [Timer-Driven Process Thread-1]
> o.a.nifi.processors.standard.MergeRecord
> MergeRecord[id=7b425cff-017a-1000-6a20-58c4e064df3d] Failed to bin
> StandardFlowFileRecord[uuid=3e894a96-883a-4ac2-8121-b8200964cf20,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=6, container=in-memory,
> section=section], offset=0,
> length=5655],offset=0,name=b2c7cf61-b421-477d-902e-daeb2ed58f0d,size=5655]
> due to org.apache.nifi.controller.repository.ContentNotFoundException: Could
> not find content for StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=6, container=in-memory,
> section=section], offset=0, length=-1]:
> org.apache.nifi.controller.repository.ContentNotFoundException: Could not
> find content for StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=6, container=in-memory,
> section=section], offset=0, length=-1]}}
> {{org.apache.nifi.controller.repository.ContentNotFoundException: Could not
> find content for StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=6, container=in-memory,
> section=section], offset=0, length=-1]}}
> {{at
> org.apache.nifi.controller.repository.VolatileContentRepository.getContent(VolatileContentRepository.java:445)}}
> {{at
> org.apache.nifi.controller.repository.VolatileContentRepository.read(VolatileContentRepository.java:468)}}
> {{at
> org.apache.nifi.controller.repository.VolatileContentRepository.read(VolatileContentRepository.java:473)}}
> {{at
> org.apache.nifi.controller.repository.StandardProcessSession.getInputStream(StandardProcessSession.java:2302)}}
> {{at
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2409)}}
> {{at
> org.apache.nifi.processors.standard.MergeRecord.binFlowFile(MergeRecord.java:383)}}
> {{at
> org.apache.nifi.processors.standard.MergeRecord.onTrigger(MergeRecord.java:346)}}
> {{at
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1173)}}
> {{at
> org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:214)}}
> {{at
> org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)}}
> {{at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)}}
> {{at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)}}
> {{at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)}}
> {{at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)}}
> {{at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)}}
> {{at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)}}
> {{at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)}}
> {{at java.lang.Thread.run(Thread.java:748)}}
> QueryRecord:
> {{2021-07-06 10:15:09,174 ERROR [Timer-Driven Process Thread-4]
> o.a.nifi.processors.standard.QueryRecord
> QueryRecord[id=673fe9f6-017a-1000-8041-dfde9d02d976] Failed to determine
> Record Schema from
> StandardFlowFileRecord[uuid=090e3058-67e6-4436-bea9-d511132848e3,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=2, container=in-memory,
> section=section], offset=0,
> length=5655],offset=0,name=090e3058-67e6-4436-bea9-d511132848e3,size=5655];
> routing to failure:
> org.apache.nifi.controller.repository.ContentNotFoundException: Could not
> find content for StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=2, container=in-memory,
> section=section], offset=0, length=-1]}}
> {{org.apache.nifi.controller.repository.ContentNotFoundException: Could not
> find content for StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=2, container=in-memory,
> section=section], offset=0, length=-1]}}
> {{at
> org.apache.nifi.controller.repository.VolatileContentRepository.getContent(VolatileContentRepository.java:445)}}
> {{at
> org.apache.nifi.controller.repository.VolatileContentRepository.read(VolatileContentRepository.java:468)}}
> {{at
> org.apache.nifi.controller.repository.VolatileContentRepository.read(VolatileContentRepository.java:473)}}
> {{at
> org.apache.nifi.controller.repository.StandardProcessSession.getInputStream(StandardProcessSession.java:2302)}}
> {{at
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2409)}}
> {{at
> org.apache.nifi.processors.standard.QueryRecord.onTrigger(QueryRecord.java:294)}}
> {{at
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)}}
> {{at
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1173)}}
> {{at
> org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:214)}}
> {{at
> org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)}}
> {{at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)}}
> {{at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)}}
> {{at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)}}
> {{at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)}}
> {{at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)}}
> {{at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)}}
> {{at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)}}
> {{at java.lang.Thread.run(Thread.java:748)}}
> SplitJson:
> {{2021-07-06 10:15:10,178 ERROR [Timer-Driven Process Thread-5]
> o.a.nifi.processors.standard.SplitJson
> SplitJson[id=7b411bdc-017a-1000-0f48-53d6a2ad5ee9]
> SplitJson[id=7b411bdc-017a-1000-0f48-53d6a2ad5ee9] failed to process session
> due to java.lang.NullPointerException; Processor Administratively Yielded for
> 1 sec: java.lang.NullPointerException}}
> {{java.lang.NullPointerException: null}}
> {{at
> org.apache.nifi.processors.standard.SplitJson.onTrigger(SplitJson.java:199)}}
> {{at
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)}}
> {{at
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1173)}}
> {{at
> org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:214)}}
> {{at
> org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)}}
> {{at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)}}
> {{at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)}}
> {{at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)}}
> {{at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)}}
> {{at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)}}
> {{at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)}}
> {{at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)}}
> {{at java.lang.Thread.run(Thread.java:748)}}
> This issue is not reproducible at 1.13.0 by my side, so it could correlate
> with the commit
> [528fce2407d092d4ced1a58fcc14d0bc6e660b89|https://github.com/apache/nifi/commit/528fce2407d092d4ced1a58fcc14d0bc6e660b89].
> With some support I would be glad to help investigate and solve the issue.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)