[
https://issues.apache.org/jira/browse/NIFI-14095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17906844#comment-17906844
]
Filip Maretić commented on NIFI-14095:
--------------------------------------
[~joewitt] we are in an unfortunate situation where the content repo is mingled
with other repos as well and things like this could bring a node down. One of
my colleagues innocently tried to use the GetFile on production to ingest one
20 GB file (using KeepSourceFile option not to lose the original file) and the
content repo just filled in a second. Just out of curiosity, can you please
explain why the DefaultSchedulingStrategy would not apply for this?
> GetFile - "KeepSourceFile" set to true can fill up content repository
> ---------------------------------------------------------------------
>
> Key: NIFI-14095
> URL: https://issues.apache.org/jira/browse/NIFI-14095
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Configuration
> Affects Versions: 2.0.0, 1.28.1
> Reporter: Filip Maretić
> Priority: Major
> Labels: GetFile, ListFile
> Fix For: 2.1.0
>
>
> Just setting the *KeepSourceFile* property to *true* can cause continuous
> ingestion of files into NiFi. If the file is big (e.g. 20 GB) this can cause
> the content repository (e.g. size of 400 GB) to be filled in an instant. This
> renders the NiFi node unusable and a cleanup is needed. There is no reason
> for this to happen, the flow should at least have enough time to process a
> chunk of such a huge file before attempting to load the same file again.
> A quick solution would be just to add
> {code:java}
> @DefaultSchedule(strategy = SchedulingStrategy.TIMER_DRIVEN, period = "1 min")
> {code}
> This is anyway present on the ListFile processor, so why not to add it here
> also? if the user really wants to set this to 0 seconds I guess he should be
> aware of the consequences.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)