[ 
https://issues.apache.org/jira/browse/NIFI-14095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Filip Maretić updated NIFI-14095:
---------------------------------
    Description: 
Just setting the *KeepSourceFile* property to *true* can cause continuous 
ingestion of files into NiFi. If the file is big (e.g. 20 GB) this can cause 
the content repository (e.g. size of 400 GB) to be filled in an instant. This 
renders the NiFi node unusable and a cleanup is needed. There is no reason for 
this to happen, the flow should at least have enough time to process a chunk of 
such a huge file before attempting to load the same file again.

A quick solution would be just to add
{code:java}
@DefaultSchedule(strategy = SchedulingStrategy.TIMER_DRIVEN, period = "1 min")
{code}

This is anyway present on the ListFile processor, so why not to add it here 
also? if the user really wants to set this to 0 seconds I guess he should be 
aware of the consequences.

  was:
Just setting the *KeepSourceFile* property to *true* can cause continuous 
ingestion of files into NiFi. If the file is big (e.g. 20 GB) this can cause 
the content repository (e.g. size of 400 GB) to be filled in an instant. This 
renders the NiFi node unusable and a cleanup is needed. There is no reason for 
this to happen, the flow should at least have enough time to process a chunk of 
such a huge file before attempting to load the same file again.

A quick solution would be just to add
{code:java}
@DefaultSchedule(strategy = SchedulingStrategy.TIMER_DRIVEN, period = "1 min")
{code}

This is anyone present on the ListFile processor, so why not to add it here 
also?


> GetFile - "KeepSourceFile" set to true can fill up content repository
> ---------------------------------------------------------------------
>
>                 Key: NIFI-14095
>                 URL: https://issues.apache.org/jira/browse/NIFI-14095
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Configuration
>    Affects Versions: 2.0.0, 1.28.1
>            Reporter: Filip Maretić
>            Priority: Major
>              Labels: GetFile, ListFile
>             Fix For: 2.1.0
>
>
> Just setting the *KeepSourceFile* property to *true* can cause continuous 
> ingestion of files into NiFi. If the file is big (e.g. 20 GB) this can cause 
> the content repository (e.g. size of 400 GB) to be filled in an instant. This 
> renders the NiFi node unusable and a cleanup is needed. There is no reason 
> for this to happen, the flow should at least have enough time to process a 
> chunk of such a huge file before attempting to load the same file again.
> A quick solution would be just to add
> {code:java}
> @DefaultSchedule(strategy = SchedulingStrategy.TIMER_DRIVEN, period = "1 min")
> {code}
> This is anyway present on the ListFile processor, so why not to add it here 
> also? if the user really wants to set this to 0 seconds I guess he should be 
> aware of the consequences.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to