ncover21 commented on PR #9784:
URL: https://github.com/apache/nifi/pull/9784#issuecomment-2707056816

   > Given the way this processor is likely going to be used, I think we should 
be prescriptive here and avoid generating one flow file per listed file in the 
configured folder. Instead I would generate a single FlowFile with JSON content 
with an array of records, each record containing the metadata information of 
the listed files.
   > 
   > Assuming a folder with 50k files, that will avoid generating 50k flowfiles 
in one execution of the processor. By generating one single flowfile, a user 
could then use a first SliptRecord processor configured with 1000 records 
split, then a second SplitRecord with 1 record split, and finally a ForkRecord 
with path(s) for the fields that should be moved into flowfile attributes. This 
way the backpressure would do its job.
   > 
   > Thoughts?
   
   Thanks for the review, yes I think that would make more sense for folders 
with large amounts of files in them. I've adjusted the logic to add in a writer 
and write the contents to a record instead. I've also added a batch based 
writing system in case of large numbers of files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to