ncover21 commented on PR #9784: URL: https://github.com/apache/nifi/pull/9784#issuecomment-2707056816
> Given the way this processor is likely going to be used, I think we should be prescriptive here and avoid generating one flow file per listed file in the configured folder. Instead I would generate a single FlowFile with JSON content with an array of records, each record containing the metadata information of the listed files. > > Assuming a folder with 50k files, that will avoid generating 50k flowfiles in one execution of the processor. By generating one single flowfile, a user could then use a first SliptRecord processor configured with 1000 records split, then a second SplitRecord with 1 record split, and finally a ForkRecord with path(s) for the fields that should be moved into flowfile attributes. This way the backpressure would do its job. > > Thoughts? Thanks for the review, yes I think that would make more sense for folders with large amounts of files in them. I've adjusted the logic to add in a writer and write the contents to a record instead. I've also added a batch based writing system in case of large numbers of files. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
