Hi Edward, thank you for your input. I didn't know about the cow-semantics, that's really useful. I'll check out the in-depth guide for sure!In my case, the content of the flow file does change heavily from one processor to the next one, so I doubt copy-on-write would help here. Best,Lars On Wed, 2019-07-31 at 12:13 +0100, Edward Armes wrote: > HI Lars, > In short. depending on the how a FlowFile is duplicated, the contentshouldn't > be duplicated as well. > In general, content is only duplicated when it has been deemed to have > beenchanged (copy-on-write semantics). For the > most part (unless a FlowFIle hasa large number of attributes) a FlowFile is > actually quite small andtherefore the > waste is minimal, hence why they can be held in memory andpassed through a > Flow. > The best way to branch/clone a flow file is to add another output from > theprocessor you want to log the output from, > and the Framework that surroundsa Processor will handle the rest. This does > create a duplicate FlowFIle butdoesn't > create a copy of the content. In the provenance repository thismarked as a > CLONE event for the original FlowFIle and > the new FlowFile getstreated as it's own unique FlowFIle with a reference to > the originalcontent. > This is quite a short explanation, and a better and more in depthexplanation > can be found here and I think this covers > all the scenariosyou're thinking > about:https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html. > > Edward > On Wed, Jul 31, 2019 at 11:47 AM Lars Winderling > <[email protected]>wrote: > > Dear NiFi community, > > I often face the use-case where I import flow files with content of > > orderO(1gb) or O(10gb) – already > > compressed.Let's day I need to branch off of a flow where the actual flow > > file shouldbe processed further, and one > > some side branch I want just to do some kindof logging or whatever without > > accessing the flow file's contents. > > Thusit's clearly wasteful to duplicate the flow file including content.For > > this case I wrote a processor defining 2 > > relationships: "original" and"attributes only", so the flow file attributes > > can be accessed separatelyfrom the > > content.I will gladly prepare a PR if anyone finds that worth incorporating > > intoNiFi.Only remaining question for me > > would be: use an individual processor tothat end, or add it to e.g. the > > DuplicateFlowFile processor. The formerseems > > cleaner to me. Proposed names would be something like ForkProcessor(no > > better idea yet). > > Thanks in advance!Best,Lars
signature.asc
Description: This is a digitally signed message part
