Hi Edward,
thank you for your input. I didn't know about the cow-semantics, that's really 
useful. I'll check out the in-depth guide
for sure!In my case, the content of the flow file does change heavily from one 
processor to the next one, so I doubt
copy-on-write would help here. 
Best,Lars
On Wed, 2019-07-31 at 12:13 +0100, Edward Armes wrote:
> HI Lars,
> In short. depending on the how a FlowFile is duplicated, the contentshouldn't 
> be duplicated as well.
> In general, content is only duplicated when it has been deemed to have 
> beenchanged (copy-on-write semantics). For the
> most part (unless a FlowFIle hasa large number of attributes) a FlowFile is 
> actually quite small andtherefore the
> waste is minimal, hence why they can be held in memory andpassed through a 
> Flow.
> The best way to branch/clone a flow file is to add another output from 
> theprocessor you want to log the output from,
> and the Framework that surroundsa Processor will handle the rest. This does 
> create a duplicate FlowFIle butdoesn't
> create a copy of the content. In the provenance repository thismarked as a 
> CLONE event for the original FlowFIle and
> the new FlowFile getstreated as it's own unique FlowFIle with a reference to 
> the originalcontent.
> This is quite a short explanation, and a better and more in depthexplanation 
> can be found here and I think this covers
> all the scenariosyou're thinking 
> about:https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html.
> 
> Edward
> On Wed, Jul 31, 2019 at 11:47 AM Lars Winderling 
> <[email protected]>wrote:
> > Dear NiFi community,
> > I often face the use-case where I import flow files with content of 
> > orderO(1gb) or O(10gb) – already
> > compressed.Let's day I need to branch off of a flow where the actual flow 
> > file shouldbe processed further, and one
> > some side branch I want just to do some kindof logging or whatever without 
> > accessing the flow file's contents.
> > Thusit's clearly wasteful to duplicate the flow file including content.For 
> > this case I wrote a processor defining 2
> > relationships: "original" and"attributes only", so the flow file attributes 
> > can be accessed separatelyfrom the
> > content.I will gladly prepare a PR if anyone finds that worth incorporating 
> > intoNiFi.Only remaining question for me
> > would be: use an individual processor tothat end, or add it to e.g. the 
> > DuplicateFlowFile processor. The formerseems
> > cleaner to me. Proposed names would be something like ForkProcessor(no 
> > better idea yet).
> > Thanks in advance!Best,Lars

Attachment: signature.asc
Description: This is a digitally signed message part



Reply via email to