Re: [DISCUSS] Streaming or "lazy" mode for `CompressContent`

Edward Armes Tue, 30 Jul 2019 09:34:11 -0700

So while I agree with in principle and it is a good idea on paper.

 My concern is that this starts to add a bolt-on bloat problem. The Nifi
processors as they stand in general do follow the Unix Philosophy (Do one
thing, and do it well). My concern is while it could just be a case with
just adding a wrapper is that it then becomes an ask to just add the
wrapper to other processors to add similar functionalty or other. This does
start to cause a technical debt problem and also start to potentially a
detrimental experience to the user. Some of this I have mentioned in the
previous thread about the re-structuring the Nifi core.

The reason why I suggest doing it either at the repo level or as the
InputStream is handed over to the processor from the core is that it adds
it as a global piece of functionality, which every processor that processes
data that compress well could benefit from. Now ideally it would be nice to
see it as a "per-flow" setting but I suspect that would be adding more
complexity, than is actually needed.

I have seen an issue where over the time the content repo took up quite a
chunk of disk, for a multi-tenanted cluster that performed lots of small
changes on lots of FlowFiles, now while the hosts were under resourced,
being able to have compressed the content and trading it off for speed of
data through the flow might have helped that situation quite a bit.

Edward

On Tue, Jul 30, 2019 at 4:21 PM Joe Witt <[email protected]> wrote:

> Malthe
>
> I do see value in having the Record readers/writers understand and handle
> compression directly as it will avoid the extra disk hit of decompress,
> read, compress cycles using existing processes and further there are cases
> where the compression is record specific and not just holistic block
> encryption.
>
> I think Koji offered a great description of how to start thinking about
> this.
>
> Thanks
>
> On Tue, Jul 30, 2019 at 10:47 AM Malthe <[email protected]> wrote:
>
> > In reference to NIFI-6496 [1], I'd like to open a discussion on adding
> > compression support to flow files such that a processor such as
> > `CompressContent` might function in a streaming or "lazy" mode.
> >
> > Context, more details and initial feedback can be found in the ticket
> > referenced below as well as in a related SO entry [2].
> >
> > [1] https://issues.apache.org/jira/browse/NIFI-6496
> > [2]
> >
> https://stackoverflow.com/questions/57005564/using-convertrecord-on-compressed-input
> >
>

Re: [DISCUSS] Streaming or "lazy" mode for `CompressContent`

Reply via email to