Joe,

I think it might not be necessary or desirable to expose this outside
of the `CompressContent` processor. Whether the processor operates in
a lazy mode (as proposed here) or the current eager mode shouldn't
change the behavior of the flow. The next process (or processes) will
not know the difference.

The benefit of this approach is that it really introduces no new
concepts and is basically just a way for people to optimize existing
and already working flows.

Thanks

On Tue, 30 Jul 2019 at 16:42, Joe Witt <[email protected]> wrote:
>
> Edward,
>
> I like your point/comment regarding separation of concerns/cohesion.  I
> think we could/should consider automatically decompressing data on the fly
> for processors in general in the event we know a given set of data to be
> compressed but being accessed for plaintext purposes.  For general block
> compression types this is probably fair game and could be quite compelling
> particularly to avoid the extra read/write/content repo hits involved.
>
> That said, I think for the case of record readers/writers I'm not sure we
> can avoid having a specific solution.  Some compression types can be
> concatted together and some cannot.  Some record types would be
> tolerant/still valid and some would not.
>
> Thanks
> Joe
>
> On Tue, Jul 30, 2019 at 12:34 PM Edward Armes <[email protected]>
> wrote:
>
> > So while I agree with in principle and it is a good idea on paper.
> >
> >  My concern is that this starts to add a bolt-on bloat problem. The Nifi
> > processors as they stand in general do follow the Unix Philosophy (Do one
> > thing, and do it well). My concern is while it could just be a case with
> > just adding a wrapper is that it then becomes an ask to just add the
> > wrapper to other processors to add similar functionalty or other. This does
> > start to cause a technical debt problem and also start to potentially a
> > detrimental experience to the user. Some of this I have mentioned in the
> > previous thread about the re-structuring the Nifi core.
> >
> > The reason why I suggest doing it either at the repo level or as the
> > InputStream is handed over to the processor from the core is that it adds
> > it as a global piece of functionality, which every processor that processes
> > data that compress well could benefit from. Now ideally it would be nice to
> > see it as a "per-flow" setting but I suspect that would be adding more
> > complexity, than is actually needed.
> >
> > I have seen an issue where over the time the content repo took up quite a
> > chunk of disk, for a multi-tenanted cluster that performed lots of small
> > changes on lots of FlowFiles, now while the hosts were under resourced,
> > being able to have compressed the content and trading it off for speed of
> > data through the flow might have helped that situation quite a bit.
> >
> > Edward
> >
> > On Tue, Jul 30, 2019 at 4:21 PM Joe Witt <[email protected]> wrote:
> >
> > > Malthe
> > >
> > > I do see value in having the Record readers/writers understand and handle
> > > compression directly as it will avoid the extra disk hit of decompress,
> > > read, compress cycles using existing processes and further there are
> > cases
> > > where the compression is record specific and not just holistic block
> > > encryption.
> > >
> > > I think Koji offered a great description of how to start thinking about
> > > this.
> > >
> > > Thanks
> > >
> > > On Tue, Jul 30, 2019 at 10:47 AM Malthe <[email protected]> wrote:
> > >
> > > > In reference to NIFI-6496 [1], I'd like to open a discussion on adding
> > > > compression support to flow files such that a processor such as
> > > > `CompressContent` might function in a streaming or "lazy" mode.
> > > >
> > > > Context, more details and initial feedback can be found in the ticket
> > > > referenced below as well as in a related SO entry [2].
> > > >
> > > > [1] https://issues.apache.org/jira/browse/NIFI-6496
> > > > [2]
> > > >
> > >
> > https://stackoverflow.com/questions/57005564/using-convertrecord-on-compressed-input
> > > >
> > >
> >

Reply via email to