James,
Sounds like your encoded data is in a binary format. In that case, there
should be marking bytes separating the header and the encoded data. Then,
just read the content into a byte array, convert the header bytes of the
array to UTF-8, do your regex, make a new array joining the changed header
bytes and the encoded data's bytes, and write them as bytes, not UTF-8, to
the output stream.





On Thu, Aug 19, 2021 at 8:47 AM James McMahon <[email protected]> wrote:

> I have a series of data files that have two sections: headers that are text
> which I want to alter using regex patterns, and a payload following the
> text portion that is encoded data.
>
> I call a groovy script from an ExecuteScript processor, employing an
> inputStream and outputStream callback to make my change to the header
> content. Problem is, I seem to be mangling the nontextual payload after the
> header operating on the entire flowfile stream.
>
> I suspect what I need to do is somehow read only the data portion of each
> flowfile into my stream, make my changes to that, and write that back out
> to the flowfile stream without disturbing the rest of the flowfile. I don't
> know how to do that. I'm hoping someone can help.
>
> Here is my Groovy script thus far (business logic removed). It works as
> desired on flowfiles that are all text, but does not work for flowfiles
> that have text in the header portion and encoded data following that.
>
> import java.util.regex.Pattern
> import org.apache.commons.io.IOUtils
> import java.nio.charset.StandardCharsets
>
> flowFileList = session.get(1000)
> if (!flowFileList.isEmpty()) {
>      flowFileList.each { flowFile ->
>           try {
>                flowFile = session.write(flowFile, {inputStream,
> outputStream ->
>                     text = IOUtils.toString(inputStream,
> StandardCharsets.UTF_8)
>
>                     //regex pattern manipulations in the header text
> content happen here
>
>
> outputStream.write(text.getBytes(StandardCharsets.UTF_8))
>                 } as StreamCallback)
>                session.transfer(flowFile, REL_SUCCESS)
>           } catch (e) {
>                // error logging here
>                session.transfer(flowFile, REL_FAILURE)
>           }
>      }
> }
>
> I posted to dev rather than users because of the nature of the question. My
> apologies if I should have done otherwise.
>

Reply via email to