James, Sounds like your encoded data is in a binary format. In that case, there should be marking bytes separating the header and the encoded data. Then, just read the content into a byte array, convert the header bytes of the array to UTF-8, do your regex, make a new array joining the changed header bytes and the encoded data's bytes, and write them as bytes, not UTF-8, to the output stream.
On Thu, Aug 19, 2021 at 8:47 AM James McMahon <[email protected]> wrote: > I have a series of data files that have two sections: headers that are text > which I want to alter using regex patterns, and a payload following the > text portion that is encoded data. > > I call a groovy script from an ExecuteScript processor, employing an > inputStream and outputStream callback to make my change to the header > content. Problem is, I seem to be mangling the nontextual payload after the > header operating on the entire flowfile stream. > > I suspect what I need to do is somehow read only the data portion of each > flowfile into my stream, make my changes to that, and write that back out > to the flowfile stream without disturbing the rest of the flowfile. I don't > know how to do that. I'm hoping someone can help. > > Here is my Groovy script thus far (business logic removed). It works as > desired on flowfiles that are all text, but does not work for flowfiles > that have text in the header portion and encoded data following that. > > import java.util.regex.Pattern > import org.apache.commons.io.IOUtils > import java.nio.charset.StandardCharsets > > flowFileList = session.get(1000) > if (!flowFileList.isEmpty()) { > flowFileList.each { flowFile -> > try { > flowFile = session.write(flowFile, {inputStream, > outputStream -> > text = IOUtils.toString(inputStream, > StandardCharsets.UTF_8) > > //regex pattern manipulations in the header text > content happen here > > > outputStream.write(text.getBytes(StandardCharsets.UTF_8)) > } as StreamCallback) > session.transfer(flowFile, REL_SUCCESS) > } catch (e) { > // error logging here > session.transfer(flowFile, REL_FAILURE) > } > } > } > > I posted to dev rather than users because of the nature of the question. My > apologies if I should have done otherwise. >
