[ 
https://issues.apache.org/jira/browse/NIFI-3648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Payne updated NIFI-3648:
-----------------------------
    Fix Version/s: 1.6.0
           Status: Patch Available  (was: Open)

> Address Excessive Garbage Collection
> ------------------------------------
>
>                 Key: NIFI-3648
>                 URL: https://issues.apache.org/jira/browse/NIFI-3648
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework, Extensions
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>             Fix For: 1.6.0
>
>
> We have a lot of places in the codebase where we generate lots of unnecessary 
> garbage - especially byte arrays. We need to clean this up to in order to 
> relieve stress on the garbage collector.
> Specific points that I've found create unnecessary garbage:
> Provenance CompressableRecordWriter creates a new BufferedOutputStream for 
> each 'compression block' that it creates. Each one has a 64 KB byte[]. This 
> is very wasteful. We should instead subclass BufferedOutputStream so that we 
> are able to provide a byte[] to use instead of an int that indicates the 
> size. This way, we can just keep re-using the same byte[] that we create for 
> each writer. This saves about 32,000 of these 64 KB byte[] for each writer 
> that we create. And we create more than 1 of these per minute.
> EvaluateJsonPath uses a BufferedInputStream but it is not necessary, because 
> the underlying library will also buffer data. So we are unnecessarily 
> creating a lot of byte[]'s
> CompressContent uses Buffered Input AND Output. And uses 64 KB byte[]. And 
> doesn't need them at all, because it reads and writes with its own byte[] 
> buffer via StreamUtils.copy
> Site-to-site uses CompressionInputStream. This stream creates a new byte[] in 
> the readChunkHeader() method continually. We should instead only create a new 
> byte[] if we need a bigger buffer and otherwise just use an offset & length 
> variable.
> Right now, SplitText uses TextLineDemarcator. The fill() method increases the 
> size of the internal byte[] by 8 KB each time. When dealing with a large 
> chunk of data, this is VERY expensive on GC because we continually create a 
> byte[] and then discard it to create a new one. Take for example an 800 KB 
> chunk. We would do this 100,000 times. If we instead double the size we would 
> only have to create 8 of these.
> Other Processors that use Buffered streams unnecessarily:
> ConvertJSONToSQL
> ExecuteProcess
> ExecuteStreamCommand
> AttributesToJSON
> EvaluateJsonPath (when writing to content)
> ExtractGrok
> JmsConsumer



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to