OOM issue on Dataflow Worker by doing string manipulation

2020-09-02 Thread Talat Uyarer
Hi, I have an issue with String Concatenating. You can see my code below.[1] I have a step on my df job which is concatenating strings. But somehow when I use that step my job starts getting jvm restart errors. Shutting down JVM after 8 consecutive periods of measured GC thrashing. > Memory is u

Re: OOM issue on Dataflow Worker by doing string manipulation

2020-09-02 Thread Ning Kang
Here is a question answered on StackOverflow: https://stackoverflow.com/questions/27221292/when-should-i-use-javas-stringwriter Could you try using StringBuilder instead since the usage is not appropriate for a StringWriter? On Wed, Sep 2, 2020 at 2:49 PM Talat Uyarer wrote: > Hi, > > I have a

Re: OOM issue on Dataflow Worker by doing string manipulation

2020-09-02 Thread Talat Uyarer
Sorry for the wrong import. You can see on the code I am using StringBuilder. On Wed, Sep 2, 2020 at 2:55 PM Ning Kang wrote: > Here is a question answered on StackOverflow: > https://stackoverflow.com/questions/27221292/when-should-i-use-javas-stringwriter >

Re: OOM issue on Dataflow Worker by doing string manipulation

2020-09-02 Thread Kyle Weaver
You can try scoping the string builder instance to processElement, instead of making it a member of your DoFn. The same DoFn instance can be used for a bundle of many elements, or possibly even across multiple bundles. https://beam.apache.org/releases/javadoc/2.23.0/org/apache/beam/sdk/transforms/

Re: OOM issue on Dataflow Worker by doing string manipulation

2020-09-02 Thread Brian Hulette
That error isn't exactly an OOM, it indicates the JVM is spending a significant amount of time in garbage collection. It looks like `writer.setLength(0)` may actually allocate a new buffer, and then the buffer may also need to be resized as the String grows, so you could be creating a lot of orpha

Re: OOM issue on Dataflow Worker by doing string manipulation

2020-09-02 Thread Kyle Weaver
> It looks like `writer.setLength(0)` may actually allocate a new buffer, and then the buffer may also need to be resized as the String grows, so you could be creating a lot of orphaned buffers very quickly. I'm not that familiar with StringBuilder, is there a way to reset it and re-use the existin

Re: OOM issue on Dataflow Worker by doing string manipulation

2020-09-02 Thread Talat Uyarer
> > You can try scoping the string builder instance to processElement, instead > of making it a member of your DoFn. > I tried to create a StringBuilder in beamRow2CsvLine function too. But it has a similar issue. I put StringBuilder on Setup to reuse the same object per bundle to reduce object re

Re: OOM issue on Dataflow Worker by doing string manipulation

2020-09-02 Thread Talat Uyarer
> > If I'm understanding Talat's logic correctly, it's not necessary to reuse > the string builder at all in this case. Yes. I tried it too. But DF job has the same issue. On Wed, Sep 2, 2020 at 3:17 PM Kyle Weaver wrote: > > It looks like `writer.setLength(0)` may actually allocate a new buff

Re: OOM issue on Dataflow Worker by doing string manipulation

2020-09-02 Thread Talat Uyarer
I also tried Brian's suggestion to clear stringbuilder by calling delete with stringbuffer length. No luck. I am still getting the same error message. Do you have any suggestions ? Thanks On Wed, Sep 2, 2020 at 3:33 PM Talat Uyarer wrote: > If I'm understanding Talat's logic correctly, it's not

Re: Clearing states and timers in a Stateful Fn with Global Windows

2020-09-02 Thread Gökhan Imral
Thanks for the quick response. I tried with a fix applied build and can see that memory is much more stable. Gokhan > On 2 Sep 2020, at 12:51 PM, Jan Lukavský wrote: > > Hi Gokhan, > > this is related to [1], which is just going to be fixed. > > Jan > > [1] https://github.com/apache/beam/pu