Is it possible that you only need more memory per worker?

Have you tried using a `workerMachineType` with more memory [1]  and lower
the `numberOfWorkerHarnessThreads` [2]?

[1]
https://cloud.google.com/compute/docs/machine-types#n1_standard_machine_types
[2]
https://cloud.google.com/dataflow/docs/guides/specifying-exec-params#setting-other-cloud-dataflow-pipeline-options

On Thu, Sep 3, 2020 at 11:46 AM Brian Hulette <bhule...@google.com> wrote:

> You may be able to get some additional insight if you configure Dataflow
> to save a heap dump before killing the JVM
> (--dumpHeapOnOOM, --saveHeapDumpsToGcsPath) and inspecting the dump. There
> are directions for that (and a lot of other advice about memory issues) at
> [1].
>
>
> Another question - is this a batch pipeline? There's an open Jira about
> thrashing detection in Dataflow's batch worker [2]. The fact that we shut
> down a worker's JVM after sustained periods of thrashing is part of a
> larger system for dealing with memory pressure intended for use in
> streaming pipelines. We may want to make it opt-in for batch pipelines. I
> wrote a PR that made the JVM shutdown opt-in for all Dataflow pipelines
> earlier this year, but I closed it when I realized it's an important
> feature in streaming [3]. I could revisist that PR and make the feature
> opt-in for batch, opt-out for streaming, but that wouldn't help you until
> the next Beam release.
>
> Brian
>
> [1]
> https://cloud.google.com/community/tutorials/dataflow-debug-oom-conditions
> [2] https://issues.apache.org/jira/browse/BEAM-9049
> [3] https://github.com/apache/beam/pull/10499#issuecomment-570743842
>
>
> On Thu, Sep 3, 2020 at 10:48 AM Talat Uyarer <tuya...@paloaltonetworks.com>
> wrote:
>
>> Hi,
>>
>> One more update. Sorry When I created a code sample that I shared. I put
>> StringBuilder under the setup function but actually it was on the start
>> bundle function. So far I tested below scenarios
>> - with StringWriter construct object every processElement call
>> - with StringBuilder construct object every processElement call
>> - with StringBuilder construct object every startBundle call (and also
>> tried setLength(0) and delete(0,sb.length() to clean StringBuilder)
>>
>> None of the cases prevent DF jobs from getting below error.
>>
>>> Shutting down JVM after 8 consecutive periods of measured GC thrashing.
>>> Memory is used/total/max = 4112/5994/5994 MB, GC last/max = 97.36/97.36 %,
>>> #pushbacks=3, gc thrashing=true. Heap dump not written.
>>
>>
>> And also my process rate is 4kps per instance. I would like to hear your
>> suggestions if you have any.
>>
>> Thanks
>>
>> On Wed, Sep 2, 2020 at 6:22 PM Talat Uyarer <tuya...@paloaltonetworks.com>
>> wrote:
>>
>>> I also tried Brian's suggestion to clear stringbuilder by calling delete
>>> with stringbuffer length. No luck. I am still getting the same error
>>> message. Do you have any suggestions ?
>>>
>>> Thanks
>>>
>>> On Wed, Sep 2, 2020 at 3:33 PM Talat Uyarer <
>>> tuya...@paloaltonetworks.com> wrote:
>>>
>>>> If I'm understanding Talat's logic correctly, it's not necessary to
>>>>> reuse the string builder at all in this case.
>>>>
>>>> Yes. I tried it too. But DF job has the same issue.
>>>>
>>>>
>>>> On Wed, Sep 2, 2020 at 3:17 PM Kyle Weaver <kcwea...@google.com> wrote:
>>>>
>>>>> > It looks like `writer.setLength(0)` may actually allocate a new
>>>>> buffer, and then the buffer may also need to be resized as the String
>>>>> grows, so you could be creating a lot of orphaned buffers very quickly. 
>>>>> I'm
>>>>> not that familiar with StringBuilder, is there a way to reset it and 
>>>>> re-use
>>>>> the existing capacity? Maybe `writer.delete(0, writer.length())` [1]?
>>>>>
>>>>> If I'm understanding Talat's logic correctly, it's not necessary to
>>>>> reuse the string builder at all in this case.
>>>>>
>>>>> On Wed, Sep 2, 2020 at 3:11 PM Brian Hulette <bhule...@google.com>
>>>>> wrote:
>>>>>
>>>>>> That error isn't exactly an OOM, it indicates the JVM is spending a
>>>>>> significant amount of time in garbage collection.
>>>>>>
>>>>>> It looks like `writer.setLength(0)` may actually allocate a new
>>>>>> buffer, and then the buffer may also need to be resized as the String
>>>>>> grows, so you could be creating a lot of orphaned buffers very quickly. 
>>>>>> I'm
>>>>>> not that familiar with StringBuilder, is there a way to reset it and 
>>>>>> re-use
>>>>>> the existing capacity? Maybe `writer.delete(0, writer.length())` [1]?
>>>>>>
>>>>>> [1]
>>>>>> https://stackoverflow.com/questions/242438/is-it-better-to-reuse-a-stringbuilder-in-a-loop
>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__stackoverflow.com_questions_242438_is-2Dit-2Dbetter-2Dto-2Dreuse-2Da-2Dstringbuilder-2Din-2Da-2Dloop&d=DwMFaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=BkW1L6EF7ergAVYDXCo-3Vwkpy6qjsWAz7_GD7pAR8g&m=8xskmxTZ2EbxwBknWfeIiV2kEsXsu9dzjWT_yG6A0s4&s=ZL6S353ZUzPRmxrPo8Sei_mdxsWDxs4Km2RwwiwefEU&e=>
>>>>>>
>>>>>> On Wed, Sep 2, 2020 at 3:02 PM Talat Uyarer <
>>>>>> tuya...@paloaltonetworks.com> wrote:
>>>>>>
>>>>>>> Sorry for the wrong import. You can see on the code I am using
>>>>>>> StringBuilder.
>>>>>>>
>>>>>>> On Wed, Sep 2, 2020 at 2:55 PM Ning Kang <ni...@google.com> wrote:
>>>>>>>
>>>>>>>> Here is a question answered on StackOverflow:
>>>>>>>> https://stackoverflow.com/questions/27221292/when-should-i-use-javas-stringwriter
>>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__stackoverflow.com_questions_27221292_when-2Dshould-2Di-2Duse-2Djavas-2Dstringwriter&d=DwMFaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=BkW1L6EF7ergAVYDXCo-3Vwkpy6qjsWAz7_GD7pAR8g&m=mVBqxC5kNOARPduF-c17S1VnIw8gwS6alvgONJKfheY&s=ggveahdPKo3vaAhADvjz4ucjndSmzyOZ8FPBvJ_0oZQ&e=>
>>>>>>>>
>>>>>>>> Could you try using StringBuilder instead since the usage is not
>>>>>>>> appropriate for a StringWriter?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Sep 2, 2020 at 2:49 PM Talat Uyarer <
>>>>>>>> tuya...@paloaltonetworks.com> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I have an issue with String Concatenating. You can see my code
>>>>>>>>> below.[1] I have a step on my df job which is concatenating strings. 
>>>>>>>>> But
>>>>>>>>> somehow when I use that step my job starts getting jvm restart errors.
>>>>>>>>>
>>>>>>>>>  Shutting down JVM after 8 consecutive periods of measured GC
>>>>>>>>>> thrashing. Memory is used/total/max = 4112/5994/5994 MB, GC last/max 
>>>>>>>>>> =
>>>>>>>>>> 97.36/97.36 %, #pushbacks=3, gc thrashing=true. Heap dump not 
>>>>>>>>>> written.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> And also I try to use Avro rather than String. When I use Avro, it
>>>>>>>>> works fine without any issue. Do you have any suggestions?
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>> [1] https://dpaste.com/7RTV86WQC
>>>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__dpaste.com_7RTV86WQC&d=DwMFaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=BkW1L6EF7ergAVYDXCo-3Vwkpy6qjsWAz7_GD7pAR8g&m=mVBqxC5kNOARPduF-c17S1VnIw8gwS6alvgONJKfheY&s=eSd0NcP8fw5BOZlSXtUMRfYuGWlN-gcXENVwgCmrapY&e=>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>

Reply via email to