Re: Checkpointing with RocksDB as statebackend

vinay patil Mon, 20 Feb 2017 11:41:39 -0800

Hi Stephan,

Just saw your mail while I was explaining the answer to your earlier
questions. I have attached some more screenshots which are taken from the
latest run today.
Yes I will try to set it to higher value and check if performance improves


Let me know your thoughts

Regards,
Vinay Patil

On Tue, Feb 21, 2017 at 12:51 AM, Stephan Ewen [via Apache Flink User
Mailing List archive.] <ml-node+s2336050n11758...@n4.nabble.com> wrote:

> @Vinay!
>
> Just saw the screenshot you attached to the first mail. The checkpoint
> that failed came after one that had an incredible heavy alignment phase (14
> GB).
> I think that working that off threw the next checkpoint because the
> workers were still working off the alignment backlog.
>
> I think you can for now fix this by setting the minimum pause between
> checkpoints a bit higher (it is probably set a bit too small for the state
> of your application).
>
> Also, can you describe what your sources are (Kafka / Kinesis or file
> system)?
>
> BTW: We are currently working on
>   - incremental RocksDB checkpoints
>   - the network stack to allow in the future for a new way of doing the
> alignment
>
> Both of that should help that the program is more resilient to these
> situations.
>
> Best,
> Stephan
>
>
>
> On Mon, Feb 20, 2017 at 7:51 PM, Stephan Ewen <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=11758&i=0>> wrote:
>
>> Hi Vinay!
>>
>> Can you start by giving us a bit of an environment spec?
>>
>>   - What Flink version are you using?
>>   - What is your rough topology (what operations does the program use)
>>   - Where is the state (windows, keyBy)?
>>   - What is the rough size of your checkpoints and where does the time
>> go? Can you attach a screenshot from https://ci.apache.org/pro
>> jects/flink/flink-docs-release-1.2/monitoring/checkpoint_monitoring.html
>>   - What is the size of the JVM?
>>
>> Those things would be helpful to know...
>>
>> Best,
>> Stephan
>>
>>
>> On Mon, Feb 20, 2017 at 7:04 PM, vinay patil <[hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=11758&i=1>> wrote:
>>
>>> Hi Xiaogang,
>>>
>>> Thank you for your inputs.
>>>
>>> Yes I have already tried setting MaxBackgroundFlushes and
>>> MaxBackgroundCompactions to higher value (tried with 2, 4, 8) , still not
>>> getting expected results.
>>>
>>> System.getProperty("java.io.tmpdir") points to /tmp but there I could
>>> not find RocksDB logs, can you please let me know where can I find it ?
>>>
>>> Regards,
>>> Vinay Patil
>>>
>>> On Mon, Feb 20, 2017 at 7:32 AM, xiaogang.sxg [via Apache Flink User
>>> Mailing List archive.] <[hidden email]
>>> <http:///user/SendEmail.jtp?type=node&node=11752&i=0>> wrote:
>>>
>>>> Hi Vinay
>>>>
>>>> Can you provide the LOG file in RocksDB? It helps a lot to figure out
>>>> the problems becuse it records the options and the events happened
>>>> during the execution. Otherwise configured, it should locate at the
>>>> path set in System.getProperty("java.io.tmpdir").
>>>>
>>>> Typically, a large amount of memory is consumed by RocksDB to store
>>>> necessary indices. To avoid the unlimited growth in the memory consumption,
>>>> you can put these indices into block cache (set CacheIndexAndFilterBlock to
>>>> true) and properly set the block cache size.
>>>>
>>>> You can also increase the number of backgroud threads to improve the
>>>> performance of flushes and compactions (via MaxBackgroundFlushes and
>>>> MaxBackgroudCompactions).
>>>>
>>>> In YARN clusters, task managers will be killed if their memory
>>>> utilization exceeds the allocation size. Currently Flink does not count the
>>>> memory used by RocksDB in the allocation. We are working on fine-grained
>>>> resource allocation (see FLINK-5131). It may help to avoid such problems.
>>>>
>>>> May the information helps you.
>>>>
>>>> Regards,
>>>> Xiaogang
>>>>
>>>>
>>>> ------------------------------------------------------------------
>>>> 发件人：Vinay Patil <[hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=11731&i=0>>
>>>> 发送时间：2017年2月17日(星期五) 21:19
>>>> 收件人：user <[hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=11731&i=1>>
>>>> 主 题：Re: Checkpointing with RocksDB as statebackend
>>>>
>>>> Hi Guys,
>>>>
>>>> There seems to be some issue with RocksDB memory utilization.
>>>>
>>>> Within few minutes of job run the physical memory usage increases by
>>>> 4-5 GB and it keeps on increasing.
>>>> I have tried different options for Max Buffer Size(30MB, 64MB, 128MB ,
>>>> 512MB) and Min Buffer to Merge as 2, but the physical memory keeps on
>>>> increasing.
>>>>
>>>> According to RocksDB documentation, these are the main options on which
>>>> flushing to storage is based.
>>>>
>>>> Can you please point me where am I doing wrong. I have tried different
>>>> configuration options but each time the Task Manager is getting killed
>>>> after some time :)
>>>>
>>>> Regards,
>>>> Vinay Patil
>>>>
>>>> On Thu, Feb 16, 2017 at 6:02 PM, Vinay Patil <[hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=11731&i=2>> wrote:
>>>> I think its more of related to RocksDB, I am also not aware about
>>>> RocksDB but reading the tuning guide to understand the important values
>>>> that can be set
>>>>
>>>> Regards,
>>>> Vinay Patil
>>>>
>>>> On Thu, Feb 16, 2017 at 5:48 PM, Stefan Richter [via Apache Flink User
>>>> Mailing List archive.] <[hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=11731&i=3>> wrote:
>>>> What kind of problem are we talking about? S3 related or RocksDB
>>>> related. I am not aware of problems with RocksDB per se. I think seeing
>>>> logs for this would be very helpful.
>>>>
>>>> Am 16.02.2017 um 11:56 schrieb Aljoscha Krettek <[hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=11673&i=0>>:
>>>>
>>>> [hidden email] <http:///user/SendEmail.jtp?type=node&node=11673&i=1>
>>>>  and [hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=11673&i=2> could this be
>>>> the same problem that you recently saw when working with other people?
>>>>
>>>> On Wed, 15 Feb 2017 at 17:23 Vinay Patil <[hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=11673&i=3>> wrote:
>>>> Hi Guys,
>>>>
>>>> Can anyone please help me with this issue
>>>>
>>>> Regards,
>>>> Vinay Patil
>>>>
>>>> On Wed, Feb 15, 2017 at 6:17 PM, Vinay Patil <[hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=11673&i=4>> wrote:
>>>> Hi Ted,
>>>>
>>>> I have 3 boxes in my pipeline , 1st and 2nd box containing source and
>>>> s3 sink and the 3rd box is window operator followed by chained operators
>>>> and a s3 sink
>>>>
>>>> So in the details link section I can see that that S3 sink is taking
>>>> time for the acknowledgement and it is not even going to the window
>>>> operator chain.
>>>>
>>>> But as shown in the snapshot ,checkpoint id 19 did not get any
>>>> acknowledgement. Not sure what is causing the issue
>>>>
>>>> Regards,
>>>> Vinay Patil
>>>>
>>>> On Wed, Feb 15, 2017 at 5:51 PM, Ted Yu [via Apache Flink User Mailing
>>>> List archive.] <[hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=11673&i=5>> wrote:
>>>> What did the More Details link say ?
>>>>
>>>> Thanks
>>>>
>>>> > On Feb 15, 2017, at 3:11 AM, vinay patil <[hidden email]
>>>> <http://user/SendEmail.jtp?type=node&node=11641&i=0>> wrote:
>>>> >
>>>> > Hi,
>>>> >
>>>> > I have kept the checkpointing interval to 6secs and minimum pause
>>>> between
>>>> > checkpoints to 5secs, while testing the pipeline I have observed that
>>>> that
>>>> > for some checkpoints it is taking long time , as you can see in the
>>>> attached
>>>> > snapshot checkpoint id 19 took the maximum time before it gets
>>>> failed,
>>>> > although it has not received any acknowledgements, now during this
>>>> 10minutes
>>>> > the entire pipeline did not make any progress and no data was getting
>>>> > processed. (For Ex : In 13minutes 20M records were processed and when
>>>> the
>>>> > checkpoint took time there was no progress for the next 10minutes)
>>>> >
>>>> > I have even tried to set max checkpoint timeout to 3min, but in that
>>>> case as
>>>> > well multiple checkpoints were getting failed.
>>>> >
>>>> > I have set RocksDB FLASH_SSD_OPTION
>>>> > What could be the issue ?
>>>> >
>>>> > P.S. I am writing to 3 S3 sinks
>>>> >
>>>> > checkpointing_issue.PNG
>>>> > <http://apache-flink-user-mailing-list-archive.2336050.n4.na
>>>> bble.com/file/n11640/checkpointing_issue.PNG>
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > View this message in context: http://apache-flink-user-maili
>>>> ng-list-archive.2336050.n4.nabble.com/Checkpointing-with-Roc
>>>> ksDB-as-statebackend-tp11640.html
>>>> > Sent from the Apache Flink User Mailing List archive. mailing list
>>>> archive at Nabble.com.
>>>> ------------------------------
>>>> If you reply to this email, your message will be added to the
>>>> discussion below:
>>>> http://apache-flink-user-mailing-list-archive.2336050.n4.nab
>>>> ble.com/Checkpointing-with-RocksDB-as-statebackend-tp11640p11641.html
>>>> To start a new topic under Apache Flink User Mailing List archive.,
>>>> email [hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=11673&i=6>
>>>> To unsubscribe from Apache Flink User Mailing List archive., click here
>>>> <#m_-370635408291964005_m_3724869264661144930_m_6198963695418156302_m_8892162958879126193_this>
>>>> .
>>>> NAML
>>>> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>>>
>>>>
>>>>
>>>> ------------------------------
>>>> If you reply to this email, your message will be added to the
>>>> discussion below:
>>>> http://apache-flink-user-mailing-list-archive.2336050.n4.nab
>>>> ble.com/Checkpointing-with-RocksDB-as-statebackend-tp11640p11673.html
>>>> To start a new topic under Apache Flink User Mailing List archive.,
>>>> email [hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=11731&i=4>
>>>> To unsubscribe from Apache Flink User Mailing List archive., click here
>>>> .
>>>> NAML
>>>> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>>>
>>>>
>>>
>>
>> ------------------------------
>> If you reply to this email, your message will be added to the discussion
>> below:
>> http://apache-flink-user-mailing-list-archive.2336050.n4.nab
>> ble.com/Checkpointing-with-RocksDB-as-statebackend-tp11640p11731.html
>> To start a new topic under Apache Flink User Mailing List archive., email 
>> [hidden
>> email] <http:///user/SendEmail.jtp?type=node&node=11752&i=1>
>> To unsubscribe from Apache Flink User Mailing List archive., click here.
>> NAML
>> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>>
>> ------------------------------
>> View this message in context: Re: Checkpointing with RocksDB as
>> statebackend
>> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Re-Checkpointing-with-RocksDB-as-statebackend-tp11752.html>
>>
>> Sent from the Apache Flink User Mailing List archive. mailing list
>> archive
>> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/>
>> at Nabble.com.
>>
>
>
>
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Re-
> Checkpointing-with-RocksDB-as-statebackend-tp11752p11758.html
> To start a new topic under Apache Flink User Mailing List archive., email
> ml-node+s2336050n1...@n4.nabble.com
> To unsubscribe from Apache Flink User Mailing List archive., click here
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=dmluYXkxOC5wYXRpbEBnbWFpbC5jb218MXwxODExMDE2NjAx>
> .
> NAML
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Re-Checkpointing-with-RocksDB-as-statebackend-tp11752p11760.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.

Re: Checkpointing with RocksDB as statebackend

Reply via email to