Re: Resource under-utilization when using RocksDb state backend [SOLVED]

vinay patil Thu, 16 Feb 2017 05:29:25 -0800

Hi Cliff,

It will be really helpful if you could share your RocksDB configuration.


I am also running on c3.4xlarge EC2 instances backed by SSD's .

I had tried with FLASH_SSD_OPTIMIZED option which works great but somehow
the pipeline stops in between and the overall processing time increases,

I tried to set different values as mentioned in this video, but somehow I
am not getting it right, the TM's is getting killed after sometime.


Regards,
Vinay Patil

On Thu, Dec 8, 2016 at 10:19 PM, Cliff Resnick [via Apache Flink User
Mailing List archive.] <ml-node+s2336050n10537...@n4.nabble.com> wrote:

> It turns out that most of the time in RocksDBFoldingState was spent on
> serialization/deserializaton. RocksDb read/write was performing well. By
> moving from Kryo to custom serialization we were able to increase
> throughput dramatically. Load is now where it should be.
>
> On Mon, Dec 5, 2016 at 1:15 PM, Robert Metzger <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=10537&i=0>> wrote:
>
>> Another Flink user using RocksDB with large state on SSDs recently posted
>> this video for oprimizing the performance of Rocks on SSDs:
>> https://www.youtube.com/watch?v=pvUqbIeoPzM
>> That could be relevant for you.
>>
>> For how long did you look at iotop. It could be that the IO access
>> happens in bursts, depending on how data is cached.
>>
>> I'll also add Stefan Richter to the conversation, he has maybe some more
>> ideas what we can do here.
>>
>>
>> On Mon, Dec 5, 2016 at 6:19 PM, Cliff Resnick <[hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=10537&i=1>> wrote:
>>
>>> Hi Robert,
>>>
>>> We're following 1.2-SNAPSHOT,  using event time. I have tried "iotop"
>>> and I see usually less than 1 % IO. The most I've seen was a quick flash
>>> here or there of something substantial (e.g. 19%, 52%) then back to
>>> nothing. I also assumed we were disk-bound, but to use your metaphor I'm
>>> having trouble finding any smoke. However, I'm not very experienced in
>>> sussing out IO issues so perhaps there is something else I'm missing.
>>>
>>> I'll keep investigating. If I continue to come up empty then I guess my
>>> next steps may be to stage some independent tests directly against RocksDb.
>>>
>>> -Cliff
>>>
>>>
>>> On Mon, Dec 5, 2016 at 5:52 AM, Robert Metzger <[hidden email]
>>> <http:///user/SendEmail.jtp?type=node&node=10537&i=2>> wrote:
>>>
>>>> Hi Cliff,
>>>>
>>>> which Flink version are you using?
>>>> Are you using Eventtime or processing time windows?
>>>>
>>>> I suspect that your disks are "burning" (= your job is IO bound). Can
>>>> you check with a tool like "iotop" how much disk IO Flink is producing?
>>>> Then, I would set this number in relation with the theoretical maximum
>>>> of your SSD's (a good rough estimate is to use dd for that).
>>>>
>>>> If you find that your disk bandwidth is saturated by Flink, you could
>>>> look into tuning the RocksDB settings so that it uses more memory for
>>>> caching.
>>>>
>>>> Regards,
>>>> Robert
>>>>
>>>>
>>>> On Fri, Dec 2, 2016 at 11:34 PM, Cliff Resnick <[hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=10537&i=3>> wrote:
>>>>
>>>>> In tests comparing RocksDb to fs state backend we observe much lower
>>>>> throughput, around 10x slower. While the lowered throughput is expected,
>>>>> what's perplexing is that machine load is also very low with RocksDb,
>>>>> typically falling to  < 25% CPU and negligible IO wait (around 0.1%). Our
>>>>> test instances are EC2 c3.xlarge which are 4 virtual CPUs and 7.5G RAM,
>>>>> each running a single TaskManager in YARN, with 6.5G allocated memory per
>>>>> TaskManager. The instances also have 2x40G attached SSDs which we have
>>>>> mapped to `taskmanager.tmp.dir`.
>>>>>
>>>>> With FS state and 4 slots per TM, we will easily max out with an
>>>>> average load average around 5 or 6, so we actually need throttle down the
>>>>> slots to 3. With RocksDb using the Flink SSD configured options we see a
>>>>> load average at around 1. Also, load (and actual) throughput remain more 
>>>>> or
>>>>> less constant no matter how many slots we use. The weak load is spread 
>>>>> over
>>>>> all CPUs.
>>>>>
>>>>> Here is a sample top:
>>>>>
>>>>> Cpu0  : 20.5%us,  0.0%sy,  0.0%ni, 79.5%id,  0.0%wa,  0.0%hi,  0.0%si,
>>>>>  0.0%st
>>>>> Cpu1  : 18.5%us,  0.0%sy,  0.0%ni, 81.5%id,  0.0%wa,  0.0%hi,  0.0%si,
>>>>>  0.0%st
>>>>> Cpu2  : 11.6%us,  0.7%sy,  0.0%ni, 87.0%id,  0.7%wa,  0.0%hi,  0.0%si,
>>>>>  0.0%st
>>>>> Cpu3  : 12.5%us,  0.3%sy,  0.0%ni, 86.8%id,  0.0%wa,  0.0%hi,  0.3%si,
>>>>>  0.0%st
>>>>>
>>>>> Our pipeline uses tumbling windows, each with a ValueState keyed to a
>>>>> 3-tuple of one string and two ints.. Each ValueState comprises a small set
>>>>> of tuples around 5-7 fields each. The WindowFunction simply diffs agains
>>>>> the set and updates state if there is a diff.
>>>>>
>>>>> Any ideas as to what the bottleneck is here? Any suggestions welcomed!
>>>>>
>>>>> -Cliff
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-flink-user-mailing-list-archive.2336050.
> n4.nabble.com/Re-Resource-under-utilization-when-using-
> RocksDb-state-backend-SOLVED-tp10537.html
> To start a new topic under Apache Flink User Mailing List archive., email
> ml-node+s2336050n1...@n4.nabble.com
> To unsubscribe from Apache Flink User Mailing List archive., click here
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=dmluYXkxOC5wYXRpbEBnbWFpbC5jb218MXwxODExMDE2NjAx>
> .
> NAML
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Re-Resource-under-utilization-when-using-RocksDb-state-backend-SOLVED-tp10537p11678.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.

Re: Resource under-utilization when using RocksDb state backend [SOLVED]

Reply via email to