Hi Cliff, It will be really helpful if you could share your RocksDB configuration.
I am also running on c3.4xlarge EC2 instances backed by SSD's . I had tried with FLASH_SSD_OPTIMIZED option which works great but somehow the pipeline stops in between and the overall processing time increases, I tried to set different values as mentioned in this video, but somehow I am not getting it right, the TM's is getting killed after sometime. Regards, Vinay Patil On Thu, Dec 8, 2016 at 10:19 PM, Cliff Resnick [via Apache Flink User Mailing List archive.] <ml-node+s2336050n10537...@n4.nabble.com> wrote: > It turns out that most of the time in RocksDBFoldingState was spent on > serialization/deserializaton. RocksDb read/write was performing well. By > moving from Kryo to custom serialization we were able to increase > throughput dramatically. Load is now where it should be. > > On Mon, Dec 5, 2016 at 1:15 PM, Robert Metzger <[hidden email] > <http:///user/SendEmail.jtp?type=node&node=10537&i=0>> wrote: > >> Another Flink user using RocksDB with large state on SSDs recently posted >> this video for oprimizing the performance of Rocks on SSDs: >> https://www.youtube.com/watch?v=pvUqbIeoPzM >> That could be relevant for you. >> >> For how long did you look at iotop. It could be that the IO access >> happens in bursts, depending on how data is cached. >> >> I'll also add Stefan Richter to the conversation, he has maybe some more >> ideas what we can do here. >> >> >> On Mon, Dec 5, 2016 at 6:19 PM, Cliff Resnick <[hidden email] >> <http:///user/SendEmail.jtp?type=node&node=10537&i=1>> wrote: >> >>> Hi Robert, >>> >>> We're following 1.2-SNAPSHOT, using event time. I have tried "iotop" >>> and I see usually less than 1 % IO. The most I've seen was a quick flash >>> here or there of something substantial (e.g. 19%, 52%) then back to >>> nothing. I also assumed we were disk-bound, but to use your metaphor I'm >>> having trouble finding any smoke. However, I'm not very experienced in >>> sussing out IO issues so perhaps there is something else I'm missing. >>> >>> I'll keep investigating. If I continue to come up empty then I guess my >>> next steps may be to stage some independent tests directly against RocksDb. >>> >>> -Cliff >>> >>> >>> On Mon, Dec 5, 2016 at 5:52 AM, Robert Metzger <[hidden email] >>> <http:///user/SendEmail.jtp?type=node&node=10537&i=2>> wrote: >>> >>>> Hi Cliff, >>>> >>>> which Flink version are you using? >>>> Are you using Eventtime or processing time windows? >>>> >>>> I suspect that your disks are "burning" (= your job is IO bound). Can >>>> you check with a tool like "iotop" how much disk IO Flink is producing? >>>> Then, I would set this number in relation with the theoretical maximum >>>> of your SSD's (a good rough estimate is to use dd for that). >>>> >>>> If you find that your disk bandwidth is saturated by Flink, you could >>>> look into tuning the RocksDB settings so that it uses more memory for >>>> caching. >>>> >>>> Regards, >>>> Robert >>>> >>>> >>>> On Fri, Dec 2, 2016 at 11:34 PM, Cliff Resnick <[hidden email] >>>> <http:///user/SendEmail.jtp?type=node&node=10537&i=3>> wrote: >>>> >>>>> In tests comparing RocksDb to fs state backend we observe much lower >>>>> throughput, around 10x slower. While the lowered throughput is expected, >>>>> what's perplexing is that machine load is also very low with RocksDb, >>>>> typically falling to < 25% CPU and negligible IO wait (around 0.1%). Our >>>>> test instances are EC2 c3.xlarge which are 4 virtual CPUs and 7.5G RAM, >>>>> each running a single TaskManager in YARN, with 6.5G allocated memory per >>>>> TaskManager. The instances also have 2x40G attached SSDs which we have >>>>> mapped to `taskmanager.tmp.dir`. >>>>> >>>>> With FS state and 4 slots per TM, we will easily max out with an >>>>> average load average around 5 or 6, so we actually need throttle down the >>>>> slots to 3. With RocksDb using the Flink SSD configured options we see a >>>>> load average at around 1. Also, load (and actual) throughput remain more >>>>> or >>>>> less constant no matter how many slots we use. The weak load is spread >>>>> over >>>>> all CPUs. >>>>> >>>>> Here is a sample top: >>>>> >>>>> Cpu0 : 20.5%us, 0.0%sy, 0.0%ni, 79.5%id, 0.0%wa, 0.0%hi, 0.0%si, >>>>> 0.0%st >>>>> Cpu1 : 18.5%us, 0.0%sy, 0.0%ni, 81.5%id, 0.0%wa, 0.0%hi, 0.0%si, >>>>> 0.0%st >>>>> Cpu2 : 11.6%us, 0.7%sy, 0.0%ni, 87.0%id, 0.7%wa, 0.0%hi, 0.0%si, >>>>> 0.0%st >>>>> Cpu3 : 12.5%us, 0.3%sy, 0.0%ni, 86.8%id, 0.0%wa, 0.0%hi, 0.3%si, >>>>> 0.0%st >>>>> >>>>> Our pipeline uses tumbling windows, each with a ValueState keyed to a >>>>> 3-tuple of one string and two ints.. Each ValueState comprises a small set >>>>> of tuples around 5-7 fields each. The WindowFunction simply diffs agains >>>>> the set and updates state if there is a diff. >>>>> >>>>> Any ideas as to what the bottleneck is here? Any suggestions welcomed! >>>>> >>>>> -Cliff >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>> >> > > > ------------------------------ > If you reply to this email, your message will be added to the discussion > below: > http://apache-flink-user-mailing-list-archive.2336050. > n4.nabble.com/Re-Resource-under-utilization-when-using- > RocksDb-state-backend-SOLVED-tp10537.html > To start a new topic under Apache Flink User Mailing List archive., email > ml-node+s2336050n1...@n4.nabble.com > To unsubscribe from Apache Flink User Mailing List archive., click here > <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=dmluYXkxOC5wYXRpbEBnbWFpbC5jb218MXwxODExMDE2NjAx> > . > NAML > <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Re-Resource-under-utilization-when-using-RocksDb-state-backend-SOLVED-tp10537p11678.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.