Great to hear! On Fri, 9 Dec 2016 at 01:02 Cliff Resnick <cre...@gmail.com> wrote:
> It turns out that most of the time in RocksDBFoldingState was spent on > serialization/deserializaton. RocksDb read/write was performing well. By > moving from Kryo to custom serialization we were able to increase > throughput dramatically. Load is now where it should be. > > On Mon, Dec 5, 2016 at 1:15 PM, Robert Metzger <rmetz...@apache.org> > wrote: > > Another Flink user using RocksDB with large state on SSDs recently posted > this video for oprimizing the performance of Rocks on SSDs: > https://www.youtube.com/watch?v=pvUqbIeoPzM > That could be relevant for you. > > For how long did you look at iotop. It could be that the IO access happens > in bursts, depending on how data is cached. > > I'll also add Stefan Richter to the conversation, he has maybe some more > ideas what we can do here. > > > On Mon, Dec 5, 2016 at 6:19 PM, Cliff Resnick <cre...@gmail.com> wrote: > > Hi Robert, > > We're following 1.2-SNAPSHOT, using event time. I have tried "iotop" and > I see usually less than 1 % IO. The most I've seen was a quick flash here > or there of something substantial (e.g. 19%, 52%) then back to nothing. I > also assumed we were disk-bound, but to use your metaphor I'm having > trouble finding any smoke. However, I'm not very experienced in sussing out > IO issues so perhaps there is something else I'm missing. > > I'll keep investigating. If I continue to come up empty then I guess my > next steps may be to stage some independent tests directly against RocksDb. > > -Cliff > > > On Mon, Dec 5, 2016 at 5:52 AM, Robert Metzger <rmetz...@apache.org> > wrote: > > Hi Cliff, > > which Flink version are you using? > Are you using Eventtime or processing time windows? > > I suspect that your disks are "burning" (= your job is IO bound). Can you > check with a tool like "iotop" how much disk IO Flink is producing? > Then, I would set this number in relation with the theoretical maximum of > your SSD's (a good rough estimate is to use dd for that). > > If you find that your disk bandwidth is saturated by Flink, you could look > into tuning the RocksDB settings so that it uses more memory for caching. > > Regards, > Robert > > > On Fri, Dec 2, 2016 at 11:34 PM, Cliff Resnick <cre...@gmail.com> wrote: > > In tests comparing RocksDb to fs state backend we observe much lower > throughput, around 10x slower. While the lowered throughput is expected, > what's perplexing is that machine load is also very low with RocksDb, > typically falling to < 25% CPU and negligible IO wait (around 0.1%). Our > test instances are EC2 c3.xlarge which are 4 virtual CPUs and 7.5G RAM, > each running a single TaskManager in YARN, with 6.5G allocated memory per > TaskManager. The instances also have 2x40G attached SSDs which we have > mapped to `taskmanager.tmp.dir`. > > With FS state and 4 slots per TM, we will easily max out with an average > load average around 5 or 6, so we actually need throttle down the slots to > 3. With RocksDb using the Flink SSD configured options we see a load > average at around 1. Also, load (and actual) throughput remain more or less > constant no matter how many slots we use. The weak load is spread over all > CPUs. > > Here is a sample top: > > Cpu0 : 20.5%us, 0.0%sy, 0.0%ni, 79.5%id, 0.0%wa, 0.0%hi, 0.0%si, > 0.0%st > Cpu1 : 18.5%us, 0.0%sy, 0.0%ni, 81.5%id, 0.0%wa, 0.0%hi, 0.0%si, > 0.0%st > Cpu2 : 11.6%us, 0.7%sy, 0.0%ni, 87.0%id, 0.7%wa, 0.0%hi, 0.0%si, > 0.0%st > Cpu3 : 12.5%us, 0.3%sy, 0.0%ni, 86.8%id, 0.0%wa, 0.0%hi, 0.3%si, > 0.0%st > > Our pipeline uses tumbling windows, each with a ValueState keyed to a > 3-tuple of one string and two ints.. Each ValueState comprises a small set > of tuples around 5-7 fields each. The WindowFunction simply diffs agains > the set and updates state if there is a diff. > > Any ideas as to what the bottleneck is here? Any suggestions welcomed! > > -Cliff > > > > > > > > > > >