[ https://issues.apache.org/jira/browse/FLINK-10320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16622088#comment-16622088 ]
Piotr Nowojski commented on FLINK-10320: ---------------------------------------- Ok thank you for providing motivation :) You could try to setup the benchmark as you described (with some dummy {{TaskManagers}}). Other possible way to provide such benchmarks would be to just spam a JobMaster with some fake pre made RPC calls, without setting up any TaskManager. I’m not very familiar with JobMaster code to predict whether this approach would be easier/better. Regardless of the used approach just keep in mind following things while developing such benchmark: * benchmark should actually stress the thing that you want to test. Use code profiler to make sure that 90+% of CPU time is being spent in relevant JobMaster code and not in TaskManager/ResourceManager mocks or other irrelevant components * use/re-use as much as possible production code. If you are forced to write thousands lines of code to setup a benchmark, something is wrong (usually that means the code you are trying to benchmark is untestable as well) and it will be difficult to maintain such benchmark. * can’t we re-use the code from unit/integration tests? Even if there is no TestingTaskExecutor I’m guessing that there must exist some other places that test the thing that you want to benchmark? That’s often at least a good place to start - convert existing unit/integration tests into benchmarks. > Introduce JobMaster schedule micro-benchmark > -------------------------------------------- > > Key: FLINK-10320 > URL: https://issues.apache.org/jira/browse/FLINK-10320 > Project: Flink > Issue Type: Improvement > Components: Tests > Reporter: 陈梓立 > Assignee: 陈梓立 > Priority: Major > > Based on {{org.apache.flink.streaming.runtime.io.benchmark}} stuff and the > repo [flink-benchmark|https://github.com/dataArtisans/flink-benchmarks], I > proposal to introduce another micro-benchmark which focuses on {{JobMaster}} > schedule performance > h3. Target > Benchmark how long from {{JobMaster}} startup(receive the {{JobGraph}} and > init) to all tasks RUNNING. Technically we use bounded stream and TM finishes > tasks as soon as they arrived. So the real interval we measure is to all > tasks FINISHED. > h3. Case > 1. JobGraph that cover EAGER + PIPELINED edges > 2. JobGraph that cover LAZY_FROM_SOURCES + PIPELINED edges > 3. JobGraph that cover LAZY_FROM_SOURCES + BLOCKING edges > ps: maybe benchmark if the source is get from {{InputSplit}}? > h3. Implement > Based on the flink-benchmark repo, we finally run benchmark using jmh. So the > whole test suit is separated into two repos. The testing environment could be > located in the main repo, maybe under > flink-runtime/src/test/java/org/apache/flink/runtime/jobmaster/benchmark. > To measure the performance of {{JobMaster}} scheduling, we need to simulate > an environment that: > 1. has a real {{JobMaster}} > 2. has a mock/testing {{ResourceManager}} that having infinite resource and > react immediately. > 3. has a(many?) mock/testing {{TaskExecutor}} that deploy and finish tasks > immediately. > [~trohrm...@apache.org] [~GJL] [~pnowojski] could you please review this > proposal to help clarify the goal and concrete details? Thanks in advance. > Any suggestions are welcome. -- This message was sent by Atlassian JIRA (v7.6.3#76005)