[ https://issues.apache.org/jira/browse/KAFKA-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15896321#comment-15896321 ]
ASF GitHub Bot commented on KAFKA-4843: --------------------------------------- GitHub user enothereska opened a pull request: https://github.com/apache/kafka/pull/2643 KAFKA-4843: More efficient round-robin scheduler - Improves streams efficiency by more than 200K requests/second (small 100 byte requests) - Gets streams efficiency very close to pure consumer (see results in https://jenkins.confluent.io/job/system-test-kafka-branch-builder/746/console) - Maintains same fairness across tasks - Schedules all records in the queue in-between poll() calls, not just one per task. You can merge this pull request into a Git repository by running: $ git pull https://github.com/enothereska/kafka minor-schedule-round-robin Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/2643.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2643 ---- commit c3f9a1756e7b7f0e2869853bde2e249fb3f1f6d9 Author: Eno Thereska <e...@confluent.io> Date: 2017-03-05T08:09:51Z More efficient round robin commit 138a491f743d5ed7017c415c9f50f974f16c8567 Author: Eno Thereska <e...@confluent.io> Date: 2017-03-05T10:05:38Z Tighter loop commit caba483760eb47304e50589f66d396a2afdf0f4e Author: Eno Thereska <e...@confluent.io> Date: 2017-03-05T14:16:33Z Increased records further commit aaa14d1c95bfea0f7681f1b5686e0bc6736b13ee Author: Eno Thereska <e...@confluent.io> Date: 2017-03-05T15:00:06Z Temporary reduce number of tests for quick branch builder turnaround commit 6c616addbc86f23e9f7311f6ac1cc2e5c92152ee Author: Eno Thereska <e...@confluent.io> Date: 2017-03-05T15:24:29Z Re-enable full tests ---- > Stream round-robin scheduler is inneficient > ------------------------------------------- > > Key: KAFKA-4843 > URL: https://issues.apache.org/jira/browse/KAFKA-4843 > Project: Kafka > Issue Type: Improvement > Components: streams > Affects Versions: 0.10.2.0 > Reporter: Eno Thereska > Assignee: Eno Thereska > Fix For: 0.11.0.0 > > > Currently StreamThread.runloop() uses a simple round-robin scheduler, where a > single request is taken from each task for processing, followed by poll, > followed by the same process over again. For example, for an app that has > just 2 tasks each with 3 records ready to be processed we'd have the > following sequence > poll() -> process 1 request for task T1 -> process 1 request for task T2 -> > poll() > -> process 1 request for task T1 -> process 1 request for task T2 -> poll() > -> process 1 request for task T1 -> process 1 request for task T2 -> poll() > This is quite inefficient. Instead, a better round robin scheduler would do: > poll() -> process all 3 requests for task T1 -> process all 3 requests for > task T2 -> poll() -- This message was sent by Atlassian JIRA (v6.3.15#6346)