I'm maintaining a highly-loaded proxy-like service, which serves huge amount of small rpc requests every day. Yesterday I profiled it, and found that runtime.netpoll took 8.5% cpu(runtime.mcall took 20% cpu).
There is only one global epoll fd in runtime, but every P will call netpoll. Inside kernel, a fd list, a rbtree and a lock will be associated to one epoll fd, so concurrent netpoll calls from many Ps may result in lock contention and low cache locality I guess. Can we do the same optimization of timer to netpoller, to make epoll fd per P, let each P polls on its own epoll fd first and steals ready fds from other Ps if it has no work to do? -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/7f540f9b-a566-4841-b2fe-e5fd91cf6131%40googlegroups.com.