I'm maintaining a highly-loaded proxy-like service, which serves huge 
amount of small rpc requests every day. Yesterday I profiled it, and found 
that runtime.netpoll took 8.5% cpu(runtime.mcall took 20% cpu).

There is only one global epoll fd in runtime, but every P will call 
netpoll. Inside kernel, a fd list, a rbtree and a lock will be associated 
to one epoll fd, so concurrent netpoll calls from many Ps may result in 
lock contention and low cache locality I guess.

Can we do the same optimization of timer to netpoller, to make epoll fd per 
P, let each P polls on its own epoll fd first and steals ready fds from 
other Ps if it has no work to do?

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/7f540f9b-a566-4841-b2fe-e5fd91cf6131%40googlegroups.com.

Reply via email to