Hi, I encountered a wiered problem recently.
*Environment* Arch: x86_64 CPU Vendor: hygon (AMD Zen1 OEM) (www.hygon.cn) (Hygon C86 7285) OS: CentOS 7.6 Kernel: 3.10.0-957 Go version: 1.15.6 *What happened* Our pure go program, named controllerd, stopped working except looping on one cpu core (from top command, it consumed one cpu core). Before it stopped working, it runs from mid April to Jun 12. After profiling the process with perf command, the perf data shows the program looped to call __vdso_clock_gettime . 64.20% controllerd [vdso] [.] __vdso_clock_gettime 8.82% controllerd controllerd [.] runtime.procyield 8.82% controllerd controllerd [.] runtime.suspendG 8.01% controllerd controllerd [.] runtime.nanotime1 1.47% controllerd [kernel.kallsyms] [k] __enqueue_entity 0.79% controllerd [kernel.kallsyms] [k] system_call_after_swapgs 0.79% controllerd [kernel.kallsyms] [k] set_next_entity 0.68% controllerd [kernel.kallsyms] [k] _raw_gspin_lock 0.68% controllerd [kernel.kallsyms] [k] change_pte_range 0.65% controllerd [kernel.kallsyms] [k] auditsys 0.57% controllerd [kernel.kallsyms] [k] update_curr 0.45% controllerd [kernel.kallsyms] [k] native_sched_clock 0.45% controllerd [kernel.kallsyms] [k] cpuacct_charge 0.45% controllerd [kernel.kallsyms] [k] __x86_indirect_thunk_+rax 0.45% controllerd [kernel.kallsyms] [k] __audit_syscall_exit 0.34% controllerd [kernel.kallsyms] [k] pick_next_task_fail 0.23% controllerd controllerd [.] runtime.osyield 0.23% controllerd [kernel.kallsyms] [k] native_queuedc_spin_lock_slowpath 0.23% controllerd [kernel.kallsyms] [k] dput 0.23% controllerd [kernel.kallsyms] [k] __schedule 0.23% controllerd [kernel.kallsyms] [k] put_prev_task_fair 0.23% controllerd [kernel.kallsyms] [k] yield_task_fair 0.22% controllerd [kernel.kallsyms] [k] sys_sched_yield 0.11% controllerd [kernel.kallsyms] [k] clear_buddies 0.11% controllerd [kernel.kallsyms] [k] update_rq_clock.part.78 0.11% controllerd [kernel.kallsyms] [k] update_min_vruntime 0.11% controllerd [kernel.kallsyms] [k] tick_do_update_jiffies64 0.11% controllerd [kernel.kallsyms] [k] system_call 0.11% controllerd [kernel.kallsyms] [k] rb_next 0.11% controllerd [kernel.kallsyms] [k] rb_insert_color 0.00% controllerd controllerd [.] runtime.notesleep 0.00% controllerd [kernel.kallsyms] [k] load_balance 0.00% controllerd controllerd [.] runtime.runggrab 0.00% controllerd controllerd [.] runtime.findrunnable 0.00% controllerd [kernel.kallsyms] [k] update_numa_stats 0.00% controllerd [kernel.kallsyms] [k] __lock_task_sighand 0.00% controllerd [kernel.kallsyms] [k] idle_cpu 0.00% controllerd [kernel.kallsyms] [k] kmem_cache_free_bulk 0.00% controllerd [kernel.kallsyms] [k] task_numa_find_cpu 0.00% controllerd [kernel.kallsyms] [k] __queue_work 0.00% controllerd [kernel.kallsyms] [k] __perf_event_task_sched_in 0.00% controllerd [kernel.kallsyms] [k] finish_task_switch 0.00% controllerd [kernel.kallsyms] [k] perf_ctx_unlock 0.00% controllerd [kernel.kallsyms] [k] native_write_msr_safe 0.00% controllerd [kernel.kallsyms] [k] __perf_event_enable B.T.W, perf command is: perf record -o controllerd_perf.data -F 99 -p PID sleep 10 I'm not familiar with Go's runtime detail. But after getting the perf data and reading code of function suspendG of Go's runtime. I guessed: The program looping in suspendG and failed to find any other goroutine to execute. *What was did* - Firstly, I suspected the clock source was not set correctly on the system. But it's set to tsc, and it's right. - Then, I tried to send SIGUSR1 to the program to see if it could be revived (the controllerd program can be triggered with SIGUSR1 to flush logs), but nothing changed. - Finally, we have to restart the program to recover. *Ask for help* 1. Any body know the reason of the problem? 2. What can I do to get more information if this problem happened again? Thanks. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/1552be0a-3c36-4561-8fd0-2e649195a269n%40googlegroups.com.