This "smells" a lot like https://github.com/golang/go/issues/51654 - though 
the repro is with TLS.

This is very disruptive for us (MinIO) as well, seeing this at several 
customers. Obviously downgrading to Go 1.16.6 is less than optimal for us.

/Klaus

On Tuesday, 12 April 2022 at 14:55:41 UTC+2 Brian Candler wrote:

> Correct: "go run" builds a binary and then fork/execs the binary it has 
> built, to run it.
>
> On Tuesday, 12 April 2022 at 08:19:38 UTC+1 scorr...@gmail.com wrote:
>
>> By the way, the difference that I thought I saw between go run and the 
>> compiled version was because the pid of the process to look at in the below 
>> example should be 63668, the tmp file, not 63548, the go run call.
>>
>> $ ps aux | grep test
>> bill       63548  0.2  0.1 2044984 17588 pts/0   Sl+  09:15   0:00 go run 
>> test.go
>> bill       63668  155  0.0 1891292 12140 pts/0   Sl+  09:15   1:52 
>> /tmp/go-build910547164/b001/exe/test
>> bill       63750  0.0  0.0  11564   652 pts/1    S+   09:16   0:00 grep 
>> --color=auto test
>>
>> El martes, 12 de abril de 2022 a las 9:13:41 UTC+2, Santiago Corredoira 
>> escribió:
>>
>>> Hi, yes, all about the toy program is the same. It was an attempt to get 
>>> to a minimal example that anyone can reproduce so I could ask for help. The 
>>> real problem is with a larger program where I get this 100% CPU usages 
>>> ramdomly when the server is idle.
>>>
>>> I was thinking and with the basic example the nanosleep calls maybe is 
>>> not the scheduller signaling itself because:
>>>
>>>  - GODEBUG=asyncpreemptoff=1 has no effect
>>>  - It doesn't happen with Go 1.16
>>>  - It is random. If it where the normal working of the scheduler, 
>>> wouldn't it be predictable?
>>>
>>>
>>> El martes, 12 de abril de 2022 a las 8:59:06 UTC+2, Brian Candler 
>>> escribió:
>>>
>>>> At the start of the thread, you said "if I compile and run this simple 
>>>> program ..."
>>>>
>>>> Does the problem no longer reproduce with that?
>>>>
>>>> On Monday, 11 April 2022 at 22:59:55 UTC+1 scorr...@gmail.com wrote:
>>>>
>>>>> I tried with a more complex application and already got constant 100% 
>>>>> CPU usage twice. 
>>>>>
>>>>> When I attach strace I see the same as with the minimal demo but with 
>>>>> much more frecuency:
>>>>>
>>>>> [pid 42659] 23:36:49.035120 epoll_pwait(3, [], 128, 0, NULL, 
>>>>> 140341803388040) = 0
>>>>> [pid 42659] 23:36:49.035133 epoll_pwait(3, [], 128, 0, NULL, 
>>>>> 18704162493558) = 0
>>>>> [pid 42659] 23:36:49.035145 epoll_pwait(3, [], 128, 0, NULL, 
>>>>> 140341803388040) = 0
>>>>> [pid 42659] 23:36:49.035158 epoll_pwait(3, [], 128, 0, NULL, 
>>>>> 18704162493558) = 0
>>>>> [pid 42659] 23:36:49.035170 epoll_pwait(3, [], 128, 0, NULL, 
>>>>> 140341803388040) = 0
>>>>> [pid 42659] 23:36:49.035182 epoll_pwait(3, [], 128, 0, NULL, 
>>>>> 18704162493558) = 0
>>>>> [pid 42659] 23:36:49.035195 epoll_pwait(3, [], 128, 0, NULL, 
>>>>> 140341803388040) = 0
>>>>> [pid 42659] 23:36:49.035207 epoll_pwait(3, [], 128, 0, NULL, 
>>>>> 18704162493558) = 0
>>>>> [pid 42659] 23:36:49.035219 epoll_pwait(3, [], 128, 0, NULL, 
>>>>> 140341803388040) = 0
>>>>> [pid 42659] 23:36:49.035232 epoll_pwait(3, [], 128, 0, NULL, 
>>>>> 18704162493558) = 0
>>>>> [pid 42659] 23:36:49.035244 epoll_pwait(3, [], 128, 0, NULL, 
>>>>> 140341803388040) = 0
>>>>> [pid 42659] 23:36:49.035257 epoll_pwait(3, [], 128, 0, NULL, 
>>>>> 18704162493558) = 0
>>>>> [pid 42659] 23:36:49.035269 epoll_pwait(3, [], 128, 0, NULL, 
>>>>> 140341803388040) = 0
>>>>> [pid 42659] 23:36:49.035281 epoll_pwait(3, [], 128, 0, NULL, 
>>>>> 18704162493558) = 0
>>>>> [pid 42659] 23:36:49.035293 epoll_pwait(3, [], 128, 0, NULL, 
>>>>> 140341803388040) = 0
>>>>> [pid 42659] 23:36:49.035306 epoll_pwait(3, [], 128, 0, NULL, 
>>>>> 18704162493558) = 0
>>>>>
>>>>> The program is too large to post here. I have tested it with go race 
>>>>> and it doesn't run into any problem. Any idea of what to look for?
>>>>>
>>>>> Thank you!
>>>>>
>>>>> El lunes, 11 de abril de 2022 a las 17:08:40 UTC+2, Santiago 
>>>>> Corredoira escribió:
>>>>>
>>>>>> I tried with the stripped tmp version of "go run" and after a while I 
>>>>>> was able to reproduce it too. It definitely took me more time but 
>>>>>> probably 
>>>>>> is just random. 
>>>>>>
>>>>>> I tried "$ GODEBUG=asyncpreemptoff=1 ./test" and I kept seeing the 
>>>>>> the calls every 10ms. Is this expected to happen?
>>>>>>
>>>>>> I compiled the test with "$ go1.16 build test.go" and I tried hard to 
>>>>>> reproduce it but I couldn't. It is very strange because this behaviour 
>>>>>> was 
>>>>>> introduced in Go 1.14 and I only see it in Go 1.17. Also 
>>>>>> asyncpreemptoff=1 
>>>>>> seems to make no difference.
>>>>>>
>>>>>> I am going to work again with Go 1.18 by default to see if I get the 
>>>>>> 100% CPU usage.
>>>>>>
>>>>>> El lunes, 11 de abril de 2022 a las 13:28:48 UTC+2, Brian Candler 
>>>>>> escribió:
>>>>>>
>>>>>>> On Monday, 11 April 2022 at 09:26:28 UTC+1 scorr...@gmail.com wrote:
>>>>>>>
>>>>>>>> and the output keeps growing like that with every new curl request 
>>>>>>>> until eventually it starts constantly printing output like this:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> [pid 143818] 10:02:22.697120 nanosleep({tv_sec=0, 
>>>>>>>>> tv_nsec=10000000}, NULL) = 0
>>>>>>>>> [pid 143818] 10:02:22.707609 nanosleep({tv_sec=0, 
>>>>>>>>> tv_nsec=10000000}, NULL) = 0
>>>>>>>>> [pid 143818] 10:02:22.718125 nanosleep({tv_sec=0, 
>>>>>>>>> tv_nsec=10000000}, NULL) = 0
>>>>>>>>> [pid 143818] 10:02:22.728753 nanosleep({tv_sec=0, 
>>>>>>>>> tv_nsec=10000000}, NULL) = 0
>>>>>>>>> [pid 143818] 10:02:22.739195 nanosleep({tv_sec=0, 
>>>>>>>>> tv_nsec=10000000}, NULL) = 0
>>>>>>>>> [pid 143818] 10:02:22.749618 nanosleep({tv_sec=0, 
>>>>>>>>> tv_nsec=10000000}, NULL) = 0
>>>>>>>>> [pid 143818] 10:02:22.760160 nanosleep({tv_sec=0, 
>>>>>>>>> tv_nsec=10000000}, NULL) = 0
>>>>>>>>> [pid 143818] 10:02:22.770669 nanosleep({tv_sec=0, 
>>>>>>>>> tv_nsec=10000000}, NULL) = 0
>>>>>>>>> [pid 143818] 10:02:22.781137 nanosleep({tv_sec=0, 
>>>>>>>>> tv_nsec=10000000}, NULL) = 0
>>>>>>>>> [pid 143818] 10:02:22.791581 nanosleep({tv_sec=0, 
>>>>>>>>> tv_nsec=10000000}, NULL) = 0
>>>>>>>>> [pid 143818] 10:02:22.802060 nanosleep({tv_sec=0, 
>>>>>>>>> tv_nsec=10000000}, NULL) = 0
>>>>>>>>> [pid 143818] 10:02:22.812540 nanosleep({tv_sec=0, 
>>>>>>>>> tv_nsec=10000000}, NULL) = 0
>>>>>>>>> [pid 143818] 10:02:22.822981 nanosleep({tv_sec=0, 
>>>>>>>>> tv_nsec=10000000}, NULL) = 0
>>>>>>>>> [pid 143818] 10:02:22.833468 nanosleep({tv_sec=0, 
>>>>>>>>> tv_nsec=10000000}, NULL) = 0
>>>>>>>>>
>>>>>>>>>
>>>>>>> That is normal, I believe, for a process with multiple runnable 
>>>>>>> goroutines - one alarm/wakeup call every 10ms.
>>>>>>>
>>>>>>> What is odd is the 100% CPU utilisation.  Can you attach an strace 
>>>>>>> when that happens?
>>>>>>>
>>>>>>> As for the binary produced by "go run" - use "ps" to see where it 
>>>>>>> is.  For example:
>>>>>>>
>>>>>>> ==> sleepy.go <==
>>>>>>> package main
>>>>>>>
>>>>>>> import (
>>>>>>>     "sync"
>>>>>>>         "time"
>>>>>>> )
>>>>>>>
>>>>>>> func main() {
>>>>>>>     var wg sync.WaitGroup
>>>>>>>     wg.Add(1)
>>>>>>>         go func() {
>>>>>>>                 time.Sleep(30 * time.Second)
>>>>>>>                 wg.Done()
>>>>>>>         }()
>>>>>>>     wg.Wait()
>>>>>>> }
>>>>>>> ==> end <==
>>>>>>>
>>>>>>> $ go run sleepy.go
>>>>>>>
>>>>>>> ... in another window ...
>>>>>>> $ ps auxwww | grep sleepy
>>>>>>> ubuntu     17743  0.0  0.1 1237424 12428 pts/1   Sl+  11:23   0:00 
>>>>>>> go run sleepy.go
>>>>>>> ubuntu     17782  0.0  0.0 702468  1052 pts/1    Sl+  11:23   0:00 
>>>>>>> /tmp/go-build2834561198/b001/exe/sleepy
>>>>>>>
>>>>>>> You can 'cp' it somewhere to keep it safe.
>>>>>>>
>>>>>>> If I compare this with the output of "go build sleepy.go", I see 
>>>>>>> that the "go build" version is larger and unstripped:
>>>>>>>
>>>>>>> $ cp /tmp/go-build2834561198/b001/exe/sleepy sleepy-run
>>>>>>> $ file sleepy-run
>>>>>>> sleepy-run: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), 
>>>>>>> statically linked, stripped
>>>>>>> $ go build sleepy.go
>>>>>>> $ file sleepy
>>>>>>> sleepy: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), 
>>>>>>> statically linked, not stripped
>>>>>>>
>>>>>>> Then "strip sleepy" gives me a binary which is almost, but not 
>>>>>>> quite, the same size as the one from "go run".  So that gives you 
>>>>>>> another 
>>>>>>> way to try to reproduce the problem - although if the difference is 
>>>>>>> just 
>>>>>>> between stripped and unstripped, then it's most likely some sort of 
>>>>>>> edge 
>>>>>>> timing case.
>>>>>>>
>>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/e2fa169b-145a-4915-b813-c153e8a0e43an%40googlegroups.com.

Reply via email to