The system has 1GB of swap, and I just tried enabling: sysctl -w vm.overcommit_memory=1 sysctl -w vm.swappiness=1
according to https://stackoverflow.com/questions/35025338/cannot-allocate-memory-error, with no effect. There's 64GB of RAM, which is not being used at the time of error: top - 16:47:14 up 2 days, 4:35, 4 users, load average: 1.43, 1.35, 1.12 Tasks: 32606 total, 1 running, 531 sleeping, 0 stopped, 32074 zombie Cpu(s): 0.7%us, 14.0%sy, 0.0%ni, 80.6%id, 3.3%wa, 0.0%hi, 1.4%si, 0.0% st Mem: 66067872k total, 43270952k used, 22796920k free, 5684k buffers Swap: 1023996k total, 332k used, 1023664k free, 26004700k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1401 root 20 0 898m 26m 2936 S 93.6 0.0 9:30.36 pdm 6494 root 20 0 38280 25m 880 R 20.1 0.0 1:20.34 top 6145 root 20 0 112m 5468 4428 S 0.0 0.0 0:00.05 sshd 29324 root 20 0 112m 5092 4056 S 0.0 0.0 0:00.60 sshd 2720 root 20 0 114m 3284 616 S 0.3 0.0 0:43.29 sshd 2377 nobody 20 0 152m 3148 668 S 0.0 0.0 0:06.24 gmond 9290 postfix 20 0 79868 2852 1996 S 0.0 0.0 0:00.01 pickup 29336 root 20 0 106m 1916 1432 S 0.0 0.0 0:00.03 bash 6155 root 20 0 106m 1888 1428 S 0.0 0.0 0:00.02 bash 2575 root 20 0 379m 1880 684 S 0.0 0.0 0:00.89 automount 2240 haldaemo 20 0 38224 1744 632 S 0.0 0.0 0:01.58 hald 30608 root 20 0 112m 1672 636 S 0.0 0.0 0:00.24 sshd 2724 root 20 0 106m 1416 912 S 0.0 0.0 0:00.31 bash 2467 postfix 20 0 80036 1404 496 S 0.0 0.0 0:00.18 qmgr 30634 root 20 0 106m 1376 868 S 0.0 0.0 0:00.13 bash 2456 root 20 0 80000 1352 460 S 0.0 0.0 0:00.74 master 1911 root 20 0 245m 1300 588 S 0.0 0.0 0:00.27 rsyslogd 2366 ntp 20 0 25440 1160 588 S 0.0 0.0 0:00.24 ntpd 2323 nscd 20 0 877m 1148 644 S 0.0 0.0 0:02.14 nscd 2357 root 20 0 90848 1052 392 S 0.0 0.0 0:00.01 sshd 1628 root 18 -2 11260 1048 232 S 0.0 0.0 0:00.00 udevd 2468 root 20 0 112m 980 392 S 0.0 0.0 0:00.40 crond pprof is showing around 2MB memory used all the time. I'm thinking the problem is somewhere here, since it goes away when I disable this part of the app: https://github.com/sdsc/pdm/blob/3b5c7fcef24e9081f3bd0608efde1bdc10a65d17/lustre_backend.go#L73 Also I thought I'm creating too many goroutines too fast, and I just rewrote this part to use no goroutines and channels and return a simple slice, with no good effect: https://github.com/sdsc/pdm/blob/master/lustre_backend.go#L73 What I'm wondering about is the time it takes to get the error - very close to 10 minutes all the time. Not even dependent on the number of workers (I have a setting for that and trying with 1-5 workers) On Thursday, June 1, 2017 at 5:54:51 PM UTC-7, Dave Cheney wrote: > > Does the machine (vm, container, etc) you are running this application > have any swap configured. > > What I think is happening is the momentary spike in potential memory usage > during the fork / exec cycle is causing the linux memory system to refuse > to permit the fork. There are several open issues for this, search clone or > vfork on the github.com/golang/go project, but the short version is; add > swap. > > On Friday, 2 June 2017 10:41:10 UTC+10, Dmitry Mishin wrote: >> >> Hello all, >> >> Trying to fix the out of memory issue in my go app. >> >> The app is working fine with no memory usage increasing for 10-15 >> minutes, then suddenly system starts giving "cannot allocate memory" error >> for any command, same in the app. >> >> I attached the trace and heap during the out of memory state >> >> The app is finding files and folders in lustre mount, submitting those as >> amqp messages, and copying the discovered files >> >> Thanks for your help! >> > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.