I may have found the root cause of this issue. Many thanks to Igor for the valuable inputs
The issue had something to do with the fact that I was calling nginx -s reload from python subprocess module and I believe the error was coming from the fork in python https://stackoverflow.com/questions/1367373/python-subprocess-popen-oserror-errno-12-cannot-allocate-memory https://stackoverflow.com/questions/5306075/python-memory-allocation-error-using-subprocess-popen As Igor suggested I changed the subprocess call in python to signal (SIGHUP to master binary) and the logs don't have the ENOMEM error anymore, at least in the past 12+ hours The memory usage for ~10k virtual hosts are a bit high though, but things do work as expected and no more errors Thank you all again On Wed, Aug 8, 2018 at 9:03 AM Anoop Alias <[email protected]> wrote: > Hi Igor, > > Yes the server runs other software including httpd with a similar number > of vhost > > # grep "<VirtualHost" /etc/apache2/conf/httpd.conf|wc -l > 5168 > > I haven't found issue with the other softwares in the logs relating to > memory > > Infact httpd (event mpm) use lesser memory to load similar config > > # ps_mem| head -1 && ps_mem |grep httpd > Private + Shared = RAM used Program > 585.6 MiB + 392.0 MiB = 977.6 MiB httpd (63) > > # ps_mem| head -1 && ps_mem |grep nginx > Private + Shared = RAM used Program > 999.8 MiB + 2.5 GiB = 3.5 GiB nginx (3) > > The server is a shared hosting one and runs CloudLinux , but as far as I > know ,CloudLinux applies limits to only user level process and not nginx > > The nginx HUP is needed as this is triggered by changes in apache > configuration and nginx need to reload the new config . For log file reload > SIGUSR1 is used > > > > > > On Tue, Aug 7, 2018 at 5:50 PM Igor A. Ippolitov <[email protected]> > wrote: > >> Anoop, >> >> I don't see any troubles with your configuration. >> Also, if you have 120G of RAM and a single worker - the problem is not in >> nginx. >> Do you have other software running on the host? >> >> Basically, you just run out of memory. >> >> You can optimize your reload though: use "service nginx reload" (or "kill >> -SIGHUP") to reload nginx configuration. >> When you do nginx -s reload - you make nginx parse configuration (and it >> requires memory) and then send a signal to the running master. You can >> avoid this overhead with 'service' command as it uses 'kill' documented in >> the manual page. >> >> On 06.08.2018 22:55, Anoop Alias wrote: >> >> Hi Igor, >> >> Config is reloaded using >> >> /usr/sbin/nginx -s reload >> >> this is invoked from a python/shell script ( Nginx is installed on a web >> control panel ) >> >> The top-level Nginx config is in the gist below >> >> https://gist.github.com/AnoopAlias/ba5ad6749a586c7e267672ee65b32b3a >> >> It further includes ~8k server blocks or more in some servers. Out of >> this 2/3 are server {} blocks with TLS config and 1/3 non-TLS ones >> >> ]# pwd >> /etc/nginx/sites-enabled >> # grep "server {" *|wc -l >> 7886 >> >> And yes most of them are very similar and mostly proxy to upstream httpd >> >> I have tried removing all the loadable modules and even tried an older >> version of nginx and all produce the error >> >> >> # numastat -m >> >> Per-node system memory usage (in MBs): >> Node 0 Node 1 Total >> --------------- --------------- --------------- >> MemTotal 65430.84 65536.00 130966.84 >> MemFree 5491.26 40.89 5532.15 >> MemUsed 59939.58 65495.11 125434.69 >> Active 22295.61 21016.09 43311.70 >> Inactive 8742.76 4662.48 13405.24 >> Active(anon) 16717.10 16572.19 33289.29 >> Inactive(anon) 2931.94 1388.14 4320.08 >> Active(file) 5578.50 4443.91 10022.41 >> Inactive(file) 5810.82 3274.34 9085.16 >> Unevictable 0.00 0.00 0.00 >> Mlocked 0.00 0.00 0.00 >> Dirty 7.04 1.64 8.67 >> Writeback 0.00 0.00 0.00 >> FilePages 18458.93 10413.97 28872.90 >> Mapped 862.14 413.38 1275.52 >> AnonPages 12579.49 15264.37 27843.86 >> Shmem 7069.52 2695.71 9765.23 >> KernelStack 18.34 3.03 21.38 >> PageTables 153.14 107.77 260.90 >> NFS_Unstable 0.00 0.00 0.00 >> Bounce 0.00 0.00 0.00 >> WritebackTmp 0.00 0.00 0.00 >> Slab 4830.68 2254.55 7085.22 >> SReclaimable 2061.05 921.72 2982.77 >> SUnreclaim 2769.62 1332.83 4102.45 >> AnonHugePages 4.00 2.00 6.00 >> HugePages_Total 0.00 0.00 0.00 >> HugePages_Free 0.00 0.00 0.00 >> HugePages_Surp 0.00 0.00 0.00 >> >> >> Thanks, >> >> >> >> >> >> On Mon, Aug 6, 2018 at 6:33 PM Igor A. Ippolitov <[email protected]> >> wrote: >> >>> Anoop, >>> >>> I suppose, most of your 10k servers are very similar, right? >>> Please, post top level configuration and a typical server{}, please. >>> >>> Also, how do you reload configuration? With 'service nginx reload' or >>> may be other commands? >>> >>> It looks like you have a lot of fragmented memory and only 4gb free in >>> the second numa node. >>> So, I'd say this is OK that you are getting errors from allocating a 16k >>> stripes. >>> >>> Could you please post numastat -m output additionally. Just to make sure >>> you have half of the memory for the second CPU. >>> And we'll have a look if memory utilization may be optimized based on >>> your configuration. >>> >>> Regards, >>> Igor. >>> >>> On 04.08.2018 07:54, Anoop Alias wrote: >>> >>> Hi Igor, >>> >>> Setting vm.max_map_count to 20x the normal value did not help >>> >>> The issue happens on a group of servers and among the group, it shows up >>> only in servers which have ~10k server{} blocks >>> >>> On servers that have lower number of server{} blocks , the ENOMEM issue >>> is not there >>> >>> Also, I can find that the RAM usage of the Nginx process is directly >>> proportional to the number of server {} blocks >>> >>> For example on a server having the problem >>> >>> # ps_mem| head -1 && ps_mem |grep nginx >>> Private + Shared = RAM used Program >>> 1.0 GiB + 2.8 GiB = 3.8 GiB nginx (3) >>> >>> >>> That is for a single worker process with 4 threads in thread_pool >>> # pstree|grep nginx >>> |-nginx-+-nginx---4*[{nginx}] >>> | `-nginx >>> >>> Whatever config change I try the memory usage seem to mostly depend on >>> the number of server contexts defined >>> >>> Now the issue mostly happen in nginx reload ,when one more worker >>> process will be active in shutting down mode >>> >>> I believe the memalign error is thrown by the worker being shutdown, >>> this is because the sites work after the error and also the pid mentioned >>> in the error would have gone when I check ps >>> >>> >>> # pmap 948965|grep 16K >>> 00007f2923ff2000 16K r-x-- ngx_http_redis2_module.so >>> 00007f2924fd7000 16K r---- libc-2.17.so >>> 00007f2925431000 16K rw--- [ anon ] >>> 00007f292584a000 16K rw--- [ anon ] >>> >>> Aug 4 05:50:00 b kernel: SysRq : Show Memory >>> Aug 4 05:50:00 b kernel: Mem-Info: >>> Aug 4 05:50:00 b kernel: active_anon:7757394 inactive_anon:1021319 >>> isolated_anon:0#012 active_file:3733324 inactive_file:2136476 isolated_ >>> file:0#012 unevictable:0 dirty:1766 writeback:6 wbtmp:0 unstable:0#012 >>> slab_reclaimable:2003687 slab_unreclaimable:901391#012 mapped:316734 >>> shmem:2381810 pagetables:63163 bounce:0#012 free:4851283 free_pcp:11332 >>> free_cma:0 >>> Aug 4 05:50:00 bravo kernel: Node 0 DMA free:15888kB min:8kB low:8kB >>> high:12kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_ >>> file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB >>> present:15972kB managed:15888kB mlocked:0kB dirty:0kB writeback:0kB >>> mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB >>> kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB >>> local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 >>> all_unreclaimable? yes >>> Aug 4 05:50:00 b kernel: lowmem_reserve[]: 0 1679 64139 64139 >>> >>> # cat /proc/buddyinfo >>> Node 0, zone DMA 0 0 1 0 2 1 1 >>> 0 1 1 3 >>> Node 0, zone DMA32 5284 6753 6677 1083 410 59 1 >>> 0 0 0 0 >>> Node 0, zone Normal 500327 638958 406737 14690 872 106 11 >>> 0 0 0 0 >>> Node 1, zone Normal 584840 291640 188 0 0 0 0 >>> 0 0 0 0 >>> >>> >>> The only correlation I see in having the error is the number of >>> server {} blocks (close to 10k) which then makes the nginx process consume >>> ~ 4GB of mem with a single worker process and then a reload is done >>> >>> >>> >>> >>> On Thu, Aug 2, 2018 at 6:02 PM Igor A. Ippolitov <[email protected]> >>> wrote: >>> >>>> Anoop, >>>> >>>> There are two guesses: either mmap allocations limit is hit or memory >>>> is way too fragmented. >>>> Could you please track amount of mapped regions for a worker with pmap >>>> and amount of 16k areas in Normal zones (it is the third number)? >>>> >>>> You can also set vm.max_map_count to a higher number (like 20 times >>>> higher than default) and look if the error is gone. >>>> >>>> Please, let me know if increasing vm.max_map_count helps you. >>>> >>>> On 02.08.2018 13:06, Anoop Alias wrote: >>>> >>>> Hi Igor, >>>> >>>> The error happens randomly >>>> >>>> 2018/08/02 06:52:42 [emerg] 874514#874514: posix_memalign(16, 16384) >>>> failed (12: Cannot allocate memory) >>>> 2018/08/02 09:42:53 [emerg] 872996#872996: posix_memalign(16, 16384) >>>> failed (12: Cannot allocate memory) >>>> 2018/08/02 10:16:14 [emerg] 877611#877611: posix_memalign(16, 16384) >>>> failed (12: Cannot allocate memory) >>>> 2018/08/02 10:16:48 [emerg] 879410#879410: posix_memalign(16, 16384) >>>> failed (12: Cannot allocate memory) >>>> 2018/08/02 10:17:55 [emerg] 876563#876563: posix_memalign(16, 16384) >>>> failed (12: Cannot allocate memory) >>>> 2018/08/02 10:20:21 [emerg] 879263#879263: posix_memalign(16, 16384) >>>> failed (12: Cannot allocate memory) >>>> 2018/08/02 10:20:51 [emerg] 878991#878991: posix_memalign(16, 16384) >>>> failed (12: Cannot allocate memory) >>>> >>>> # date >>>> Thu Aug 2 10:58:48 BST 2018 >>>> >>>> ------------------------------------------ >>>> # cat /proc/buddyinfo >>>> Node 0, zone DMA 0 0 1 0 2 1 1 >>>> 0 1 1 3 >>>> Node 0, zone DMA32 11722 11057 4663 1647 609 72 10 >>>> 7 1 0 0 >>>> Node 0, zone Normal 755026 710760 398136 21462 1114 18 1 >>>> 0 0 0 0 >>>> Node 1, zone Normal 341295 801810 179604 256 0 0 0 >>>> 0 0 0 0 >>>> ----------------------------------------- >>>> >>>> >>>> slabinfo - version: 2.1 >>>> # name <active_objs> <num_objs> <objsize> <objperslab> >>>> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata >>>> <active_slabs> <num_slabs> <sharedavail> >>>> SCTPv6 21 21 1536 21 8 : tunables 0 0 >>>> 0 : slabdata 1 1 0 >>>> SCTP 0 0 1408 23 8 : tunables 0 0 >>>> 0 : slabdata 0 0 0 >>>> kcopyd_job 0 0 3312 9 8 : tunables 0 0 >>>> 0 : slabdata 0 0 0 >>>> dm_uevent 0 0 2608 12 8 : tunables 0 0 >>>> 0 : slabdata 0 0 0 >>>> nf_conntrack_ffffffff81acbb00 14054 14892 320 51 4 : >>>> tunables 0 0 0 : slabdata 292 292 0 >>>> lvp_cache 36 36 224 36 2 : tunables 0 0 >>>> 0 : slabdata 1 1 0 >>>> lve_struct 4140 4140 352 46 4 : tunables 0 0 >>>> 0 : slabdata 90 90 0 >>>> fat_inode_cache 0 0 744 44 8 : tunables 0 0 >>>> 0 : slabdata 0 0 0 >>>> fat_cache 0 0 40 102 1 : tunables 0 0 >>>> 0 : slabdata 0 0 0 >>>> isofs_inode_cache 0 0 664 49 8 : tunables 0 0 >>>> 0 : slabdata 0 0 0 >>>> ext4_inode_cache 30 30 1088 30 8 : tunables 0 0 >>>> 0 : slabdata 1 1 0 >>>> ext4_xattr 0 0 88 46 1 : tunables 0 0 >>>> 0 : slabdata 0 0 0 >>>> ext4_free_data 0 0 64 64 1 : tunables 0 0 >>>> 0 : slabdata 0 0 0 >>>> ext4_allocation_context 32 32 128 32 1 : tunables 0 >>>> 0 0 : slabdata 1 1 0 >>>> ext4_io_end 0 0 72 56 1 : tunables 0 0 >>>> 0 : slabdata 0 0 0 >>>> ext4_extent_status 102 102 40 102 1 : tunables 0 0 >>>> 0 : slabdata 1 1 0 >>>> jbd2_journal_handle 0 0 48 85 1 : tunables 0 >>>> 0 0 : slabdata 0 0 0 >>>> jbd2_journal_head 0 0 112 36 1 : tunables 0 0 >>>> 0 : slabdata 0 0 0 >>>> jbd2_revoke_table_s 256 256 16 256 1 : tunables 0 >>>> 0 0 : slabdata 1 1 0 >>>> jbd2_revoke_record_s 0 0 32 128 1 : tunables 0 >>>> 0 0 : slabdata 0 0 0 >>>> kvm_async_pf 0 0 136 30 1 : tunables 0 0 >>>> 0 : slabdata 0 0 0 >>>> kvm_vcpu 0 0 18560 1 8 : tunables 0 0 >>>> 0 : slabdata 0 0 0 >>>> xfs_dqtrx 992 992 528 31 4 : tunables 0 0 >>>> 0 : slabdata 32 32 0 >>>> xfs_dquot 3264 3264 472 34 4 : tunables 0 0 >>>> 0 : slabdata 96 96 0 >>>> xfs_ili 4342175 4774399 152 53 2 : tunables 0 >>>> 0 0 : slabdata 90083 90083 0 >>>> xfs_inode 4915588 5486076 1088 30 8 : tunables 0 >>>> 0 0 : slabdata 182871 182871 0 >>>> xfs_efd_item 2680 2760 400 40 4 : tunables 0 0 >>>> 0 : slabdata 69 69 0 >>>> xfs_da_state 1088 1088 480 34 4 : tunables 0 0 >>>> 0 : slabdata 32 32 0 >>>> xfs_btree_cur 1248 1248 208 39 2 : tunables 0 0 >>>> 0 : slabdata 32 32 0 >>>> xfs_log_ticket 14874 15048 184 44 2 : tunables 0 0 >>>> 0 : slabdata 342 342 0 >>>> xfs_ioend 12909 13104 104 39 1 : tunables 0 0 >>>> 0 : slabdata 336 336 0 >>>> scsi_cmd_cache 5400 5652 448 36 4 : tunables 0 0 >>>> 0 : slabdata 157 157 0 >>>> ve_struct 0 0 848 38 8 : tunables 0 0 >>>> 0 : slabdata 0 0 0 >>>> ip6_dst_cache 1152 1152 448 36 4 : tunables 0 0 >>>> 0 : slabdata 32 32 0 >>>> RAWv6 910 910 1216 26 8 : tunables 0 0 >>>> 0 : slabdata 35 35 0 >>>> UDPLITEv6 0 0 1216 26 8 : tunables 0 0 >>>> 0 : slabdata 0 0 0 >>>> UDPv6 832 832 1216 26 8 : tunables 0 0 >>>> 0 : slabdata 32 32 0 >>>> tw_sock_TCPv6 1152 1376 256 32 2 : tunables 0 0 >>>> 0 : slabdata 43 43 0 >>>> TCPv6 510 510 2176 15 8 : tunables 0 0 >>>> 0 : slabdata 34 34 0 >>>> cfq_queue 3698 5145 232 35 2 : tunables 0 0 >>>> 0 : slabdata 147 147 0 >>>> bsg_cmd 0 0 312 52 4 : tunables 0 0 >>>> 0 : slabdata 0 0 0 >>>> mqueue_inode_cache 136 136 960 34 8 : tunables 0 0 >>>> 0 : slabdata 4 4 0 >>>> hugetlbfs_inode_cache 1632 1632 632 51 8 : tunables 0 >>>> 0 0 : slabdata 32 32 0 >>>> configfs_dir_cache 1472 1472 88 46 1 : tunables 0 0 >>>> 0 : slabdata 32 32 0 >>>> dquot 0 0 256 32 2 : tunables 0 0 >>>> 0 : slabdata 0 0 0 >>>> userfaultfd_ctx_cache 32 32 128 32 1 : tunables 0 >>>> 0 0 : slabdata 1 1 0 >>>> fanotify_event_info 2336 2336 56 73 1 : tunables 0 >>>> 0 0 : slabdata 32 32 0 >>>> dio 6171 6222 640 51 8 : tunables 0 0 >>>> 0 : slabdata 122 122 0 >>>> pid_namespace 42 42 2192 14 8 : tunables 0 0 >>>> 0 : slabdata 3 3 0 >>>> posix_timers_cache 1056 1056 248 33 2 : tunables 0 0 >>>> 0 : slabdata 32 32 0 >>>> UDP-Lite 0 0 1088 30 8 : tunables 0 0 >>>> 0 : slabdata 0 0 0 >>>> flow_cache 2268 2296 144 28 1 : tunables 0 0 >>>> 0 : slabdata 82 82 0 >>>> xfrm_dst_cache 896 896 576 28 4 : tunables 0 0 >>>> 0 : slabdata 32 32 0 >>>> ip_fib_alias 2720 2720 48 85 1 : tunables 0 0 >>>> 0 : slabdata 32 32 0 >>>> RAW 3977 4224 1024 32 8 : tunables 0 0 >>>> 0 : slabdata 132 132 0 >>>> UDP 4110 4110 1088 30 8 : tunables 0 0 >>>> 0 : slabdata 137 137 0 >>>> tw_sock_TCP 4756 5216 256 32 2 : tunables 0 0 >>>> 0 : slabdata 163 163 0 >>>> TCP 2705 2768 1984 16 8 : tunables 0 0 >>>> 0 : slabdata 173 173 0 >>>> scsi_data_buffer 5440 5440 24 170 1 : tunables 0 0 >>>> 0 : slabdata 32 32 0 >>>> blkdev_queue 154 154 2208 14 8 : tunables 0 0 >>>> 0 : slabdata 11 11 0 >>>> blkdev_requests 4397688 4405884 384 42 4 : tunables 0 >>>> 0 0 : slabdata 104902 104902 0 >>>> blkdev_ioc 11232 11232 112 36 1 : tunables 0 0 >>>> 0 : slabdata 312 312 0 >>>> user_namespace 0 0 1304 25 8 : tunables 0 0 >>>> 0 : slabdata 0 0 0 >>>> sock_inode_cache 12282 12282 704 46 8 : tunables 0 0 >>>> 0 : slabdata 267 267 0 >>>> file_lock_cache 20056 20960 200 40 2 : tunables 0 0 >>>> 0 : slabdata 524 524 0 >>>> net_namespace 6 6 5056 6 8 : tunables 0 0 >>>> 0 : slabdata 1 1 0 >>>> shmem_inode_cache 16970 18952 712 46 8 : tunables 0 0 >>>> 0 : slabdata 412 412 0 >>>> Acpi-ParseExt 39491 40432 72 56 1 : tunables 0 0 >>>> 0 : slabdata 722 722 0 >>>> Acpi-State 1683 1683 80 51 1 : tunables 0 0 >>>> 0 : slabdata 33 33 0 >>>> Acpi-Namespace 11424 11424 40 102 1 : tunables 0 0 >>>> 0 : slabdata 112 112 0 >>>> task_delay_info 15336 15336 112 36 1 : tunables 0 0 >>>> 0 : slabdata 426 426 0 >>>> taskstats 1568 1568 328 49 4 : tunables 0 0 >>>> 0 : slabdata 32 32 0 >>>> proc_inode_cache 169897 190608 680 48 8 : tunables 0 0 >>>> 0 : slabdata 3971 3971 0 >>>> sigqueue 2208 2208 168 48 2 : tunables 0 0 >>>> 0 : slabdata 46 46 0 >>>> bdev_cache 792 792 896 36 8 : tunables 0 0 >>>> 0 : slabdata 22 22 0 >>>> sysfs_dir_cache 74698 74698 120 34 1 : tunables 0 0 >>>> 0 : slabdata 2197 2197 0 >>>> mnt_cache 163197 163424 256 32 2 : tunables 0 0 >>>> 0 : slabdata 5107 5107 0 >>>> filp 64607 97257 320 51 4 : tunables 0 0 >>>> 0 : slabdata 1907 1907 0 >>>> inode_cache 370744 370947 616 53 8 : tunables 0 0 >>>> 0 : slabdata 6999 6999 0 >>>> dentry 1316262 2139228 192 42 2 : tunables 0 >>>> 0 0 : slabdata 50934 50934 0 >>>> iint_cache 0 0 80 51 1 : tunables 0 0 >>>> 0 : slabdata 0 0 0 >>>> buffer_head 1441470 2890290 104 39 1 : tunables 0 >>>> 0 0 : slabdata 74110 74110 0 >>>> vm_area_struct 194998 196840 216 37 2 : tunables 0 0 >>>> 0 : slabdata 5320 5320 0 >>>> mm_struct 2679 2760 1600 20 8 : tunables 0 0 >>>> 0 : slabdata 138 138 0 >>>> files_cache 8680 8925 640 51 8 : tunables 0 0 >>>> 0 : slabdata 175 175 0 >>>> signal_cache 3691 3780 1152 28 8 : tunables 0 0 >>>> 0 : slabdata 135 135 0 >>>> sighand_cache 1950 2160 2112 15 8 : tunables 0 0 >>>> 0 : slabdata 144 144 0 >>>> task_xstate 8070 8658 832 39 8 : tunables 0 0 >>>> 0 : slabdata 222 222 0 >>>> task_struct 1913 2088 4080 8 8 : tunables 0 0 >>>> 0 : slabdata 261 261 0 >>>> cred_jar 31699 33936 192 42 2 : tunables 0 0 >>>> 0 : slabdata 808 808 0 >>>> anon_vma_chain 164026 168704 64 64 1 : tunables 0 0 >>>> 0 : slabdata 2636 2636 0 >>>> anon_vma 84104 84594 88 46 1 : tunables 0 0 >>>> 0 : slabdata 1839 1839 0 >>>> pid 11127 12576 128 32 1 : tunables 0 0 >>>> 0 : slabdata 393 393 0 >>>> shared_policy_node 9350 9350 48 85 1 : tunables 0 0 >>>> 0 : slabdata 110 110 0 >>>> numa_policy 62 62 264 31 2 : tunables 0 0 >>>> 0 : slabdata 2 2 0 >>>> radix_tree_node 771778 1194312 584 28 4 : tunables 0 0 >>>> 0 : slabdata 42654 42654 0 >>>> idr_layer_cache 2538 2565 2112 15 8 : tunables 0 0 >>>> 0 : slabdata 171 171 0 >>>> dma-kmalloc-8192 0 0 8192 4 8 : tunables 0 0 >>>> 0 : slabdata 0 0 0 >>>> dma-kmalloc-4096 0 0 4096 8 8 : tunables 0 0 >>>> 0 : slabdata 0 0 0 >>>> dma-kmalloc-2048 0 0 2048 16 8 : tunables 0 0 >>>> 0 : slabdata 0 0 0 >>>> dma-kmalloc-1024 0 0 1024 32 8 : tunables 0 0 >>>> 0 : slabdata 0 0 0 >>>> dma-kmalloc-512 0 0 512 32 4 : tunables 0 0 >>>> 0 : slabdata 0 0 0 >>>> dma-kmalloc-256 0 0 256 32 2 : tunables 0 0 >>>> 0 : slabdata 0 0 0 >>>> dma-kmalloc-128 0 0 128 32 1 : tunables 0 0 >>>> 0 : slabdata 0 0 0 >>>> dma-kmalloc-64 0 0 64 64 1 : tunables 0 0 >>>> 0 : slabdata 0 0 0 >>>> dma-kmalloc-32 0 0 32 128 1 : tunables 0 0 >>>> 0 : slabdata 0 0 0 >>>> dma-kmalloc-16 0 0 16 256 1 : tunables 0 0 >>>> 0 : slabdata 0 0 0 >>>> dma-kmalloc-8 0 0 8 512 1 : tunables 0 0 >>>> 0 : slabdata 0 0 0 >>>> dma-kmalloc-192 0 0 192 42 2 : tunables 0 0 >>>> 0 : slabdata 0 0 0 >>>> dma-kmalloc-96 0 0 96 42 1 : tunables 0 0 >>>> 0 : slabdata 0 0 0 >>>> kmalloc-8192 385 388 8192 4 8 : tunables 0 0 >>>> 0 : slabdata 97 97 0 >>>> kmalloc-4096 9296 10088 4096 8 8 : tunables 0 0 >>>> 0 : slabdata 1261 1261 0 >>>> kmalloc-2048 65061 133536 2048 16 8 : tunables 0 0 >>>> 0 : slabdata 8346 8346 0 >>>> kmalloc-1024 11987 21120 1024 32 8 : tunables 0 0 >>>> 0 : slabdata 660 660 0 >>>> kmalloc-512 107510 187072 512 32 4 : tunables 0 0 >>>> 0 : slabdata 5846 5846 0 >>>> kmalloc-256 160498 199104 256 32 2 : tunables 0 0 >>>> 0 : slabdata 6222 6222 0 >>>> kmalloc-192 144975 237426 192 42 2 : tunables 0 0 >>>> 0 : slabdata 5653 5653 0 >>>> kmalloc-128 36799 108096 128 32 1 : tunables 0 0 >>>> 0 : slabdata 3378 3378 0 >>>> kmalloc-96 99510 238896 96 42 1 : tunables 0 0 >>>> 0 : slabdata 5688 5688 0 >>>> kmalloc-64 7978152 8593280 64 64 1 : tunables 0 >>>> 0 0 : slabdata 134270 134270 0 >>>> kmalloc-32 2939882 3089664 32 128 1 : tunables 0 >>>> 0 0 : slabdata 24138 24138 0 >>>> kmalloc-16 172057 172288 16 256 1 : tunables 0 0 >>>> 0 : slabdata 673 673 0 >>>> kmalloc-8 109568 109568 8 512 1 : tunables 0 0 >>>> 0 : slabdata 214 214 0 >>>> kmem_cache_node 893 896 64 64 1 : tunables 0 0 >>>> 0 : slabdata 14 14 0 >>>> kmem_cache 612 612 320 51 4 : tunables 0 0 >>>> 0 : slabdata 12 12 0 >>>> >>>> ------------------------------------------------- >>>> >>>> >>>> # uname -r >>>> 3.10.0-714.10.2.lve1.5.17.1.el7.x86_64 >>>> >>>> -------------------------------------------------------- >>>> >>>> Core part of glances >>>> http://i.imgur.com/La5JbQn.png >>>> ----------------------------------------------------------- >>>> >>>> Thank you very much for looking into this >>>> >>>> >>>> On Thu, Aug 2, 2018 at 12:37 PM Igor A. Ippolitov <[email protected]> >>>> wrote: >>>> >>>>> Anoop, >>>>> >>>>> I doubt this will be the solution, but may we have a look at >>>>> /proc/buddyinfo and /proc/slabinfo the moment when nginx can't allocate >>>>> memory? >>>>> >>>>> On 02.08.2018 08:15, Anoop Alias wrote: >>>>> >>>>> Hi Maxim, >>>>> >>>>> I enabled debug and the memalign call is happening on nginx reloads >>>>> and the ENOMEM happen sometimes on the reload(not on all reloads) >>>>> >>>>> 2018/08/02 05:59:08 [notice] 872052#872052: signal process started >>>>> 2018/08/02 05:59:23 [notice] 871570#871570: signal 1 (SIGHUP) received >>>>> from 872052, reconfiguring >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: wake up, sigio 0 >>>>> 2018/08/02 05:59:23 [notice] 871570#871570: reconfiguring >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign: >>>>> 0000000002B0DA00:16384 @16 === > the memalign call on reload >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: malloc: >>>>> 00000000087924D0:4560 >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign: >>>>> 000000000E442E00:16384 @16 >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: malloc: >>>>> 0000000005650850:4096 >>>>> 20 >>>>> >>>>> >>>>> >>>>> >>>>> 2018/08/02 05:48:49 [debug] 871275#871275: bind() xxxx:443 #71 >>>>> 2018/08/02 05:48:49 [debug] 871275#871275: bind() xxxx:443 #72 >>>>> 2018/08/02 05:48:49 [debug] 871275#871275: bind() xxxx:443 #73 >>>>> 2018/08/02 05:48:49 [debug] 871275#871275: bind() xxxx:443 #74 >>>>> 2018/08/02 05:48:49 [debug] 871275#871275: add cleanup: >>>>> 000000005340D728 >>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc: >>>>> 00000000024D3260:4096 >>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc: >>>>> 00000000517BAF10:4096 >>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc: >>>>> 0000000053854FC0:4096 >>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc: >>>>> 0000000053855FD0:4096 >>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc: >>>>> 0000000053856FE0:4096 >>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc: >>>>> 0000000053857FF0:4096 >>>>> 2018/08/02 05:48:49 [debug] 871275#871275: posix_memalign: >>>>> 0000000053859000:16384 @16 >>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc: >>>>> 000000005385D010:4096 >>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc: >>>>> 000000005385E020:4096 >>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc: >>>>> 000000005385F030:4096 >>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc: >>>>> 00000000536CD160:4096 >>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc: >>>>> 00000000536CE170:4096 >>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc: >>>>> 00000000536CF180:4096 >>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc: >>>>> 00000000536D0190:4096 >>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc: >>>>> 00000000536D11A0:4096 >>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc: >>>>> 00000000536D21B0:4096 >>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc: >>>>> 00000000536D31C0:4096 >>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc: >>>>> 00000000536D41D0:4096 >>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc: >>>>> 00000000536D51E0:4096 >>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc: >>>>> 00000000536D61F0:4096 >>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc: >>>>> 00000000536D7200:4096 >>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc: >>>>> 00000000536D8210:4096 >>>>> 2018/08/02 05:48:49 [debug] 871275#871275: malloc: >>>>> 00000000536D9220:4096 >>>>> >>>>> >>>>> Infact there are lot of such calls during a reload >>>>> >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign: >>>>> 00000000BA17ED00:16384 @16 >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign: >>>>> 00000000BA1B0FF0:16384 @16 >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign: >>>>> 00000000BA1E12C0:16384 @16 >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign: >>>>> 00000000BA211590:16384 @16 >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign: >>>>> 00000000BA243880:16384 @16 >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign: >>>>> 00000000BA271B30:16384 @16 >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign: >>>>> 00000000BA2A3E20:16384 @16 >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign: >>>>> 00000000BA2D20D0:16384 @16 >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign: >>>>> 00000000BA3063E0:16384 @16 >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign: >>>>> 00000000BA334690:16384 @16 >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign: >>>>> 00000000BA366980:16384 @16 >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign: >>>>> 00000000BA396C50:16384 @16 >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign: >>>>> 00000000BA3C8F40:16384 @16 >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign: >>>>> 00000000BA3F9210:16384 @16 >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign: >>>>> 00000000BA4294E0:16384 @16 >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign: >>>>> 00000000BA45B7D0:16384 @16 >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign: >>>>> 00000000BA489A80:16384 @16 >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign: >>>>> 00000000BA4BBD70:16384 @16 >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign: >>>>> 00000000BA4EA020:16384 @16 >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign: >>>>> 00000000BA51E330:16384 @16 >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign: >>>>> 00000000BA54C5E0:16384 @16 >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign: >>>>> 00000000BA57E8D0:16384 @16 >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign: >>>>> 00000000BA5AEBA0:16384 @16 >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign: >>>>> 00000000BA5DEE70:16384 @16 >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign: >>>>> 00000000BA611160:16384 @16 >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign: >>>>> 00000000BA641430:16384 @16 >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign: >>>>> 00000000BA671700:16384 @16 >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign: >>>>> 00000000BA6A29E0:16384 @16 >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign: >>>>> 00000000BA6D5CE0:16384 @16 >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign: >>>>> 00000000BA707FD0:16384 @16 >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign: >>>>> 00000000BA736280:16384 @16 >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign: >>>>> 00000000BA768570:16384 @16 >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign: >>>>> 00000000BA796820:16384 @16 >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign: >>>>> 00000000BA7CAB30:16384 @16 >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign: >>>>> 00000000BA7F8DE0:16384 @16 >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign: >>>>> 00000000BA82B0D0:16384 @16 >>>>> 2018/08/02 05:59:23 [debug] 871570#871570: posix_memalign: >>>>> 00000000BA85B3A0:16384 @16 >>>>> >>>>> >>>>> >>>>> What is perplexing is that the system has enough free (available RAM) >>>>> ############# >>>>> # free -g >>>>> total used free shared buff/cache >>>>> available >>>>> Mem: 125 54 24 8 46 >>>>> 58 >>>>> Swap: 0 0 0 >>>>> ############# >>>>> >>>>> # ulimit -a >>>>> core file size (blocks, -c) 0 >>>>> data seg size (kbytes, -d) unlimited >>>>> scheduling priority (-e) 0 >>>>> file size (blocks, -f) unlimited >>>>> pending signals (-i) 514579 >>>>> max locked memory (kbytes, -l) 64 >>>>> max memory size (kbytes, -m) unlimited >>>>> open files (-n) 1024 >>>>> pipe size (512 bytes, -p) 8 >>>>> POSIX message queues (bytes, -q) 819200 >>>>> real-time priority (-r) 0 >>>>> stack size (kbytes, -s) 8192 >>>>> cpu time (seconds, -t) unlimited >>>>> max user processes (-u) 514579 >>>>> virtual memory (kbytes, -v) unlimited >>>>> file locks (-x) unlimited >>>>> >>>>> ######################################### >>>>> >>>>> There is no other thing limiting memory allocation >>>>> >>>>> Any way to prevent this or probably identify/prevent this >>>>> >>>>> >>>>> On Tue, Jul 31, 2018 at 7:08 PM Maxim Dounin <[email protected]> >>>>> wrote: >>>>> >>>>>> Hello! >>>>>> >>>>>> On Tue, Jul 31, 2018 at 09:52:29AM +0530, Anoop Alias wrote: >>>>>> >>>>>> > I am repeatedly seeing errors like >>>>>> > >>>>>> > ###################### >>>>>> > 2018/07/31 03:46:33 [emerg] 2854560#2854560: posix_memalign(16, >>>>>> 16384) >>>>>> > failed (12: Cannot allocate memory) >>>>>> > 2018/07/31 03:54:09 [emerg] 2890190#2890190: posix_memalign(16, >>>>>> 16384) >>>>>> > failed (12: Cannot allocate memory) >>>>>> > 2018/07/31 04:08:36 [emerg] 2939230#2939230: posix_memalign(16, >>>>>> 16384) >>>>>> > failed (12: Cannot allocate memory) >>>>>> > 2018/07/31 04:24:48 [emerg] 2992650#2992650: posix_memalign(16, >>>>>> 16384) >>>>>> > failed (12: Cannot allocate memory) >>>>>> > 2018/07/31 04:42:09 [emerg] 3053092#3053092: posix_memalign(16, >>>>>> 16384) >>>>>> > failed (12: Cannot allocate memory) >>>>>> > 2018/07/31 04:42:17 [emerg] 3053335#3053335: posix_memalign(16, >>>>>> 16384) >>>>>> > failed (12: Cannot allocate memory) >>>>>> > 2018/07/31 04:42:28 [emerg] 3053937#3053937: posix_memalign(16, >>>>>> 16384) >>>>>> > failed (12: Cannot allocate memory) >>>>>> > 2018/07/31 04:47:54 [emerg] 3070638#3070638: posix_memalign(16, >>>>>> 16384) >>>>>> > failed (12: Cannot allocate memory) >>>>>> > #################### >>>>>> > >>>>>> > on a few servers >>>>>> > >>>>>> > The servers have enough memory free and the swap usage is 0, yet >>>>>> somehow >>>>>> > the kernel denies the posix_memalign with ENOMEM ( this is what I >>>>>> think is >>>>>> > happening!) >>>>>> > >>>>>> > The numbers requested are always 16, 16k . This makes me suspicious >>>>>> > >>>>>> > I have no setting in nginx.conf that reference a 16k >>>>>> > >>>>>> > Is there any chance of finding out what requests this and why this >>>>>> is not >>>>>> > fulfilled >>>>>> >>>>>> There are at least some buffers which default to 16k - for >>>>>> example, ssl_buffer_size (http://nginx.org/r/ssl_buffer_size). >>>>>> >>>>>> You may try debugging log to futher find out where the particular >>>>>> allocation happens, see here for details: >>>>>> >>>>>> http://nginx.org/en/docs/debugging_log.html >>>>>> >>>>>> But I don't really think it worth the effort. The error is pretty >>>>>> clear, and it's better to focus on why these allocations are >>>>>> denied. Likely you are hitting some limit. >>>>>> >>>>>> -- >>>>>> Maxim Dounin >>>>>> http://mdounin.ru/ >>>>>> _______________________________________________ >>>>>> nginx mailing list >>>>>> [email protected] >>>>>> http://mailman.nginx.org/mailman/listinfo/nginx >>>>>> >>>>> >>>>> >>>>> -- >>>>> *Anoop P Alias* >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> nginx mailing >>>>> [email protected]http://mailman.nginx.org/mailman/listinfo/nginx >>>>> >>>>> >>>>> _______________________________________________ >>>>> nginx mailing list >>>>> [email protected] >>>>> http://mailman.nginx.org/mailman/listinfo/nginx >>>> >>>> >>>> >>>> -- >>>> *Anoop P Alias* >>>> >>>> >>>> >>>> _______________________________________________ >>>> nginx mailing >>>> [email protected]http://mailman.nginx.org/mailman/listinfo/nginx >>>> >>>> >>>> _______________________________________________ >>>> nginx mailing list >>>> [email protected] >>>> http://mailman.nginx.org/mailman/listinfo/nginx >>> >>> >>> >>> -- >>> *Anoop P Alias* >>> >>> >>> >>> _______________________________________________ >>> nginx mailing >>> [email protected]http://mailman.nginx.org/mailman/listinfo/nginx >>> >>> >>> _______________________________________________ >>> nginx mailing list >>> [email protected] >>> http://mailman.nginx.org/mailman/listinfo/nginx >> >> >> >> -- >> *Anoop P Alias* >> >> >> >> _______________________________________________ >> nginx mailing >> [email protected]http://mailman.nginx.org/mailman/listinfo/nginx >> >> >> _______________________________________________ >> nginx mailing list >> [email protected] >> http://mailman.nginx.org/mailman/listinfo/nginx > > > > -- > *Anoop P Alias* > > -- *Anoop P Alias*
_______________________________________________ nginx mailing list [email protected] http://mailman.nginx.org/mailman/listinfo/nginx
