Well, this is frustrating.. The buffer drops are still occurring - so I thought it worth trying use a ramdisk and set *stats_temp_directory* accordingly.
I've reloaded the instance, and can see that the stats directory is now being populated in the new location. *Except* - there is one last file ( pgss_query_texts.stat) that continues to be updated in the *old* pg_stat_tmp path.. Is that supposed to happen? Fairly similar to this guy (but not quite the same). https://www.postgresql.org/message-id/d6e71befad7beb4fbcd8ae74fadb1265011bb40fc...@win-8-eml-ex1.eml.local I can see the packets arriving and being consumed by the collector.. and, the collector is indeed updating in the new stats_temp_directory.. just not for that one file. It also failed to resolve the buffer drops.. At this point, I'm not sure I expected it to. They tend to occur semi-regularly (every 8-13 minutes) but I can't correlate them with any kind of activity (and if I'm honest, it's possibly starting to drive me a little bit mad). On Tue, Apr 18, 2017 at 2:53 PM Tim Kane <tim.k...@gmail.com> wrote: > Okay, so I've run an strace on the collector process during a buffer drop > event. > I can see evidence of a recvfrom loop pulling in a *maximum* of 142kb. > > While I've had already increased rmem_max, it would appear this is not > being observed by the kernel. > rmem_default is set to 124kb, which would explain the above read maxing > out just slightly beyond this (presuming a ring buffer filling up behind > the read). > > I'm going to try increasing rmem_default and see if it has any positive > effect.. (and then investigate why the kernel doesn't want to consider > rmem_max).. > > > > > > On Tue, Apr 18, 2017 at 8:05 AM Tim Kane <tim.k...@gmail.com> wrote: > >> Hi all, >> >> I'm seeing sporadic (but frequent) UDP buffer drops on a host that so far >> I've not been able to resolve. >> >> The drops are originating from postgres processes, and from what I know - >> the only UDP traffic generated by postgres should be consumed by the >> statistics collector - but for whatever reason, it's failing to read the >> packets quickly enough. >> >> Interestingly, I'm seeing these drops occur even when the system is >> idle.. but every 15 minutes or so (not consistently enough to isolate any >> particular activity) we'll see in the order of ~90 packets dropped at a >> time. >> >> I'm running 9.6.2, but the issue was previously occurring on 9.2.4 (on >> the same hardware) >> >> >> If it's relevant.. there are two instances of postgres running (and >> consequently, 2 instances of the stats collector process) though 1 of those >> instances is most definitely idle for most of the day. >> >> In an effort to try to resolve the problem, I've increased (x2) the UDP >> recv buffer sizes on the host - but it seems to have had no effect. >> >> cat /proc/sys/net/core/rmem_max >> 1677216 >> >> The following parameters are configured >> >> track_activities on >> track_counts on >> track_functions none >> track_io_timing off >> >> >> There are approximately 80-100 connections at any given time. >> >> It seems that the issue started a few weeks ago, around the time of a >> reboot on the given host... but it's difficult to know what (if anything) >> has changed, or why :-/ >> >> >> Incidentally... the documentation doesn't seem to have any mention of UDP >> whatsoever. I'm going to use this as an opportunity to dive into the >> source - but perhaps it's worth improving the documentation around this? >> >> My next step is to try disabling track_activities and track_counts to >> see if they improve matters any, but I wouldn't expect these to generate >> enough data to flood the UDP buffers :-/ >> >> Any ideas? >> >> >> >>