I don't see the rest of this thread, can you (re?) post you config? Rainer
El mié., 2 sept. 2020 a las 3:13, Adam Chalkley via rsyslog (<[email protected]>) escribió: > > Unfortunately the system is still having issues. > > I enabled debug logging earlier, copied the debug log aside and *then* > disabled debug-on-demand logging (last time I forgot that not copying the > file elsewhere first complete wipes the log file). > > I have 18 MB worth of captured details for a short timeframe. > > Out of that log file I'm seeing a lot of these entries: > > 7316.804057780:imuxsock.c : main Q: queue.c: EnqueueMsg advised worker > start > 7316.804062774:imuxsock.c : imuxsock.c: --------imuxsock calling poll() > on 2 fds > 7316.804232359:imuxsock.c : imuxsock.c: Message from UNIX socket: #3, > size 221 > 7316.804258678:imuxsock.c : datetime.c: ParseTIMESTAMP3339: invalid year: > 0, pszTS: 'e' > 7316.804264073:imuxsock.c : main Q: queue.c: EnqueueMsg advised worker > start > 7316.804267610:imuxsock.c : imuxsock.c: --------imuxsock calling poll() > on 2 fds > 7317.803939077:imrelp.c : main Q: queue.c: EnqueueMsg advised worker > start > 7317.803994833:imrelp.c : imrelp.c: relpTcpSend: send data: 14916 rsp 6 > 200 OK > > 7317.804025407:imrelp.c : imrelp.c: relpTcpSend: sock 80, lenbuf 19, > send returned -1 [errno 32] > 7317.804038885:imrelp.c : imrelp.c: librelp: generic error: ecode > 10014, emsg 'error sending relp: Broken pipe' > 7317.804049421:imrelp.c : errmsg.c: Called LogMsg, msg: imrelp[2514]: > error 'error sending relp: Broken pipe', object 'lstn 2514: conn to clt > 192.168.2.172/192.168.2.172' - input may not work as intended > 7317.804054526:imrelp.c : operatingstate.c: osf: MSG imrelp[2514]: > error 'error sending relp: Broken pipe', object 'lstn 2514: conn to clt > 192.168.2.172/192.168.2.172' - input may not work as intended: signaling new > internal message via SIGTTOU: 'imrelp[2514]: error 'error sending relp: > Broken pipe', object 'lstn 2514: conn to clt 192.168.2.172/192.168.2.172' - > input may not work as intended [v8.2006.0 try https://www.rsyslog.com/e/2353 > ]' > 7317.804097763:main thread : janitor.c: janitorRun() called > > The pattern seems to be that a remote target is interrupted and then the main > queue (main Q) starts filling up. All omrelp actions have queues attached and > none of them ever filled (going by impstats log file which was regularly > receiving updates). > > Going by the local log file on the primary receiver it noted these details: > > 2020-09-01T12:27:33.306418-05:00 woodchuck1 rsyslogd: main Q:Reg: high > activity - starting 1 additional worker thread(s), currently 1 active worker > threads. [v8.2006.0 try https://www.rsyslog.com/e/2439 ] > 2020-09-01T12:27:33.429855-05:00 woodchuck1 rsyslogd: omfwd: TCPSendBuf error > -2027, destruct TCP Connection to graylog1.example.com:514 [v8.2006.0 try > https://www.rsyslog.com/e/2027 ] > 2020-09-01T12:27:33.469576-05:00 woodchuck1 rsyslogd: action > 'ForwardToGraylog1' suspended (module 'builtin:omfwd'), retry 0. There should > be messages before this one giving the reason for suspension. [v8.2006.0 try > https://www.rsyslog.com/e/2007 ] > 2020-09-01T12:27:33.470212-05:00 woodchuck1 rsyslogd: action > 'ForwardToGraylog1' resumed (module 'builtin:omfwd') [v8.2006.0 try > https://www.rsyslog.com/e/2359 ] > 2020-09-01T12:33:02.644015-05:00 woodchuck1 rsyslogd: -- MARK -- > 2020-09-01T12:53:02.711834-05:00 woodchuck1 rsyslogd: -- MARK -- > 2020-09-01T13:13:02.802190-05:00 woodchuck1 rsyslogd: -- MARK -- > 2020-09-01T13:33:02.856874-05:00 woodchuck1 rsyslogd: -- MARK -- > 2020-09-01T13:45:10.206220-05:00 woodchuck1 rsyslogd: main Q:Reg: high > activity - starting 1 additional worker thread(s), currently 2 active worker > threads. [v8.2006.0 try https://www.rsyslog.com/e/2439 ] > 2020-09-01T13:45:10.222580-05:00 woodchuck1 rsyslogd: librelp error 10008 > forwarding to server woodchuck2.example.com:2514 - suspending [v8.2006.0 try > https://www.rsyslog.com/e/2291 ] > 2020-09-01T13:45:10.222794-05:00 woodchuck1 rsyslogd: action > 'ForwardTowoodchuck2' suspended (module 'omrelp'), retry 0. There should be > messages before this one giving the reason for suspension. [v8.2006.0 try > https://www.rsyslog.com/e/2007 ] > 2020-09-01T13:45:11.232309-05:00 woodchuck1 rsyslogd: action > 'ForwardTowoodchuck2' resumed (module 'omrelp') [v8.2006.0 try > https://www.rsyslog.com/e/2359 ] > According to the impstats log, the main queue started filling up around 11:21 > am, about the time that several of our mail relays were rebooting from > maintenance. > > This is an Ubuntu 18.04 box that was in-place upgraded from Ubuntu 16.04. > > Not sure if it is relevant, but the existing rsyslog packages were not > swapped out with replacement 18.04-based packages as part of the upgrade as > expected. > > dpkg -l | grep rsyslog > ii rsyslog 8.2006.0-0adiscon2xenial1 > amd64 a rocket-fast system for log processing > ii rsyslog-mmjsonparse 8.2006.0-0adiscon2xenial1 > amd64 Parsing/handling of CEE/Lumberjack JSON messages in > rsyslog > ii rsyslog-mmnormalize 8.2006.0-0adiscon2xenial1 > amd64 The rsyslog-mmnormalize package provides log > normalization > ii rsyslog-mmrm1stspace 8.2006.0-0adiscon2xenial1 > amd64 The mmrm1stspace module permits to strip the > leading space from > ii rsyslog-relp 8.2006.0-0adiscon2xenial1 > amd64 RELP protocol support for rsyslog > > I was planning to wait for the next rsyslog packages to be released for the > Ubuntu PPA to see if that was enough to trigger a switchover to "bionic" > based packages. I suspect this is completely unrelated to the problem > experienced, but wanted to make sure and note it here. > > Recent changes (not *exactly* flailing around, but not very confident) from a > few days ago (which didn't help today): > > * Modified forward queues (attached to actions within a ruleset) > ** increase queue size from 10K to 100K > ** increase worker threads from 1 to 4 > ** disable explicit high water mark setting, allow default setting to apply > ** disable explicit low water mark setting, allow default setting to apply > > * Modified 'main Q' (as impstats lists it) > ** adjust worker threads from 1 to 4 > ** adjust queue size from 10K to 500K > > Changes made tonight after restarting rsyslog: > > * Modified ruleset used to write out messages to local log files on the > receiver > ** increase queue size to 250K > ** increase worker threads to 4 > > * Disable lightDelayMark on main queue (as discussed on this thread) > > > I'm not sure what's going on. This box was quite stable up until we upgraded > the OS from Ubuntu 16.04 to 18.04. > > Any ideas/suggestions are welcome. > > Thanks. > > -----Original Message----- > From: Adam Chalkley > Sent: Friday, August 28, 2020 4:15 PM > To: rsyslog-users <[email protected]> > Cc: [email protected] > Subject: RE: [rsyslog] Upgraded receiver from Ubuntu 16.04 to 18.04, > connections from clients failing with a high number of CLOSE_WAIT connections > on receiver > > Hi Andre, > > Thank you for the additional feedback. > > As you suggested, the problem is likely tied back to the TCP probe. We've not > had it enabled since last Sunday and rsyslog has been running fine without > making the changes we previously discussed (I'm still interested in making > them, I've just been pulled in other directions). > > Do you happen to know of a safe way to check that the port is open remotely > without triggering a failure from rsyslog's perspective? I'm guessing that a > minimal RELP-compatible client would be the best approach. Is there such a > tool that you're aware of that could be called periodically to confirm that a > rsyslog receiver (RELP-enabled port) is functioning properly? > > Just thought I would ask. > > Thanks! > > -----Original Message----- > From: Andre Lorbach <[email protected]> > Sent: Monday, August 24, 2020 4:06 AM > To: rsyslog-users <[email protected]> > Cc: Adam Chalkley <[email protected]> > Subject: AW: [rsyslog] Upgraded receiver from Ubuntu 16.04 to 18.04, > connections from clients failing with a high number of CLOSE_WAIT connections > on receiver > > I think those errors were there all the time but not reported in older > librelp version. > I reviewed the code and we added this error output about 2 years ago in > librelp. > Ubuntu 16.04 most likely is using an older librelp version, so you did not > see the error there. > > The problem is caused by the TCP Probe, it may helps if you try to receive > data before you drop the connection. > > Best regards, > Andre Lorbach > -- > Adiscon GmbH > Mozartstr. 21 > 97950 Großrinderfeld, Germany > Ph. +49-9349-9298530 > Geschäftsführer/President: Rainer Gerhards Reg.-Gericht Mannheim, HRB > 560610 > Ust.-IDNr.: DE 81 22 04 622 > Web: www.adiscon.com - Mail: [email protected] > > Informations regarding your data privacy policy can be found here: > https://www.adiscon.com/data-privacy-policy/ > > This e-mail may contain confidential and/or privileged information. If you > are not the intended recipient or have received this e-mail in error > please notify the sender immediately and delete this e-mail. Any > unauthorized copying, disclosure or distribution of the material in this > e-mail is strictly forbidden. > > Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte > Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail > irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und > vernichten Sie diese E-Mail. Das unerlaubte Kopieren und die unbefugte > Weitergabe dieser E-Mail sind nicht gestattet. > > > > > -----Ursprüngliche Nachricht----- > > Von: rsyslog <[email protected]> Im Auftrag von Adam > > Chalkley via rsyslog > > Gesendet: Mittwoch, 19. August 2020 18:38 > > An: rsyslog-users <[email protected]> > > Cc: Adam Chalkley <[email protected]> > > Betreff: [rsyslog] Upgraded receiver from Ubuntu 16.04 to 18.04, > connections > > from clients failing with a high number of CLOSE_WAIT connections on > > receiver > > > > Hi, > > > > We upgraded the OS on our central receiver yesterday from Ubuntu 16.04 > > (4.4 kernel) to 18.04 (4.15 kernel). > > > > We are using the upstream PPA, so running 8.2006.0 on receivers and > > endpoints. > > > > When we started getting reports from our Nagios instance that the > rsyslog > > forward queues endpoints were beginning to fill we checked our receiver > > (sawmill1) and saw 94 open TCP connections with 40 of them in CLOSE_WAIT > > from our Nagios server, most of them I suspect from the TCP port > connection > > test performed every 5 minutes. > > > > Log samples from the receiver system (which are related to port probes > from > > our Nagios instance): > > > > 2020-08-19T10:05:01.279416-05:00 lincoln rsyslogd: -- MARK -- > > 2020-08-19T10:05:08.249358-05:00 lincoln rsyslogd: imrelp[2514]: error > 'server > > closed relp session, session broken', object 'lstn 2514: conn to clt > > 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0 > try > > https://www.rsyslog.com/e/2353 ] > > 2020-08-19T10:05:08.249626-05:00 lincoln rsyslogd: imrelp[2514]: error > 'error > > sending relp: Bad file descriptor', object 'lstn 2514: conn to clt > > 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0 > try > > https://www.rsyslog.com/e/2353 ] > > 2020-08-19T10:08:08.020625-05:00 lincoln rsyslogd: imrelp[2514]: error > 'server > > closed relp session, session broken', object 'lstn 2514: conn to clt > > 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0 > try > > https://www.rsyslog.com/e/2353 ] > > 2020-08-19T10:08:08.021253-05:00 lincoln rsyslogd: imrelp[2514]: error > 'error > > sending relp: Bad file descriptor', object 'lstn 2514: conn to clt > > 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0 > try > > https://www.rsyslog.com/e/2353 ] > > 2020-08-19T10:11:08.074712-05:00 lincoln rsyslogd: imrelp[2514]: error > 'server > > closed relp session, session broken', object 'lstn 2514: conn to clt > > 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0 > try > > https://www.rsyslog.com/e/2353 ] > > > > Log samples from the Nagios instance: > > > > 2020-08-19T11:19:53.444953-05:00 nagios rsyslogd: > > omrelp[lincoln.lib.auburn.edu:2514]: error 'error waiting on required > session > > state, session broken', object 'conn to srvr > lincoln.lib.auburn.edu:2514' - > > action may not work as intended [v8.2006.0 try > > https://www.rsyslog.com/e/2353 ] > > 2020-08-19T11:19:53.445260-05:00 nagios rsyslogd: > > omrelp[lincoln.lib.auburn.edu:2514]: error 'error opening connection to > > remote peer', object 'conn to srvr lincoln.lib.auburn.edu:2514' - > action may > > not work as intended [v8.2006.0 try https://www.rsyslog.com/e/2353 ] > > > > Is there a setting I can apply to rsyslog to help resolve this? > > > > Is this a known bug? > > > > We didn't have the issue with v8.2006.0 on our receiver when it was > running > > Ubuntu 16.04 (the prior OS release), even though it made the same > > complaints about the TCP port probes from Nagios. > > > > Thanks in advance. > > > > _______________________________________________ > > rsyslog mailing list > > https://lists.adiscon.net/mailman/listinfo/rsyslog > > http://www.rsyslog.com/professional-services/ > > What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: > > This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites > beyond > > our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT. > _______________________________________________ > rsyslog mailing list > https://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > What's up with rsyslog? Follow https://twitter.com/rgerhards > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of > sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T > LIKE THAT. _______________________________________________ rsyslog mailing list https://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

