all is fine - maybe you start with a snippet of just some of the affected actions.
Rainer El jue., 3 sept. 2020 a las 15:13, Adam Chalkley (<[email protected]>) escribió: > > Rainer, > > Thanks for the reply. > > I've not shared the config, but I can work on doing so. It's rather long and > likely over complicated, but it would be great to have other eyes on it. > > It may take some time to sanitize the contents before sharing, but when I'm > able, should I share it here as attachments, inline as plaintext or via a > GitHub repo? I'm leaning towards the latter if you don't have a specific > preference. > > Thank you for offering to take a look at the configuration! > > -----Original Message----- > From: rsyslog <[email protected]> On Behalf Of Rainer > Gerhards via rsyslog > Sent: Wednesday, September 2, 2020 3:14 AM > To: rsyslog-users <[email protected]> > Cc: Rainer Gerhards <[email protected]> > Subject: Re: [rsyslog] Upgraded receiver from Ubuntu 16.04 to 18.04, main > queue filling up, imrelp-related errors > > I don't see the rest of this thread, can you (re?) post you config? > > Rainer > > El mié., 2 sept. 2020 a las 3:13, Adam Chalkley via rsyslog > (<[email protected]>) escribió: > > > > Unfortunately the system is still having issues. > > > > I enabled debug logging earlier, copied the debug log aside and *then* > > disabled debug-on-demand logging (last time I forgot that not copying the > > file elsewhere first complete wipes the log file). > > > > I have 18 MB worth of captured details for a short timeframe. > > > > Out of that log file I'm seeing a lot of these entries: > > > > 7316.804057780:imuxsock.c : main Q: queue.c: EnqueueMsg advised worker > > start > > 7316.804062774:imuxsock.c : imuxsock.c: --------imuxsock calling poll() > > on 2 fds > > 7316.804232359:imuxsock.c : imuxsock.c: Message from UNIX socket: #3, > > size 221 > > 7316.804258678:imuxsock.c : datetime.c: ParseTIMESTAMP3339: invalid > > year: 0, pszTS: 'e' > > 7316.804264073:imuxsock.c : main Q: queue.c: EnqueueMsg advised worker > > start > > 7316.804267610:imuxsock.c : imuxsock.c: --------imuxsock calling poll() > > on 2 fds > > 7317.803939077:imrelp.c : main Q: queue.c: EnqueueMsg advised worker > > start > > 7317.803994833:imrelp.c : imrelp.c: relpTcpSend: send data: 14916 rsp > > 6 200 OK > > > > 7317.804025407:imrelp.c : imrelp.c: relpTcpSend: sock 80, lenbuf 19, > > send returned -1 [errno 32] > > 7317.804038885:imrelp.c : imrelp.c: librelp: generic error: ecode > > 10014, emsg 'error sending relp: Broken pipe' > > 7317.804049421:imrelp.c : errmsg.c: Called LogMsg, msg: imrelp[2514]: > > error 'error sending relp: Broken pipe', object 'lstn 2514: conn to clt > > 192.168.2.172/192.168.2.172' - input may not work as intended > > 7317.804054526:imrelp.c : operatingstate.c: osf: MSG imrelp[2514]: > > error 'error sending relp: Broken pipe', object 'lstn 2514: conn to clt > > 192.168.2.172/192.168.2.172' - input may not work as intended: signaling > > new internal message via SIGTTOU: 'imrelp[2514]: error 'error sending relp: > > Broken pipe', object 'lstn 2514: conn to clt 192.168.2.172/192.168.2.172' > > - input may not work as intended [v8.2006.0 try > > https://www.rsyslog.com/e/2353 ]' > > 7317.804097763:main thread : janitor.c: janitorRun() called > > > > The pattern seems to be that a remote target is interrupted and then the > > main queue (main Q) starts filling up. All omrelp actions have queues > > attached and none of them ever filled (going by impstats log file which was > > regularly receiving updates). > > > > Going by the local log file on the primary receiver it noted these details: > > > > 2020-09-01T12:27:33.306418-05:00 woodchuck1 rsyslogd: main Q:Reg: high > > activity - starting 1 additional worker thread(s), currently 1 active > > worker threads. [v8.2006.0 try https://www.rsyslog.com/e/2439 ] > > 2020-09-01T12:27:33.429855-05:00 woodchuck1 rsyslogd: omfwd: TCPSendBuf > > error -2027, destruct TCP Connection to graylog1.example.com:514 [v8.2006.0 > > try https://www.rsyslog.com/e/2027 ] > > 2020-09-01T12:27:33.469576-05:00 woodchuck1 rsyslogd: action > > 'ForwardToGraylog1' suspended (module 'builtin:omfwd'), retry 0. There > > should be messages before this one giving the reason for suspension. > > [v8.2006.0 try https://www.rsyslog.com/e/2007 ] > > 2020-09-01T12:27:33.470212-05:00 woodchuck1 rsyslogd: action > > 'ForwardToGraylog1' resumed (module 'builtin:omfwd') [v8.2006.0 try > > https://www.rsyslog.com/e/2359 ] > > 2020-09-01T12:33:02.644015-05:00 woodchuck1 rsyslogd: -- MARK -- > > 2020-09-01T12:53:02.711834-05:00 woodchuck1 rsyslogd: -- MARK -- > > 2020-09-01T13:13:02.802190-05:00 woodchuck1 rsyslogd: -- MARK -- > > 2020-09-01T13:33:02.856874-05:00 woodchuck1 rsyslogd: -- MARK -- > > 2020-09-01T13:45:10.206220-05:00 woodchuck1 rsyslogd: main Q:Reg: high > > activity - starting 1 additional worker thread(s), currently 2 active > > worker threads. [v8.2006.0 try https://www.rsyslog.com/e/2439 ] > > 2020-09-01T13:45:10.222580-05:00 woodchuck1 rsyslogd: librelp error 10008 > > forwarding to server woodchuck2.example.com:2514 - suspending [v8.2006.0 > > try https://www.rsyslog.com/e/2291 ] > > 2020-09-01T13:45:10.222794-05:00 woodchuck1 rsyslogd: action > > 'ForwardTowoodchuck2' suspended (module 'omrelp'), retry 0. There should be > > messages before this one giving the reason for suspension. [v8.2006.0 try > > https://www.rsyslog.com/e/2007 ] > > 2020-09-01T13:45:11.232309-05:00 woodchuck1 rsyslogd: action > > 'ForwardTowoodchuck2' resumed (module 'omrelp') [v8.2006.0 try > > https://www.rsyslog.com/e/2359 ] > > According to the impstats log, the main queue started filling up around > > 11:21 am, about the time that several of our mail relays were rebooting > > from maintenance. > > > > This is an Ubuntu 18.04 box that was in-place upgraded from Ubuntu 16.04. > > > > Not sure if it is relevant, but the existing rsyslog packages were not > > swapped out with replacement 18.04-based packages as part of the upgrade as > > expected. > > > > dpkg -l | grep rsyslog > > ii rsyslog 8.2006.0-0adiscon2xenial1 > > amd64 a rocket-fast system for log processing > > ii rsyslog-mmjsonparse 8.2006.0-0adiscon2xenial1 > > amd64 Parsing/handling of CEE/Lumberjack JSON > > messages in rsyslog > > ii rsyslog-mmnormalize 8.2006.0-0adiscon2xenial1 > > amd64 The rsyslog-mmnormalize package provides log > > normalization > > ii rsyslog-mmrm1stspace 8.2006.0-0adiscon2xenial1 > > amd64 The mmrm1stspace module permits to strip the > > leading space from > > ii rsyslog-relp 8.2006.0-0adiscon2xenial1 > > amd64 RELP protocol support for rsyslog > > > > I was planning to wait for the next rsyslog packages to be released for the > > Ubuntu PPA to see if that was enough to trigger a switchover to "bionic" > > based packages. I suspect this is completely unrelated to the problem > > experienced, but wanted to make sure and note it here. > > > > Recent changes (not *exactly* flailing around, but not very confident) from > > a few days ago (which didn't help today): > > > > * Modified forward queues (attached to actions within a ruleset) > > ** increase queue size from 10K to 100K > > ** increase worker threads from 1 to 4 > > ** disable explicit high water mark setting, allow default setting to apply > > ** disable explicit low water mark setting, allow default setting to apply > > > > * Modified 'main Q' (as impstats lists it) > > ** adjust worker threads from 1 to 4 > > ** adjust queue size from 10K to 500K > > > > Changes made tonight after restarting rsyslog: > > > > * Modified ruleset used to write out messages to local log files on the > > receiver > > ** increase queue size to 250K > > ** increase worker threads to 4 > > > > * Disable lightDelayMark on main queue (as discussed on this thread) > > > > > > I'm not sure what's going on. This box was quite stable up until we > > upgraded the OS from Ubuntu 16.04 to 18.04. > > > > Any ideas/suggestions are welcome. > > > > Thanks. > > > > -----Original Message----- > > From: Adam Chalkley > > Sent: Friday, August 28, 2020 4:15 PM > > To: rsyslog-users <[email protected]> > > Cc: [email protected] > > Subject: RE: [rsyslog] Upgraded receiver from Ubuntu 16.04 to 18.04, > > connections from clients failing with a high number of CLOSE_WAIT > > connections on receiver > > > > Hi Andre, > > > > Thank you for the additional feedback. > > > > As you suggested, the problem is likely tied back to the TCP probe. We've > > not had it enabled since last Sunday and rsyslog has been running fine > > without making the changes we previously discussed (I'm still interested in > > making them, I've just been pulled in other directions). > > > > Do you happen to know of a safe way to check that the port is open remotely > > without triggering a failure from rsyslog's perspective? I'm guessing that > > a minimal RELP-compatible client would be the best approach. Is there such > > a tool that you're aware of that could be called periodically to confirm > > that a rsyslog receiver (RELP-enabled port) is functioning properly? > > > > Just thought I would ask. > > > > Thanks! > > > > -----Original Message----- > > From: Andre Lorbach <[email protected]> > > Sent: Monday, August 24, 2020 4:06 AM > > To: rsyslog-users <[email protected]> > > Cc: Adam Chalkley <[email protected]> > > Subject: AW: [rsyslog] Upgraded receiver from Ubuntu 16.04 to 18.04, > > connections from clients failing with a high number of CLOSE_WAIT > > connections on receiver > > > > I think those errors were there all the time but not reported in older > > librelp version. > > I reviewed the code and we added this error output about 2 years ago in > > librelp. > > Ubuntu 16.04 most likely is using an older librelp version, so you did not > > see the error there. > > > > The problem is caused by the TCP Probe, it may helps if you try to receive > > data before you drop the connection. > > > > Best regards, > > Andre Lorbach > > -- > > Adiscon GmbH > > Mozartstr. 21 > > 97950 Großrinderfeld, Germany > > Ph. +49-9349-9298530 > > Geschäftsführer/President: Rainer Gerhards Reg.-Gericht Mannheim, HRB > > 560610 > > Ust.-IDNr.: DE 81 22 04 622 > > Web: www.adiscon.com - Mail: [email protected] > > > > Informations regarding your data privacy policy can be found here: > > https://www.adiscon.com/data-privacy-policy/ > > > > This e-mail may contain confidential and/or privileged information. If you > > are not the intended recipient or have received this e-mail in error > > please notify the sender immediately and delete this e-mail. Any > > unauthorized copying, disclosure or distribution of the material in this > > e-mail is strictly forbidden. > > > > Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte > > Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail > > irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und > > vernichten Sie diese E-Mail. Das unerlaubte Kopieren und die unbefugte > > Weitergabe dieser E-Mail sind nicht gestattet. > > > > > > > > > -----Ursprüngliche Nachricht----- > > > Von: rsyslog <[email protected]> Im Auftrag von Adam > > > Chalkley via rsyslog > > > Gesendet: Mittwoch, 19. August 2020 18:38 > > > An: rsyslog-users <[email protected]> > > > Cc: Adam Chalkley <[email protected]> > > > Betreff: [rsyslog] Upgraded receiver from Ubuntu 16.04 to 18.04, > > connections > > > from clients failing with a high number of CLOSE_WAIT connections on > > > receiver > > > > > > Hi, > > > > > > We upgraded the OS on our central receiver yesterday from Ubuntu 16.04 > > > (4.4 kernel) to 18.04 (4.15 kernel). > > > > > > We are using the upstream PPA, so running 8.2006.0 on receivers and > > > endpoints. > > > > > > When we started getting reports from our Nagios instance that the > > rsyslog > > > forward queues endpoints were beginning to fill we checked our receiver > > > (sawmill1) and saw 94 open TCP connections with 40 of them in CLOSE_WAIT > > > from our Nagios server, most of them I suspect from the TCP port > > connection > > > test performed every 5 minutes. > > > > > > Log samples from the receiver system (which are related to port probes > > from > > > our Nagios instance): > > > > > > 2020-08-19T10:05:01.279416-05:00 lincoln rsyslogd: -- MARK -- > > > 2020-08-19T10:05:08.249358-05:00 lincoln rsyslogd: imrelp[2514]: error > > 'server > > > closed relp session, session broken', object 'lstn 2514: conn to clt > > > 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0 > > try > > > https://www.rsyslog.com/e/2353 ] > > > 2020-08-19T10:05:08.249626-05:00 lincoln rsyslogd: imrelp[2514]: error > > 'error > > > sending relp: Bad file descriptor', object 'lstn 2514: conn to clt > > > 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0 > > try > > > https://www.rsyslog.com/e/2353 ] > > > 2020-08-19T10:08:08.020625-05:00 lincoln rsyslogd: imrelp[2514]: error > > 'server > > > closed relp session, session broken', object 'lstn 2514: conn to clt > > > 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0 > > try > > > https://www.rsyslog.com/e/2353 ] > > > 2020-08-19T10:08:08.021253-05:00 lincoln rsyslogd: imrelp[2514]: error > > 'error > > > sending relp: Bad file descriptor', object 'lstn 2514: conn to clt > > > 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0 > > try > > > https://www.rsyslog.com/e/2353 ] > > > 2020-08-19T10:11:08.074712-05:00 lincoln rsyslogd: imrelp[2514]: error > > 'server > > > closed relp session, session broken', object 'lstn 2514: conn to clt > > > 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0 > > try > > > https://www.rsyslog.com/e/2353 ] > > > > > > Log samples from the Nagios instance: > > > > > > 2020-08-19T11:19:53.444953-05:00 nagios rsyslogd: > > > omrelp[lincoln.lib.auburn.edu:2514]: error 'error waiting on required > > session > > > state, session broken', object 'conn to srvr > > lincoln.lib.auburn.edu:2514' - > > > action may not work as intended [v8.2006.0 try > > > https://www.rsyslog.com/e/2353 ] > > > 2020-08-19T11:19:53.445260-05:00 nagios rsyslogd: > > > omrelp[lincoln.lib.auburn.edu:2514]: error 'error opening connection to > > > remote peer', object 'conn to srvr lincoln.lib.auburn.edu:2514' - > > action may > > > not work as intended [v8.2006.0 try https://www.rsyslog.com/e/2353 ] > > > > > > Is there a setting I can apply to rsyslog to help resolve this? > > > > > > Is this a known bug? > > > > > > We didn't have the issue with v8.2006.0 on our receiver when it was > > running > > > Ubuntu 16.04 (the prior OS release), even though it made the same > > > complaints about the TCP port probes from Nagios. > > > > > > Thanks in advance. > > > > > > _______________________________________________ > > > rsyslog mailing list > > > https://lists.adiscon.net/mailman/listinfo/rsyslog > > > http://www.rsyslog.com/professional-services/ > > > What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: > > > This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites > > beyond > > > our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT. > > _______________________________________________ > > rsyslog mailing list > > https://lists.adiscon.net/mailman/listinfo/rsyslog > > http://www.rsyslog.com/professional-services/ > > What's up with rsyslog? Follow https://twitter.com/rgerhards > > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of > > sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T > > LIKE THAT. > _______________________________________________ > rsyslog mailing list > https://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > What's up with rsyslog? Follow https://twitter.com/rgerhards > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of > sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T > LIKE THAT. _______________________________________________ rsyslog mailing list https://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

