Hi. Long time no see!
A few months ago we deployed an opensearch cluster feeded by rsyslog and let it running without giving it any attention, to be honest. Today we realized it wasn't working and while reviewing things I noticed a weird thing with rsyslog. *Rsyslog is creating MANY (so many a simple "ls" gets stuck) queue.xxxnumber files on disk.* Probably there's something wrong with our configuration, but definitively is not working as I expected. find /nfs/rsyslog/ -type f | wc -l results in *649420* queue.xxx files du -sh /nfs/rsyslog/ results in *10T* (gladly, its an old synology) *rsyslog -v (I should probably upgrade, yeah...)* rsyslogd 8.2302.0 (aka 2023.02) compiled with: PLATFORM: x86_64-pc-linux-gnu PLATFORM (lsb_release -d): FEATURE_REGEXP: Yes GSSAPI Kerberos 5 support: No FEATURE_DEBUG (debug build, slow code): No 32bit Atomic operations supported: Yes 64bit Atomic operations supported: Yes memory allocator: system default Runtime Instrumentation (slow code): No uuid support: Yes systemd support: Yes Config file: /etc/rsyslog.conf PID file: /var/run/rsyslogd.pid Number of Bits in RainerScript integers: 64 *Configuration (I think is better to paste it complete, sorry if I'm wrong):* # https://www.rsyslog.com/doc/master/configuration/modules/omstdout.html #module(load="omstdout") # http://www.rsyslog.com/doc/v8-stable/configuration/modules/impstats.html # http://www.rsyslog.com/rsyslog-statistic-counter/ # http://www.rsyslog.com/impstats-analyzer/ module(load="impstats" log.file="/logs/rsyslog-stats.log" interval="60" log.syslog="off") # http://www.rsyslog.com/doc/master/rainerscript/global.html global( workDirectory="/data" # http://www.rsyslog.com/doc/v8-stable/troubleshooting/debug.html environment=["RSYSLOG_DEBUGLOG=/logs/rsyslog-debug.log", "RSYSLOG_DEBUG=DebugOnDemand NoStdOut"] MaxMessageSize="128k" internalmsg.ratelimit.interval="1" internalmsg.ratelimit.burst="100000" # http://www.rsyslog.com/doc/rsconf1_escapecontrolcharactersonreceive.html parser.escapeControlCharactersOnReceive="off" # This controls whether slashes in the "programname" property are permitted or not. This property bases on a BSD concept, and by BSD syslogd sources, slashes are NOT permitted inside the program name. However, some Linux tools (including most importantly the journal) store slashes as part of the program name inside the syslogtag. In those cases, the programname is truncated at the first slash. If this setting is changed to "on", slashes are permitted and will not terminate programname parsing. #parser.permitSlashInHostname="on" # This controls whether slashes in the static part of the tag are permitted or not. If this setting is off, a value of "app/foo[1234]" in the tag will result in a programname of "app". If an application stores an absolute path name like "/app/foo[1234]", the programname property will become empty (""). If you need to actually store slashes as part of the programname, this setting should be changed to "on" to permit this. Then, a syslogtag of "/app/foo[1234]" will result in programname being "/app/foo". parser.permitSlashInProgramName="on" ) main_queue( queue.filename="main" queue.maxdiskspace="4G" queue.saveonshutdown="on" queue.lowwatermark="10000" queue.highwatermark="60000" queue.size="100000" queue.type="LinkedList" queue.fulldelaymark="90000" queue.lightdelaymark="75000" #TODO queue.dequeuebatchsize="1 #IIUC: writes in disk when memory reaches highwatermark #If disk becomes full, we have 2 options: # - Dont accept any more messages, using fulldelaymark # - Reject messages #WARN: if bigger than fulldelay, will never discard #queue.discardmark="9750" #queue.discardseverity="8" ) #TODO using UDP until RELP works flawlessly #http://www.rsyslog.com/doc/v8-stable/configuration/modules/imudp.html module(load="imudp") input( port="514" type="imudp" name="imudp" ruleset="ruleset" ) module(load="imtcp") input( port="601" type="imtcp" name="imtcp" ruleset="ruleset" ) # http://www.rsyslog.com/doc/master/configuration/modules/imrelp.html module(load="imrelp") input( port="20514" type="imrelp" name="imrelp" ruleset="ruleset" ) # http://www.rsyslog.com/doc/master/configuration/templates.html # Debug: template="RSYSLOG_DebugFormat" # http://www.rsyslog.com/doc/master/configuration/properties.html ruleset( name="ruleset" queue.filename="queue" # Memory queue will grow until _highwatermark_ # when reached, it will start writing in disk [DA] # it will stop if it goes down to _lowwatermark_ # Memory queue will start discarding if reaches _size_ # # Disk queue will grow up to _size_ (or _maxdiskspace_?) # Tested so far: # It grows a bit over _size_ but doesn't reach _maxdiskspace_ # however it stopped growing, so i don't know if that second statement is true # To me, its not clear if disk queue discards or overwrites and some messages are lost queue.maxdiskspace="1G" queue.saveonshutdown="on" queue.lowwatermark="10000" queue.highwatermark="60000" queue.size="100000" queue.type="LinkedList" queue.fulldelaymark="90000" queue.lightdelaymark="75000" ) { action(type="omfwd" Target="mytarget" Port="10514" Protocol="tcp") } *Statistics (please, notice ruleser[DA size])* Fri Apr 19 11:15:02 2024: imudp(w0): origin=imudp called.recvmmsg=0 called.recvmsg=0 msgs.received=0 Fri Apr 19 11:16:02 2024: global: origin=dynstats Fri Apr 19 11:16:02 2024: global: origin=percentile Fri Apr 19 11:16:02 2024: TCP-mytarget-10514: origin=omfwd bytes.sent=0 Fri Apr 19 11:16:02 2024: action-0-builtin:omfwd: origin=core.action processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0 Fri Apr 19 11:16:02 2024: imudp(*/514/IPv4): origin=imudp submitted=0 disallowed=0 Fri Apr 19 11:16:02 2024: imtcp(601): origin=imtcp submitted=0 Fri Apr 19 11:16:02 2024: imrelp(20514): origin=imrelp submitted=0 Fri Apr 19 11:16:02 2024: resource-usage: origin=impstats utime=27080 stime=262330 maxrss=7876 minflt=576 majflt=142 inblock=24824 oublock=15496 nvcsw=1798 nivcsw=266 openfiles=15 *Fri Apr 19 11:16:02 2024: ruleset[DA]: origin=core.queue size=-1917883149 enqueued=0 full=0 discarded.full=0 discarded.nf <http://discarded.nf>=0 maxqsize=0 * Fri Apr 19 11:16:02 2024: ruleset: origin=core.queue size=0 enqueued=0 full=0 discarded.full=0 discarded.nf=0 maxqsize=0 Fri Apr 19 11:16:02 2024: main Q[DA]: origin=core.queue size=0 enqueued=0 full=0 discarded.full=0 discarded.nf=0 maxqsize=0 Fri Apr 19 11:16:02 2024: main Q: origin=core.queue size=0 enqueued=0 full=0 discarded.full=0 discarded.nf=0 maxqsize=0 Fri Apr 19 11:16:02 2024: imudp(w0): origin=imudp called.recvmmsg=0 called.recvmsg=0 msgs.received=0 Going back a few days, we can see there's a moment queue size crash: *Mon Apr 1 16:52:46 2024: ruleset[DA]: origin=core.queue size=0 enqueued=0 full=0 discarded.full=0 discarded.nf <http://discarded.nf>=0 maxqsize=0 * Mon Apr 1 16:52:46 2024: ruleset: origin=core.queue size=0 enqueued=0 full=0 discarded.full=0 discarded.nf=0 maxqsize=0 Mon Apr 1 16:52:46 2024: main Q[DA]: origin=core.queue size=0 enqueued=0 full=0 discarded.full=0 discarded.nf=0 maxqsize=0 Mon Apr 1 16:52:46 2024: main Q: origin=core.queue size=0 enqueued=0 full=0 discarded.full=0 discarded.nf=0 maxqsize=0 Mon Apr 1 16:52:46 2024: imudp(w0): origin=imudp called.recvmmsg=0 called.recvmsg=0 msgs.received=0 Mon Apr 1 16:55:53 2024: global: origin=dynstats Mon Apr 1 16:55:53 2024: global: origin=percentile Mon Apr 1 16:55:53 2024: TCP-mytarget-10514: origin=omfwd bytes.sent=0 Mon Apr 1 16:55:53 2024: action-0-builtin:omfwd: origin=core.action processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0 Mon Apr 1 16:55:53 2024: imudp(*/514/IPv4): origin=imudp submitted=0 disallowed=0 Mon Apr 1 16:55:53 2024: imtcp(601): origin=imtcp submitted=0 Mon Apr 1 16:55:53 2024: imrelp(20514): origin=imrelp submitted=0 Mon Apr 1 16:55:53 2024: resource-usage: origin=impstats utime=3259 stime=19555 maxrss=7936 minflt=558 majflt=102 inblock=19024 oublock=8 nvcsw=148 nivcsw=35 openfiles=14 *Mon Apr 1 16:55:53 2024: ruleset[DA]: origin=core.queue size=-2025213502 enqueued=0 full=0 discarded.full=0 discarded.nf <http://discarded.nf>=0 maxqsize=0* Seems rsyslog queue counter overflowed or whatever, but its also failing to measure disk-size usage (queue.maxdiskspace="1G" vs 10T). Apart from upgrading rsyslog version, do you see anything that could be improved? Perhaps stats each 60s are too much, maybe we didn't setup discard following best practices... Thanks in advance _______________________________________________ rsyslog mailing list https://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.