Levi Wilbert wrote:
Here's the updated config. I moved the queues to within the action()'s for the
ELK stack and syslog-db servers.
good, I will review it later.
I'm trying to figure out how best to create
output queues for the local file copies of syslog messages, as there is an
action for messages, secure, mail, etc etc and the best I can figure out is to
update each of those omfile outputs to the new Rainer format, and assign each
a queue individually, but it seems like a bit overkill and honestly that
particular output isn't essential for us anyway, it's more of a temporary
"cache" we can easily peek into to verify things are working as expected.
What I would suggest is to put them all in a ruleset (using the existing syntax
for the actions, but the new syntax for the ruleset) and put the queue on the
ruleset.
normally putting a queue on local filesystem writes actually hurts overall
performance, but if you want to be sure logs keep getting sent remotely, even if
the local filesystem fills up, it can be a reasonable thing to do.
personally, I eliminate most, if not all of the separate files that the logs are
split into and just put them in /var/log/messages
David Lang
________________________________
From: David Lang <da...@lang.hm>
could you send me your full configs again?
start rsyslog with the option -o /path/to/file
and rsyslog will combine all the config files into that file (which is much
easier to look at and understand than having a bunch of separate file and having
to reason how they are combined)
you are correct that the (DA) files are representing the disk file.
I suspect that you don't have the config that you think you have (I mentioned
the need to replace all the $foo queue related things that you had in the config
you posted before with parameters inside the action() statement)
David Lang
On Fri, 14 Feb 2025, Levi Wilbert wrote:
I appreciate all the help you've given us! Thanks so much for putting in the
time to help!
I had a follow up question. In rsyslog-stats, I see lines from ommysql "queue" and ommysql
"queue[DA]". I believe the "queue[DA]" messages are the disk-assisted queue, however, I'm a little
confused on the meaning of the "queue" messages.
It looks like the "queue" lines are for the in-memory queue for that output, as it's constantly
enqueueing new messages throughout the day, and the "queue[DA]" only increases during when our DB
backups are happening. However, the queue size seems to be limited to around 1000 messages for
"queue".
I have the disk-assisted highwatermark set to 1000000 messages, and the low
watermark set to 100000, but it seems to kick on once the in-memory queue gets
up to 1000. My question is, if the disk-assisted queue isn't supposed to kick
in until the high watermark is reached, when the in-memory queue hits 1000,
where are messages going before the high watermark is reached?
I've attached our rsyslog-stats for the prior day that demonstrates this
happening.
Levi Wilbert
HPC & Linux Systems Administrator
ARCC - Division of Research and Economic Development
Information Technology Ctr 226
1000 E. University Avenue, Laramie, WY 82071-200
________________________________
From: David Lang <da...@lang.hm>
Sent: Wednesday, February 12, 2025 1:31 PM
To: Levi Wilbert <lwilb...@uwyo.edu>
Cc: David Lang <da...@lang.hm>; Levi Wilbert via rsyslog
<rsyslog@lists.adiscon.com>
Subject: Re: [rsyslog] Rsyslog Losing Messages
Levi Wilbert wrote:
OK, I think we've identified the primary cause of our syslog message loss.
On our DB server, we have backups taken around 8pm which include a DB dump. In
addition to this, the DB server runs within a VM, mounts the Syslog database
over NFS, and between all of this we think its simply being overloaded when our
syslog server sends syslogs over to MariaDB.
that sounds very plausible. you can increase the size of the main queue or add a
queue to the database action so that rsyslog doesn't fill it's main queue and
stop accepting new messages.
David Lang
I've upped the impstats to run 3 times over 10 minutes. I've included the
rsyslog-stats from last evening. In this, I've disabled our ELK Stack output,
so logs are only going to the local disk and our DB server.
Right after 8pm, I can see the main queue filling up and things stop
processing. I haven't added in the dynacache or output queues quite yet, right
now just getting some baseline measurements.
Levi Wilbert
HPC & Linux Systems Administrator
ARCC - Division of Research and Economic Development
Information Technology Ctr 226
1000 E. University Avenue, Laramie, WY 82071-200
________________________________
From: David Lang <da...@lang.hm>
Sent: Friday, February 7, 2025 4:27 PM
To: Levi Wilbert <lwilb...@uwyo.edu>
Cc: David Lang <da...@lang.hm>; Levi Wilbert via rsyslog
<rsyslog@lists.adiscon.com>
Subject: Re: [rsyslog] Rsyslog Losing Messages
Levi Wilbert wrote:
Awesome, thank you this is very helpful!
For the dynacache, we have around 600 or so hosts we collect logs from on this
particular server. I read this shouldn't practically be set above 1000 unless
it's for very particular situations, I'm thinking of starting it around 100-200
in that area, would you agree/disagree?
No, if you expect to be writing to 600 files during one rotation cycle, you need
to have the cache size >= 600
strawman example, if you get 600 hosts sending you one message per
minute (and assume in the same orderand have your cache size set to 599.
the first 599 messages that arrive will open a file and write one message to it.
Then the message from host 600 arrives. Rsyslog will flush the buffer and close
the file for host 1, open the file for host 600, write the log to the buffer.
Then the next message from host 1 arrives. Rsyslog will flush the buffer and
close the file for hsot 2, hoepn the file for host 1, write the log to the
buffer.
and continue to have to flush, close, and open the files for every log message
that arrives.
(now, in practice, the logs don't arrive exactly in the same order from every
host perfectly interleaved, but this is a simplified, worst case example)
you should ALWAYS have the dynafilecachesize larger than the number of files you
expect to be writing to. HAving the number be higher just means you have more
files open (make sure you have your max file limits set large enough, for many
distros 1000 files is the default limit, which is why it's mentioned in the
docs)
Just to clarify some on the output of rsyslog-stats, is the 'enqueued' counter
on the 'main Q' the number of messages that have been queued and then dequeued
(i.e., sent to their out/handled)? And within the same lines, the 'size' is
the current queue size?
it's the number added to the queue, enqueued - size is the number that have been
processed. You can set it to zero all stats each time it reports, or leave
things like enqueued, processed, etc as running totals (there is a race
condition with zeroing the stats in a multi-threaded environment that can cause
them to be slightly off, but I tend to zero them as it's easier to understand.
David Lang
I'm doing further testing by disabling our outputs temporarily, to what is
causing the blocking. I'll update as discover more!
Levi Wilbert
HPC & Linux Systems Administrator
ARCC - Division of Research and Economic Development
Information Technology Ctr 226
1000 E. University Avenue, Laramie, WY 82071-200
________________________________
From: David Lang <da...@lang.hm>
Sent: Wednesday, February 5, 2025 3:25 PM
To: Levi Wilbert <lwilb...@uwyo.edu>
Cc: David Lang <da...@lang.hm>; Levi Wilbert via rsyslog
<rsyslog@lists.adiscon.com>
Subject: Re: [rsyslog] Rsyslog Losing Messages
Ok, one thing I see is that you have a dynafilecachesize that's too small
Tue Feb 4 22:10:36 2025: dynafile cache host_messages: origin=omfile
requests=4316034 level0=3811573 missed=380153 evicted=380143 maxused=10
closetimeouts=0
This has the system spending a lot of time closing a file, opening a new one,
writing a log to it, and then repeating the process a LOT. That can drastically
slow down your log processing. Since maxused is 10, that indicates defaults. Set
the cach size to something larger than the number of files you expect to have
open
another one is
Tue Feb 4 22:30:36 2025: action-7-ommysql: origin=core.action
processed=2270931 failed=0 suspended=0 suspended.duration=0 resumed=0
It looks like many of your outputs are not seeing the processed number increase,
so something is blocking, causing the queues to fill and then everything stops.
take a look at https://www.rsyslog.com/doc/whitepapers/queues_analogy.html and
consider adding some queues to some of the actions that could block if the
remote system pauses. you do have a queue on action 8 (looks like the elk
forward) and that queue is not filling up
try setting the pstats to output more frequently than once every 10 min and have
it running before the time that logs stop processing, it may give a little more
detail. Look for a time when the queue size goes from near zero (normal
operations) to something much larger, hopefully it happens slowly enough for us
to see some stats.
I surprised we aren't seeing any failed or syspended (other than open failed for
inbound tcp connections)
I don't think this section is doing what you expect:
$ActionQueueType LinkedList # use asynchronous processing
$ActionQueueFileName syslogdb1 # set file name, also enables disk mode
$ActionResumeRetryCount -1 # infinite retries on insert failure
$ActionQueueSaveOnShutdown on # save in-memory data if rsyslog shuts down
$ActionQueueHighWatermark 51200000 # Switch to disk mode when queue =>
~4.76GiB ( 5120000000 x 512 bytes avg msg size)
$ActionQueueLowWatermark 25600000 # Exit disk mode when queue < ~1.86GiB
$ActionQueueMaxDiskSpace 30G
This is setting up a queue for the next action in the config that is not using
the action() format (I think this is the writes to
/var/log/rsyslog/%HOSTNAME%/messages but it's really hard to be sure). This is
one of the biggest reasons why this format is depreiciated, it's hard even for
experts to keep track of what's happening, especially when the config is spread
across multiple files
David Lang
On Wed, 5 Feb 2025, Levi Wilbert wrote:
Date: Wed, 5 Feb 2025 20:52:18 +0000
From: Levi Wilbert <lwilb...@uwyo.edu>
To: David Lang <da...@lang.hm>
Cc: Levi Wilbert via rsyslog <rsyslog@lists.adiscon.com>
Subject: Re: [rsyslog] Rsyslog Losing Messages
In fact, I've enabled rsyslog-stats already. It does look like the queue has
been getting hammered, so much that the main queue is full, with millions of
messages enqueued. I've attached an hours worth of rsyslog-stats during the
impacted time.
Levi Wilbert
HPC & Linux Systems Administrator
ARCC - Division of Research and Economic Development
Information Technology Ctr 226
1000 E. University Avenue, Laramie, WY 82071-200
________________________________
From: David Lang <da...@lang.hm>
Sent: Wednesday, February 5, 2025 1:39 PM
To: Levi Wilbert <lwilb...@uwyo.edu>
Cc: David Lang <da...@lang.hm>; Levi Wilbert via rsyslog
<rsyslog@lists.adiscon.com>
Subject: Re: [rsyslog] Rsyslog Losing Messages
I forgot to mention the other reason you may not see logs, if an output blocks,
all log processing halts (unless you have action/ruleset queues in place)
In your configs, you send logs to elasticsearch using omfwd over TCP, if
eleasticsearch can't handle the message, all log processing will pause. The logs
will get queued up (to the limits of your queues) and be delivered later. But if
your queues get overrun, you may lose logs.
I would suggest that you enable impstats, writing to a local file, and see what
it shows during the time that log processing isn't happening.
David Lang
On Wed, 5 Feb 2025, Levi
Wilbert wrote:
Date: Wed, 5 Feb 2025 20:18:34 +0000
From: Levi Wilbert <lwilb...@uwyo.edu>
To: David Lang <da...@lang.hm>,
Levi Wilbert via rsyslog <rsyslog@lists.adiscon.com>
Subject: Re: [rsyslog] Rsyslog Losing Messages
1. your VM server gets overloaded and stops scheduling your VM for a chunk of
time
*
The VM itself appears fine. I can SSH to it, open 514/tcp w/ telnet from a
remote host, cd, view files, etc.
2. you have a network problem (overload, interupption) that causes the packets
to not get through.
*
It looks like packets are hitting the syslog server. I tested with a VM on
which I generated logs during the impacted period, and could see traffic
incoming into the syslog server using tcpdump. Granted, visually looking at it
I can't guarantee every log is getting through, but I can see syslog traffic
from that host, and no logs ever appear in the any of output destinations.
3. you have a flood of messages that are arriving faster than they can be
processed and your network buffers on your OS are overflowing (from your
description, this doesn't seem likely)
for the network problem, this doesn't have to be on your VM server.
For example, if you are sending the logs via UDP and have a router that gets
overloaded with a nightly backup, it is allowed to drop UDP packets.
*
this is something I can look into, but as I mentioned in #2, I tested w/ a VM
sending syslogs during the impacted times, could see syslog traffic arriving on
the syslog server port, but those logs never appear to get passed along.
So look at what else is going on on the network/systems around the time of your
log outage? is that when a nightly backup runs somewhere? other big batch job?
Additionally, I can share our config (included: /etc/rsyslog.conf,
etc/rsyslog.d/*.conf).
Levi Wilbert
HPC & Linux Systems Administrator
ARCC - Division of Research and Economic Development
Information Technology Ctr 226
1000 E. University Avenue, Laramie, WY 82071-200
________________________________
From: David Lang <da...@lang.hm>
Sent: Wednesday, February 5, 2025 10:55 AM
To: Levi Wilbert via rsyslog <rsyslog@lists.adiscon.com>
Cc: Levi Wilbert <lwilb...@uwyo.edu>
Subject: Re: [rsyslog] Rsyslog Losing Messages
◆ This message was sent from a non-UWYO address. Please exercise caution when
clicking links or opening attachments from external sources.
we would need to see your config to have any idea what's going on (are you
sending via UDP, TCP, RELP, other??)
But there are a few basic things that can go wrong
1. your VM server gets overloaded and stops scheduling your VM for a chunk of
time
2. you have a network problem (overload, interupption) that causes the packets
to not get through.
3. you have a flood of messages that are arriving faster than they can be
processed and your network buffers on your OS are overflowing (from your
description, this doesn't seem likely)
for the network problem, this doesn't have to be on your VM server.
For example, if you are sending the logs via UDP and have a router that gets
overloaded with a nightly backup, it is allowed to drop UDP packets.
So look at what else is going on on the network/systems around the time of your
log outage? is that when a nightly backup runs somewhere? other big batch job?
David Lang
On Wed, 5 Feb 2025, Levi Wilbert via rsyslog wrote:
Date: Wed, 5 Feb 2025 16:18:15 +0000
From: Levi Wilbert via rsyslog <rsyslog@lists.adiscon.com>
To: "rsyslog@lists.adiscon.com" <rsyslog@lists.adiscon.com>
Cc: Levi Wilbert <lwilb...@uwyo.edu>
Subject: [rsyslog] Rsyslog Losing Messages
Greetings,
We are using Rsyslog on RHEL9.3 to gather logs in an environment of around 600
or so servers. All of these servers forward directly to our single syslog
server, which then forwards the logs along to a mysql db (runs on a separate
server), ELK stack, and to file locally on the system.
I've noticed at around the same time each night, rsyslog begins dropping most
of the incoming logs, and there is a gap where almost all logs simply aren't
recorded.
Network connectivity seems ok, as I am able to connect to the syslog server,
connect to 514/TCP port (we also use UDP), and I can see logs hitting the
server using tcpdump on the NIC.
The syslog server is a virtualized server w/ 4 CPU cores and 8G of RAM.
Would anyone have any idea on how to tune rsyslog to avoid these periods of log
loss?
Levi Wilbert
HPC & Linux Systems Administrator
ARCC - Division of Research and Economic Development
Information Technology Ctr 226
1000 E. University Avenue, Laramie, WY 82071-200
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.
## full conf created by rsyslog version 8.2102.0-117.el9 at 2025-02-17 13:57:07
##
##### BEGIN CONFIG: /etc/rsyslog.conf
# rsyslog configuration file
# For more information see /usr/share/doc/rsyslog-*/rsyslog_conf.html
# or latest version online at http://www.rsyslog.com/doc/rsyslog_conf.html
# If you experience problems, see http://www.rsyslog.com/doc/troubleshoot.html
#### GLOBAL DIRECTIVES ####
# Log statistics
module(load="impstats"
interval="200"
severity="7"
log.syslog="off"
log.file="/var/log/rsyslog-stats"
)
# to actually gather the data:
#syslog.=debug /var/log/rsyslog-stats
# Where to place auxiliary files
global(workDirectory="/var/lib/rsyslog")
# Use default timestamp format
module(
load="builtin:omfile"
Template="RSYSLOG_TraditionalFileFormat"
)
#### MODULES ####
module(load="imuxsock"
SysSock.Use="off") # Turn off message reception via local log socket;
# local messages are retrieved through imjournal now.
module(load="imjournal"
UsePid="system"
StateFile="imjournal.state") # File to store the position in the journal
#module(load="imklog") # reads kernel messages (the same are read from journald)
#module(load="immark") # provides --MARK-- message capability
module(load="ompgsql") # postgres plugin
module(load="ommysql") # mysql/mariadb plugin
# Include all config files in /etc/rsyslog.d/
include(file="/etc/rsyslog.d/*.conf" mode="optional")
##### BEGIN CONFIG: /etc/rsyslog.d/80-elk-syslog.conf
##### BEGIN CONFIG: /etc/rsyslog.d/70-syslog-db1.conf
##### BEGIN CONFIG: /etc/rsyslog.d/20-remote-logging.conf
##### BEGIN CONFIG: /etc/rsyslog.d/01-json-template.conf
template(name="json-template"
type="list") {
constant(value="{")
constant(value="\"@timestamp\":\"") property(name="timereported"
dateFormat="rfc3339")
constant(value="\",\"@version\":\"1")
constant(value="\",\"message\":\"") property(name="msg" format="json")
constant(value="\",\"sysloghost\":\"") property(name="hostname")
constant(value="\",\"severity\":\"")
property(name="syslogseverity-text")
constant(value="\",\"facility\":\"")
property(name="syslogfacility-text")
constant(value="\",\"programname\":\"") property(name="programname")
constant(value="\",\"procid\":\"") property(name="procid")
constant(value="\"}\n")
}
##### END CONFIG: /etc/rsyslog.d/01-json-template.conf
# Name: 20-remote-logging.conf
# Author: Levi Wilbert
# Date: 2023-12-06
# Desc: This file processes remote rsyslog client logs and filters logs into
# their own discrete folders, without affecting the local host.
# Notes:
# 2025-02-12 - lwilbert: Added in filtering for kern.* messages, since we
want to record these messages.
# Create templates to define log locations
template(name="host_messages" type="string"
string="/var/log/rsyslog/%HOSTNAME%/messages")
template(name="host_secure" type="string"
string="/var/log/rsyslog/%HOSTNAME%/secure")
template(name="host_maillog" type="string"
string="/var/log/rsyslog/%HOSTNAME%/maillog")
template(name="host_cron" type="string"
string="/var/log/rsyslog/%HOSTNAME%/cron")
template(name="host_spooler" type="string"
string="/var/log/rsyslog/%HOSTNAME%/spooler")
template(name="host_bootlog" type="string"
string="/var/log/rsyslog/%HOSTNAME%/boot.log")
template(name="host_console" type="string"
string="/var/log/rsyslog/%HOSTNAME%/console.log")
#### RULES ####
# Match all logs from hosts that are not the local host
if ($hostname != "syslog1") then {
# Log all kernel messages to the console.
# Logging much else clutters up the screen.
kern.* ?host_console
# Log anything (except mail) of level info or higher.
# Don't log private authentication messages!
*.info;mail.none;authpriv.none;cron.none ?host_messages
# The authpriv file has restricted access.
authpriv.* ?host_secure
# Log all the mail messages in one place.
mail.* -?host_maillog
# Log cron stuff
cron.* ?host_cron
# Everybody gets emergency messages
#*.emerg :omusrmsg:*
# Save news errors of level crit and higher in a special file.
uucp,news.crit ?host_spooler
# Save boot messages also to boot.log
local7.* ?host_bootlog
}
##### END CONFIG: /etc/rsyslog.d/20-remote-logging.conf
# Standard Setup
# template
$template tpl,"insert into SystemEvents (Message, Facility, FromHost, Priority,
DeviceReportedTime, ReceivedAt, InfoUnitID, SysLogTag) values ('%msg%',
%syslogfacility%, '%HOSTNAME%', %syslogpriority%,
'%timereported:::date-mysql%', '%timegenerated:::date-mysql%', %iut%,
'%syslogtag%')",SQL
# certificate files
#$DefaultNetstreamDriverCAFile
/etc/pki/tls/certs/syslog-db1_arcc_domain.edu_interm.cer
$DefaultNetstreamDriverCAFile /etc/pki/tls/certs/arccdb1_2024-05-08.crt
# make gtls driver the default
$DefaultNetStreamDriver gtls
# TLS stuff
$ActionSendStreamDriverMode 1 # run driver in TLS-only mode
$ActionSendStreamDriverAuthMode x509/name
$ActionSendStreamDriverPermittedPeer 10.10.11.22 # syslog-db1 as of
2024-03-04
# Send syslogs w/ severity WARN (4) thru EMERG (0) to MariaDB on syslog-db1
(10.99.5.41)
if ( $syslogseverity <= 4 ) then {
action(type="ommysql"
server="10.10.11.22"
serverport="3306"
db="Syslog"
uid="rsyslog"
pwd="<secret>"
MySQLConfig.File="/etc/my.cnf.d/mariadb-client-tls.cnf"
MySQLConfig.Section="client-mariadb"
queue.filename="syslogdb1"
queue.maxdiskspace="10G"
queue.saveonshutdown="on"
queue.type="LinkedList"
queue.highwatermark="1000000"
queue.lowwatermark="100000"
)
}
##### END CONFIG: /etc/rsyslog.d/70-syslog-db1.conf
# Name: 80-elk-syslog.conf
# Date: 2023-12-06
# Author: Levi Wilbert
# Desc: Forwards all logs to ELK Logging server
# Notes:
# 2025-02-13 - lwilbert - Added queue
# New syntax
action(
type="omfwd"
queue.filename="herdUW"
queue.maxdiskspace="5g"
queue.saveonshutdown="on"
queue.type="LinkedList"
queue.highwatermark="1000000"
queue.lowwatermark="100000"
Target="elk-syslog.domain.edu"
Port="514"
Protocol="tcp"
template="json-template"
)
##### END CONFIG: /etc/rsyslog.d/80-elk-syslog.conf
# Provides UDP syslog reception
# for parameters see http://www.rsyslog.com/doc/imudp.html
#module(load="imudp" ) # needs to be done just once
module(load="imudp" BatchSize="128" threads="4" ) # needs to be done just once
input(type="imudp" port="514" )
# Provides TCP syslog reception
# for parameters see http://www.rsyslog.com/doc/imtcp.html
module(load="imtcp") # needs to be done just once
input(type="imtcp" port="514")
#module(load="imptcp" threads="4" ) # tcp performance testing - lwilbert
#input(type="imptcp" port="514") # tcp performance testing - lwilbert
#### RULES ####
# Only perform the following filters for the local host
if ($hostname == "syslog1") then {
# Log all kernel messages to the console.
# Logging much else clutters up the screen.
#kern.* /dev/console
# Log anything (except mail) of level info or higher.
# Don't log private authentication messages!
*.info;mail.none;authpriv.none;cron.none
/var/log/messages
# The authpriv file has restricted access.
authpriv.* /var/log/secure
# Log all the mail messages in one place.
mail.*
-/var/log/maillog
# Log cron stuff
cron.* /var/log/cron
# Everybody gets emergency messages
*.emerg :omusrmsg:*
# Save news errors of level crit and higher in a special file.
uucp,news.crit /var/log/spooler
# Save boot messages also to boot.log
local7.*
/var/log/boot.log
}
# ### sample forwarding rule ###
#action(type="omfwd"
# # An on-disk queue is created for this action. If the remote host is
# # down, messages are spooled to disk and sent when it is up again.
#queue.filename="fwdRule1" # unique name prefix for spool files
#queue.maxdiskspace="1g" # 1gb space limit (use as much as possible)
#queue.saveonshutdown="on" # save messages to disk on shutdown
#queue.type="LinkedList" # run asynchronously
#action.resumeRetryCount="-1" # infinite retries if host is down
# # Remote Logging (we use TCP for reliable delivery)
# # remote_host is: name/ip, e.g. 192.168.0.1, port optional e.g. 10514
#Target="remote_host" Port="XXX" Protocol="tcp")
##### END CONFIG: /etc/rsyslog.conf
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.