Re: [lopsa-tech] Server Overload and Log Processing

David Lang Mon, 24 Aug 2015 09:15:06 -0700

On Mon, 24 Aug 2015, Page, Jeremy wrote:

Sorry, was not saying don't look at logs, just saying logs are only reactiveand only see things you're logging (if the server crashes you may log nada butthat's definitely an issue! I also personally find correlation easier when Ihave graphic data but something like the ELK stack could help here, I havechecks that look at ELK and then alert when they find pertinent data (theycould also watch logs but this way they're in a single place and can also lookfor negatives (i.e. no one has logged in for 15 minutes is an error even ifeverything else is "green).

the absense of logs is also detectable by event correlation engines. I have avery simple ruleset in SEC that alerts me when things stop logging.

ELK and Splunk are great for exploring your data and doing correlationsmanually. But after you figure out what you are interested in, they are horriblyinefficient to do the ongoing monitoring and alerting compared to tools thataren't database driven.


https://www.usenix.org/publications/login/feb14/logging-reports-dashboards

https://www.usenix.org/publications/login/april14/lang (splunk tuning, most ofwhich is applicable to ElasticSearch with some terminology changes)

"looking at logs is 100% accurate at detecting logged problems :-)" - I'mstealing this.

go ahead. It can be a positive statement or a negative statement, depending onthe problem :-)

In this context, the point is that applications usually log internal problems,and when they do, it's far more accurate to react to the log messages than totry and detect the same problem by the application response behavior.


David Lang

my sec config: It sends an alert when something stops logging, and again every 4hours until it comes back. I have rsyslog configured to pass it a single value(unless it's the disable heartbeat alert message), which is usually hostname,but is sometimes a specific application/instance


type=single
ptype=regexp
pattern= disable heartbeat alert (\S+)
context=[!SEC_INTERNAL_EVENT]
desc=clear_heartbeat_$1
action=delete heartbeat_$1

type=single
ptype=regexp
pattern= setup extended logging outage alert for (\S+)
context=[!SEC_INTERNAL_EVENT]
desc=long_heartbeat_$1

action=create heartbeat_$1 14400 (shellcmd /usr/local/bin/sec/notify.sh $1 '4+hours'; udgram /dev/log " sec-alert: setup extended logging outage alert for $1");


type=single
ptype=regexp
pattern=(\S+)
context=[!SEC_INTERNAL_EVENT]
desc=heartbeat_$1

action=create heartbeat_$1 310 (shellcmd /usr/local/bin/sec/notify.sh $1 '5min'; udgram /dev/log " sec-alert: setup extended logging outage alert for $1 ")


# cat /usr/local/bin/sec/notify.sh
#!/bin/sh

(
echo "From: `hostname`@company.com"
echo "To: m...@company.com"
echo "Subject: $1 stopped reporting"
echo

echo "System $1 was generating logs, but has not generated any logs in the last$2"

echo

echo "If this system continues to fail to log, an additional message will begenerated every four hours, To disable this, create a log 'disable heartbeatalert $1'"

echo
echo "for example:"
echo "logger -t manual disable heartbeat alert $1"
#) >/var/log/alerts.notsent
) |sendmail -t

_______________________________________________
Tech mailing list
Tech@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/

Re: [lopsa-tech] Server Overload and Log Processing

Reply via email to