Dirk,
There are a couple examples of the problem:
1. Servers Alive hangs (hasn't happened in a while, thankfully)
2. The machine Servers Alive is running on loses its network connection. All
of a sudden, you have hundreds of checks failing, and after a server reboot,
everything is normal.
As far as the example goes, yes, we would want to see each instance of an
outage. This report is used in our weekly team meeting where we discuss
problems from the past week. The report is called up during the meeting. The
report we have now includes records where the laststatuschange column shows a
value within the last 7 days of the time the report was run.
We log the entries to an Oracle database and have 2 views setup to facilitate
the reporting.
This view shows entries that have gone down and come back up in the last week:
DROP VIEW APPDEV.SA_STATS_V;
/* Formatted on 2008/08/29 09:05 (Formatter Plus v4.8.8) */
CREATE OR REPLACE FORCE VIEW appdev.sa_stats_v (HOST, downat, upat)
AS
SELECT HOST,
TO_DATE (previousstatuschange, 'MM/DD/YYYY hh:mi:ss AM') AS downat,
TO_DATE (laststatuschange, 'MM/DD/YYYY hh:mi:ss AM') AS upat
FROM sa_stats_t s, sa_codes_t s1, sa_codes_t s2
WHERE TO_DATE (s.laststatuschange, 'MM/DD/YYYY hh:mi:ss AM') > SYSDATE - 7
AND s.status = s1.status
AND s.previousstatus = s2.status
--and s1.description not in('Maintenance', 'Unavailable')
AND s2.description NOT IN ('Maintenance', 'Unavailable')
-- Start permanently don't want entries AND s.HOST NOT IN
('Good Morning', 'CalgServer4 Server', 'ServerA.agrium.com',
'ServerB.agrium.com', 'ServerC.agrium.com', 'ServerD.agrium.com')
-- End permanently don't want entries
-- Start temporary don't want entries
AND s.HOST NOT IN
('CalgServer2 LIMSCP Status Monitor Service',
'CalgServer2 LIMSRP Event Monitor Service',
'CalgServer2 LIMSCP Event Monitor Service',
'CalgServer2 LIMSRP Status Monitor Service',
'CalgServer2 Redwater LIMS Prod Database')
-- End temporary don't want entries
AND s1.description = 'Up'
UNION
SELECT s.HOST, h.laststatuschange AS downat,
TO_DATE (NULL, 'MM/DD/YYYY hh:mi:ss AM') AS upat
FROM sa_stats_t s,
(SELECT HOST,
MAX
(TO_DATE (laststatuschange, 'MM/DD/YYYY hh:mi:ss AM')
) AS laststatuschange
FROM sa_stats_t
-- where to_date(laststatuschange, 'MM/DD/YYYY hh:mi:ss AM') > sysdate -30
GROUP BY HOST) h
WHERE s.status = 1
AND TO_DATE (s.laststatuschange, 'MM/DD/YYYY hh:mi:ss AM') =
h.laststatuschange
-- Start permanently don't want entries
AND s.HOST NOT IN
('Good Morning', 'CalgServer4 Server', 'ServerA.agrium.com',
'ServerB.agrium.com', 'ServerC.agrium.com', 'ServerD.agrium.com')
-- End permanently don't want entries
-- Start temporary don't want entries
AND s.HOST NOT IN
('CalgServer2 LIMSCP Status Monitor Service',
'CalgServer2 LIMSRP Event Monitor Service',
'CalgServer2 LIMSCP Event Monitor Service',
'CalgServer2 LIMSRP Status Monitor Service',
'CalgServer2 Redwater LIMS Prod Database')
-- End temporary don't want entries
AND s.HOST = h.HOST
GROUP BY s.HOST, h.laststatuschange;
Here is the other view, which shows entries currently down:
DROP VIEW APPDEV.SA_STATS_CURRENTLY_DOWN_V;
/* Formatted on 2008/08/29 09:09 (Formatter Plus v4.8.8) */
CREATE OR REPLACE FORCE VIEW appdev.sa_stats_currently_down_v (HOST, downat)
AS
SELECT s.HOST, h.laststatuschange AS downat
FROM sa_stats_t s,
(SELECT HOST,
MAX
(TO_DATE (laststatuschange, 'MM/DD/YYYY hh:mi:ss AM')
) AS laststatuschange
FROM sa_stats_t
-- where to_date(laststatuschange, 'MM/DD/YYYY hh:mi:ss AM') > sysdate -30
GROUP BY HOST) h
WHERE s.status = 1
AND TO_DATE (s.laststatuschange, 'MM/DD/YYYY hh:mi:ss AM') =
h.laststatuschange
-- Start permanently don't want entries AND s.HOST NOT IN
('Good Morning', 'CalgServer4 Server', 'ServerA.agrium.com',
'ServerB.agrium.com', 'ServerC.agrium.com', 'ServerD.agrium.com')
-- End permanently don't want entries
-- Start temporary don't want entries
AND s.HOST NOT IN
('CalgServer2 LIMSCP Status Monitor Service',
'CalgServer2 LIMSRP Event Monitor Service',
'CalgServer2 LIMSCP Event Monitor Service',
'CalgServer2 LIMSRP Status Monitor Service',
'CalgServer2 Redwater LIMS Prod Database')
-- End temporary don't want entries
AND s.HOST = h.HOST
GROUP BY s.HOST, h.laststatuschange
ORDER BY h.laststatuschange;
I hope this helps,
Brett Hanson
>>> "Dirk" <[EMAIL PROTECTED]> 8/29/2008 8:50 AM >>>
If SA isn't running that it can detect that change.not much we can do about
that.
Would this also mean that you would want (example):
Check Name Down At Up At Outage Duration
LDAP Server 27-Aug 4:06 PM 27-Aug 4:32 PM 26 Minutes
LDAP Server 27-Aug 7:06 PM 27-Aug 7:30 PM 24 Minutes
LDAP Server 28-Aug 4:06 AM 28-Aug 7:30 PM 204 Minutes
is the entry went down several times in the last week?
Also for the week starting on Aug 24th (at midnight), how would you handle the
fact that the entry went down on the 23rd at 11:58pm? Should it mark this down
at 23rd 11:58pm or starting 24th at midnight?
Dirk Bulinckx.
From: Servers Alive Discussion List [mailto:[EMAIL PROTECTED] On Behalf Of
Brett Hanson
Sent: Friday, August 29, 2008 4:41 PM
To: Servers Alive Discussion List
Subject: Re: [SA-list] Servers Alive and reporting
One report we've been struggling with for the last couple years is an activity
report. We'd like to see a report that shows outages detected by Servers Alive
in the past 7 days. This report would show the check name, when the check went
down, when it came back up and the duration. Any checks down when the report
is run would show as 'Currently Down'.
For example:
Check Name Down At Up At Outage Duration
LDAP Server 27-Aug 4:06 PM 27-Aug 4:32 PM 26 Minutes
Service A 29-Aug 3:10 AM Currently Down 5 Hours 17 Minutes
Our biggest issue is problems with accuracy that result when Servers Alive is
restarted - a check that was down before Servers Alive was shut down and was up
when Servers Alive started again is not detected as a status change, and no
database record showing the transition exists.
Regards,
Brett Hanson
Systems Analyst
Agrium
>>> "Dirk" <[EMAIL PROTECTED]> 8/29/2008 3:50 AM >>>
One of the often returning question on Servers Alive is if it can do reporting.
We always point to the HTML template based output and to the DB logging (using a
3rd party report writer). Still it seems that this is not what people are
looking for.
That's why we would like you to help us with some brainstorming around that
reporting feature.
I'll start by giving my own idea on it.
* it's based on the HTML template based output
* it can be set to be executed (generated) once a day and you can select
what "entries" go on it
* you can ofcourse have several output's and for several sets of entries
* additional parameters are needed like
% up cycles
% down cycles
% maintenance cycles
and this per DAY, WEEK, MONTH, YEAR
with "easy" access to the current (day/week/...) and the previous
(day/week/...) and also access to other days/weeks/months/year.
Example:
<sa_stats_up_week{pervious}%>gives the up% of the previous
week
<sa_stats_up_week082008%>gives the up% of week 8 of 2008
<sa_stats_down_month082008%>gives the up% of month 8 of 2008
All ideas/comments/additions are MORE THEN WELCOME
Dirk Bulinckx.
To unsubscribe send a message with UNSUBSCRIBE in the subject line to
[email protected]
If you use auto-responders (like out-of-the-office messages), make sure that
they are not sent to the list nor to individual members. Doing so will cause
you to be automatically removed from the list.
_____________________________________________________________
IMPORTANT NOTICE !
This E-Mail transmission and any accompanying attachments may contain
confidential information intended only for the use of the individual or entity
named above. Any dissemination, distribution, copying or action taken in
reliance on the contents of this E-Mail by anyone other than the intended
recipient is strictly prohibited and is not intended to, in anyway, waive
privilege or confidentiality. If you have received this E-Mail in error please
immediately delete it and notify sender at the above E-Mail address.
Agrium uses state of the art anti-virus technology on all incoming and outgoing
E-Mail. We encourage and promote the use of safe E-Mail management practices
and recommend you check this, and all other E-Mail and attachments you receive
for the presence of viruses. The sender and Agrium accept no liability for any
damage caused by a virus or otherwise by the transmittal of this E-Mail.
_____________________________________________________________
To unsubscribe send a message with UNSUBSCRIBE in the subject line to
[email protected]
If you use auto-responders (like out-of-the-office messages), make sure that
they are not sent to the list nor to individual members. Doing so will cause
you to be automatically removed from the list.
To unsubscribe send a message with UNSUBSCRIBE in the subject line to
[email protected]
If you use auto-responders (like out-of-the-office messages), make sure that
they are not sent to the list nor to individual members. Doing so will cause
you to be automatically removed from the list.
_____________________________________________________________
IMPORTANT NOTICE !
This E-Mail transmission and any accompanying attachments may contain
confidential information intended only for the use of the individual or entity
named above. Any dissemination, distribution, copying or action taken in
reliance on the contents of this E-Mail by anyone other than the intended
recipient is strictly prohibited and is not intended to, in anyway, waive
privilege or confidentiality. If you have received this E-Mail in error please
immediately delete it and notify sender at the above E-Mail address.
Agrium uses state of the art anti-virus technology on all incoming and outgoing
E-Mail. We encourage and promote the use of safe E-Mail management practices
and recommend you check this, and all other E-Mail and attachments you receive
for the presence of viruses. The sender and Agrium accept no liability for any
damage caused by a virus or otherwise by the transmittal of this E-Mail.
_____________________________________________________________
To unsubscribe send a message with UNSUBSCRIBE in the subject line to
[email protected]
If you use auto-responders (like out-of-the-office messages), make sure that
they are not sent to the list nor to individual members. Doing so will cause
you to be automatically removed from the list.