Rob,
one comment on the real-time search capability of splunk4, per recent
conversations with splunk, the real-time search is not going to be able to
be integrated with past data.
in other words, you can say, 'do this search on data that arrives after
now', but you cannot say 'do this search on data that arrived/arrives
after 5 min ago'
David Lang
On Mon, 1 Mar 2010, Rob Das wrote:
Date: Mon, 1 Mar 2010 10:26:38 -0800
From: Rob Das <rob...@gmail.com>
To: discuss@lopsa.org
Subject: [lopsa-discuss] splunk alternatives
First, please forgive me if this email is overly long.
Yes, SEC and Splunk are different in many ways - both useful in the right
context. I have a few questions. How much data per day are you talking
about? Are you interested in looking at historical data and comparing it
against current data? Do you need any sort of roll-ups or more advanced
aggregations/analytics on your data? You may not now, but will you ever be
interested in gathering events that cannot be captured via syslog (extra
large, application or multi-line events for example)? Do you want different
people to have access to different types of data. Do you want different
roles of users to see different views? Do you foresee that the data volumes
will grow over time? Are your 20 users really concurrent or will they be
searching randomly throughout the day?
First of all the new version of Splunk (version 4.1), which will be out very
soon, includes real-time support. What this means is that searches can
optionally be executed at data input time as the data is acquired. If
events as they come in match a search, alerts can be triggered.
Furthermore, Splunk's dashboards, graphs and tables will update in real time
as the data comes in effectively providing a "heartbeat".
If you need to "find the needle in the haystack", you can't find a better
tool.
Simple stuff like "Tell me the top ten logins by IP address over the last 24
hours or month" can't be done with SEC without writing code. Splunk handles
this via it's GUI and graphs like this can be placed on dashboards which
update in real-time. Splunk can easily filter out data that you are not
interested in or keep it for as long as you like - your choice.
Splunk provides role-based access controls that can optionally filter data
at search time depending on who is allowed to see what.
One of the most important concepts is that Splunk doesn't require or impose
any structure on the incoming data. You can apply structure at search time,
which means that as data changes in your data center (because new versions
of software/hardware are installed, etc), you will not need to re-do any
regular expressions.
Depending on daily data volumes, Splunk will run very well on commodity type
hardware. As your business grows, it can scale to handle it (to
terrabytes/day). If your daily volume doesn't exceed 500M/day, you can use
the free version of the software.
SEC is a low level tool written in Pearl that requires you to create regular
expressions that match patterns in your data. It also requires quite a bit
of scripting to make it work in many environments. As things change, you
will need to update your regular expressions or things will break.
SEC implements a state machine that operates over incoming data. There are
many cool things you can do with it, but like David L says keeps all of it's
state in memory. Splunk does not currently implement a state machine in the
same way as SEC. However, Splunk's search language, which is extremely
robust, can handle many of the same use cases - especially with the
introduction of real-time searching version 4.1.
I have not taken a look at logsurfer, so I can't comment on it. I'll check
it out.
I am more than happy to field questions directly if you wish.
Rob Das
r...@splunk.com
Co-founder / Chief Architect
Splunk, Inc.
Paul DiSciascio wrote:
I'm looking for a good way to share log files on a centralized syslog
server with about 10-20 people/developers who are familiar with the log
formats but not very much with unix tools. They want an easy way to
dig thru the logs and filter out junk they're not interested in, but
still have near realtime visibility. Obviously, splunk can do this,
but it's pricey and their documentation seems to indicate that 20
concurrent users would be a lot to ask for without a lot of hardware.
I really only need an interface capable of some rudimentary filtering,
and if possible the ability to save those searches or filters. Does
anyone have any suggestions short of writing this myself?
You might be interested in SEC (simple event correlator) for this
purpose. But, if you just want a presentation interface, logsurfer might
be more what you are looking for. SEC is much more like splunk while
logsurfer is more of a realtime filtering monitor.
I'm not sure what you have seen of splunk, but it and SEC have very little
in common.
splunk allows for arbatrary search queries against your past log data (and
indexes it like crazy to make the search fairly efficiant)
SEC watches for patterns (or combinations of patterns) to appear in the
logs and generates alerts.
splunk can simulate SEC's functionality by doing repeated queries against
the logs, but that's fairly inefficant.
the answer to the original question, it depends a lot on the amount of
data that you are working with.
If you can fit it all in ram on a machine, then there are a lot of things
that you can use to query it. The problem comes when you can no longer fit
it in ram and have to go to disk, at that point you need an application
that does a lot of indexing (and/or spreads the load across multiple
machines, depending on how much data you have and how fast you want your
answers)
you say that your users are not familiar with unix tools, are they
familiar with using SQL for queries?
David Lang
_______________________________________________
Discuss mailing list
Discuss@lopsa.org
http://lopsa.org/cgi-bin/mailman/listinfo/discuss
This list provided by the League of Professional System Administrators
http://lopsa.org/
_______________________________________________
Discuss mailing list
Discuss@lopsa.org
http://lopsa.org/cgi-bin/mailman/listinfo/discuss
This list provided by the League of Professional System Administrators
http://lopsa.org/