On Fri, Apr 01, 2016 at 10:08:47AM PDT, N.J. Thomas spake thusly:
> * Tracy Reed <tr...@ultraviolet.org> [2016-03-31 12:15:55-0700]:
> > I have done a lot of work with Splunk also and have seriously mixed
> > feelings about it.
> 
> Apart from the cost, can you expand on that?

Their demo is awesome and it is theoretically possible to do really cool stuff.
But the Splunk Processing Language (aka SPL, their query language) makes the
simple stuff simple (it's like google for your logs!) and the slightly more
sophisticated stuff nearly impossible. Massive time goes into figuring out how
to make a reasonably complex Splunk query work. Massive effort was put into
making all of those slick queries you will see in a demo. You aren't likely to
be able to do that yourself.

http://docs.splunk.com/Documentation/Splunk/latest/Search/Aboutcalculatingstatistics

We had Splunk indexers just mysteriously fall over occasionally. They simply
stopped sending logs. We had a dozen datacenters. And a few times one of them
stopped sending logs. There is no good way to have some sort of external
process monitor that Splunk is still receiving logs. So how do you monitor for
this? The only way is with Splunk alerts and try to get Splunk to monitor
itself. So I cooked up a Splunk query which would alert if log volume for any
particular datacenter ever fell below a certain level. I couldn't actually
cause a datacenter outage to test it but I could adjust the alerting threshold
and as I dialed it up and down it alerted below a certain level as expected.
Later we had another outage and my query/alert didn't work. It turns out that
when a datacenter stopped sending data it didn't appear in the flow of SPL
logic as datacenter1=0. It's that datacenter1 didn't appear anywhere so no
alert was generated. In hindsight this is understandable (but also means an
entirely new way to approach the problem must be found) but also shows how
difficult it is to reason about SPL. Give me an imperative or even a proper
functional programming language any day. You cannot easily save state from one
query to another. I would like to just run a query to generate a list of all of
the datacenters and then iterate over that list of datacenters ensuring that
there is greater than 0 log events per datacenter. Not only can you not really
save state between queries but iteration isn't a thing. 

There are workarounds for all of this in SPL but they are quite complicated.
You can sort of save state by sending your query results to a sort of index
just like a log event which you then have to query for and parse like it's log
data, a real kludge.  I've been away from Splunk for nearly two years and it
seems like they now also have a key-value store to query somehow which might
help but you still have to query it with SPL.

Meanwhile, with Logstash I can write a Python program and save state in a file
or however I want between invocations and iterate over search results to my
heart's content.

SPL is the ONLY way to query Splunk. There is no real programmatic API. Well,
there is a ReST API, but you are just passing it SPL queries. It isn't like the
Logstash API. SPL is slightly remeniscent of SQL married to UNIX shell with
data being piped from one command to the next but definitely isn't UNIX shell
or SQL.

Splunk "apps"...oh how I wish they had just used Ruby on Rails or Django or the
like instead of cooking up their own XML based web design language. I never got
the hang of it. And even if I had it's a proprietary app-specific technology
which I would invest a bunch of time learning which has no use away from
Splunk. As opposed to Rails, Django, etc. They would probably go for Django as
a lot of Splunk is Python.

Splunk is quite difficult to configure. There are a number of different places
to configure things and it is hard to know or discover what goes where. Does
this particular setting go in the inputs.conf or the props.conf or...? And
which version of each of these files does it go in? The system file, the local
file, the app-specific file? Does it go in that file on the indexer or the
search head or in the forwarder remote machine actually doing the logging? 

http://docs.splunk.com/Documentation/Splunk/latest/admin/Wheretofindtheconfigurationfiles

I could go on... But Splunk looks nice. If you let the Splunk folks demo it for
your CTO I'm sure he will love it (aside from the price). But then you get
saddled with actually making it work. Good luck! I know some people have...but
we had half a dozen people bounce off of Splunk at my very large shop over the
course of nearly 5 years and we've barely done anything with it but the bare
minimum to be able to say we have a centralized logging solution for compliance
purposes.

I would much rather have spent the several million dollars we have spent on
Splunk licensing fees on paying good people to make Logstash do what we want
and getting much better results instead of paying people *and* licensing fees
but that wasn't my call.

-- 
Tracy Reed

Attachment: pgpfXL2gsh6rr.pgp
Description: PGP signature

_______________________________________________
Tech mailing list
Tech@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Reply via email to