On Fri, Apr 01, 2016 at 10:08:47AM PDT, N.J. Thomas spake thusly: > * Tracy Reed <tr...@ultraviolet.org> [2016-03-31 12:15:55-0700]: > > I have done a lot of work with Splunk also and have seriously mixed > > feelings about it. > > Apart from the cost, can you expand on that?
Their demo is awesome and it is theoretically possible to do really cool stuff. But the Splunk Processing Language (aka SPL, their query language) makes the simple stuff simple (it's like google for your logs!) and the slightly more sophisticated stuff nearly impossible. Massive time goes into figuring out how to make a reasonably complex Splunk query work. Massive effort was put into making all of those slick queries you will see in a demo. You aren't likely to be able to do that yourself. http://docs.splunk.com/Documentation/Splunk/latest/Search/Aboutcalculatingstatistics We had Splunk indexers just mysteriously fall over occasionally. They simply stopped sending logs. We had a dozen datacenters. And a few times one of them stopped sending logs. There is no good way to have some sort of external process monitor that Splunk is still receiving logs. So how do you monitor for this? The only way is with Splunk alerts and try to get Splunk to monitor itself. So I cooked up a Splunk query which would alert if log volume for any particular datacenter ever fell below a certain level. I couldn't actually cause a datacenter outage to test it but I could adjust the alerting threshold and as I dialed it up and down it alerted below a certain level as expected. Later we had another outage and my query/alert didn't work. It turns out that when a datacenter stopped sending data it didn't appear in the flow of SPL logic as datacenter1=0. It's that datacenter1 didn't appear anywhere so no alert was generated. In hindsight this is understandable (but also means an entirely new way to approach the problem must be found) but also shows how difficult it is to reason about SPL. Give me an imperative or even a proper functional programming language any day. You cannot easily save state from one query to another. I would like to just run a query to generate a list of all of the datacenters and then iterate over that list of datacenters ensuring that there is greater than 0 log events per datacenter. Not only can you not really save state between queries but iteration isn't a thing. There are workarounds for all of this in SPL but they are quite complicated. You can sort of save state by sending your query results to a sort of index just like a log event which you then have to query for and parse like it's log data, a real kludge. I've been away from Splunk for nearly two years and it seems like they now also have a key-value store to query somehow which might help but you still have to query it with SPL. Meanwhile, with Logstash I can write a Python program and save state in a file or however I want between invocations and iterate over search results to my heart's content. SPL is the ONLY way to query Splunk. There is no real programmatic API. Well, there is a ReST API, but you are just passing it SPL queries. It isn't like the Logstash API. SPL is slightly remeniscent of SQL married to UNIX shell with data being piped from one command to the next but definitely isn't UNIX shell or SQL. Splunk "apps"...oh how I wish they had just used Ruby on Rails or Django or the like instead of cooking up their own XML based web design language. I never got the hang of it. And even if I had it's a proprietary app-specific technology which I would invest a bunch of time learning which has no use away from Splunk. As opposed to Rails, Django, etc. They would probably go for Django as a lot of Splunk is Python. Splunk is quite difficult to configure. There are a number of different places to configure things and it is hard to know or discover what goes where. Does this particular setting go in the inputs.conf or the props.conf or...? And which version of each of these files does it go in? The system file, the local file, the app-specific file? Does it go in that file on the indexer or the search head or in the forwarder remote machine actually doing the logging? http://docs.splunk.com/Documentation/Splunk/latest/admin/Wheretofindtheconfigurationfiles I could go on... But Splunk looks nice. If you let the Splunk folks demo it for your CTO I'm sure he will love it (aside from the price). But then you get saddled with actually making it work. Good luck! I know some people have...but we had half a dozen people bounce off of Splunk at my very large shop over the course of nearly 5 years and we've barely done anything with it but the bare minimum to be able to say we have a centralized logging solution for compliance purposes. I would much rather have spent the several million dollars we have spent on Splunk licensing fees on paying good people to make Logstash do what we want and getting much better results instead of paying people *and* licensing fees but that wasn't my call. -- Tracy Reed
pgpfXL2gsh6rr.pgp
Description: PGP signature
_______________________________________________ Tech mailing list Tech@lists.lopsa.org https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/