One more quick comment: when you search, it's faster to use filters than queries, have a look here: http://www.elasticsearch.org/guide/reference/query-dsl/
Unless you need the relevance scoring, which you probably won't, for logs, because you'd want to sort them by date for most of the time. Filters are faster because they don't calculate the score and are cached. 2013/6/18 Radu Gheorghe <[email protected]> > 2013/6/18 Mahesh V <[email protected]> > >> Hi, >> >> performance seems to be good with elastic search >> 2 minutes and change for 5 lakh entries >> > > Glad to hear that it's good :) I guess there's a typo there so I don't get > your statement completely. > > >> >> However, i have a problem. >> I can enter the messages into elasticsearch only when rsyslogd is running >> in foreground (rsyslogd -n running in command line) >> not sure why this is so. >> > > Hmm, I've got a similar issue: > http://www.gossamer-threads.com/lists/rsyslog/users/9463#9463 > > The only solution I found so far was to move the init script (ie: mv > /etc/init.d/rsyslog /etc/init.d/rsyslog-new). I'm not sure if your issue is > the same, but you can check by changing the init script to do only "rsyslog > -dn > /tmp/logfile &". Then start it and check the log. > > >> >> Secondly, my requirement would be to query based on part of message. >> i.e my message may look like >> <date><time>ip=x.x.x.x name=abcd loglevel=3 <actual log message> >> Is it possible to query using curl alll messages that have ip address as >> y.y.y.y ? >> > > Ah, that's a problem, because the standard > analyzer<http://www.elasticsearch.org/guide/reference/index-modules/analysis/standard-analyzer/>would > break your log into "words" (terms). And each number in an IP would > be a term. > > The same analyzer would be applied by default when you search (each number > from the IP is a term, and ES will look for any of the numbers in your > logs). > > There are some things you can do: > - the query you run with "q=msg:127.0.0.1" for example, does a string > query<http://www.elasticsearch.org/guide/reference/query-dsl/query-string-query/>. > If you specify your query in JSON, you can change the default operator from > OR to AND. Like: > > curl localhost:9200/_search?pretty -d '{ > "query": { > "query_string": { > "query": "127.0.0.1", > "default_operator": "AND" > } > } > }' > > This would match only the logs that contain 127, 0 and 1. > > If you need more precise matches, the best way to go is to parse your logs > and insert each field in it's own field. Like: > {"ip":"x.x.x.x", "name":"abcd"} > > This way you can search in the ip field directly. You can also search in > all fields by using the default "_all" field. That said, to match only the > exact IP, you need to make sure ES doesn't analyze your "ip" field to brake > it into terms. You'd do that by setting "index" to "not_analyzed" in your > mapping <http://www.elasticsearch.org/guide/reference/mapping/>. For > example: > > curl -XPUT 'http://localhost:9200/system/events/_mapping' -d '{ > "events" : { > "properties" : { > "ip" : {"type" : "string", "index" : "not_analyzed"} > } > } > }' > > You need to put your > mapping<http://www.elasticsearch.org/guide/reference/api/admin-indices-put-mapping/>before > you start indexing that field, otherwise it will get detected > automatically. > > >> >> Is 4000 (500000 entries / 125 odd seconds) the max I can get per second in >> my system or can I get some more tuning parameters. >> > > You can definitely get more. I've sent you some links a while ago, I'll > paste them here again: > http://edgeofsanity.net/article/2012/12/26/elasticsearch-for-logging.html > http://www.elasticsearch.org/tutorials/using-elasticsearch-for-logs/ > --> you can ignore the "compression" advice from there, it's compressed by > default in 0.90+ > http://wiki.rsyslog.com/index.php/Queues_on_v6_with_omelasticsearch > > The first thing you can do is to enable bulks by adding bulkmode="on" to > your action line. I see you already set ActionQueueDequeueBatchSize. Not > sure if it works, I usually set it as queue.dequeuebatchsize="1000" in the > action line. > > >> >> My rsyslog.conf has the following lines >> ------------------------------------------------------------ >> $ActionQueueDequeueBatchSize 1000 >> >> template (name="apsimTemplate" type="list" option.json="on") { >> constant(value="{") >> constant(value="\"@message\":\"") >> property(name="msg") >> constant(value="\"}") >> } >> >> *.* action(type="omelasticsearch" template="apsimTemplate" >> server="localhost" serverport="9200") >> >> >> My elasticsearch.yml has folliwing lines >> ------------------------------------------------------------ >> cluster: >> name: APSIM >> >> network: >> host: localhost >> >> root@localhost rsyslog]# date; ./a.out ; date >> Mon Jun 17 23:33:29 IST 2013 >> openlog: Success >> Mon Jun 17 23:35:32 IST 2013 >> >> [root@localhost rsyslog]# >> [root@localhost rsyslog]# curl 'http://localhost:9200/_search?pretty=true >> ' >> -d ' >> { >> "from" : 0, "size" : 1000000, >> "query" : { >> "matchAll" : {} >> } >> }' > e >> % Total % Received % Xferd Average Speed Time Time Time >> Current >> Dload Upload Total Spent Left >> Speed >> 100 1688k 100 1688k 0 86 19.1M 998 --:--:-- --:--:-- --:--:-- >> 19.6M >> >> >> [root@localhost rsyslog]# cat e | grep "this is a test" | wc -l >> 500000 >> > > Note that you can simply go like: > curl localhost:9200/_search?size=0 > > And watch the hits.total field for the number of hits. If you want to get > serious with the performance test it would be expensive to fetch all your > docs. > > Best regards, > Radu > _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

