Thanks a lot Radu.

Here is what I did using curl command line in linux. (below).
However, If I were to do this with syslog, how is it possible?
I mean, I need to create the index (apsim in my case) and type (some ip
address index) using rsyslog.
Will the schema in rsyslog.conf help me create this?

[root@localhost rsyslog]# curl -XGET
10.16.131.8:9200/apsim/2_2_0_1/_search?q=ip:"2.2.0.1"
{"took":8,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":0.30685282,"hits":[{"_index":"apsim","_type":"2_2_0_1","_id":"1","_score":0.30685282,
"_source" :
{
   "ip": "2.2.0.1",
   "name": "ap_00",
   "log": "this is a test log"
}},{"_index":"apsim","_type":"2_2_0_1","_id":"2","_score":0.30685282,
"_source" :
{
   "ip": "2.2.0.1",
   "name": "ap_00",
   "log": "this is a another test log"
}}]}}[root@localhost rsyslog]#

[root@localhost rsyslog]# curl -XGET
10.16.131.8:9200/apsim/2_2_0_1/_search?q=log:"another"
{"took":2,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.15342641,"hits":[{"_index":"apsim","_type":"2_2_0_1","_id":"2","_score":0.15342641,
"_source" :
{
   "ip": "2.2.0.1",
   "name": "ap_00",
   "log": "this is a another test log"
}}]}}[root@localhost rsyslog]#

[root@localhost rsyslog]# curl -XGET
10.16.131.8:9200/apsim/2_2_0_1/_search?q=ip:"2.2.0.1"
{"took":8,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":0.30685282,"hits":[{"_index":"apsim","_type":"2_2_0_1","_id":"1","_score":0.30685282,
"_source" :
{
   "ip": "2.2.0.1",
   "name": "ap_00",
   "log": "this is a test log"
}},{"_index":"apsim","_type":"2_2_0_1","_id":"2","_score":0.30685282,
"_source" :
{
   "ip": "2.2.0.1",
   "name": "ap_00",
   "log": "this is a another test log"
}}]}}[root@localhost rsyslog]#
[root@localhost rsyslog]# curl -XGET
10.16.131.8:9200/apsim/2_2_0_1/_search?q=log:"another"
{"took":2,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.15342641,"hits":[{"_index":"apsim","_type":"2_2_0_1","_id":"2","_score":0.15342641,
"_source" :
{
   "ip": "2.2.0.1",
   "name": "ap_00",
   "log": "this is a another test log"
}}]}}[root@localhost rsyslog]#





On Tue, Jun 18, 2013 at 8:28 PM, Radu Gheorghe <[email protected]>wrote:

> One more quick comment: when you search, it's faster to use filters than
> queries, have a look here:
> http://www.elasticsearch.org/guide/reference/query-dsl/
>
> Unless you need the relevance scoring, which you probably won't, for logs,
> because you'd want to sort them by date for most of the time. Filters are
> faster because they don't calculate the score and are cached.
>
>
> 2013/6/18 Radu Gheorghe <[email protected]>
>
> > 2013/6/18 Mahesh V <[email protected]>
> >
> >> Hi,
> >>
> >> performance seems to be good with elastic search
> >>  2 minutes and change for 5 lakh entries
> >>
> >
> > Glad to hear that it's good :) I guess there's a typo there so I don't
> get
> > your statement completely.
> >
> >
> >>
> >> However, i have a problem.
> >> I can enter the messages into elasticsearch only when rsyslogd is
> running
> >> in foreground   (rsyslogd -n running in command line)
> >> not sure why this is so.
> >>
> >
> > Hmm, I've got a similar issue:
> > http://www.gossamer-threads.com/lists/rsyslog/users/9463#9463
> >
> > The only solution I found so far was to move the init script (ie: mv
> > /etc/init.d/rsyslog /etc/init.d/rsyslog-new). I'm not sure if your issue
> is
> > the same, but you can check by changing the init script to do only
> "rsyslog
> > -dn > /tmp/logfile &". Then start it and check the log.
> >
> >
> >>
> >> Secondly, my requirement would be to query based on part of message.
> >> i.e my message may look like
> >> <date><time>ip=x.x.x.x name=abcd loglevel=3 <actual log message>
> >> Is it possible to query using curl alll messages that have ip address as
> >> y.y.y.y ?
> >>
> >
> > Ah, that's a problem, because the standard analyzer<
> http://www.elasticsearch.org/guide/reference/index-modules/analysis/standard-analyzer/>would
> break your log into "words" (terms). And each number in an IP would
> > be a term.
> >
> > The same analyzer would be applied by default when you search (each
> number
> > from the IP is a term, and ES will look for any of the numbers in your
> > logs).
> >
> > There are some things you can do:
> > - the query you run with "q=msg:127.0.0.1" for example, does a string
> > query<
> http://www.elasticsearch.org/guide/reference/query-dsl/query-string-query/
> >.
> > If you specify your query in JSON, you can change the default operator
> from
> > OR to AND. Like:
> >
> > curl localhost:9200/_search?pretty -d '{
> >   "query": {
> >     "query_string": {
> >       "query": "127.0.0.1",
> >       "default_operator": "AND"
> >     }
> >   }
> > }'
> >
> > This would match only the logs that contain 127, 0 and 1.
> >
> > If you need more precise matches, the best way to go is to parse your
> logs
> > and insert each field in it's own field. Like:
> > {"ip":"x.x.x.x", "name":"abcd"}
> >
> > This way you can search in the ip field directly. You can also search in
> > all fields by using the default "_all" field. That said, to match only
> the
> > exact IP, you need to make sure ES doesn't analyze your "ip" field to
> brake
> > it into terms. You'd do that by setting "index" to "not_analyzed" in your
> > mapping <http://www.elasticsearch.org/guide/reference/mapping/>. For
> > example:
> >
> > curl -XPUT 'http://localhost:9200/system/events/_mapping' -d '{
> >     "events" : {
> >         "properties" : {
> >             "ip" : {"type" : "string", "index" : "not_analyzed"}
> >         }
> >     }
> > }'
> >
> > You need to put your mapping<
> http://www.elasticsearch.org/guide/reference/api/admin-indices-put-mapping/>before
> you start indexing that field, otherwise it will get detected
> > automatically.
> >
> >
> >>
> >> Is 4000 (500000 entries / 125 odd seconds) the max I can get per second
> in
> >> my system or can I get some more tuning parameters.
> >>
> >
> > You can definitely get more. I've sent you some links a while ago, I'll
> > paste them here again:
> >
> http://edgeofsanity.net/article/2012/12/26/elasticsearch-for-logging.html
> > http://www.elasticsearch.org/tutorials/using-elasticsearch-for-logs/
> > --> you can ignore the "compression" advice from there, it's compressed
> by
> > default in 0.90+
> > http://wiki.rsyslog.com/index.php/Queues_on_v6_with_omelasticsearch
> >
> > The first thing you can do is to enable bulks by adding bulkmode="on" to
> > your action line. I see you already set ActionQueueDequeueBatchSize. Not
> > sure if it works, I usually set it as queue.dequeuebatchsize="1000" in
> the
> > action line.
> >
> >
> >>
> >> My rsyslog.conf has the following lines
> >> ------------------------------------------------------------
> >> $ActionQueueDequeueBatchSize  1000
> >>
> >> template (name="apsimTemplate" type="list" option.json="on") {
> >>   constant(value="{")
> >>   constant(value="\"@message\":\"")
> >>   property(name="msg")
> >>   constant(value="\"}")
> >> }
> >>
> >> *.*   action(type="omelasticsearch" template="apsimTemplate"
> >> server="localhost" serverport="9200")
> >>
> >>
> >> My elasticsearch.yml has folliwing lines
> >> ------------------------------------------------------------
> >> cluster:
> >>    name:   APSIM
> >>
> >> network:
> >>    host:   localhost
> >>
> >> root@localhost rsyslog]# date; ./a.out ; date
> >> Mon Jun 17 23:33:29 IST 2013
> >> openlog: Success
> >> Mon Jun 17 23:35:32 IST 2013
> >>
> >> [root@localhost rsyslog]#
> >> [root@localhost rsyslog]# curl '
> http://localhost:9200/_search?pretty=true
> >> '
> >> -d '
> >> {
> >>     "from" : 0, "size" : 1000000,
> >>     "query" : {
> >>         "matchAll" : {}
> >>     }
> >> }'  > e
> >>   % Total    % Received % Xferd  Average Speed   Time    Time     Time
> >> Current
> >>                                  Dload  Upload   Total   Spent    Left
> >> Speed
> >> 100 1688k  100 1688k    0    86  19.1M    998 --:--:-- --:--:-- --:--:--
> >> 19.6M
> >>
> >>
> >> [root@localhost rsyslog]# cat e | grep "this is a test" | wc -l
> >> 500000
> >>
> >
> > Note that you can simply go like:
> > curl localhost:9200/_search?size=0
> >
> > And watch the hits.total field for the number of hits. If you want to get
> > serious with the performance test it would be expensive to fetch all your
> > docs.
> >
> > Best regards,
> > Radu
> >
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to