Hello Folks, With rsyslog + mysql configured (rsyslog 7.4, $ActionQueueDequeueBatchSize 1000) and no changes to mysql config file (/etc/my.cnf) I get a good insert rate, shown below. mysql -u root -p<...> -e "use Syslog; select message from SystemEvents;" | wc -l 405330 in about Howver, I find that mysqld reaches > 80% CPU Usage and rsyslog (20% CPU). The system is a 4 core x86_84 centos system.
I am curious to find out if such high cpu usage for mysqld is normal. I guess elasticsearch is what I need rather than persisting with mysql. What kind of performances can I expect with elasticsearch? The problem in my application is, it has a heartbeat for every second with device under test. If the cpu usage from other entities are high, the heartbeat looses its rythm and hence the connectivity. thanks, Mahesh On Sun, Jun 16, 2013 at 1:50 PM, David Lang <[email protected]> wrote: > On Sun, 16 Jun 2013, Radu Gheorghe wrote: > > 2013/6/14 David Lang <[email protected]> >> >> On Fri, 14 Jun 2013, Radu Gheorghe wrote: >>> >>> Hi Mahesh, >>> >>>> >>>> If you don't need mysql for a specific reason, I'd suggest you try >>>> thowing >>>> your logs in Elasticsearch. Here's a tutorial: >>>> http://wiki.rsyslog.com/index.****php/HOWTO:_rsyslog_%2B_**** >>>> elasticsearch<http://wiki.rsyslog.com/index.**php/HOWTO:_rsyslog_%2B_**elasticsearch> >>>> <http://wiki.**rsyslog.com/index.php/HOWTO:_**rsyslog_%2B_elasticsearch<http://wiki.rsyslog.com/index.php/HOWTO:_rsyslog_%2B_elasticsearch> >>>> > >>>> >>>> I assume you'll get way better insert and query performance than you can >>>> with mysql (ie: with bulks, I get 10-20K logs indexed per second on my >>>> $500 >>>> laptop. Then I can query in 100M-200M logs within a second. Depends on >>>> your >>>> settings). Plus, it's super-easy to scale Elasticsearch by adding new >>>> nodes. >>>> >>>> For querying, there are several, tools, the most popular being Kibana: >>>> http://three.kibana.org/ >>>> >>>> >>> Just to note, one of the things that makes MySQL so slow or Mahesh is >>> it's >>> safety features. After each insert, MySQL makes sure the data is safe on >>> disk before it considers the insert complete. >>> >> >> >> By that, you mean it does a fsync after every transaction? I thought it >> doesn't do this (at least not by default, with neither MyISAM nor InnoDB). >> But then again, at least InnoDB does it more often than ES does. >> > > I don't remember the table types, but the newer of the two does do fsync > after each transaction, which is how it actually properly supports > transactions. This is why it was such a big deal when MySQL changed the > default. > > > >> If the system crashes, the data will be there. There are config options >>> to >>> override this in MySQL. >>> >>> To get the numbers that elasticsearch is getting on your laptop, it's >>> almost certinly not doing this. >>> >>> >> I assume you lose some data if the whole system suddenly goes down. But if >> just ES does (ie: kill -9 the JVM), you shouldn't lose any data. >> >> I think ES writes stuff in a very different way than MySQL does. When you >> index something in ES, it does the indexing in memory and writes the raw >> data in the transaction >> log<http://www.elasticsearch.**org/guide/reference/index-** >> modules/translog/<http://www.elasticsearch.org/guide/reference/index-modules/translog/> >> >. >> Only after this is done you get a reply from ES. >> >> The transaction log is replayed on startup in case something goes wrong >> and >> you lose the data you had in memory. Every once in a while, it writes what >> it has to disk in the actual Lucene >> index<http://www.**elasticsearch.org/guide/**reference/glossary/#shard<http://www.elasticsearch.org/guide/reference/glossary/#shard> >> >**where >> it stores data "permanently". >> >> These chunks of data that it writes are >> segments<https://lucene.**apache.org/core/3_6_2/** >> fileformats.html#Segments<https://lucene.apache.org/core/3_6_2/fileformats.html#Segments> >> >, >> which consist of multiple files. The thing about segments is that they're >> immutable. And to make sure that you don't end up with a gazzillion >> segments, these are asynchronously >> merged<http://www.**elasticsearch.org/guide/** >> reference/index-modules/merge/<http://www.elasticsearch.org/guide/reference/index-modules/merge/> >> **>from >> time to time. >> > > the thing is that if it doesn't do a fsync, you have no guarantee that the > data is on the disk. And it's very possible for later data to make it to > the disk before earlier data does. > > doing a kill -9 isn't the same as a system crash. > > when you do a kill -9 the kernel and filesystem code contain all the data > that the application wrote, and will present that data if asked, and will > eventually get it to disk. > > But if the system looses power, any data not actually written to disk is > lost. And (depending on lots of implementation details) it's possible to > end up with holes in files, or files created that have no content, or even > files created, with space allocated for them, but stray data from the drive > in that space, not what the application wrote. > > > I suspect that what ES does is that it writes the data in long sequential > writes, and tries to make it so that if there is power loss, logs will be > lost but not corrupted. It can do that at the data rates that you are > describing. It's writing hundreds, if not thousands of logs per > 'transaction' > > > this is probably acceptable, but you do need to be aware of the tradeoff. >>> >>> >> Right, there are always trade-offs. I'm sorry if I came across as the >> "you're using the wrong technology" guy. I hate it when people do that. >> >> In this particular case, I understand it's only about aggregating logs and >> searching them afterwards instead of doing that with straight files. And >> this is exactly what ES is about, so I thought it would be easier/better >> to >> give it a shot. And I don't see write speed as being its strong point, >> either - that would be the search speed. >> > > I think that you are correct in saying that ES is better than MySQL for > this, but I was wanting to point out that the reason why MySQL is as slow > as he was seeing is because it's making sure that each transaction is safe > before proceeding. > > Relaxing this guarantee is the sort of thing that all the No-SQL databases > do, and most of their performance wins are possible only because they do > not provide the same guarantees that the traditional SQL databases provide. > > David Lang > > ______________________________**_________________ > rsyslog mailing list > http://lists.adiscon.net/**mailman/listinfo/rsyslog<http://lists.adiscon.net/mailman/listinfo/rsyslog> > http://www.rsyslog.com/**professional-services/<http://www.rsyslog.com/professional-services/> > What's up with rsyslog? Follow https://twitter.com/rgerhards > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you > DON'T LIKE THAT. > _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

