Re: [rsyslog] Tuning Rsyslog Centralized because it use only 1 CPU each time at 100% instead of 4 available

David Lang Tue, 04 Jun 2013 09:54:41 -0700

On Tue, 4 Jun 2013, Roberto Giordani wrote:

Hello David,
according to your suggestion this is my changes:
1) I've removed ActionQueue on client
2)added on server
$MainMsgQueueType LinkedList
$MainMsgQueueSize 100000
$MainMsgQueueWorkerThreads 3
3)the dns lookup was already disabled with -x option on starting rsyslog

4)I can't use UDP because my next step will be to encrypt messages

and this is the pstats report after 4 hours on server
2013-06-04T13:09:26.398752+02:00 fsp01 rsyslogd-pstats: imtcp(10514):submitted=45834092013-06-04T13:09:26.773487+02:00 fsp01 rsyslogd-pstats: main Q: size=9969enqueued=5082093 full=158414 discarded.full=0 discarded.nf=0 maxqsize=10000
now the server use 3 of 4 CPU available
I've tested locally the client before forward to server and I don't observeany issue.
I would like to know from you some consideration about this parameters:
a)$MainMsgQueueDequeueBatchSize

b)$InputFilePersistStateInterval
-)Which value is correct for previous parameters using a $MainMsgQueueSize1000000 ?


you should not need to change these values from default

I didn't receive an answer about my first question
-)How is possible to increase incoming messages and handle quickly?

lots of things could be the problem, but we need to find out where thebottleneck is in your system

-)how can retrieve the number of messages sent and the number of messagerecieved?

The best way to do this is to setup a benchmark test where you know the numberof messages that are sent and can measure the number received.

-)do you think using more than one listener on the server is better?


only if the listener is the bottleneck and you have multiple senders.

If the server is now using 3 out of 4 CPUs, then it's possible that one of theCPUs is maxing out and that is limiting your overall throughput. With a simple"receive and write to disk" configuration, rsyslog probably only uses a fewthreads, so it may not need the 4th cpu.

whn you look at top with threads enabled (after hitting 'H') on the server, isone of the syslog threads using 100% cpu (or close to it)?


Some of the places you could be having bottlnecks

1. the receiving thread could be bottlnecking, in which case using the ptcpcould be a win

2. activity on the main queue could be the bottleneck, in which case allowinglarger batch sizes could help. Disabling the disk fallback could help, usingarray instead of linkedlist could help.

3. output to the disk could be the bottlneck, depending on what filesystem youare using, how large the files get before you rotate them, and what fsyncactivity is taking place on the filesystem could all cause problems. Dynamicfile names do have some overhead

setup a test file that has a few hundred million log messages in it. Then seehow long it takes to send that file. If you have enough RAM on the central box,set your max queue size very large (1M or so), repeat the test, lookingspecifically to see if your sending box is finishing noticably before thereceiving box (if so, the bottleneck is in the output not the input)

While I understand that you are looking to add encryption to your connection inthe near future, do a test with UDP instead of TCP just to see if that is thebottleneck

Sometimes I find it useful to do a strace on threads that are using a lot of cpuas the system calls that are being made let me know what that particular threadis working on.


David Lang

Regards,
Roberto.


On 06/04/2013 08:32 AM, David Lang wrote:

On Mon, 3 Jun 2013, Roberto Giordani wrote:


Hello,
I'm working on a project where 20 servers RHEL 5.8 (with rsyslog
5.8.12) has 20 input files on input and send about 10.000 messages for
second to one rsyslog server 7.2 version

The network is gigabit between client-->server and this is the daily
nmon network traffic graphs


The first configuration was with queue file, but is was too slow, so
I've used the LinkedList queue.


in my testing memory was the fastest (around 1000x as fast as a full
disk queue)

The current client settings are
$InputFileName /file1.log
$InputFileTag TEST1
$InputFileStateFile file1
$InputFileSeverity debug
$InputFileFacility local6
$InputRunFileMonitor
$InputFilePersistStateInterval 10
....
....
....
############################################
$MaxMessageSize 9000

$MainMsgQueueType LinkedList
$MainMsgQueueSize 1000000
$MainMsgQueueWorkerThreads 20
$MainMsgQueueDequeueBatchSize 5000
$MainMsgQueueSaveOnShutdown on

$ActionQueueType LinkedList
$ActionQueueSize 2000000
$ActionQueueWorkerThreads 50
$ActionQueueDequeueBatchSize 5000
$ActionQueueSaveOnShutdown on
###########################################

$ActionResumeRetryCount -1
$ActionQueueTimeoutEnqueue 1
$ActionSendResendLastMsgOnReconnect on
$ActionQueueCheckpointInterval 1

local6.debug @@10.10.1.10:10514

The server settings are
##############################################
# Provides TCP syslog reception
$ModLoad imtcp
$InputTCPServerRun 10514


#### GLOBAL DIRECTIVES ####

# Use default timestamp format
$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat

# File syncing capability is disabled by default. This feature is
usually not required,
# not useful and an extreme performance hit
$ActionFileEnableSync off

$umask 0007

if $fromhost-ip == "10.10.1.7" and $syslogfacility-text == "local6"
and $syslogseverity-text == "debug" and $syslogtag == "TEST1" then
action(type="omfile" DirOwner="user1" DirCreateMode="0750"
FileCreateMode="0444" File="/rsyslog-data/file1.log")

if $fromhost-ip == ........

###############################

My questions are:
1)how to find the right combination of
Main queue ---> Action Queue---> receiver queue on rsyslog server ?


why do you need an Action Queue? it's only needed if you have some
reason to think that that particular action is going to fall behind and
you want to decouple it from the rest of the outputs.

2)Is it possible to increase incoming messages and handle quickly?


Yes, I've processed 380K logs/sec (gig-e limit) and others with 10G
networks have processed 1M logs/sec

3)Why the output log on Centralized Rsyslog are still behind the
source log on the client?


lots of possible reasons, but first off what are you looking at that's
saying that the central servers are behind?

start off by seeing if you are having problems reading the data from the
file. setup a test where you just read from the file and write back out
to the file (I'd suggest starting without any queue directives at all)

then if that's fast enough, change it to send remotely and report what's
happening

I've activated the pstats module to understand the queue status on
clients and server.
After 6 hours running  the client has this report
.........
2013-06-03T22:23:16.708288+02:00 app01 rsyslogd-pstats: action 9
queue: size=2000000 enqueued=9327675 full=6831507 maxqsize=20
00000
2013-06-03T22:23:16.708297+02:00 app01 rsyslogd-pstats: main Q: size=2
enqueued=9648448 full=0 maxqsize=20395
2013-06-03T22:23:46.708367+02:00 app01 rsyslogd-pstats: imuxsock:
submitted=323414 ratelimit.discarded=0 ratelimit.numratelimi
ters=29974
2013-06-03T22:23:46.708382+02:00 app01 rsyslogd-pstats: action 9
queue: size=2000000 enqueued=9340578 full=6844410 maxqsize=20
00000
2013-06-03T22:23:46.708390+02:00 app01 rsyslogd-pstats: main Q:
size=2164 enqueued=9666464 full=0 maxqsize=20395
2013-06-03T22:24:16.708923+02:00 app01 rsyslogd-pstats: imuxsock:
submitted=328198 ratelimit.discarded=0 ratelimit.numratelimi
ters=29986
2013-06-03T22:24:16.708941+02:00 app01 rsyslogd-pstats: action 9
queue: size=2000000 enqueued=9355649 full=6859481 maxqsize=20
00000
2013-06-03T22:24:16.708949+02:00 app01 rsyslogd-pstats: main Q:
size=1364 enqueued=9686593 full=0 maxqsize=20395
2013-06-03T22:24:46.709300+02:00 app01 rsyslogd-pstats: imuxsock:
submitted=333070 ratelimit.discarded=0 ratelimit.numratelimi
ters=29997
2013-06-03T22:24:46.709316+02:00 app01 rsyslogd-pstats: action 9
queue: size=2000000 enqueued=9365276 full=6869108 maxqsize=20
00000
2013-06-03T22:24:46.709323+02:00 app01 rsyslogd-pstats: main Q:
size=2123 enqueued=9702047 full=0 maxqsize=20395
2013-06-03T22:25:16.709807+02:00 app01 rsyslogd-pstats: imuxsock:
submitted=337951 ratelimit.discarded=0 ratelimit.numratelimi
ters=30009
2013-06-03T22:25:16.709823+02:00 app01 rsyslogd-pstats: action 9
queue: size=2000000 enqueued=9379492 full=6883324 maxqsize=20
00000
2013-06-03T22:25:16.709832+02:00 app01 rsyslogd-pstats: main Q: size=2
enqueued=9719723 full=0 maxqsize=20395
2013-06-03T22:25:46.709942+02:00 app01 rsyslogd-pstats: imuxsock:
submitted=343014 ratelimit.discarded=0 ratelimit.numratelimi
ters=30021
2013-06-03T22:25:46.709980+02:00 app01 rsyslogd-pstats: action 9
queue: size=2000000 enqueued=9389640 full=6893472 maxqsize=20
00000


and the rsyslog server the following stats
.......
2013-06-03T23:05:15.898682+02:00 fsp01 rsyslogd-pstats: main Q:
size=9978 enqueued=5032165 full=156941 discarded.full=0 discarded.nf=0
maxqsize=10000
2013-06-03T23:06:56.157199+02:00 fsp01 rsyslogd-pstats: main Q:
size=9987 enqueued=5065134 full=157971 discarded.full=0 discarded.nf=0
maxqsize=10000
2013-06-03T23:08:30.657673+02:00 fsp01 rsyslogd-pstats: main Q:
size=9972 enqueued=5096315 full=158942 discarded.full=0 discarded.nf=0
maxqsize=10000
2013-06-03T23:10:09.895850+02:00 fsp01 rsyslogd-pstats: main Q:
size=9986 enqueued=5129162 full=159969 discarded.full=0 discarded.nf=0
maxqsize=10000
2013-06-03T23:11:42.488505+02:00 fsp01 rsyslogd-pstats: main Q:
size=9973 enqueued=5159935 full=160933 discarded.full=0 discarded.nf=0
maxqsize=10000
2013-06-03T23:13:23.213800+02:00 fsp01 rsyslogd-pstats: main Q:
size=9973 enqueued=5193246 full=161973 discarded.full=0 discarded.nf=0
maxqsize=10000
2013-06-03T23:14:58.833570+02:00 fsp01 rsyslogd-pstats: main Q:
size=9970 enqueued=5224922 full=162962 discarded.full=0 discarded.nf=0
maxqsize=10000
2013-06-03T23:16:35.184133+02:00 fsp01 rsyslogd-pstats: main Q:
size=9975 enqueued=5256863 full=163960 discarded.full=0 discarded.nf=0
maxqsize=10000
2013-06-03T23:18:13.992958+02:00 fsp01 rsyslogd-pstats: main Q:
size=9991 enqueued=5289392 full=164977 discarded.full=0 discarded.nf=0
maxqsize=10000
2013-06-03T23:19:52.464473+02:00 fsp01 rsyslogd-pstats: main Q:
size=9942 enqueued=5322013 full=165996 discarded.full=0 discarded.nf=0
maxqsize=10000

4)Why on the client enqueued value never decrease and full= is always
different of 0?


well, once you have enqueued something, you don't go back in time and
not enqueue it, so unless you restart rsyslog, that number is going to
just keep climbing.

I'm planning to distribute the 20 client to 2 process of  rsyslog
server on different port on the same server but I think your help
about the right combination of action queue on client and main queue
on server.

On rsyslog documentation I've found a lot of info about to handle
input queue (# thread, max messages and DequeuBatch) but I dont' know
how to rsyslog should better work on receiver server.
I've seen that the rsyslog 7.2 process on server use only one CPU each
time, while the server has 4CPU and 12GB ram on 64bit O.S.
5)Some option during configuration process can change this behavior ?
$ ./configure --prefix=/usr/local/rsyslog7 --enable-gnutls
--enable-imfile --enable-omruleset --enable-imptcp
--enable-mmnormalize --enable-usertools --enable-imdiag
--enable-diagtools --enable-impstats


I think you are trying too hard.

rsyslog will use one thread for each input, and one thread for each
output by default (and possibly some others along the way if it sees a
need), the fact that you aren't seeing it use more than 100% CPU
indicates that you probably have some other bottleneck.

On the recieving side, have you disabled DNS lookups? that has a big hit
in many cases.

when you look at top of the central box, hit 'H' to have it display the
different threads as different lines, then you will see how much cpu is
used for each one. After that you can track down which one is the
bottleneck

David Lang

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] Tuning Rsyslog Centralized because it use only 1 CPU each time at 100% instead of 4 available

Reply via email to