On Sat, 15 Jun 2013, Timothy Ehlers wrote:
I posted this on the wrong discussion:We resolved our issue with this by spawning more main Q threads (8) and removed the Action ques. CPU usage for rsyslogd on the 50mb/sec box is sitting steadily at 200% (was 100). Regarding the open files, our disk array seems to be able to handle 2k TPS, the average TPS on sar -b is ~250. Our bottleneck was the CPU usage on a single thread. We also adjusted ulimits and i do not think this is an issue.
it's not just the ulimits, it's how many of these files rsyslog keeps open for writing at one time.
with dynamicly generated filenames, rsyslog has a cache of recently used names. If the name you go to write to is in that cache, rsyslog just writes to it. If the name is not in the cache, it picks the oldest file in the cache, closes it, opens the new file, and then performs the write.
file opens and closes are very expensive calls compared to writes.I don't know the default size of this cache (the documentation on this is bad). doing a lsof and counting the filehandles open for writing would give you a hint. If it's a suspiciously 'round' number you are probably hitting the limit. If this limit is higher than the number of files that you write in a day (since you include the day in the file path), you should be ok.
But if it's smaller than the number of files that you are writing to at once, then you are thrashing the cache. This means that a large percentage of the time when rsyslog goes to write a log message, it will have to first close and open a file.
another way to tell if you are in trouble here is to do a strace of the thread that's doing the writing. look for open calls. If you see a lot of them, or you see them opening the same file repeatedly, then you would benefit from setting the DynaFileCacheSize to a larger value. keep increasing it until the number of opens drops off.
David Lang
We will try and tune the regex as well. On Fri, Jun 14, 2013 at 7:08 PM, David Lang <[email protected]> wrote:The number of threads used is complex. The value you are setting is the max number of threads that can be used, but there are several other things that can affect the number of threads actually used. In addition, not all actions support multithreading, it's possible the dynafile action doesn't allow multiple threads. I could definantly see outputting to one file from multiple threads getting into trouble (or at the very least, requireing locking that would be expensive enough to make it not worth having multiple threads). I don't know the code well enough to know if this output can multithread or not. $Ruleset appLog $ActionQueueType LinkedList $ActionQueueWorkerThreads 64 $**ActionQueueWorkerThreadMinimum**Messages 10000 $ActionQueueSize 400000 *.* ?appLogDynFile;****appLogHadoopTemplate do you have any idea how many different files this is generating? One possible issue could be that rsyslog can't keep that many files open and so is spending a lot of time opening and closing the file. If you can do a strace on the thread for a few seconds and see if it's doing a lot of writes, a lot of open,write,close or something else. Looking in the documentation for the omfile module at http://www.rsyslog.com/doc/**omfile.html<http://www.rsyslog.com/doc/omfile.html> It looks like there is a parameter DynaFileCacheSize that is incorrectly documented as defining the temptate to be used, when instead I believe it allocates memory to keep track of what files are being kept open so they can be written to again. IOBuffersize may be interesting (how large are your messages, how likely is it that you will have multiple messages arriving together that go to the same destination) It would be interesting to enable ASyncWriting, this would show you the cost of writing to disk vs formatting the message. My discussion of trying to simplify the regex was based on the thinking that you are spending most of your time processing the regex. If instead you are spending your time opening/closing files (needing a larger dynafilecachesize) I think that enabling this would show this pretty clearly. David Lang On Fri, 14 Jun 2013, Timothy Ehlers wrote: Well James and i both work on this. We will try and reduce the regex butwe both do not understand why this cannot be multi-threaded. Why does setting an ActionQue to 10 worker threads not actually spawn 10. On Fri, Jun 14, 2013 at 6:18 PM, David Lang <[email protected]> wrote: Given that with one exception, all his rulesets are essentually singleactions, and he has multiple action queues, I think it is effectivly tied to the ruleset in this case. I origionally assumed that different rulesets used different main queues. It looks like that may not be the case, but it could also just be that this one horribly bad rule was eating so much of the system that top only showed this one. That's a question that I'll clarify with Rainer next week sometime. David Lang On Fri, 14 Jun 2013, Boylan, James wrote: Is the action queue thread tied to the ruleset?-- James ----- Reply message ----- From: "David Lang" <[email protected]> To: "rsyslog-users" <[email protected]> Subject: [rsyslog] client connectivity issues syslog-ng -> rsyslog 7.x Date: Fri, Jun 14, 2013 6:13 pm I've found that FixedArray queues are noticably faster than LinkedList. That probably won't completely solve the problem, but it may help. If I'm seeing this correctly, this is the message template $template appLogHadoopTemplate,"<%PRI%>%****TIMESTAMP% %FROMHOST% app=%programname:R,ERE,1,DFLT:****([A-Za-z0-9]+)-.*-.*_.*--**end%|** bucket=%programname:R,ERE,1,****DFLT:.*-.*-.*_([A-Za-z0-9]+)--**** end%%msg%\n" and this is the filename template $template appLogDynFile,"/log/app-logs/%** ****programname:R,ERE,0,DFLT:[**A-* *Za-z0-9]+--end%/%FROMHOST%/%$******YEAR%/%$MONTH%/%$DAY%/%** PROGRAMNAME%.log" Both of these are heavy users of regex parsing, which is a pretty expensive operation One thing that I think I'm seeing, it that you do the same regex multiple times. With an ugly regex like this, you would probably gain significantly by setting a variable and using that set $!shortname = "%programname:R,ERE,1,DFLT:([**** A-Za-z0-9]+)-.*-.*_.*--end%" $template appLogHadoopTemplate,"<%PRI%>%****TIMESTAMP% %FROMHOST% app=%$!shortname%|bucket=%$!****shortname%%msg%\n" the other thing is, can you simplify the regex? it looks like you are trying to pull any alphanumeric value ahead of a -, but if that's the case, why are you matching things after that? will the following give you what you need? set $!shortname = "%programname:R,ERE,1,DFLT:^([****A-Za-z0-9]+)-%" I anchor the regex to the beginning of the string, and then pull everything before the - David Lang On Fri, 14 Jun 2013, Timothy Ehlers wrote: Date: Fri, 14 Jun 2013 17:50:05 -0500From: Timothy Ehlers <[email protected]> Reply-To: rsyslog-users <[email protected]> To: rsyslog-users <[email protected]> Subject: Re: [rsyslog] client connectivity issues syslog-ng -> rsyslog 7.x Yeah it helped in staging but with the higher volume of prod ruleset 2 seems to be the problem... I still only see 1 thread and its pegged at 100% $Ruleset appLog $ActionQueueType LinkedList $ActionQueueWorkerThreads 64 $****ActionQueueWorkerThreadMinimum****Messages 10000 $ActionQueueSize 400000 *.* ?appLogDynFile;****appLogHadoopTemplate There must be something i do not understand about the Queue system. On Fri, Jun 14, 2013 at 4:45 PM, David Lang <[email protected]> wrote: Interesting, I did not expect that action queues would help this much,given that you have the rulesets bound to different interfaces, I would have expected that their output processing would be independant. But the fact that putting in action queues (I assume one queue per ruleset??) splits up the work so much says that I was wrong. Which output is the action 3 queue that's using so much more CPU than anything else? David Lang On Fri, 14 Jun 2013, Timothy Ehlers wrote: I put ActionQueus into the config and in staging it looks better now. 12773 root 20 0 1919m 231m 1848 R 34.8 0.5 0:19.06rs:action 3 que 12772 root 20 0 1919m 231m 1848 S 10.6 0.5 0:09.51 rs:action 2 que 12751 root 20 0 1919m 231m 1848 S 1.7 0.5 0:01.29 rs:main Q:Reg 12742 root 20 0 1919m 231m 1848 S 0.7 0.5 0:00.72 in:imtcp 12767 root 20 0 1919m 231m 1848 S 0.7 0.5 0:00.21 rs:action 5 que 12774 root 20 0 1919m 231m 1848 S 0.7 0.5 0:00.70 rs:action 4 que I will try this in production with the Higher volume after a peer review. On Fri, Jun 14, 2013 at 4:32 PM, Timothy Ehlers <[email protected]> wrote: 52mb/sec inbound traffic Hadoop stream is showing: 25k msg per second.. i do not know howaccurate this is. $OptimizeForUniprocessor off $MaxMessageSize 2048k # Rsyslog plugins $ModLoad immark # provides --MARK-- message capability $ModLoad imudp # provides UDP syslog reception $ModLoad imtcp # provides TCP syslog reception $ModLoad imuxsock # provides support for local system logging (e.g. via logger command) $ModLoad imklog # provides kernel logging support (previously done by rklogd) $ModLoad imrelp # Provides RELP syslog reception $ModLoad omrelp # Provides RELP syslog transmission # Rsyslog Stats $ModLoad impstats $PStatInterval 60 $PStatSeverity 7 # Queue configuration $ActionQueueSize 2000000 $MainMsgQueueSize 40000000 # File Creation Permissions $umask 0000 $DirCreateMode 0755 $FileCreateMode 0644 # Remote Log Processing Ruleset $PreserveFQDN on $template appLogDynFile,"/log/app-logs/%******programname:R,ERE,0,DFLT:[** A-**** Za-z0-9]+--end%/%FROMHOST%/%$******YEAR%/%$MONTH%/%$DAY%/%** PROGRAMNAME%.log" $template currLogStatsDynFile,"/log/app-******logs/logstats/%FROMHOST%/%** $**** YEAR%/%$MONTH%/%$DAY%/******logstats.log.%$HOUR%00" $template currAppLogDynFile,"/log/app-******logs/%msg:R,ERE,1,DFLT:^([A-**** Za-z0-9._-]+)\|([A-Za-z0-9._-]******+)\|([A-Za-z0-9._]+)[-_]*(** [**A-** Za-z0-9]*)([\^])--end%/%******FROMHOST%/%$YEAR%/%$MONTH%/%$****** DAY%/%msg:R,ERE,1,DFLT:^([A-******Za-z0-9._-]+)\|([A-Za-z0-9._** -]**** +)\|([A-Za-z0-9._]+)[-_]*([A-******Za-z0-9]*)([\^])--end%-%** msg:**R,** ERE,2,DFLT:^([A-Za-z0-9._-]+)\******|([A-Za-z0-9._-]+)\|([A-** Za-**z0-** 9._]+)[-_]*([A-Za-z0-9]*)([\^]******)--end%-%msg:R,ERE,3,DFLT:** ^(**[** A-Za-z0-9._-]+)\|([A-Za-z0-9._******-]+)\|([A-Za-z0-9._]+)[-_]** *(**[** A-Za-z0-9]*)([\^])--end%-%msg:******R,ERE,4,DFLT:^([A-Za-z0-9.** _-**]+** )\|([A-Za-z0-9._-]+)\|([A-Za-******z0-9._]+)[-_]*([A-Za-z0-9]*** )(**[** \^])--end%.log.%$HOUR%00" $template currAppLoggTemplate,"%msg:R,******ERE,1,DFLT:^[A-Za-z0-9._-]+\** |[**** A-Za-z0-9._-]+\|[A-Za-z0-9._]+******[-_]*[A-Za-z0-9]*[\^](.*)-** -*** *end%\n" $template currAppLoggTemplate2,"%msg%\n" $template currentappLogHadoopTemplate,"<** ****%PRI%>%TIMESTAMP:date-** rfc3164% %FROMHOST% %msg%\n" $template currentappLogNewHadoopTemplate******,"<%PRI%>%TIMESTAMP% %FROMHOST% app=%msg:R,ERE,1,DFLT:^([A-Za-******z0-9._-]+)\|([A-Za-z0-9._-** ]+**)\** |([A-Za-z0-9.]+)[-_]*([A-Za-******z0-9]*)--end%|bucket=%msg:R,**** ERE,4,DFLT:^([A-Za-z0-9._-]+)\******|([A-Za-z0-9._-]+)\|([A-** Za-**z0-** 9.]+)[-_]*([A-Za-z0-9]*)--end% %msg%\n" $template appLogHadoopTemplate,"<%PRI%>%******TIMESTAMP% %FROMHOST% app=%programname:R,ERE,1,DFLT:******([A-Za-z0-9]+)-.*-.*_.*--*** *end%|** bucket=%programname:R,ERE,1,******DFLT:.*-.*-.*_([A-Za-z0-9]+)** --**** end%%msg%\n" $template remoteMessagesDynFile,"/log/******system-logs/%FROMHOST%/%$** YEAR%**** /%$MONTH%/%$DAY%/messages" $template remoteSecureDynFile,"/log/******secure-system-logs/%FROMHOST%/** **** %$YEAR%/%$MONTH%/%$DAY%/******secure" $template remoteMaillogDynFile,"/log/******system-logs/%FROMHOST%/%$** YEAR%**** /%$MONTH%/%$DAY%/maillog" $template remoteEmergDynFile,"/log/******system-logs/%FROMHOST%/%$YEAR%****** /%$MONTH%/%$DAY%/emergency" $template remoteCronDynFile,"/log/******system-logs/%FROMHOST%/%$YEAR%****** /%$MONTH%/%$DAY%/cron" $template remoteSpoolerDynFile,"/log/******system-logs/%FROMHOST%/%$** YEAR%**** /%$MONTH%/%$DAY%/spooler" $template remoteBootDynFile,"/log/******system-logs/%FROMHOST%/%$YEAR%****** /%$MONTH%/%$DAY%/boot.log" $Ruleset appLog *.* ?appLogDynFile;******appLogHadoopTemplate # Forward to Hadoop #*.* @@ wmhdcollector01s.stag.****timste**sting.net:5003<http://** timstesting.net:5003 <http://timstesting.net:5003>> <http://**wmhdcollector01s.**stag.**timstesting.net:5003<ht** tp://wmhdcollector01s.stag.**timstesting.net:5003<http://wmhdcollector01s.stag.timstesting.net:5003>;$Ruleset currAppLog *.* ?currAppLogDynFile;******currAppLoggTemplate # Forward to Hadoop *.* @@hadoopcollectors.prod.******timstesting.net:5003;** currentappLogHadoopTemplate $Ruleset currLogStats *.* ?currLogStatsDynFile # Forward to Hadoop #*.* @@hadoopcollectors.prod.******timstesting.net:5003;** currentappLogHadoopTemplate # Remote System Log Processing Ruleset $Ruleset remoteSysLogs # Log all kernel messages to the console. # Logging much else clutters up the screen. #kern.* /dev/console # Log anything (except mail) of level info or higher. # Don't log private authentication messages! $DirCreateMode 0755 $FileCreateMode 0644 *.info;local1.none;local6.******none;mail.none;authpriv.none;***** *cron.none ?remoteMessagesDynFile # The authpriv file has restricted access. $DirCreateMode 0700 $FileCreateMode 0600 authpriv.* ?remoteSecureDynFile # Log all the mail messages in one place. $DirCreateMode 0755 $FileCreateMode 0644 mail.* ?remoteMaillogDynFile # Log cron stuff $DirCreateMode 0755 $FileCreateMode 0644 cron.* ?remoteCronDynFile # Everybody gets emergency messages $DirCreateMode 0755 $FileCreateMode 0644 *.emerg ?remoteEmergDynFile # Save news errors of level crit and higher in a special file. $DirCreateMode 0755 $FileCreateMode 0644 uucp,news.crit ?remoteSpoolerDynFile # Save boot messages also to boot.log $DirCreateMode 0755 $FileCreateMode 0644 local7.* ?remoteBootDynFile # Local Log Processing Ruleset $Ruleset local # Log all kernel messages to the console. # Logging much else clutters up the screen. #kern.* /dev/console # Log anything (except mail) of level info or higher. # Don't log private authentication messages! *.info;local1.none;local6.******none;mail.none;authpriv.none;***** *cron.none /var/log/messages syslog.=debug /log/rsyslog-stats # The authpriv file has restricted access. authpriv.* /var/log/secure # Log all the mail messages in one place. mail.* -/var/log/maillog # Log cron stuff cron.* /var/log/cron # Everybody gets emergency messages *.emerg :omusrmsg:* # Save news errors of level crit and higher in a special file. uucp,news.crit /var/log/spooler # Save boot messages also to boot.log local7.* /var/log/boot.log # Assign default Ruleset $DefaultRuleset local # New AppLog Process RELP Collector $InputRELPServerBindRuleset appLog $InputRELPServerRun 20514 # Current AppLog TCP Collector $InputTCPServerBindRuleset currAppLog $InputTCPServerRun 20516 # Current LogStats TCP Collector $InputTCPServerBindRuleset currLogStats $InputTCPServerRun 20518 # SystemLog TCP Collector $InputTCPServerBindRuleset remoteSysLogs $InputTCPServerRun 20515 # SystemLog UDP Collector $InputUDPServerBindRuleset remoteSysLogs $UDPServerRun 514______________________________******_________________ rsyslog mailing listhttp://lists.adiscon.net/******mailman/listinfo/rsyslog<http://lists.adiscon.net/****mailman/listinfo/rsyslog> <http:**//lists.adiscon.net/**mailman/**listinfo/rsyslog<http://lists.adiscon.net/**mailman/listinfo/rsyslog><http:**//lists.adiscon.net/**mailman/**listinfo/rsyslog<http://lists.adiscon.net/mailman/**listinfo/rsyslog> <htt**p://lists.adiscon.net/mailman/**listinfo/rsyslog<http://lists.adiscon.net/mailman/listinfo/rsyslog>http://www.rsyslog.com/******professional-services/<http://www.rsyslog.com/****professional-services/><http://**www.rsyslog.com/****professional-services/<http://www.rsyslog.com/**professional-services/><http://**www.rsyslog.com/**professional-**services/<http://www.rsyslog.com/professional-**services/> <http:**//www.rsyslog.com/**professional-services/<http://www.rsyslog.com/professional-services/>What's up with rsyslog? Follow https://twitter.com/rgerhardsNOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.______________________________****_________________rsyslog mailing list http://lists.adiscon.net/****mailman/listinfo/rsyslog<http://lists.adiscon.net/**mailman/listinfo/rsyslog> <http:**//lists.adiscon.net/mailman/**listinfo/rsyslog<http://lists.adiscon.net/mailman/listinfo/rsyslog>http://www.rsyslog.com/****professional-services/<http://www.rsyslog.com/**professional-services/> <http://**www.rsyslog.com/professional-**services/<http://www.rsyslog.com/professional-services/>What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT. ______________________________****_________________ rsyslog mailing list http://lists.adiscon.net/****mailman/listinfo/rsyslog<http://lists.adiscon.net/**mailman/listinfo/rsyslog> <http:**//lists.adiscon.net/mailman/**listinfo/rsyslog<http://lists.adiscon.net/mailman/listinfo/rsyslog>http://www.rsyslog.com/****professional-services/<http://www.rsyslog.com/**professional-services/> <http://**www.rsyslog.com/professional-**services/<http://www.rsyslog.com/professional-services/>What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT. ______________________________****_________________rsyslog mailing list http://lists.adiscon.net/****mailman/listinfo/rsyslog<http://lists.adiscon.net/**mailman/listinfo/rsyslog> <http:**//lists.adiscon.net/mailman/**listinfo/rsyslog<http://lists.adiscon.net/mailman/listinfo/rsyslog>http://www.rsyslog.com/****professional-services/<http://www.rsyslog.com/**professional-services/> <http://**www.rsyslog.com/professional-**services/<http://www.rsyslog.com/professional-services/>What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.______________________________**_________________rsyslog mailing list http://lists.adiscon.net/**mailman/listinfo/rsyslog<http://lists.adiscon.net/mailman/listinfo/rsyslog> http://www.rsyslog.com/**professional-services/<http://www.rsyslog.com/professional-services/> What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
_______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

