On Thu, May 9, 2013 at 7:58 PM, David Lang <[email protected]> wrote: > On Thu, 9 May 2013, Fajun Chen wrote: > > Saw this error message below several times, not every time though: >> rsyslogd-2040: fatal error on disk queue 'action 22 queue[DA]', emergency >> switch to direct mode [try http://www.rsyslog.com/e/2040 ] >> > > I think this is the problem. > > are there any other errors around this (something that may indicate an > error reading/writing a file?)
No, no other errors around. However, this error doesn't show up most of the time. > > > I also noticed that .qi file in the disk queue directory doesn't show up >> sometimes. Is this file required for disk queue to function correctly? I >> did notice that disk queue flushing was working when the files was absent >> in several cases. >> > > > I don't believe that the queue will work properly without the .qi file. > and if it's switching to direct mode and failing, the messages are not > really getting queued, they are getting lost. I think that what's happening > is that rsyslog has the data that's in the .qi file in memory, so it can > keep going without it, but once it's restarted, it doesn't know what's what > > what version are you using on the sending machine? (you may have said > earlier, but I'm not remembering). I know I've seen some work done dealing > with these sorts of problems since 7.0 was released, so if you are on an > older version, the first thing to try is a current version. > 7.3.14, which is the most recent rsyslog ubuntu package published on Adiscon. > > I think I remember Rainer posting a tool to recreate/fix the .qi file in > cases where it's been corrupted. > > Can you provide the tool or its link? Thanks, Fajun > >> >> On Thu, May 9, 2013 at 4:50 PM, David Lang <[email protected]> wrote: >> >> On Thu, 9 May 2013, Fajun Chen wrote: >>> >>> Resending original reply without debugging logs since it was blocked >>> >>>> waiting for approval (message size over 512k). >>>> >>>> Thanks, >>>> Fajun >>>> >>>> On Thu, May 9, 2013 at 2:52 PM, Fajun Chen <[email protected]> wrote: >>>> >>>> >>>> >>>>> >>>>> On Thu, May 9, 2013 at 11:34 AM, David Lang <[email protected]> wrote: >>>>> >>>>> On Thu, 9 May 2013, Fajun Chen wrote: >>>>> >>>>>> >>>>>> On Wed, May 8, 2013 at 9:22 PM, David Lang <[email protected]> wrote: >>>>>> >>>>>> >>>>>>> On Wed, 8 May 2013, Fajun Chen wrote: >>>>>>> >>>>>>> >>>>>>>> iptables block setting didn't work for some reason. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> what do the iptables rules on that system look like? >>>>>>>>> >>>>>>>> >>>>>>>> without seeing them, my guess is that there is a rule already there >>>>>>>> that >>>>>>>> allows packets related to a known connection that are getting >>>>>>>> applied >>>>>>>> (and >>>>>>>> therefor accepting the packets) before the deny rule you are trying >>>>>>>> to >>>>>>>> put >>>>>>>> in place takes effect. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> The same problem exists on 5.8.6 with iptables blocking. One minor >>>>>>> detail: >>>>>>> the queued files reached the limit of 96M, it's reduced to 95M after >>>>>>> the >>>>>>> firewall was unblocked, but it stays at 95M on the client without >>>>>>> flushing. >>>>>>> I can use logger to send new log messages to the server, so network >>>>>>> connection is not an issue. >>>>>>> >>>>>>> 7.3.14 seems to be working with iptables blocking. >>>>>>> >>>>>>> >>>>>>> hmm, I don't understand how it could be different for different >>>>>> versions >>>>>> of rsyslog. the iptables filtering should be happening by the OS and >>>>>> wouldn't care what version of software is running. >>>>>> >>>>>> >>>>> >>>>> iptables filtering issue had been resolved by restarting rsyslog for >>>>> the >>>>> firewall changes to take effect. >>>>> >>>>> >>>> Ok, that is almost certinly the 'established connection' thing that I >>> was >>> speculating about. >>> >>> >>> rsyslog version has nothing to do with iptables filtering. What I >>> referred >>> >>>> to was that rsyslog 5.8.6 doesn't flush queued files while 7.3.14 does >>>>> when >>>>> iptables filtering was changed from blocking to unblocking. >>>>> >>>>> >>>> ahh, Ok. >>> >>> At this point 5.8.6 is old enough that it's well past being supported, so >>> let's work on the current version. >>> >>> >>> >>> >>>>>> As a alternative testing, I stopped rsylogd on the remote server and >>>>>> the >>>>>> >>>>>> >>>>>>> logs were queued on the client as expected. I started rsyslog on >>>>>>>> the >>>>>>>> >>>>>>>>> remote >>>>>>>>> server once the disk queue on the client is filled up. I did see >>>>>>>>> the >>>>>>>>> queue >>>>>>>>> files were flushed to the remote server once rsyslog is back to >>>>>>>>> service. So >>>>>>>>> this seems to be related to rsyslog configuration change. >>>>>>>>> >>>>>>>>> >>>>>>>>> My guess (without knowing the code well) is that the queued >>>>>>>>> >>>>>>>> messages are >>>>>>>> somehow queued for the specific destination (IIRC you had this queue >>>>>>>> setup >>>>>>>> as an action queue, not as the main queue, you posted your config, >>>>>>>> but I >>>>>>>> have already deleted those messages). I'd be curious to see if you >>>>>>>> have >>>>>>>> the >>>>>>>> same problem spilling the main queue to disk. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> Just for your reference, here's my rsyslog configuration: >>>>>>> >>>>>>> # start forwarding rule 1 of 1 >>>>>>> $ActionQueueType LinkedList >>>>>>> $ActionQueueFileName srvrfwd >>>>>>> $ActionResumeRetryCount -1 >>>>>>> $ActionQueueSaveOnShutdown on >>>>>>> $ActionQueueMaxDiskSpace 100000000 >>>>>>> $ActionQueueSize 200000 # Tried 100000 as well >>>>>>> $ActionQueueHighWaterMark 600 >>>>>>> $ActionQueueLowWaterMark 200 >>>>>>> $ActionQueueTimeoutEnqueue 1 >>>>>>> >>>>>>> #local5.* :omrelp:127.255.255.1:20514 # Invalid IP to trigger log >>>>>>> buffering >>>>>>> local5.* :omrelp:172.17.5.28:20514 # Real IP to trigger log >>>>>>> forwarding >>>>>>> # end forwarding rule 1 of 1 >>>>>>> >>>>>>> >>>>>>> >>>>>>> On the other hand, as I noted in the first report, when I changed >>>>>>> >>>>>>>> rsyslog >>>>>>>> >>>>>>>> configuration before disk space limit is reached, the queued files >>>>>>>> >>>>>>>>> were >>>>>>>>> flushed to the remote server without issues. >>>>>>>>> >>>>>>>>> >>>>>>>>> very interesting, and probably a bug. >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> Let me know if you need debugging logs to troubleshoot it. >>>>>>> >>>>>>> >>>>>>> Ranier will need probably need to get involved with this, but he's >>>>>> super >>>>>> busy the next 2-3 weeks with a very high priority deadline (the "every >>>>>> waking hour" type of project) >>>>>> >>>>>> It wouldn't hurt to take a look at the debug logs for the copy started >>>>>> after the config change. >>>>>> >>>>>> >>>>>> Rsyslog debugging log is attached here. This was collected by running >>>>> "rsyslogd -dn" when the remote server IP was set to valid. Please let >>>>> me >>>>> know if you want me to submit bug tracking item. >>>>> >>>>> >>>> you can e-mail it to me directly >>> >>> >>> >>> by the way, are you sure you are doing a full restart after the config >>>>>> change? a -HUP does not cause rsyslog to do a full restart and re-read >>>>>> it's >>>>>> config file, it just causes rsyslog to close and re-open it's outputs >>>>>> (a >>>>>> full restart takes a long time and can cause messages to be lost) >>>>>> >>>>>> >>>>> >>>>> I did "service rsyslog restart" after the config change. "Kill timeout >>>>> 5" >>>>> is set in /etc/init/rsyslog.conf. I'm not sure if this timeout setting >>>>> could make a difference. >>>>> >>>>> >>>> this should do it. but just to be sure, do a stop (make sure it's >>> finished >>> shuttng down), then a start >>> >>> >>> >>> >>>>>> We need the initial startup logs to be queued before remote logging >>>>>> >>>>>> server >>>>>>> >>>>>>>> >>>>>>>> is set. Switching from invalid IP to valid IP in rsyslog >>>>>>>> >>>>>>>>> configuration >>>>>>>>> was >>>>>>>>> chosen to meet this requirement. >>>>>>>>> >>>>>>>>> >>>>>>>>> Is there any chance of re-ordering the startup sequence to get the >>>>>>>>> >>>>>>>> config >>>>>>>> first, then start rsyslog, then start everything else? kernel >>>>>>>> messages >>>>>>>> will >>>>>>>> get queued for quite a while, so they shouldn't be an issue. The >>>>>>>> only >>>>>>>> issue >>>>>>>> would be any other applications that need to write logs very early >>>>>>>> on. >>>>>>>> >>>>>>>> >>>>>>>> The problem is that we don't know remote logging server at startup, >>>>>>>> >>>>>>> so we >>>>>>> need the capability to buffer the logs until the remote server is set >>>>>>> by >>>>>>> user later. Understood that the logs could get lost after the disk >>>>>>> space >>>>>>> limit is reached. Is there any way to achieve this without rsyslog >>>>>>> configuration change? >>>>>>> >>>>>>> >>>>>>> one possibility would be to just write the logs to a file and then >>>>>> use >>>>>> imfile to read this file later to send them upstream, but I'm not sure >>>>>> if >>>>>> imfile has gained the capability to get all it's data from the file >>>>>> yet. >>>>>> >>>>>> Historically, imfile only read the message content from the file, it >>>>>> generated the timestamp, hostname, priority, and severity information >>>>>> itself. I know there was talk about having an option to have imfile >>>>>> parse >>>>>> this from the file, but I don't know if it ever happened. >>>>>> >>>>>> If nothing else, you could write messages to a file with the >>>>>> RSYSLOG_ForwardFormat and then use netcat or similar to read the file >>>>>> and >>>>>> spit it out over the network later, but that wouldn't be able to use >>>>>> RELP >>>>>> to send it. I guess you could use netcat to send it to a UDP listener >>>>>> on >>>>>> localhost and then have the logs sent out via RELP from there. >>>>>> >>>>>> There should be some way to feed the logs to /dev/log, but I'm not >>>>>> sure >>>>>> exactly how to do that. >>>>>> >>>>>> Thanks for all your suggestions. Data completeness and integrity is >>>>>> very >>>>>> >>>>>> important in our use cases. I'm not sure how some of the logging >>>>> information such as originial timestamp would change when it's routed >>>>> around. If this is confirmed to be a bug and can be fixed in 1-2 >>>>> months, >>>>> I >>>>> would much rather to wait for the fix. >>>>> >>>>> >>>> Well, if you feed the data to syslog with the timestamp, it will >>> preserve >>> the existing timestamp by default. >>> >>> David Lang >>> >>> Thanks, >>> >>>> Fajun >>>>> >>>>> >>>>> >>>>>> On Wed, May 8, 2013 at 11:56 AM, David Lang <[email protected]> wrote: >>>>>> >>>>>> >>>>>>> On Wed, 8 May 2013, Fajun Chen wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> I upgraded ubuntu rsyslog to 7.3.14 and still got the same issue. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> My test procedure: >>>>>>>>>> >>>>>>>>>>> Clean log file. Set remote host IP to 127.255.255.1 (invalid IP) >>>>>>>>>>> in >>>>>>>>>>> rsyslog conf. service rsyslog restart followed by logger in a >>>>>>>>>>> loop. The >>>>>>>>>>> disk queue files are buffered but are limited to 96M overall. Set >>>>>>>>>>> remote >>>>>>>>>>> host IP to valid IP. service rsyslog restart. I expect the queued >>>>>>>>>>> files to >>>>>>>>>>> be flushed to the remote host but these files are still in the >>>>>>>>>>> queuing >>>>>>>>>>> directory. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> This may be a silly thought, but the fact that you are changing >>>>>>>>>>> the >>>>>>>>>>> >>>>>>>>>>> configuration between these two steps could be part of the >>>>>>>>>> problem. >>>>>>>>>> >>>>>>>>>> I would suggest that instead of changing the config to >>>>>>>>>> enable/disable >>>>>>>>>> sending the logs that you instead keep the rsyslog config the same >>>>>>>>>> and set >>>>>>>>>> iptables rules to block and unblock the communications. >>>>>>>>>> >>>>>>>>>> ______________________________******_________________ >>>>>>>>>> >>>>>>>>> >>>>>>>>> rsyslog mailing list >>>>>>>> >>>>>>> http://lists.adiscon.net/******mailman/listinfo/rsyslog<http://lists.adiscon.net/****mailman/listinfo/rsyslog> >>>>>> <http:**//lists.adiscon.net/**mailman/**listinfo/rsyslog<http://lists.adiscon.net/**mailman/listinfo/rsyslog> >>>>>> > >>>>>> <http:**//lists.adiscon.net/**mailman/**listinfo/rsyslog<http://lists.adiscon.net/mailman/**listinfo/rsyslog> >>>>>> <htt**p://lists.adiscon.net/mailman/**listinfo/rsyslog<http://lists.adiscon.net/mailman/listinfo/rsyslog> >>>>>> > >>>>>> >>>>>>> >>>>>>> >>>>>>> http://www.rsyslog.com/******professional-services/<http://www.rsyslog.com/****professional-services/> >>>>>> <http://**www.rsyslog.com/****professional-services/<http://www.rsyslog.com/**professional-services/> >>>>>> > >>>>>> <http://**www.rsyslog.com/**professional-**services/<http://www.rsyslog.com/professional-**services/> >>>>>> <http:**//www.rsyslog.com/**professional-services/<http://www.rsyslog.com/professional-services/> >>>>>> > >>>>>> >>>>>> >>>>>>> >>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>> myriad >>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>>>>> DON'T LIKE THAT. >>>>>> >>>>>> >>>>>> >>>>> ______________________________****_________________ >>>>> >>>> rsyslog mailing list >>>> http://lists.adiscon.net/****mailman/listinfo/rsyslog<http://lists.adiscon.net/**mailman/listinfo/rsyslog> >>>> <http:**//lists.adiscon.net/mailman/**listinfo/rsyslog<http://lists.adiscon.net/mailman/listinfo/rsyslog> >>>> > >>>> http://www.rsyslog.com/****professional-services/<http://www.rsyslog.com/**professional-services/> >>>> <http://**www.rsyslog.com/professional-**services/<http://www.rsyslog.com/professional-services/> >>>> > >>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>>> DON'T LIKE THAT. >>>> >>>> ______________________________****_________________ >>>> >>> rsyslog mailing list >>> http://lists.adiscon.net/****mailman/listinfo/rsyslog<http://lists.adiscon.net/**mailman/listinfo/rsyslog> >>> <http:**//lists.adiscon.net/mailman/**listinfo/rsyslog<http://lists.adiscon.net/mailman/listinfo/rsyslog> >>> > >>> http://www.rsyslog.com/****professional-services/<http://www.rsyslog.com/**professional-services/> >>> <http://**www.rsyslog.com/professional-**services/<http://www.rsyslog.com/professional-services/> >>> > >>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>> DON'T LIKE THAT. >>> >>> ______________________________**_________________ >> rsyslog mailing list >> http://lists.adiscon.net/**mailman/listinfo/rsyslog<http://lists.adiscon.net/mailman/listinfo/rsyslog> >> http://www.rsyslog.com/**professional-services/<http://www.rsyslog.com/professional-services/> >> What's up with rsyslog? Follow https://twitter.com/rgerhards >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >> DON'T LIKE THAT. >> >> ______________________________**_________________ > rsyslog mailing list > http://lists.adiscon.net/**mailman/listinfo/rsyslog<http://lists.adiscon.net/mailman/listinfo/rsyslog> > http://www.rsyslog.com/**professional-services/<http://www.rsyslog.com/professional-services/> > What's up with rsyslog? Follow https://twitter.com/rgerhards > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you > DON'T LIKE THAT. > _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

