On Mon, Jan 27, 2020 at 5:05 AM David Lang <[email protected]> wrote:
>
> we'd need more details about exactly what is happening,
Here's another link to the rsyslog.conf we're using:
https://github.com/lf-edge/eve/blob/master/pkg/rsyslog/rsyslog.conf
and I can also make the content of the rsyslog workdir available
as a tarball if anyone is interested in debugging this.
But the question I was asking is slight larger and it gets
back to your point of...
> but I'll point out that
> when the system crashes, all bets are off, anything that any application tries
> to do to save it's state may fail because of the crash. Power failures are
> even
> worse (unless you have a UPS with enough notification to do a clean shutdown)
...this. I absolutely agree with the above, and yet I'd like to point
out that there are systems that are pretty crash resilient for
a failure scenario like this (which in our case is not even a
failure -- its just a normal operating procedure to do a hard
power cycle).
That's exactly why I asked this larger question of whether
rsyslog as a project is interested in tackling a use case like that.
Because I'd be the first one to admin that in a traditional
data center (where I'd expect 99% of deployment or rsyslog to
happen) none of it matters.
The point being -- we can either agree that a use case like
that is in scope for rsyslog and focus on fixing whatever may
still be not quite working OR the answer is that it is largely
out of scope and then we'll have to find some other piece of
open source to fill the gap. Both are absolutely valid choice
for the project -- I just need to know where y'all stand on this.
> rsyslog keeps logs in ram unless you configure i to use disk-only queues (not
> disk assisted queues), and if you do use disk-only queues, you will slow your
> performance by a factor of 1000x or more, as well as doing a hube amount of
> disk
> I/O (with the performance impact that causes).
Understood (and as you can see we're basically running in what is supposed
to be a full resiliency mode for the main queue: checkpointinterval=1,
syncqueuefiles=on).
But this is a good point to re-iterate that, again, our use case
is very different from a datacenter: we don't have a LOT of
log activity happening within every given node (they are small)
so it appears that we can tolerate the slowdown.
We're also just starting to experiment with rsyslog -- hence
using the main queue configuration for now, but if we're sticking
with rsyslog we're very likely to apply this resiliency configuration
at the action queue level and then only to the actions corresponding
to really critical message (which will reduce the traffic even further).
> you are probably better in
> practice to send your logs out over the network and let them get picked up by
> another system than thinking you will avoid loosing logs in a crash/power
> outage
And yet another point where we're very different from your typical datacenter
scenario ;-) In our case -- the connectivity is very intermittent so we can't
expect message to be safely sent away -- we need node-level resilience (at
least for the critical ones) and then burst log shipping during periods of time
when network is actually on.
> So let's backup and have you explain the failure scenario you are looking at
> and
> what you expect rsyslog to do under those conditions.
Yup. That's indeed why I started the thread! Thanks for replying -- hopefully
I provided enough background info to explain our use case.
Thanks,
Roman.
> David Lang
>
>
>
> On Mon, 27 Jan 2020,
> Roman Shaposhnik via rsyslog wrote:
>
> > Date: Mon, 27 Jan 2020 00:10:59 -0800
> > From: Roman Shaposhnik via rsyslog <[email protected]>
> > To: [email protected]
> > Cc: Roman Shaposhnik <[email protected]>
> > Subject: [rsyslog] rsyslog 8.1904.0 (aka 2019.04) (Alpine build) fails to
> > recover
> >
> > Hi!
> >
> > over at Linux Foundation's Project EVE
> > https://github.com/lf-edge/eve/blob/master/docs/README.md
> > https://www.lfedge.org/projects/eve/
> > we've recently considered moving to rsyslog
> > as our nexus for all the logging needs. Our
> > use case is pretty close to what you would find
> > in an embedded linux (EVE is meant to run on
> > IoT/Edge like devices) and as such power crashes
> > and intermittent connectivity are the norm. Looking
> > at rsyslog's persistent queue architecture gave us
> > a nice feeling that rsyslog project may actually be
> > a great fit for our needs.
> >
> > However, recently, we've seen quite a few crashes
> > related to rsyslog trying to recover from a previous
> > power failure. You can see the log at the bottom
> > of this email.
> >
> > With that in mind, I'm wondering if a use case like
> > that would still be considered a priority for the project
> > and what would be the best way to make sure that
> > we iron out the kinks together.
> >
> > Btw, here's our rsyslog.conf:
> > https://github.com/lf-edge/eve/blob/master/pkg/rsyslog/rsyslog.conf
> >
> > Please let me know if you need any more details and
> > otherwise thanks for an amazing project so far!
> >
> > Thanks,
> > Roman.
> >
> > rsyslogd: file '/persist/rsyslog/main_edge_node_log_queue.00000980':
> > open error: No such file or directory [v8.1904.0 try
> > https://www.rsyslog.com/e/2040 ]
> > rsyslogd: main Q: qDeqDisk error happened at around offset 0
> > [v8.1904.0 try https://www.rsyslog.com/e/2040 ]
> > rsyslogd: main Q: error dequeueing element - ignoring, but strange
> > things may happen [v8.1904.0 try https://www.rsyslog.com/e/2040 ]
> > rsyslogd: file /persist/rsyslog/main_edge_node_log_queue.00000979: fd
> > 8 no longer valid, recovery by reopen; if you see this, consider
> > reporting at
> > https://github.com/rsyslog/rsyslog/issues/3404 so that we know when it
> > happens. Include output of uname -a. OS error reason: Bad file
> > descriptor [v8.1904.0 try https://www.rsyslog.com/e/2027 ]
> > _______________________________________________
> > rsyslog mailing list
> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com/professional-services/
> > What's up with rsyslog? Follow https://twitter.com/rgerhards
> > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
> > sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
> > LIKE THAT.
> >
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.