Cornelia Huck <coh...@redhat.com> writes: > On Wed, 30 May 2018 11:30:45 +0800 > Peter Xu <pet...@redhat.com> wrote: > >> On Tue, May 29, 2018 at 11:30:00AM +0200, Cornelia Huck wrote: >> > On Thu, 24 May 2018 12:44:53 +0800 >> > Peter Xu <pet...@redhat.com> wrote: >> > >> > > There are many error_report()s that can be used in frequently called >> > > functions, especially on IO paths. That can be unideal in that >> > > malicious guest can try to trigger the error tons of time which might >> > > use up the log space on the host (e.g., libvirt can capture the stderr >> > > of QEMU and put it persistently onto disk). In VT-d emulation code, we >> > > have trace_vtd_error() tracer. AFAIU all those places can be replaced >> > > by something like error_report() but trace points are mostly used to >> > > avoid the DDOS attack that mentioned above. However using trace points >> > > mean that errors are not dumped if trace not enabled. >> > > >> > > It's not a big deal in most modern server managements since we have >> > > things like logrotate to maintain the logs and make sure the quota is >> > > expected. However it'll still be nice that we just provide another way >> > > to restrict message generations. In most cases, this kind of >> > > error_report()s will only provide valid information on the first message >> > > sent, and all the rest of similar messages will be mostly talking about >> > > the same thing. This patch introduces *_report_once() helpers to allow >> > > a message to be dumped only once during one QEMU process's life cycle. >> > > It will make sure: (1) it's on by deffault, so we can even get something >> > > without turning the trace on and reproducing, and (2) it won't be >> > > affected by DDOS attack. >> > >> > This is good for something (sub-)system wide, where it is enough to >> > alert the user once; but we may want to print something e.g. once per >> > the device where it happens (see v3 of "vfio-ccw: add force unlimited >> > prefetch property" for an example). >> >> I'm glad that we start to have more users of it, no matter which >> implementation we'll choose. At least it means it makes some sense to >> have such a thing. >> >> For me this patch works nicely enough. Of course there can be >> per-device errors, but AFAICT mostly we don't have any error at >> all... and my debugging experience is that when multiple error happens >> on different devices simutaneously, they'll very possible that it's >> caused by the same problem (again, errors are rare after all), or, the >> rest of the problems are caused by the first error only (so the first >> error might cause a collapse of the rest). That's why I wanted to >> always debug with the first error I encounter, because that's mostly >> always the root cause. In that sense, current patch works nicely for >> me (note that we can have device ID embeded in the error message with >> this one). > > I think we have slightly different use cases here. In your case, > something (an error) happened and you don't really care about any > subsequent messages. In the vfio-ccw case, we want to log when a guest > tries to do things on a certain device, so we can possibly throw a > magic switch for that device. We still want messages after that so we > can catch further devices for which we may want to throw the magic > switch. (Similar for the other message, we want to give a hint why a > certain device may not work as expected.) > >> >> At the same time, I think this patch should be easier to use of course >> - no extra variables to define, and very self contained. So I would >> slightly prefer current patch. >> >> However I'm also fine with the approach proposed in the vfio-ccw patch >> too. Though if so I would possibly drop the 2nd patch too since if >> with the vfio-ccw patch I'll need to introduce one bool for every >> trace_vtd_err() place... then I'd not bother with it any more but >> instead live with that trace_*(). ;) Or I can define one per-IOMMU >> error boolean and pass it in for each of the error_report_once(), but >> it seems a bit awkward too. > > I think we can have both the fine-grained control and convenience > macros for those cases where we really just want to print a message > once.
Yes. Cornelia, feel free to post a followup patch that satisfies your needs. >> >> > >> > > >> > > To implement it, I stole the printk_once() macro from Linux. >> > >> > Would something akin to printk_ratelimited() also make sense to avoid >> > log flooding? >> >> Yes it will. IMHO we can have that too as follow up if we want, and >> it does not conflict with this print_once(). I'd say currently this >> error_report_once() is good enough for me, especially lightweighted. >> I suspect we'll need more lines to implement a ratelimited version. > > Yes, and I agree that wants to be a separate patch should we find a use > case for it. Yes.