On Wed, 13 Dec 2017 10:31:58 +0100 David Hildenbrand <da...@redhat.com> wrote:
> On 13.12.2017 10:16, Christian Borntraeger wrote: > > > > > > On 12/12/2017 04:28 PM, Cornelia Huck wrote: > >> On Tue, 12 Dec 2017 16:17:17 +0100 > >> David Hildenbrand <da...@redhat.com> wrote: > >> > >>> On 12.12.2017 15:29, Cornelia Huck wrote: > >>>> On Tue, 12 Dec 2017 15:13:46 +0100 > >>>> Christian Borntraeger <borntrae...@de.ibm.com> wrote: > >>>> > >>>>> On 12/12/2017 02:49 PM, Cornelia Huck wrote: > >>>> > >>>>>> One thing I noticed: You removed the caching of the flic (in the old > >>>>>> kvm inject routine), and you generally do more qom invocations (first, > >>>>>> to find the common flic; then, to translate to the qemu or kvm flic). > >>>>>> Not sure if this might be a problem (probably not). > >>>>> > >>>>> Is any of these calls on a potential fast path (e.g. guest without > >>>>> adapter > >>>>> interrupts)? If yes, then QOM is a no-go since it is really slow. > >>>> > >>>> At least the new airq interface was using QOM without caching before. > >>>> > >>>> It's basically about any interrupt; but otoh we are (for kvm) in > >>>> userspace already. Caching the flic and just keeping the casting to the > >>>> specialized flic might be ok (I'd guess that the lookup is the slowest > >>>> path.) > >>>> > >>> > >>> Please note that the lookup is already cached in s390_get_flic(); That > >>> should be sufficient, as it does the expensive lookup. One cache should > >>> be enough, no? > >> > >> Ah, missed that. So the old code actually did double caching... > >> > >>> > >>> The other conversions should be cheap (and already were in place in a > >>> couple of places before). > >> > >> Yes, object_resolve_path() is probably the most expensive one. > >> > >> Did anyone ever check if the (existing) conversions are actually > >> measurable? > > > > Did some quick measurement. > > the initial object resolve takes about 20000ns, the cached ones then is > > 2-5ns. > > (measuring s390_get_flic function). > > > > > > The other conversions (S390FLICStateClass *fsc = > > S390_FLIC_COMMON_GET_CLASS(fs);) > > takes about 15-25ns (up to 100 cycles). > > For irqfd users this does not matter, but things like plan9fs might benefit > > if we speedup this lookup as well? > > Did you also measure the state conversion like QEMU_S390_FLIC() or > KVM_S390_FLIC() ? > > As we call e.g. QEMU_S390_FLIC() on a regular basis in TCG, it might > then also make sense to speed that up. > > a) cache the (converted) state and class in every function. Somewhat > uglifies the code. > > b) introduce new functions that return the cached value > > Not sure what's better. I prefer to do this as a separate addon patch > and can prepare something. If you want to do it as addon, I vote for option b). > > > > > > > PS: We can improve the initial object_resolve_path by doing the resolve not > > for > > TYPE_KVM_S390_FLIC > > but > > "/machine/" TYPE_KVM_S390_FLIC > > (2500ns instead of 20000) > > but its still way too slow. > > > > Specifying the absolute path would be even faster I guess. > > /machine/s390-flic-qemu/ ... I don't think we really need to speed up the initial lookup.