On Sat, 19 Sep 2015, Laszlo Ersek wrote: > Got some good news: with those two fixups in place (register block > size corrected, and dma_enabled set via device property), I could > test the AAVMF / ArmVirtPkg / <insert your favorite synonym here> > patches. > > On my APM Mustang, downloading a decompressed kernel (14,475,776 > bytes), a decompressed initrd (18,177,264), and a cmdline (104 bytes :)), > in total 32,653,144 bytes, takes approx. 24 seconds with the 8-byte wide > MMIO data register. (Yeah, it's *really* slow.) > > Using the DMA interface, the same takes about 52 milliseconds, and > that still includes one progress message per 1 MB downloaded :) > > It's a factor of approx. 450. Not bad. Not bad. :)
So I've been catching up (after a several-week-long day-job related detour :) with the latest developments in fw_cfg -- and the DMA stuff looks good, and makes for a very educational read! I was re-reading the documentation for fw_cfg_add_file_callback(), and noticed that non-dma read operations check for the presence of a callback (and call it if present) for *every* *single* *byte*, even on 64-bit MMIO reads. That's also what the documentation says (in docs/specs/fw_cfg.txt, being moved into fw_cfg.h as per http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05315.html). During DMA reads, however, the callback is only checked once before each chunk, effectively once per DMA read operation. Now, typical callbacks I found throughout the qemu source tend to return immediately except for the first time they're invoked, but I wonder if skipping over all those extra "do I have a callback, if so call it, mostly so it can return without doing anything" per-byte operations account in some significant part for the dramatically faster transfers? Not sure how I'd test for that -- besides my not having anything resembling a viable ARM setup, I'm not sure if limiting the callbacks to only be invoked if (s->cur_offset == 0) would make sense, just as a test ? Either way, I'll send out a v2 of my fw_cfg function-call doc patch to additionally say something like: * structure residing at key value FW_CFG_FILE_DIR, containing the * item name, * data size, and assigned selector key value. * Additionally, set a callback function (and argument) to be called * each - * time a byte is read by the guest from this particular item. + * time a byte is read by the guest from this particular item, or once per + * each DMA guest read operation. * NOTE: In addition to the opaque argument set here, the callback * function * takes the current data offset as an additional argument, allowing * it the * option of only acting upon specific offset values (e.g., 0, before * the Let me know what you all think... Thanks much, --Gabriel