Hi Alex, On Mon, 21 Aug 2023 at 16:40, Alexander Graf <ag...@csgraf.de> wrote: > > > On 22.08.23 00:10, Simon Glass wrote: > > Hi Alex, > > > > On Mon, 21 Aug 2023 at 14:20, Alexander Graf <ag...@csgraf.de> wrote: > >> > >> On 21.08.23 21:57, Simon Glass wrote: > >>> Hi Alex, > >>> > >>> On Mon, 21 Aug 2023 at 13:33, Alexander Graf <ag...@csgraf.de> wrote: > >>>> On 21.08.23 21:11, Simon Glass wrote: > >>>>> Hi Alper, > >>>>> > >>>>> On Mon, 21 Aug 2023 at 07:51, Alper Nebi Yasak > >>>>> <alpernebiya...@gmail.com> wrote: > >>>>>> This is a rebase of Alexander Graf's video damage tracking series, with > >>>>>> some tests and other changes. The original cover letter is as follows: > >>>>>> > >>>>>>> This patch set speeds up graphics output on ARM by a factor of 60x. > >>>>>>> > >>>>>>> On most ARM SBCs, we keep the frame buffer in DRAM and map it as > >>>>>>> cached, > >>>>>>> but need it accessible by the display controller which reads directly > >>>>>>> from a later point of consistency. Hence, we flush the frame buffer to > >>>>>>> DRAM on every change. The full frame buffer. > >>>>> It should not, see below. > >>>>> > >>>>>>> Unfortunately, with the advent of 4k displays, we are seeing frame > >>>>>>> buffers > >>>>>>> that can take a while to flush out. This was reported by Da Xue with > >>>>>>> grub, > >>>>>>> which happily print 1000s of spaces on the screen to draw a menu. > >>>>>>> Every > >>>>>>> printed space triggers a cache flush. > >>>>> That is a bug somewhere in EFI. > >>>> Unfortunately not :). You may call it a bug in grub: It literally prints > >>>> over space characters for every character in its menu that it wants > >>>> cleared. On every text screen draw. > >>>> > >>>> This wouldn't be a big issue if we only flush the reactangle that gets > >>>> modified. But without this patch set, we're flushing the full DRAM > >>>> buffer on every u-boot text console character write, which means for > >>>> every character (as that's the only API UEFI has). > >>>> > >>>> As a nice side effect, we speed up the normal U-Boot text console as > >>>> well with this patch set, because even "normal" text prints that write > >>>> for example a single line of text on the screen today flush the full > >>>> frame buffer to DRAM. > >>> No, I mean that it is a bug that U-Boot (apparently) flushes the cache > >>> after every character. It doesn't do that for normal character output > >>> and I don't think it makes sense to do it for EFI either. > >> > >> I see. Let's trace the calls: > >> > >> efi_cout_output_string() > >> -> fputs() > >> -> vidconsole_puts() > >> -> video_sync() > >> -> flush_dcache_range() > >> > >> Unfortunately grub abstracts character backends down to the "print > >> character" level, so it calls UEFI's sopisticated "output_string" > >> callback with single characters at a time, which means we do a full > >> dcache flush for every character that we print: > >> > >> https://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/term/efi/console.c#n165 > >> > >> > >>>>>>> This patch set implements the easiest mitigation against this problem: > >>>>>>> Damage tracking. We remember the lowest common denominator region > >>>>>>> that was > >>>>>>> touched since the last video_sync() call and only flush that. The most > >>>>>>> typical writer to the frame buffer is the video console, which always > >>>>>>> writes rectangles of characters on the screen and syncs afterwards. > >>>>>>> > >>>>>>> With this patch set applied, we reduce drawing a large grub menu (with > >>>>>>> serial console attached for size information) on an RK3399-ROC system > >>>>>>> at 1440p from 55 seconds to less than 1 second. > >>>>>>> > >>>>>>> Version 2 also implements VIDEO_COPY using this mechanism, reducing > >>>>>>> its > >>>>>>> overhead compared to before as well. So even x86 systems should be > >>>>>>> faster > >>>>>>> with this now :). > >>>>>>> > >>>>>>> > >>>>>>> Alternatives considered: > >>>>>>> > >>>>>>> 1) Lazy sync - Sandbox does this. It only calls video_sync(true) > >>>>>>> ever > >>>>>>> so often. We are missing timers to do this generically. > >>>>>>> > >>>>>>> 2) Double buffering - We could try to identify whether anything > >>>>>>> changed > >>>>>>> at all and only draw to the FB if it did. That would require > >>>>>>> maintaining a second buffer that we need to scan. > >>>>>>> > >>>>>>> 3) Text buffer - Maintain a buffer of all text printed on the > >>>>>>> screen with > >>>>>>> respective location. Don't write if the old and new character > >>>>>>> are > >>>>>>> identical. This would limit applicability to text only and is > >>>>>>> an > >>>>>>> optimization on top of this patch set. > >>>>>>> > >>>>>>> 4) Hash screen lines - Create a hash (sha256?) over every line > >>>>>>> when it > >>>>>>> changes. Only flush when it does. I'm not sure if this would > >>>>>>> waste > >>>>>>> more time, memory and cache than the current approach. It > >>>>>>> would make > >>>>>>> full screen updates much more expensive. > >>>>> 5) Fix the bug mentioned above? > >>>>> > >>>>>> Changes in v5: > >>>>>> - Add patch "video: test: Split copy frame buffer check into a > >>>>>> function" > >>>>>> - Add patch "video: test: Support checking copy frame buffer contents" > >>>>>> - Add patch "video: test: Test partial updates of hardware frame > >>>>>> buffer" > >>>>>> - Use xstart, ystart, xend, yend as names for damage region > >>>>>> - Document damage struct and fields in struct video_priv comment > >>>>>> - Return void from video_damage() > >>>>>> - Fix undeclared priv error in video_sync() > >>>>>> - Drop unused headers from video-uclass.c > >>>>>> - Use IS_ENABLED() instead of CONFIG_IS_ENABLED() > >>>>>> - Call video_damage() also in video_fill_part() > >>>>>> - Use met->baseline instead of priv->baseline > >>>>>> - Use fontdata->height/width instead of VIDEO_FONT_HEIGHT/WIDTH > >>>>>> - Update console_rotate.c video_damage() calls to pass video tests > >>>>>> - Remove mention about not having minimal damage for console_rotate.c > >>>>>> - Add patch "video: test: Test video damage tracking via vidconsole" > >>>>>> - Document new vdev field in struct efi_gop_obj comment > >>>>>> - Remove video_sync_copy() also from video_fill(), video_fill_part() > >>>>>> - Fix memmove() calls by removing the extra dev argument > >>>>>> - Call video_sync() before checking copy_fb in video tests > >>>>>> - Imply VIDEO_DAMAGE for video drivers instead of selecting it > >>>>>> - Imply VIDEO_DAMAGE also for VIDEO_TIDSS > >>>>>> > >>>>>> v4: https://lore.kernel.org/all/20230103215004.22646-1-ag...@csgraf.de/ > >>>>>> > >>>>>> Changes in v4: > >>>>>> - Move damage clear to patch "dm: video: Add damage tracking API" > >>>>>> - Simplify first damage logic > >>>>>> - Remove VIDEO_DAMAGE default for ARM > >>>>>> - Skip damage on EfiBltVideoToBltBuffer > >>>>>> - Add patch "video: Always compile cache flushing code" > >>>>>> - Add patch "video: Enable VIDEO_DAMAGE for drivers that need it" > >>>>>> > >>>>>> v3: https://lore.kernel.org/all/20221230195828.88134-1-ag...@csgraf.de/ > >>>>>> > >>>>>> Changes in v3: > >>>>>> - Adapt to always assume DM is used > >>>>>> - Adapt to always assume DM is used > >>>>>> - Make VIDEO_COPY always select VIDEO_DAMAGE > >>>>>> > >>>>>> v2: https://lore.kernel.org/all/20220609225921.62462-1-ag...@csgraf.de/ > >>>>>> > >>>>>> Changes in v2: > >>>>>> - Remove ifdefs > >>>>>> - Fix ranges in truetype target > >>>>>> - Limit rotate to necessary damage > >>>>>> - Remove ifdefs from gop > >>>>>> - Fix dcache range; we were flushing too much before > >>>>>> - Add patch "video: Use VIDEO_DAMAGE for VIDEO_COPY" > >>>>>> > >>>>>> v1: https://lore.kernel.org/all/20220606234336.5021-1-ag...@csgraf.de/ > >>>>>> > >>>>>> Alexander Graf (9): > >>>>>> dm: video: Add damage tracking API > >>>>>> dm: video: Add damage notification on display fills > >>>>>> vidconsole: Add damage notifications to all vidconsole drivers > >>>>>> video: Add damage notification on bmp display > >>>>>> efi_loader: GOP: Add damage notification on BLT > >>>>>> video: Only dcache flush damaged lines > >>>>>> video: Use VIDEO_DAMAGE for VIDEO_COPY > >>>>>> video: Always compile cache flushing code > >>>>>> video: Enable VIDEO_DAMAGE for drivers that need it > >>>>>> > >>>>>> Alper Nebi Yasak (4): > >>>>>> video: test: Split copy frame buffer check into a function > >>>>>> video: test: Support checking copy frame buffer contents > >>>>>> video: test: Test partial updates of hardware frame buffer > >>>>>> video: test: Test video damage tracking via vidconsole > >>>>>> > >>>>>> arch/arm/mach-omap2/omap3/Kconfig | 1 + > >>>>>> arch/arm/mach-sunxi/Kconfig | 1 + > >>>>>> drivers/video/Kconfig | 26 +++ > >>>>>> drivers/video/console_normal.c | 27 ++-- > >>>>>> drivers/video/console_rotate.c | 94 +++++++---- > >>>>>> drivers/video/console_truetype.c | 37 +++-- > >>>>>> drivers/video/exynos/Kconfig | 1 + > >>>>>> drivers/video/imx/Kconfig | 1 + > >>>>>> drivers/video/meson/Kconfig | 1 + > >>>>>> drivers/video/rockchip/Kconfig | 1 + > >>>>>> drivers/video/stm32/Kconfig | 1 + > >>>>>> drivers/video/tegra20/Kconfig | 1 + > >>>>>> drivers/video/tidss/Kconfig | 1 + > >>>>>> drivers/video/vidconsole-uclass.c | 16 -- > >>>>>> drivers/video/video-uclass.c | 190 ++++++++++++---------- > >>>>>> drivers/video/video_bmp.c | 7 +- > >>>>>> include/video.h | 59 +++---- > >>>>>> include/video_console.h | 52 ------ > >>>>>> lib/efi_loader/efi_gop.c | 7 + > >>>>>> test/dm/video.c | 256 > >>>>>> ++++++++++++++++++++++++------ > >>>>>> 20 files changed, 483 insertions(+), 297 deletions(-) > >>>>> It is good to see this tidied up into something that can be applied! > >>>>> > >>>>> I am unsure what is going on with the EFI performance, though. It > >>>>> should not flush the cache after every character, only after a new > >>>>> line. Is there something wrong in here? If so, we should fix that bug > >>>>> first and it should be patch 1 of this series. > >>>> Before I came up with this series, I was trying to identify the UEFI bug > >>>> in question as well, because intuition told me surely this is a bug in > >>>> UEFI :). Turns out it really isn't this time around. > >>> I don't mean a bug in UEFI, I mean a bug in U-Boot's EFI > >>> implementation. Where did you look for the bug? > >> > >> The "real" bug is in grub. But given that it's reasonably simple to work > >> around in U-Boot and even with it "fixed" in grub we would still see > >> performance benefits from flushing only parts of the screen, I think > >> it's worth living with the grub deficiency. > > OK thanks for digging into it. I suggest we add a param to > > vidconsole_puts() to tell it whether to sync or not, then the EFI code > > can indicate this and try to be a bit smarter about it. > > > It doesn't know when to sync either. From its point of view, any > "console output" could be the last one. There is no API in UEFI that > says "please flush console output now".
Yes, I understand. I was not suggesting we were missing an API. But some sort of heuristic would do, e.g. only flush on a newline, flush every 50 chars, etc. Regards, Simon