Am 9. Juni 2022 21:04:37 MESZ schrieb Alexander Graf <ag...@csgraf.de>:
>
>On 07.06.22 10:28, Heinrich Schuchardt wrote:
>> On 6/7/22 01:43, Alexander Graf wrote:
>>> This patch set speeds up graphics output on ARM by a factor of 60x.
>>>
>>> On most ARM SBCs, we keep the frame buffer in DRAM and map it as cached,
>>> but need it accessible by the display controller which reads directly
>>> from a later point of consistency. Hence, we flush the frame buffer to
>>> DRAM on every change. The full frame buffer.
>>
>> Isn't a similar problem already solved by CONFIG_VIDEO_COPY?
>>
>> Leaving the frame buffer uncached would convert the ARM problem into the
>> X86 case?
>
>
>It solves a similar problem, yes. However, it requires us to allocate the
>frame buffer size twice, and we would need to dynamically toggle the MMU
>mappings of the frame buffer to WC instead of cached. That's code we don't
>have today.
>
>VIDEO_COPY is also terribly inefficient in the most common case: Drawing one
>or multiple characters. It basically copies every line that contains the
>character, for every character printed. The damage code in this patch set only
>flushes the relevant rectangles after a string is fully printed.
>
>I think overall, damage tracking with cached memory is simple enough that it
>gives us the best of all worlds.
>
>
>>
>>>
>>> Unfortunately, with the advent of 4k displays, we are seeing frame buffers
>>> that can take a while to flush out. This was reported by Da Xue with grub,
>>> which happily print 1000s of spaces on the screen to draw a menu. Every
>>> printed space triggers a cache flush.
>>>
>>> This patch set implements the easiest mitigation against this problem:
>>> Damage tracking. We remember the lowest common denominator region that was
>>> touched since the last video_sync() call and only flush that.
>>
>> If by "lowest common denominator region" you should mean a rectangle,
>> drawing a point in the upper left corner and another in the lower right
>> corner would require a full flush. So nothing gained in this case.
>
>
>Glad you asked! :)
>
>While theoretically possible, this is a case that just never happens in
>U-Boot's code flow. All code that draws to the screen is either blt based
>(like gop, character drawing or logo display) or moves large portions of the
>screen (scrolling). The largest granularity we have between syncs is when
>printing strings. So the worst case you'll have today is a wrap around where
>you'd end up flushing full lines.
>
>
>>
>>>
>>> With this patch set applied, we reduce drawing a large grub menu (with
>>> serial console attached for size information) on an RK3399-ROC system
>>> at 1440p from 55 seconds to less than 1 second.
>>>
>>>
>>> Alternatives considered:
>>>
>>> 1) Lazy sync - Sandbox does this. It only calls video_sync(true) ever
>>> so often. We are missing timers to do this generically.
>>>
>>> 2) Double buffering - We could try to identify whether anything changed
>>> at all and only draw to the FB if it did. That would require
>>> maintaining a second buffer that we need to scan.
>>>
>>> 3) Text buffer - Maintain a buffer of all text printed on the screen with
>>> respective location. Don't write if the old and new character are
>>> identical. This would limit applicability to text only and is an
>>> optimization on top of this patch set.
>>>
>>> 4) Hash screen lines - Create a hash (sha256?) over every line when it
>>> changes. Only flush when it does. I'm not sure if this would waste
>>> more time, memory and cache than the current approach. It would make
>>> full screen updates much more expensive.
>>>
>>> Alexander Graf (6):
>>> dm: video: Add damage tracking API
>>> dm: video: Add damage notification on display clear
>>> vidconsole: Add damage notifications to all vidconsole drivers
>>> video: Add damage notification on bmp display
>>> efi_loader: GOP: Add damage notification on BLT
>>> video: Only dcache flush damaged lines
>>
>> We need documentation describing the difference between
>> CONFIG_VIDEO_COPY and CONFIG_VIDEO_DAMAGE.
>
>
>Hm, maybe we should implement CONFIG_VIDEO_COPY as a flush mechanism behind
>CONFIG_VIDEO_DAMAGE? That way we only have a single code path for producers
>left and in addition also optimize drawing individual characters. It would
>also make the feature useful beyond ARM dcache flushing.
Please, consider that RISC-V has no instruction for flushing the data cache. So
CONFIG_VIDEO_DAMAGE is not applicable.
Best regards
Heinrich
>
>
>Alex
>
>