Am 9. Juni 2022 21:04:37 MESZ schrieb Alexander Graf <ag...@csgraf.de>:
>
>On 07.06.22 10:28, Heinrich Schuchardt wrote:
>> On 6/7/22 01:43, Alexander Graf wrote:
>>> This patch set speeds up graphics output on ARM by a factor of 60x.
>>> 
>>> On most ARM SBCs, we keep the frame buffer in DRAM and map it as cached,
>>> but need it accessible by the display controller which reads directly
>>> from a later point of consistency. Hence, we flush the frame buffer to
>>> DRAM on every change. The full frame buffer.
>> 
>> Isn't a similar problem already solved by CONFIG_VIDEO_COPY?
>> 
>> Leaving the frame buffer uncached would convert the ARM problem into the
>> X86 case?
>
>
>It solves a similar problem, yes. However, it requires us to allocate the 
>frame buffer size twice, and we would need to dynamically toggle the MMU 
>mappings of the frame buffer to WC instead of cached. That's code we don't 
>have today.
>
>VIDEO_COPY is also terribly inefficient in the most common case: Drawing one 
>or multiple characters. It basically copies every line that contains the 
>character, for every character printed. The damage code in this patch set only 
>flushes the relevant rectangles after a string is fully printed.
>
>I think overall, damage tracking with cached memory is simple enough that it 
>gives us the best of all worlds.
>
>
>> 
>>> 
>>> Unfortunately, with the advent of 4k displays, we are seeing frame buffers
>>> that can take a while to flush out. This was reported by Da Xue with grub,
>>> which happily print 1000s of spaces on the screen to draw a menu. Every
>>> printed space triggers a cache flush.
>>> 
>>> This patch set implements the easiest mitigation against this problem:
>>> Damage tracking. We remember the lowest common denominator region that was
>>> touched since the last video_sync() call and only flush that.
>> 
>> If by "lowest common denominator region" you should mean a rectangle,
>> drawing a point in the upper left corner and another in the lower right
>> corner would require a full flush. So nothing gained in this case.
>
>
>Glad you asked! :)
>
>While theoretically possible, this is a case that just never happens in 
>U-Boot's code flow. All code that draws to the screen is either blt based 
>(like gop, character drawing or logo display) or moves large portions of the 
>screen (scrolling). The largest granularity we have between syncs is when 
>printing strings. So the worst case you'll have today is a wrap around where 
>you'd end up flushing full lines.
>
>
>> 
>>> 
>>> With this patch set applied, we reduce drawing a large grub menu (with
>>> serial console attached for size information) on an RK3399-ROC system
>>> at 1440p from 55 seconds to less than 1 second.
>>> 
>>> 
>>> Alternatives considered:
>>> 
>>>    1) Lazy sync - Sandbox does this. It only calls video_sync(true) ever
>>>       so often. We are missing timers to do this generically.
>>> 
>>>    2) Double buffering - We could try to identify whether anything changed
>>>       at all and only draw to the FB if it did. That would require
>>>       maintaining a second buffer that we need to scan.
>>> 
>>>    3) Text buffer - Maintain a buffer of all text printed on the screen with
>>>       respective location. Don't write if the old and new character are
>>>       identical. This would limit applicability to text only and is an
>>>       optimization on top of this patch set.
>>> 
>>>    4) Hash screen lines - Create a hash (sha256?) over every line when it
>>>       changes. Only flush when it does. I'm not sure if this would waste
>>>       more time, memory and cache than the current approach. It would make
>>>       full screen updates much more expensive.
>>> 
>>> Alexander Graf (6):
>>>    dm: video: Add damage tracking API
>>>    dm: video: Add damage notification on display clear
>>>    vidconsole: Add damage notifications to all vidconsole drivers
>>>    video: Add damage notification on bmp display
>>>    efi_loader: GOP: Add damage notification on BLT
>>>    video: Only dcache flush damaged lines
>> 
>> We need documentation describing the difference between
>> CONFIG_VIDEO_COPY and CONFIG_VIDEO_DAMAGE.
>
>
>Hm, maybe we should implement CONFIG_VIDEO_COPY as a flush mechanism behind 
>CONFIG_VIDEO_DAMAGE? That way we only have a single code path for producers 
>left and in addition also optimize drawing individual characters. It would 
>also make the feature useful beyond ARM dcache flushing.


Please, consider that RISC-V has no instruction for flushing the data cache. So 
CONFIG_VIDEO_DAMAGE is not applicable.

Best regards

Heinrich


>
>
>Alex
>
>

Reply via email to