Re: Display update issue on M1 Macs

BALATON Zoltan Thu, 02 Feb 2023 02:54:25 -0800

On Tue, 31 Jan 2023, BALATON Zoltan wrote:

On Tue, 31 Jan 2023, Akihiko Odaki wrote:

[...]
To summarise previous discussion:

- There's a problem on Apple M1 Macs with sm501 and ati-vga 2d accelfunctions drawing from device model into the video memory of the emulatedcard which is not shown on screen when the display update callback iscalled from another thread. This works on x86_64 host so I suspect it maybe related to missing memory synchronisation that ARM may need.

- This can be reproduced running AmigaOS4 on sam460ex or MorphOS (demo isodownliadable from their web site) on sam460ex, pegasos2 or mac99,via=pmuwith -device ati-vga,romfile="" as described here:http://zero.eik.bme.hu/~balaton/qemu/amiga/

- I can't test it myself lacking hardware so I have to rely on reportsfrom people who have this hardware so there may be some uncertainity inthe info I get.

- We have confirmed it's not related to a known race condition asdisabling dirty tracking and always doing full updates of whole screendid not fix it:

But there is an exception: memory_region_snapshot_and_clear_dirty()releases iothread lock, and that broke raspi3b display device:
https://lore.kernel.org/qemu-devel/CAFEAcA9odnPo2LPip295Uztri7JfoVnQbkJ=wn+k8dqneb_...@mail.gmail.com/T/
It is unexpected that gfx_update() callback releases iothread lock soit may break things in peculiar ways.
Peter, is there any change in the situation regarding the raceintroduced by memory_region_snapshot_and_clear_dirty()?
For now, to workaround the issue, I think you can create anothermutex and make the entire sm501_2d_engine_write() andsm501_update_display() critical sections.
Interesting thread but not sure it's the same problem so thisworkaround may not be enough to fix my issue. Here's a video posted byone of the people who reported it showing the problem on M1 Mac:
https://www.youtube.com/watch?v=FDqoNbp6PQs

and here's how it looks like on other machines:

https://www.youtube.com/watch?v=ML7-F4HNFKQ
There are also videos showing it running on RPi 4 and G5 Mac withoutthis issue so it seems to only happen on Apple Silicon M1 Macs. What'sstrange is that graphics elements are not just delayed which I thinkshould happen with missing thread synchronisation where the updatecallback would miss some pixels rendered during it's running butsubsequent update callbacks would eventually draw those, woudn't they?Also setting full_update to 1 in sm501_update_display() callback todisable dirty tracking does not fix the problem. So it looks like asif sm501_2d_operation() running on one CPU core only writes data tothe local cache of that core which sm501_update_display() running onother core can't see, so maybe some cache synchronisation is needed inmemory_region_set_dirty() or if that's already there maybe I shouldcall it for all changes not only those in the visible display area?I'm still not sure I understand the problem and don't know what couldbe a fix for it so anything to test to identify the issue better mightalso bring us closer to a solution.
If you set full_update to 1, you may also comment outmemory_region_snapshot_and_clear_dirty() andmemory_region_snapshot_get_dirty() to avoid the iothread mutex beingunlocked. The iothread mutex should ensure cache coherency as well.
But as you say, it's weird that the rendered result is not just delayedbut missed. That may imply other possibilities (e.g., the results areoverwritten by someone else). If the problem persists after commentingout memory_region_snapshot_and_clear_dirty() andmemory_region_snapshot_get_dirty(), I think you can assume theinter-thread coherency between sm501_2d_operation() andsm501_update_display() is not causing the problem.
I've asked people who reported and can reproduce it to test this but itdid not change anything so confirmed it's not that race condition butlooks more like some cache inconsistency maybe. Any other ideas?
I can come up with two important differences between x86 and Arm whichcan affect the execution of QEMU:1. Memory model. Arm uses a memory model more relaxed than x86 so it ismore sensitive for synchronization failures among threads.2. Different instructions. TCG uses JIT so differences in instructionsmatter.
We should be able to exclude 1) as a potential cause of the problem.iothread mutex should take care of race condition and even cachecoherency problem; mutex includes memory barrier functionality.

[...]

For difference 2), you may try to use TCI. You can find details of TCI intcg/tci/README.
This was tested and also with TCI got the same results just much slower.
The common sense tells, however, the memory model is usually the cause ofthe problem when you see behavioral differences between x86 and Arm, andTCG should work fine with both of x86 and Arm as they should have beentested well.

[...]

Fortunately macOS provides Rosetta 2 for x86 emulation on Apple M1, whichmakes it possible to compare x86 and Arm without concerning the differenceof the microarchitecture.
We've tried that before and even running x86 QEMU on M1 with Rosetta 2 it wasthe same so it's probably not something about the code itself but how it's

As this was odd I've asked to re-test this and now I'm told at least QEMU5.1 x86_64 build from emaculation.com is working with Rosetta on M1 Mac sothis suggests it may be a problem with memory sync but still don't knowwhere and what to try. We're now try newer X86_64 builds to see if itbroke somewhere along the way.

Anybody else with an M1 Mac wants to help testing? Can you reproduce thesame with UTM with MorphOS and ati-vga? Here's what I've got showing theproblem: https://www.youtube.com/watch?v=j5Ag5_Yq-Mk


Regards,
BALATON Zoltan

Re: Display update issue on M1 Macs

Reply via email to