On Fri, May 21, 2010 at 5:23 PM, Artyom Tarasenko <atar4q...@googlemail.com> wrote: > 2010/5/10 Blue Swirl <blauwir...@gmail.com>: >> On 5/10/10, Artyom Tarasenko <atar4q...@googlemail.com> wrote: >>> 2010/5/10 Blue Swirl <blauwir...@gmail.com>: >>> >>> > On 5/10/10, Artyom Tarasenko <atar4q...@googlemail.com> wrote: >>> >> 2010/5/9 Blue Swirl <blauwir...@gmail.com>: >>> >> > On 5/9/10, Artyom Tarasenko <atar4q...@googlemail.com> wrote: >>> >> >> 2010/5/9 Blue Swirl <blauwir...@gmail.com>: >>> >> >> >>> >> >> > On 5/8/10, Artyom Tarasenko <atar4q...@googlemail.com> wrote: >>> >> >> >> On the real hardware (SS-5, LX) the MMU is not padded, but >>> aliased. >>> >> >> >> Software shouldn't use aliased addresses, neither should it >>> crash >>> >> >> >> when it uses (on the real hardware it wouldn't). Using >>> empty_slot >>> >> >> >> instead of aliasing can help with debugging such accesses. >>> >> >> > >>> >> >> > TurboSPARC Microprocessor User's Manual shows that there are >>> >> >> > additional pages after the main IOMMU for AFX registers. So this >>> is >>> >> >> > not board specific, but depends on CPU/IOMMU versions. >>> >> >> >>> >> >> >>> >> >> I checked it on the real hw: on LX and SS-5 these are aliased MMU >>> addresses. >>> >> >> SS-20 doesn't have any aliasing. >>> >> > >>> >> > But are your machines equipped with TurboSPARC or some other CPU? >>> >> >>> >> >>> >> Good point, I must confess, I missed the word "Turbo" in your first >>> >> answer. LX and SS-20 don't. >>> >> But SS-5 must have a TurboSPARC CPU: >>> >> >>> >> ok cd /FMI,MB86904 >>> >> ok .attributes >>> >> context-table 00 00 00 00 03 ff f0 00 00 00 10 00 >>> >> psr-implementation 00000000 >>> >> psr-version 00000004 >>> >> implementation 00000000 >>> >> version 00000004 >>> >> cache-line-size 00000020 >>> >> cache-nlines 00000200 >>> >> page-size 00001000 >>> >> dcache-line-size 00000010 >>> >> dcache-nlines 00000200 >>> >> dcache-associativity 00000001 >>> >> icache-line-size 00000020 >>> >> icache-nlines 00000200 >>> >> icache-associativity 00000001 >>> >> ncaches 00000002 >>> >> mmu-nctx 00000100 >>> >> sparc-version 00000008 >>> >> mask_rev 00000026 >>> >> device_type cpu >>> >> name FMI,MB86904 >>> >> >>> >> and still it behaves the same as TI,TMS390S10 from the LX. This is >>> done on SS-5: >>> >> >>> >> ok 10000000 20 spacel@ . >>> >> 4000009 >>> >> ok 14000000 20 spacel@ . >>> >> 4000009 >>> >> ok 14000004 20 spacel@ . >>> >> 23000 >>> >> ok 1f000004 20 spacel@ . >>> >> 23000 >>> >> ok 10000008 20 spacel@ . >>> >> 4000009 >>> >> ok 14000028 20 spacel@ . >>> >> 4000009 >>> >> ok 1000000c 20 spacel@ . >>> >> 23000 >>> >> ok 10000010 20 spacel@ . >>> >> 4000009 >>> >> >>> >> >>> >> LX is the same except for the IOMMU-version: >>> >> >>> >> ok 10000000 20 spacel@ . >>> >> 4000005 >>> >> ok 14000000 20 spacel@ . >>> >> 4000005 >>> >> ok 18000000 20 spacel@ . >>> >> 4000005 >>> >> ok 1f000000 20 spacel@ . >>> >> 4000005 >>> >> ok 1ff00000 20 spacel@ . >>> >> 4000005 >>> >> ok 1fff0004 20 spacel@ . >>> >> 1fe000 >>> >> ok 10000004 20 spacel@ . >>> >> 1fe000 >>> >> ok 10000108 20 spacel@ . >>> >> 41000005 >>> >> ok 10000040 20 spacel@ . >>> >> 41000005 >>> >> ok 1fff0040 20 spacel@ . >>> >> 41000005 >>> >> ok 1fff0044 20 spacel@ . >>> >> 1fe000 >>> >> ok 1fff0024 20 spacel@ . >>> >> 1fe000 >>> >> >>> >> >>> >> >> At what address the additional AFX registers are located? >>> >> > >>> >> > Here's complete TurboSPARC IOMMU address map: >>> >> > PA[30:0] Register Access >>> >> > 1000_0000 IOMMU Control R/W >>> >> > 1000_0004 IOMMU Base Address R/W >>> >> > 1000_0014 Flush All IOTLB Entries W >>> >> > 1000_0018 Address Flush W >>> >> > 1000_1000 Asynchronous Fault Status R/W >>> >> > 1000_1004 Asynchronous Fault Address R/W >>> >> > 1000_1010 SBus Slot Configuration 0 R/W >>> >> > 1000_1014 SBus Slot Configuration 1 R/W >>> >> > 1000_1018 SBus Slot Configuration 2 R/W >>> >> > 1000_101C SBus Slot Configuration 3 R/W >>> >> > 1000_1020 SBus Slot Configuration 4 R/W >>> >> > 1000_1050 Memory Fault Status R/W >>> >> > 1000_1054 Memory Fault Address R/W >>> >> > 1000_2000 Module Identification R/W >>> >> > 1000_3018 Mask Identification R >>> >> > 1000_4000 AFX Queue Level W >>> >> > 1000_6000 AFX Queue Level R >>> >> > 1000_7000 AFX Queue Status R >>> >> >>> >> >>> >> >>> >> But if I read it correctly 0x12fff294 (which makes SunOS crash with -m >>> 32) is >>> >> well above this limit. >>> > >>> > Oh, so I also misread something. You are not talking about the >>> > adjacent pages, but 16MB increments. >>> > >>> > Earlier I sent a patch for a generic address alias device, would it be >>> > useful for this? >>> >>> >>> Should do as well. But I thought empty_slot is less overhead and >>> easier to debug. >>> > > Also the aliasing patch would require one more parameter: the size of > area which has to be aliased. Except we implement stubs for all > missing devices and and do aliasing of the connected port ranges. And > then again, SS-20 doesn't have aliasing in this area at all. > > What do you think about this (empty_slot) solution (except that I > missed the SoB line)? Meanwhile it's tested with SunOS 4.1.3U1 too.
I'm slightly against it, of course it would help for this but I think we may be missing a bigger problem. >>>> Maybe we have a general design problem, perhaps unassigned access >>>> faults should only be triggered inside SBus slots and ignored >>>> elsewhere. If this is true, generic Sparc32 unassigned access handler >>>> should just ignore the access and special fault generating slots >>>> should be installed for empty SBus address ranges. > > Agreed that they should be special for SBus, because SS-20 OBP is > not happy with the fault we are currently generating. But otherwise I think > qemu > does it correct. On SS-5: > > ok f7ff0000 2f spacel@ . > Data Access Error > ok sfar@ . > f7ff0000 > ok 20000000 2f spacel@ . > Data Access Error > ok sfar@ . > 20000000 > ok 40000000 20 spacel@ . > Data Access Error > ok sfar@ . > 40000000 > > Neither ff7ff0000 nor f20000000, nor 40000000 are in SBus range, right? 40000000 is on SS-5. So is the SBus Control Space in 0x10000000 to 0x1fffffff the only area besides DRAM where the accesses won't trap? >>> My impression was that SS-5 and SS-20 do unassigned accesses a bit >>> differently. >>> The current IOMMU implementation fits SS-20, which has no aliasing. >> >> It's probably rather the board design than just IOMMU. > > Agreed. That's why I bound the patch to machine hwdef and not to iommu. > >>> >> >> > One approach would be that IOMMU_NREGS would be increased to >>> cover >>> >> >> > these registers (with the bump in savevm version field) and >>> >> >> > iommu_init1() should check the version field to see how much >>> MMIO to >>> >> >> > provide. >>> >> >> >>> >> >> >>> >> >> The problem I see here is that we already have too much registers: >>> we >>> >> >> emulate SS-20 IOMMU (I guess), while SS-5 and LX seem to have only >>> >> >> 0x20 registers which are aliased all the way. >>> >> >> >>> >> >> >>> >> >> > But in order to avoid the savevm version change, iommu_init1() >>> could >>> >> >> > just install dummy MMIO (in the TurboSPARC case), if OBP does >>> not care >>> >> >> > if the read back data matches what has been written earlier. >>> Because >>> >> >> > from OBP point of view this is identical to what your patch >>> results >>> >> >> > in, I'd suppose this approach would also work. >>> >> >> >>> >> >> >>> >> >> OBP doesn't seem to care about these addresses at all. It's only >>> the "MUNIX" >>> >> >> SunOS 4.1.4 kernel who does. The "MUNIX" kernel is the only kernel >>> available >>> >> >> during the installation, so it is currently not possible to >>> install 4.1.4. >>> >> >> Surprisingly "GENERIC" kernel which is on the disk after the >>> >> >> installation doesn't >>> >> >> try to access these address ranges either, so a disk image taken >>> from a live >>> >> >> system works. >>> >> >> >>> >> >> Actually access to the non-connected/aliased addresses may also be >>> a >>> >> >> consequence of phys_page_find bug I mentioned before. When I run >>> >> >> install with -m 64 and -m 256 it tries to access different >>> >> >> non-connected addresses. May also be a SunOS bug of course. 256m >>> used >>> >> >> to be a lot back then. >>> >> > >>> >> > Perhaps with 256MB, memory probing advances blindly from memory to >>> >> > IOMMU registers. Proll (used before OpenBIOS) did that once, with bad >>> >> > results :-). If this is true, 64M, 128M and 192M should show >>> identical >>> >> > results and only with close or equal to 256M the accesses happen. >>> >> >>> >> >>> >> 32m: 0x12fff294 >>> >> 64m: 0x14fff294 >>> >> 192m:0x1cfff294 >>> >> 256m:0x20fff294 >>> >> >>> >> Memory probing? It would be strange that OS would do it itself. The OS >>> >> could just >>> >> ask OBP how much does it have. Here is the listing where it happens: >>> >> >>> >> _swift_vac_rgnflush: rd %psr, %g2 >>> >> _swift_vac_rgnflush+4: andn %g2, 0x20, %g5 >>> >> _swift_vac_rgnflush+8: mov %g5, %psr >>> >> _swift_vac_rgnflush+0xc: nop >>> >> _swift_vac_rgnflush+0x10: nop >>> >> _swift_vac_rgnflush+0x14: mov 0x100, %g5 >>> >> _swift_vac_rgnflush+0x18: lda [%g5] 0x4, %g5 >>> >> _swift_vac_rgnflush+0x1c: sll %o2, 0x2, %g1 >>> >> _swift_vac_rgnflush+0x20: sll %g5, 0x4, %g5 >>> >> _swift_vac_rgnflush+0x24: add %g5, %g1, %g5 >>> >> _swift_vac_rgnflush+0x28: lda [%g5] 0x20, %g5 >>> >> >>> >> _swift_vac_rgnflush+0x28: is the fatal one. >>> >> >>> >> kadb> $c >>> >> _swift_vac_rgnflush(?) >>> >> _vac_rgnflush() + 4 >>> >> _hat_setup_kas(0xc00,0xf0447000,0x43a000,0x400,0xf043a000,0x3c0) + 70 >>> >> _startup(0xfe000000,0x10000000,0xfa000000,0xf00e2bfc,0x10,0xdbc00) + >>> 1414 >>> >> _main(0xf00e0fb4,0xf0007810,0x293ff49f,0xa805209c,0x200,0xf00d1d18) + >>> 14 >>> >> >>> >> Unfortunately (but not surprisingly) kadb doesn't allow debugging >>> >> cache-flush code, so I can't check what is in >>> >> [%g5] (aka sfar) on the real machine when this happens. >>> > >>> > Linux code for Swift/TurboSPARC VAC flush should be similar. > > Do you have an idea why would anyone try reading a value referenced in sfar? > Especially during flushing? I can't imagine a case where it wouldn't > produce a fault. No idea, the fault should be inevitable. An explanation how VAC (Virtually Addressed Cache?) works could help. >>> >> But the bug in phys_page_find would explain this accesses: sfar gets >>> >> the wrong address, and then the secondary access happens on this wrong >>> >> address instead of the original one. >>> > >>> > I doubt phys_page_find can be buggy, it is so vital for all architecture. >>> >>> >>> But you've seen the example of buggy behaviour I posted last Friday, right? >>> If it's not phys_page_find, it's either cpu_physical_memory_rw (which >>> is also pretty generic), or >>> the way SS-20 registers devices. Can it be that all the pages must be >>> registered in the proper order? >> >> How about unassigned access handler, could it be suspected? > > Doesn't look like it: it gets a physical address as a parameter. How > would it know the address is wrong? It wouldn't, but IIRC Paul claimed earlier that the unassigned memory handling in QEMU could have problems. >>> I think it's a pretty rare use case where you have a memory fault (not >>> a translation fault) on an unknown address. You may have such fault >>> during device probing, but in such case you know what address you are >>> probing, so you don't care about the sync fault address register. >>> >>> Besides, do all architectures have sync fault address register? >> >> No, I think system level checks like that and IOMMU-like controls on >> most architectures are very poor compared to Sparc32. Server and >> mainframe systems may be a bit better. > > And do we have any mainframe emulated good enough to have a user base > and hence bug reports? The only IOMMU implemented is Sparc32 one so far. I don't know about S390x architecture, that should definitely be mainframe class. AMD IOMMU may be in QEMU one day. About bugs, IIRC NetBSD 3.x crash could be related to IOMMU. >>> >> fwiw the routine is called only once on the real hardware. It sort of >>> >> speaks for your hypothesis about the memory probing. Although it may >>> >> not necessarily probe for memory... >>> >> >>> >> > > > -- > Regards, > Artyom Tarasenko > > solaris/sparc under qemu blog: http://tyom.blogspot.com/ >