2010/5/21 Blue Swirl <blauwir...@gmail.com>: > On Fri, May 21, 2010 at 5:23 PM, Artyom Tarasenko > <atar4q...@googlemail.com> wrote: >> 2010/5/10 Blue Swirl <blauwir...@gmail.com>: >>> On 5/10/10, Artyom Tarasenko <atar4q...@googlemail.com> wrote: >>>> 2010/5/10 Blue Swirl <blauwir...@gmail.com>: >>>> >>>> > On 5/10/10, Artyom Tarasenko <atar4q...@googlemail.com> wrote: >>>> >> 2010/5/9 Blue Swirl <blauwir...@gmail.com>: >>>> >> > On 5/9/10, Artyom Tarasenko <atar4q...@googlemail.com> wrote: >>>> >> >> 2010/5/9 Blue Swirl <blauwir...@gmail.com>: >>>> >> >> >>>> >> >> > On 5/8/10, Artyom Tarasenko <atar4q...@googlemail.com> wrote: >>>> >> >> >> On the real hardware (SS-5, LX) the MMU is not padded, but >>>> aliased. >>>> >> >> >> Software shouldn't use aliased addresses, neither should it >>>> crash >>>> >> >> >> when it uses (on the real hardware it wouldn't). Using >>>> empty_slot >>>> >> >> >> instead of aliasing can help with debugging such accesses. >>>> >> >> > >>>> >> >> > TurboSPARC Microprocessor User's Manual shows that there are >>>> >> >> > additional pages after the main IOMMU for AFX registers. So >>>> this is >>>> >> >> > not board specific, but depends on CPU/IOMMU versions. >>>> >> >> >>>> >> >> >>>> >> >> I checked it on the real hw: on LX and SS-5 these are aliased MMU >>>> addresses. >>>> >> >> SS-20 doesn't have any aliasing. >>>> >> > >>>> >> > But are your machines equipped with TurboSPARC or some other CPU? >>>> >> >>>> >> >>>> >> Good point, I must confess, I missed the word "Turbo" in your first >>>> >> answer. LX and SS-20 don't. >>>> >> But SS-5 must have a TurboSPARC CPU: >>>> >> >>>> >> ok cd /FMI,MB86904 >>>> >> ok .attributes >>>> >> context-table 00 00 00 00 03 ff f0 00 00 00 10 00 >>>> >> psr-implementation 00000000 >>>> >> psr-version 00000004 >>>> >> implementation 00000000 >>>> >> version 00000004 >>>> >> cache-line-size 00000020 >>>> >> cache-nlines 00000200 >>>> >> page-size 00001000 >>>> >> dcache-line-size 00000010 >>>> >> dcache-nlines 00000200 >>>> >> dcache-associativity 00000001 >>>> >> icache-line-size 00000020 >>>> >> icache-nlines 00000200 >>>> >> icache-associativity 00000001 >>>> >> ncaches 00000002 >>>> >> mmu-nctx 00000100 >>>> >> sparc-version 00000008 >>>> >> mask_rev 00000026 >>>> >> device_type cpu >>>> >> name FMI,MB86904 >>>> >> >>>> >> and still it behaves the same as TI,TMS390S10 from the LX. This is >>>> done on SS-5: >>>> >> >>>> >> ok 10000000 20 spacel@ . >>>> >> 4000009 >>>> >> ok 14000000 20 spacel@ . >>>> >> 4000009 >>>> >> ok 14000004 20 spacel@ . >>>> >> 23000 >>>> >> ok 1f000004 20 spacel@ . >>>> >> 23000 >>>> >> ok 10000008 20 spacel@ . >>>> >> 4000009 >>>> >> ok 14000028 20 spacel@ . >>>> >> 4000009 >>>> >> ok 1000000c 20 spacel@ . >>>> >> 23000 >>>> >> ok 10000010 20 spacel@ . >>>> >> 4000009 >>>> >> >>>> >> >>>> >> LX is the same except for the IOMMU-version: >>>> >> >>>> >> ok 10000000 20 spacel@ . >>>> >> 4000005 >>>> >> ok 14000000 20 spacel@ . >>>> >> 4000005 >>>> >> ok 18000000 20 spacel@ . >>>> >> 4000005 >>>> >> ok 1f000000 20 spacel@ . >>>> >> 4000005 >>>> >> ok 1ff00000 20 spacel@ . >>>> >> 4000005 >>>> >> ok 1fff0004 20 spacel@ . >>>> >> 1fe000 >>>> >> ok 10000004 20 spacel@ . >>>> >> 1fe000 >>>> >> ok 10000108 20 spacel@ . >>>> >> 41000005 >>>> >> ok 10000040 20 spacel@ . >>>> >> 41000005 >>>> >> ok 1fff0040 20 spacel@ . >>>> >> 41000005 >>>> >> ok 1fff0044 20 spacel@ . >>>> >> 1fe000 >>>> >> ok 1fff0024 20 spacel@ . >>>> >> 1fe000 >>>> >> >>>> >> >>>> >> >> At what address the additional AFX registers are located? >>>> >> > >>>> >> > Here's complete TurboSPARC IOMMU address map: >>>> >> > PA[30:0] Register Access >>>> >> > 1000_0000 IOMMU Control R/W >>>> >> > 1000_0004 IOMMU Base Address R/W >>>> >> > 1000_0014 Flush All IOTLB Entries W >>>> >> > 1000_0018 Address Flush W >>>> >> > 1000_1000 Asynchronous Fault Status R/W >>>> >> > 1000_1004 Asynchronous Fault Address R/W >>>> >> > 1000_1010 SBus Slot Configuration 0 R/W >>>> >> > 1000_1014 SBus Slot Configuration 1 R/W >>>> >> > 1000_1018 SBus Slot Configuration 2 R/W >>>> >> > 1000_101C SBus Slot Configuration 3 R/W >>>> >> > 1000_1020 SBus Slot Configuration 4 R/W >>>> >> > 1000_1050 Memory Fault Status R/W >>>> >> > 1000_1054 Memory Fault Address R/W >>>> >> > 1000_2000 Module Identification R/W >>>> >> > 1000_3018 Mask Identification R >>>> >> > 1000_4000 AFX Queue Level W >>>> >> > 1000_6000 AFX Queue Level R >>>> >> > 1000_7000 AFX Queue Status R >>>> >> >>>> >> >>>> >> >>>> >> But if I read it correctly 0x12fff294 (which makes SunOS crash with -m >>>> 32) is >>>> >> well above this limit. >>>> > >>>> > Oh, so I also misread something. You are not talking about the >>>> > adjacent pages, but 16MB increments. >>>> > >>>> > Earlier I sent a patch for a generic address alias device, would it be >>>> > useful for this? >>>> >>>> >>>> Should do as well. But I thought empty_slot is less overhead and >>>> easier to debug. >>>> >> >> Also the aliasing patch would require one more parameter: the size of >> area which has to be aliased. Except we implement stubs for all >> missing devices and and do aliasing of the connected port ranges. And >> then again, SS-20 doesn't have aliasing in this area at all. >> >> What do you think about this (empty_slot) solution (except that I >> missed the SoB line)? Meanwhile it's tested with SunOS 4.1.3U1 too. > > I'm slightly against it, of course it would help for this but I think > we may be missing a bigger problem. > >>>>> Maybe we have a general design problem, perhaps unassigned access >>>>> faults should only be triggered inside SBus slots and ignored >>>>> elsewhere. If this is true, generic Sparc32 unassigned access handler >>>>> should just ignore the access and special fault generating slots >>>>> should be installed for empty SBus address ranges. >> >> Agreed that they should be special for SBus, because SS-20 OBP is >> not happy with the fault we are currently generating. But otherwise I think >> qemu >> does it correct. On SS-5: >> >> ok f7ff0000 2f spacel@ . >> Data Access Error >> ok sfar@ . >> f7ff0000 >> ok 20000000 2f spacel@ . >> Data Access Error >> ok sfar@ . >> 20000000 >> ok 40000000 20 spacel@ . >> Data Access Error >> ok sfar@ . >> 40000000 >> >> Neither ff7ff0000 nor f20000000, nor 40000000 are in SBus range, right? > > 40000000 is on SS-5.
Ah. I was only aware of the control space. What ranges does SBus take? > So is the SBus Control Space in 0x10000000 to > 0x1fffffff the only area besides DRAM where the accesses won't trap? At least some area after ROM is aliased too. Also on SS-10 with a non-active frame buffer writing to SX registers makes no visible effect and reading from them produces no fault but a NMI. >>>> My impression was that SS-5 and SS-20 do unassigned accesses a bit >>>> differently. >>>> The current IOMMU implementation fits SS-20, which has no aliasing. >>> >>> It's probably rather the board design than just IOMMU. >> >> Agreed. That's why I bound the patch to machine hwdef and not to iommu. >> >>>> >> >> > One approach would be that IOMMU_NREGS would be increased to >>>> cover >>>> >> >> > these registers (with the bump in savevm version field) and >>>> >> >> > iommu_init1() should check the version field to see how much >>>> MMIO to >>>> >> >> > provide. >>>> >> >> >>>> >> >> >>>> >> >> The problem I see here is that we already have too much registers: >>>> we >>>> >> >> emulate SS-20 IOMMU (I guess), while SS-5 and LX seem to have only >>>> >> >> 0x20 registers which are aliased all the way. >>>> >> >> >>>> >> >> >>>> >> >> > But in order to avoid the savevm version change, iommu_init1() >>>> could >>>> >> >> > just install dummy MMIO (in the TurboSPARC case), if OBP does >>>> not care >>>> >> >> > if the read back data matches what has been written earlier. >>>> Because >>>> >> >> > from OBP point of view this is identical to what your patch >>>> results >>>> >> >> > in, I'd suppose this approach would also work. >>>> >> >> >>>> >> >> >>>> >> >> OBP doesn't seem to care about these addresses at all. It's only >>>> the "MUNIX" >>>> >> >> SunOS 4.1.4 kernel who does. The "MUNIX" kernel is the only >>>> kernel available >>>> >> >> during the installation, so it is currently not possible to >>>> install 4.1.4. >>>> >> >> Surprisingly "GENERIC" kernel which is on the disk after the >>>> >> >> installation doesn't >>>> >> >> try to access these address ranges either, so a disk image taken >>>> from a live >>>> >> >> system works. >>>> >> >> >>>> >> >> Actually access to the non-connected/aliased addresses may also >>>> be a >>>> >> >> consequence of phys_page_find bug I mentioned before. When I run >>>> >> >> install with -m 64 and -m 256 it tries to access different >>>> >> >> non-connected addresses. May also be a SunOS bug of course. 256m >>>> used >>>> >> >> to be a lot back then. >>>> >> > >>>> >> > Perhaps with 256MB, memory probing advances blindly from memory to >>>> >> > IOMMU registers. Proll (used before OpenBIOS) did that once, with >>>> bad >>>> >> > results :-). If this is true, 64M, 128M and 192M should show >>>> identical >>>> >> > results and only with close or equal to 256M the accesses happen. >>>> >> >>>> >> >>>> >> 32m: 0x12fff294 >>>> >> 64m: 0x14fff294 >>>> >> 192m:0x1cfff294 >>>> >> 256m:0x20fff294 >>>> >> >>>> >> Memory probing? It would be strange that OS would do it itself. The OS >>>> >> could just >>>> >> ask OBP how much does it have. Here is the listing where it happens: >>>> >> >>>> >> _swift_vac_rgnflush: rd %psr, %g2 >>>> >> _swift_vac_rgnflush+4: andn %g2, 0x20, %g5 >>>> >> _swift_vac_rgnflush+8: mov %g5, %psr >>>> >> _swift_vac_rgnflush+0xc: nop >>>> >> _swift_vac_rgnflush+0x10: nop >>>> >> _swift_vac_rgnflush+0x14: mov 0x100, %g5 >>>> >> _swift_vac_rgnflush+0x18: lda [%g5] 0x4, %g5 >>>> >> _swift_vac_rgnflush+0x1c: sll %o2, 0x2, %g1 >>>> >> _swift_vac_rgnflush+0x20: sll %g5, 0x4, %g5 >>>> >> _swift_vac_rgnflush+0x24: add %g5, %g1, %g5 >>>> >> _swift_vac_rgnflush+0x28: lda [%g5] 0x20, %g5 >>>> >> >>>> >> _swift_vac_rgnflush+0x28: is the fatal one. >>>> >> >>>> >> kadb> $c >>>> >> _swift_vac_rgnflush(?) >>>> >> _vac_rgnflush() + 4 >>>> >> _hat_setup_kas(0xc00,0xf0447000,0x43a000,0x400,0xf043a000,0x3c0) + 70 >>>> >> _startup(0xfe000000,0x10000000,0xfa000000,0xf00e2bfc,0x10,0xdbc00) + >>>> 1414 >>>> >> _main(0xf00e0fb4,0xf0007810,0x293ff49f,0xa805209c,0x200,0xf00d1d18) + >>>> 14 >>>> >> >>>> >> Unfortunately (but not surprisingly) kadb doesn't allow debugging >>>> >> cache-flush code, so I can't check what is in >>>> >> [%g5] (aka sfar) on the real machine when this happens. >>>> > >>>> > Linux code for Swift/TurboSPARC VAC flush should be similar. >> >> Do you have an idea why would anyone try reading a value referenced in sfar? >> Especially during flushing? I can't imagine a case where it wouldn't >> produce a fault. > > No idea, the fault should be inevitable. An explanation how VAC > (Virtually Addressed Cache?) works could help. Is it available somewhere? An explanation how PAC works is interesting too, cause when emulating SS-20, Solaris boots hangs where it normally says that PAC is initialized. >>>> >> But the bug in phys_page_find would explain this accesses: sfar gets >>>> >> the wrong address, and then the secondary access happens on this wrong >>>> >> address instead of the original one. >>>> > >>>> > I doubt phys_page_find can be buggy, it is so vital for all >>>> architecture. >>>> >>>> >>>> But you've seen the example of buggy behaviour I posted last Friday, right? >>>> If it's not phys_page_find, it's either cpu_physical_memory_rw (which >>>> is also pretty generic), or >>>> the way SS-20 registers devices. Can it be that all the pages must be >>>> registered in the proper order? >>> >>> How about unassigned access handler, could it be suspected? >> >> Doesn't look like it: it gets a physical address as a parameter. How >> would it know the address is wrong? > > It wouldn't, but IIRC Paul claimed earlier that the unassigned memory > handling in QEMU could have problems. But I thought Paul also fixed the problems? There was a patch from him. >>>> I think it's a pretty rare use case where you have a memory fault (not >>>> a translation fault) on an unknown address. You may have such fault >>>> during device probing, but in such case you know what address you are >>>> probing, so you don't care about the sync fault address register. >>>> >>>> Besides, do all architectures have sync fault address register? >>> >>> No, I think system level checks like that and IOMMU-like controls on >>> most architectures are very poor compared to Sparc32. Server and >>> mainframe systems may be a bit better. >> >> And do we have any mainframe emulated good enough to have a user base >> and hence bug reports? > > The only IOMMU implemented is Sparc32 one so far. I don't know about > S390x architecture, that should definitely be mainframe class. AMD > IOMMU may be in QEMU one day. > > About bugs, IIRC NetBSD 3.x crash could be related to IOMMU. What does indicate it? It happens where the disk sizes are normally reported, so it could be a scsi/dma/irq/fpu issue as well. >>>> >> fwiw the routine is called only once on the real hardware. It sort of >>>> >> speaks for your hypothesis about the memory probing. Although it may >>>> >> not necessarily probe for memory... >>>> >> -- Regards, Artyom Tarasenko solaris/sparc under qemu blog: http://tyom.blogspot.com/