On Mon, Jul 13, 2020 at 10:39 AM Philippe Mathieu-Daudé <f4...@amsat.org> wrote: > > On 7/12/20 7:42 AM, Havard Skinnemoen wrote: > > On Thu, Jul 9, 2020 at 10:00 AM Philippe Mathieu-Daudé <f4...@amsat.org> > > wrote: > >> On 7/9/20 2:36 AM, Havard Skinnemoen wrote: > >>> This implements a device model for the NPCM7xx SPI flash controller. > >>> > >>> Direct reads and writes, and user-mode transactions have been tested in > >>> various modes. Protection features are not implemented yet. > >>> > >>> All the FIU instances are available in the SoC's address space, > >>> regardless of whether or not they're connected to actual flash chips. > >>> > >>> Reviewed-by: Tyrone Ting <kft...@nuvoton.com> > >>> Reviewed-by: Cédric Le Goater <c...@kaod.org> > >>> Signed-off-by: Havard Skinnemoen <hskinnem...@google.com> > >>> --- > >>> include/hw/arm/npcm7xx.h | 2 + > >>> include/hw/ssi/npcm7xx_fiu.h | 100 +++++++ > >>> hw/arm/npcm7xx.c | 53 ++++ > >>> hw/ssi/npcm7xx_fiu.c | 510 +++++++++++++++++++++++++++++++++++ > >>> hw/arm/Kconfig | 1 + > >>> hw/ssi/Makefile.objs | 1 + > >>> hw/ssi/trace-events | 9 + > >>> 7 files changed, 676 insertions(+) > >>> create mode 100644 include/hw/ssi/npcm7xx_fiu.h > >>> create mode 100644 hw/ssi/npcm7xx_fiu.c > [...] > > >>> diff --git a/hw/arm/npcm7xx.c b/hw/arm/npcm7xx.c > >>> index 4d227bb74b..c9ff3dab25 100644 > >>> --- a/hw/arm/npcm7xx.c > >>> +++ b/hw/arm/npcm7xx.c > >>> @@ -98,6 +98,37 @@ static const hwaddr npcm7xx_uart_addr[] = { > >>> 0xf0004000, > >>> }; > >>> > >>> +static const hwaddr npcm7xx_fiu0_flash_addr[] = { > >> > >> So per > >> https://github.com/Nuvoton-Israel/bootblock/blob/master/SWC_HAL/Chips/npcm750/npcm750.h > >> this is SPI0 on AHB18, > >> > >>> + 0x80000000, > >>> + 0x88000000, > >> > >> CS0 & CS1, > >> > >> also listed: > >> > >> 0x90000000, // CS2 > >> 0x98000000, // CS3 > > > > Confirmed with Nuvoton off-list that these do not exist. SPI0 only > > supports two chip-selects for direct access. > > I suppose Novoton confirmed for the particular npcm750, but you aim > to model the npcm7xx family. I doubt 2 similar IP blocks are that > different ;) Anyway with a comment this is good.
OK. I'm mostly concerned about modeling the chips I know about for now. If a chip with four AXI endpoints and four chipselects on SPI0 appears, it will be a fairly small change to move these arrays into the SoC class struct, but I'd rather not do that without evidence that such a chip exist. The IP blocks may be identical for all I know, but they are not wired up identically, and that's really what matters here in the SoC-level model. > > > > I'll add comments. > > > >>> +}; > >>> + > >>> +static const hwaddr npcm7xx_fiu3_flash_addr[] = { > >> > >> Ditto SPI3 on AHB3, and CS0 to CS3. > >> > >>> + 0xa0000000, > >>> + 0xa8000000, > >>> + 0xb0000000, > >>> + 0xb8000000, > >>> +}; > >>> + > >>> +static const struct { > >>> + const char *name; > >>> + hwaddr regs_addr; > >>> + int cs_count; > >>> + const hwaddr *flash_addr; > >>> +} npcm7xx_fiu[] = { > >>> + { > >>> + .name = "fiu0", > >>> + .regs_addr = 0xfb000000, > >>> + .cs_count = ARRAY_SIZE(npcm7xx_fiu0_flash_addr), > >> > >> Hmm without the datasheet, can't tell, but I'd expect 4 CS > >> regardless. > > > > FIU0/SPI0 only has 2 chip selects. > > > >>> + .flash_addr = npcm7xx_fiu0_flash_addr, > >>> + }, { > >>> + .name = "fiu3", > >>> + .regs_addr = 0xc0000000, > >>> + .cs_count = ARRAY_SIZE(npcm7xx_fiu3_flash_addr), > >>> + .flash_addr = npcm7xx_fiu3_flash_addr, > >>> + }, > >>> +}; > [...] > > >>> + > >>> + /* Flash chip model expects one transfer per dummy bit, not byte */ > >>> + dummy_cycles = > >>> + (FIU_DRD_CFG_DBW(drd_cfg) * 8) >> FIU_DRD_CFG_ACCTYPE(drd_cfg); > >>> + for (i = 0; i < dummy_cycles; i++) { > >>> + ssi_transfer(fiu->spi, 0); > >> > >> Note to self, might worth a ssi_shift_dummy(count) generic method. > > > > I'm not a huge fan of this interface to be honest. It requires the > > flash controller to have intimate knowledge of the flash chip, and if > > it doesn't predict the dummy phase correctly, or the guest programs > > the wrong number of dummy cycles, the end result is very different > > from what you'll see on a real system. I'd love to see something like > > a number-of-bits parameter to ssi_transfer instead. > > Do you mean like these? > > - ssi_transfer_bit(bool value); > - ssi_shift_dummy_bits(size_t bits); > > Some interfaces allow bit shifting. SPI doesn't simply because > nobody had the use :) I mean I don't like how the semantics of ssi_transfer changes implicitly in the middle of the transfer. One call shifts a whole byte, the next 8 shift individual bits, then it's back to bytes again. If a mistake is made, the flash controller might end up thinking it's shifting 16 bits, but if the flash device only expects 8 dummy bits, it will see 8 bits + 8 bytes for a total of 9 bytes of dummies. This differs a lot from what would happen on a real device. For example, I wrote up a test for the aspeed flash controller against a Winbond flash chip in Dual I/O mode. The data ended up getting shifted by 15 bytes because of, I think, bugs in the dummy logic on both sides. I took a lot of head scratching to figure out what's going on. Here's where aspeed_smc figures it needs to send 2 dummy bytes (aka 16 dummy bits) for a DIO read: https://elixir.bootlin.com/qemu/latest/source/hw/ssi/aspeed_smc.c#L803 And here's where the Winbond model decides to expect one dummy _bit_: https://elixir.bootlin.com/qemu/latest/source/hw/block/m25p80.c#L862 The difference is 15 ssi_transfers, which is what I see in the test. IMO they're both wrong -- they should both expect one dummy byte, or four dummy cycles in x2 mode. But the real problem, also IMO, is that they can't even agree on whether one ssi_transfer means one bit or one byte. While ssi_shift_dummy_bits is strictly nicer than open-coded loops of ssi_transfer, it doesn't fix the underlying problem that the dummy cycles need to be treated specially by both the flash controller model and the device model, and if they're not 100% synchronized (which is particularly interesting when you have flash chips with configurable dummy cycles), the result is like nothing you'd ever see on a real system. > > > >>> + } > >>> + > >>> + for (i = 0; i < size; i++) { > >>> + value |= ssi_transfer(fiu->spi, 0) << (8 * i); > >>> + } > >>> + > [...] > > >>> +static const MemoryRegionOps npcm7xx_fiu_flash_ops = { > >>> + .read = npcm7xx_fiu_flash_read, > >>> + .write = npcm7xx_fiu_flash_write, > >>> + .endianness = DEVICE_LITTLE_ENDIAN, > >>> + .valid = { > >>> + .min_access_size = 1, > >>> + .max_access_size = 8, > >> > >> Are you sure? Maybe, I can' tell. > > > > Real hardware supports 16 bytes, but there's no way to do more than 8 > > in emulation, I think? > > That would mean you can plug this device on a 128-bit wide bus, > and you can transfer 128-bit in a single CPU operation. Hmm, I see what you mean. It can do 16-byte bursts. I'm not sure if that looks more like a single 16-byte access or four 4-byte accesses on the SPI bus. I was assuming it's the former, but I could be wrong. Havard