> This whole thread also ties in with my posts about mmiowb (which IMO > should go away). > > readl/writel: strongly ordered wrt one another and other stores > to cacheable RAM, byteswapping > __readl/__writel: not ordered (needs mb/rmb/wmb to order with > other readl/writel and cacheable operations, or > io_*mb to order with one another) > raw_readl/raw_writel: strongly ordered, no byteswapping > __raw_readl/__raw_writel: not ordered, no byteswapping > > then get rid of *relaxed* variants.
In addition, some archs like powerpc also provide readl_be/writel_be as being defined as big endian (ie. byteswap on LE archs, no byteswap on BE archs). As of today, powerpc lacks the raw_readl/raw_writel and __readl/__writel variants (ie, we only provide fully ordered + byteswap and no ordering + no byteswap variants). If we agree on the above semantics, I'll do a patch providing the missing ones. > Linus: on x86, memory operations to wc and wc+ memory are not ordered > with one another, or operations to other memory types (ie. load/load > and store/store reordering is allowed). Also, as you know, store/load > reordering is explicitly allowed as well, which covers all memory > types. So perhaps it is not quite true to say readl/writel is strongly > ordered by default even on x86. You would have to put in some > mfence instructions in them to make it so. > > So, what *exact* definition are you going to mandate for readl/writel? > Anything less than strict ordering then we also need to ensure drivers > use the correct barriers (to implement strict ordering, we could either > put mfence instructions in, or explicitly disallow readl/writel to be > used on wc/wc+ memory). The ordering guarantees that I provide on powerpc for "ordered" variants are: - cacheable store + writel stays ordered (ie, write to some DMA stuff and then a register to trigger the DMA). - readl + cacheable read stays ordered (ie. read some status register, for example, after an interrupt, and then read the resulting data in memory). - any of these ordered vs. spin_lock and spin_unlock (with the exception that stores done before the spin_lock could potentially leak into the lock). - readl is synchronous (ie, makes the CPU think the data was actually used before executing subsequent instructions, thus waits for the data to come back, for example to ensure that a read used to push out post buffers followed by a delay will indeed happen with the right delay). We don't provide meaningless ones like writel + cacheable store for example. (PCI posting would defeat it anyway). > The other way we can go is just say that they have x86 semantics, > although that would be a bit sad IMO: we should have strong ops, in > which case driver writers never need to use a single barrier provided > they have locking right, and weak ops, in which case they should match > up with the weak Linux memory ordering model for system RAM. Ben. _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev