On 9/15/2017 5:49 PM, Catalin Marinas wrote: > On Thu, Sep 14, 2017 at 07:07:50PM +0000, Roy Pledge wrote: >> On 9/14/2017 10:00 AM, Catalin Marinas wrote: >>> On Thu, Aug 24, 2017 at 04:37:51PM -0400, Roy Pledge wrote: >>>> @@ -123,23 +122,34 @@ static int bman_portal_probe(struct platform_device >>>> *pdev) >>>> } >>>> pcfg->irq = irq; >>>> >>>> - va = ioremap_prot(addr_phys[0]->start, resource_size(addr_phys[0]), 0); >>>> - if (!va) { >>>> - dev_err(dev, "ioremap::CE failed\n"); >>>> + /* >>>> + * TODO: Ultimately we would like to use a cacheable/non-shareable >>>> + * (coherent) mapping for the portal on both architectures but that >>>> + * isn't currently available in the kernel. Because of HW differences >>>> + * PPC needs to be mapped cacheable while ARM SoCs will work with non >>>> + * cacheable mappings >>>> + */ >>> >>> This comment mentions "cacheable/non-shareable (coherent)". Was this >>> meant for ARM platforms? Because non-shareable is not coherent, nor is >>> this combination guaranteed to work with different CPUs and >>> interconnects. >> >> My wording is poor I should have been clearer that non-shareable == >> non-coherent. I will fix this. >> >> We do understand that cacheable/non shareable isn't supported on all >> CPU/interconnect combinations but we have verified with ARM that for the >> CPU/interconnects we have integrated QBMan on our use is OK. The note is >> here to try to explain why the mapping is different right now. Once we >> get the basic QBMan support integrated for ARM we do plan to try to have >> patches integrated that enable the cacheable mapping as it gives a >> significant performance boost. > > I will definitely not ack those patches (at least not in the form I've > seen, assuming certain eviction order of the bytes in a cacheline). The > reason is that it is incredibly fragile, highly dependent on the CPU > microarchitecture and interconnects. Assuming that you ever only have a > single SoC with this device, you may get away with #ifdefs in the > driver. But if you support two or more SoCs with different behaviours, > you'd have to make run-time decisions in the driver or run-time code > patching. We are very keen on single kernel binary image/drivers and > architecturally compliant code (the cacheable mapping hacks are well > outside the architecture behaviour). >
Let's put this particular point on hold for now, I would like to focus on getting the basic functions merged in ASAP. I removed the comment in question (it sort of happened naturally when I applied your other comments) in the next revision of the patchset. I have submitted the patches to our automated test system for sanity checking and I will sent a new patchset once I get the results. Thanks again for your comments - they have been very useful and have improved the quality of the code for sure. >>>> diff --git a/drivers/soc/fsl/qbman/dpaa_sys.h >>>> b/drivers/soc/fsl/qbman/dpaa_sys.h >>>> index 81a9a5e..0a1d573 100644 >>>> --- a/drivers/soc/fsl/qbman/dpaa_sys.h >>>> +++ b/drivers/soc/fsl/qbman/dpaa_sys.h >>>> @@ -51,12 +51,12 @@ >>>> >>>> static inline void dpaa_flush(void *p) >>>> { >>>> + /* >>>> + * Only PPC needs to flush the cache currently - on ARM the mapping >>>> + * is non cacheable >>>> + */ >>>> #ifdef CONFIG_PPC >>>> flush_dcache_range((unsigned long)p, (unsigned long)p+64); >>>> -#elif defined(CONFIG_ARM) >>>> - __cpuc_flush_dcache_area(p, 64); >>>> -#elif defined(CONFIG_ARM64) >>>> - __flush_dcache_area(p, 64); >>>> #endif >>>> } >>> >>> Dropping the private API cache maintenance is fine and the memory is WC >>> now for ARM (mapping to Normal NonCacheable). However, do you require >>> any barriers here? Normal NC doesn't guarantee any ordering. >> >> The barrier is done in the code where the command is formed. We follow >> this pattern >> a) Zero the command cache line (the device never reacts to a 0 command >> verb so a cast out of this will have no effect) >> b) Fill in everything in the command except the command verb (byte 0) >> c) Execute a memory barrier >> d) Set the command verb (byte 0) >> e) Flush the command >> If a castout happens between d) and e) doesn't matter since it was about >> to be flushed anyway . Any castout before d) will not cause HW to >> process the command because verb is still 0. The barrier at c) prevents >> reordering so the HW cannot see the verb set before the command is formed. > > I think that's fine, the dpaa_flush() can be a no-op with non-cacheable > memory (I had forgotten the details). >