Re: [PATCH 2/8] powerpc: remove CONFIG_PCI_QSPAN

2018-10-17 Thread Benjamin Herrenschmidt
On Wed, 2018-10-17 at 10:01 +0200, Christoph Hellwig wrote: > This option isn't actually used anywhere. Oh my, that's ancient. Probably didn't make the cut from arch/ppc to arch/powerpc > Signed-off-by: Christoph Hellwig Acked-by: Benjamin Herrenschmidt > --- >

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Benjamin Herrenschmidt
On Tue, 2017-04-18 at 16:24 -0600, Jason Gunthorpe wrote: > Basically, all this list processing is a huge overhead compared to > just putting a helper call in the existing sg iteration loop of the > actual op.  Particularly if the actual op is a no-op like no-mmu x86 > would use. Yes, I'm leaning

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Benjamin Herrenschmidt
On Tue, 2017-04-18 at 17:21 -0600, Jason Gunthorpe wrote: > Splitting the sgl is different from iommu batching. > > As an example, an O_DIRECT write of 1 MB with a single 4K P2P page in > the middle. > > The optimum behavior is to allocate a 1MB-4K iommu range and fill it > with the CPU memory. T

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Benjamin Herrenschmidt
On Tue, 2017-04-18 at 15:22 -0600, Jason Gunthorpe wrote: > On Tue, Apr 18, 2017 at 02:11:33PM -0700, Dan Williams wrote: > > > I think this opens an even bigger can of worms.. > > > > No, I don't think it does. You'd only shim when the target page is > > backed by a device, not host memory, and y

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Benjamin Herrenschmidt
On Tue, 2017-04-18 at 15:03 -0600, Jason Gunthorpe wrote: > I don't follow, when does get_dma_ops() return a p2p aware provider? > It has no way to know if the DMA is going to involve p2p, get_dma_ops > is called with the device initiating the DMA. > > So you'd always return the P2P shim on a syst

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Benjamin Herrenschmidt
On Tue, 2017-04-18 at 14:48 -0600, Logan Gunthorpe wrote: > > ...and that dma_map goes through get_dma_ops(), so I don't see the conflict? > > The main conflict is in dma_map_sg which only does get_dma_ops once but > the sg may contain memory of different types. We can handle that in our "overrid

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Benjamin Herrenschmidt
On Tue, 2017-04-18 at 12:00 -0600, Jason Gunthorpe wrote: > - All platforms can succeed if the PCI devices are under the same >   'segment', but where segments begin is somewhat platform specific >   knowledge. (this is 'same switch' idea Logan has talked about) We also need to be careful whether

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Benjamin Herrenschmidt
On Tue, 2017-04-18 at 10:27 -0700, Dan Williams wrote: > > FWIW, RDMA probably wouldn't want to use a p2mem device either, we > > already have APIs that map BAR memory to user space, and would like to > > keep using them. A 'enable P2P for bar' helper function sounds better > > to me. > > ...and I

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-17 Thread Benjamin Herrenschmidt
On Mon, 2017-04-17 at 23:43 -0600, Logan Gunthorpe wrote: > > On 17/04/17 03:11 PM, Benjamin Herrenschmidt wrote: > > Is it ? Again, you create a "concept" the user may have no idea about, > > "p2pmem memory". So now any kind of memory buffer on a device

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-17 Thread Benjamin Herrenschmidt
On Mon, 2017-04-17 at 10:52 -0600, Logan Gunthorpe wrote: > > On 17/04/17 01:20 AM, Benjamin Herrenschmidt wrote: > > But is it ? For example take a GPU, does it, in your scheme, need an > > additional "p2pmem" child ? Why can't the GPU driver just use some > &

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-17 Thread Benjamin Herrenschmidt
On Sun, 2017-04-16 at 23:13 -0600, Logan Gunthorpe wrote: > > > > > I'm still not 100% why do you need a "p2mem device" mind you ... > > Well, you don't "need" it but it is a design choice that I think makes a > lot of sense for the following reasons: > > 1) p2pmem is in fact a device on the pc

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-16 Thread Benjamin Herrenschmidt
On Sun, 2017-04-16 at 10:34 -0600, Logan Gunthorpe wrote: > > On 16/04/17 09:53 AM, Dan Williams wrote: > > ZONE_DEVICE allows you to redirect via get_dev_pagemap() to retrieve > > context about the physical address in question. I'm thinking you can > > hang bus address translation data off of tha

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-16 Thread Benjamin Herrenschmidt
On Sun, 2017-04-16 at 10:47 -0600, Logan Gunthorpe wrote: > > I think you need to give other archs a chance to support this with a > > design that considers the offset case as a first class citizen rather > > than an afterthought. > > I'll consider this. Given the fact I can use your existing > ge

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-16 Thread Benjamin Herrenschmidt
On Sun, 2017-04-16 at 08:53 -0700, Dan Williams wrote: > > Just thinking out loud ... I don't have a firm idea or a design. But > > peer to peer is definitely a problem we need to tackle generically, the > > demand for it keeps coming up. > > ZONE_DEVICE allows you to redirect via get_dev_pagemap(

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-16 Thread Benjamin Herrenschmidt
On Sun, 2017-04-16 at 08:44 -0700, Dan Williams wrote: > The difference is that there was nothing fundamental in the core > design of pmem + DAX that prevented other archs from growing pmem > support. Indeed. In fact we have work in progress support for pmem on power using experimental HW. > THP

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-15 Thread Benjamin Herrenschmidt
On Sat, 2017-04-15 at 15:09 -0700, Dan Williams wrote: > I'm wondering, since this is limited to support behind a single > switch, if you could have a software-iommu hanging off that switch > device object that knows how to catch and translate the non-zero > offset bus address case. We have somethi

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-15 Thread Benjamin Herrenschmidt
On Sat, 2017-04-15 at 11:41 -0600, Logan Gunthorpe wrote: > Thanks, Benjamin, for the summary of some of the issues. > > On 14/04/17 04:07 PM, Benjamin Herrenschmidt wrote > > So I assume the p2p code provides a way to address that too via special > > dma_ops ? Or wrappers ?

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-14 Thread Benjamin Herrenschmidt
On Fri, 2017-04-14 at 14:04 -0500, Bjorn Helgaas wrote: > I'm a little hesitant about excluding offset support, so I'd like to > hear more about this. > > Is the issue related to PCI BARs that are not completely addressable > by the CPU?  If so, that sounds like a first-class issue that should > b

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-14 Thread Benjamin Herrenschmidt
On Thu, 2017-04-13 at 22:40 -0600, Logan Gunthorpe wrote: > > On 13/04/17 10:16 PM, Jason Gunthorpe wrote: > > I'd suggest just detecting if there is any translation in bus > > addresses anywhere and just hard disabling P2P on such systems. > > That's a fantastic suggestion. It simplifies things

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-14 Thread Benjamin Herrenschmidt
On Fri, 2017-04-14 at 21:37 +1000, Benjamin Herrenschmidt wrote: > On Thu, 2017-04-13 at 22:40 -0600, Logan Gunthorpe wrote: > > > > On 13/04/17 10:16 PM, Jason Gunthorpe wrote: > > > I'd suggest just detecting if there is any translation in bus > > > addr

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-14 Thread Benjamin Herrenschmidt
On Thu, 2017-04-13 at 22:16 -0600, Jason Gunthorpe wrote: > > Any caller of pci_add_resource_offset() uses CPU addresses different from > > the PCI bus addresses (unless the offset is zero, of course).  All ACPI > > platforms also support this translation (see "translation_offset"), though > > in m

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-13 Thread Benjamin Herrenschmidt
On Thu, 2017-04-13 at 15:22 -0600, Logan Gunthorpe wrote: > > On 12/04/17 03:55 PM, Benjamin Herrenschmidt wrote: > > Look at pcibios_resource_to_bus() and pcibios_bus_to_resource(). They > > will perform the conversion between the struct resource content (CPU > > physical

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-12 Thread Benjamin Herrenschmidt
On Wed, 2017-04-12 at 11:09 -0600, Logan Gunthorpe wrote: > > > Do you handle funky address translation too ? IE. the fact that the PCI > > addresses aren't the same as the CPU physical addresses for a BAR ? > > No, we use the CPU physical address of the BAR. If it's not mapped that > way we can'

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-11 Thread Benjamin Herrenschmidt
On Thu, 2017-03-30 at 16:12 -0600, Logan Gunthorpe wrote: > Hello, > > As discussed at LSF/MM we'd like to present our work to enable > copy offload support in NVMe fabrics RDMA targets. We'd appreciate > some review and feedback from the community on our direction. > This series is not intended t

Re: [PATCH] scsi: lpfc: Add shutdown method for kexec

2017-03-06 Thread Benjamin Herrenschmidt
On Mon, 2017-03-06 at 22:46 -0500, Martin K. Petersen wrote: > > > > > > "Mauricio" == Mauricio Faria de Oliveira > > > > > et.ibm.com> writes: > > Mauricio> On 02/12/2017 07:49 PM, Anton Blanchard wrote: > > > We see lpfc devices regularly fail during kexec. Fix this by > > > adding a > > > shut

Re: [PATCH] scsi: lpfc: Add shutdown method for kexec

2017-02-13 Thread Benjamin Herrenschmidt
On Tue, 2017-02-14 at 15:45 +1300, Eric W. Biederman wrote: > The only difference ever that should exist between shutdown and remove > is do you clean up kernel data structures.  The shutdown method is > allowed to skip the cleanup up kernel data structures that the remove > method needs to make. >

Re: [PATCH] scsi: lpfc: Add shutdown method for kexec

2017-02-13 Thread Benjamin Herrenschmidt
On Mon, 2017-02-13 at 15:57 -0600, Brian King wrote: > If we do transition to use remove rather than shutdown, I think we > want > some way for a device driver to know whether we are doing kexec or > not. > A RAID adapter with a write cache is going to want to flush its write > cache on a PCI hotpl

Re: [PATCH] scsi: lpfc: Add shutdown method for kexec

2017-02-12 Thread Benjamin Herrenschmidt
On Mon, 2017-02-13 at 13:21 +1300, Eric W. Biederman wrote: > > Good point, at the very least we should call remove if shutdown doesn't > > exist. Eric: could we make the changes Ben suggests? > > Definitely.  That was the original design of the kexec interface > but people were worried about call

Re: [PATCH] scsi: lpfc: Add shutdown method for kexec

2017-02-12 Thread Benjamin Herrenschmidt
On Mon, 2017-02-13 at 08:49 +1100, Anton Blanchard wrote: > From: Anton Blanchard > > We see lpfc devices regularly fail during kexec. Fix this by adding > a shutdown method which mirrors the remove method. Or instead finally do what I've been advocating for years (and even sent patches for) whi

Re: [PATCH] ibmvscsi: add write memory barrier to CRQ processing

2016-12-09 Thread Benjamin Herrenschmidt
On Wed, 2016-12-07 at 17:31 -0600, Tyrel Datwyler wrote: > The first byte of each CRQ entry is used to indicate whether an entry is > a valid response or free for the VIOS to use. After processing a > response the driver sets the valid byte to zero to indicate the entry is > now free to be reused.

[PATCH] scsi/ipr: Fix runaway IRQs when falling back from MSI to LSI

2016-11-23 Thread Benjamin Herrenschmidt
Signed-off-by: Benjamin Herrenschmidt --- drivers/scsi/ipr.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c index 5324741..5dd3194 100644 --- a/drivers/scsi/ipr.c +++ b/drivers/scsi/ipr.c @@ -10213,6 +10213,7 @@ static int ipr_probe_ioa(struc

Re: [PATCH v4 2/3] cxlflash: Superpipe support

2015-08-10 Thread Benjamin Herrenschmidt
On Mon, 2015-08-10 at 12:09 -0500, Matthew R. Ochs wrote: > Add superpipe supporting infrastructure to device driver for the IBM CXL > Flash adapter. This patch allows userspace applications to take advantage > of the accelerated I/O features that this adapter provides and bypass the > traditional

Re: [PATCH v4 1/3] cxlflash: Base error recovery support

2015-08-10 Thread Benjamin Herrenschmidt
On Mon, 2015-08-10 at 12:09 -0500, Matthew R. Ochs wrote: > Introduce support for enhanced I/O error handling. > > Signed-off-by: Matthew R. Ochs > Signed-off-by: Manoj N. Kumar > --- So I'm not necessarily very qualified to review SCSI bits as I haven't done anything close to the Linux SCSI co

Re: Concerns about "mpt2sas: Added Reply Descriptor Post Queue (RDPQ) Array support"

2015-04-01 Thread Benjamin Herrenschmidt
On Thu, 2015-02-19 at 21:45 -0800, James Bottomley wrote: > Ben, this is legal by design. It was specifically designed for the > aic79xx SCSI card, but can be used for a variety of other reasons. The > aic79xx hardware problem was that the DMA engine could address the whole > of memory (it had t

Re: Concerns about "mpt2sas: Added Reply Descriptor Post Queue (RDPQ) Array support"

2015-02-19 Thread Benjamin Herrenschmidt
On Thu, 2015-02-19 at 21:45 -0800, James Bottomley wrote: > Ben, this is legal by design. It was specifically designed for the > aic79xx SCSI card, but can be used for a variety of other reasons. The > aic79xx hardware problem was that the DMA engine could address the whole > of memory (it had tw

Re: Concerns about "mpt2sas: Added Reply Descriptor Post Queue (RDPQ) Array support"

2015-02-19 Thread Benjamin Herrenschmidt
On Fri, 2015-02-20 at 16:22 +1100, Benjamin Herrenschmidt wrote: > Looking a bit more closely, you basically do > > - set_dma_mask(64-bit) > - set_consistent_dma_mask(32-bit) > > Now, I don't know how x86 will react to the conflicting masks, but on > ppc64, I'm

Re: Concerns about "mpt2sas: Added Reply Descriptor Post Queue (RDPQ) Array support"

2015-02-19 Thread Benjamin Herrenschmidt
On Fri, 2015-02-20 at 16:06 +1100, Benjamin Herrenschmidt wrote: > Note that even on powerpc platforms where it would work because we > maintain both 32-bit and 64-bit bypass windows in the device address > space simultaneously, you will leak iommu entries unless you also switch > ba

Re: Concerns about "mpt2sas: Added Reply Descriptor Post Queue (RDPQ) Array support"

2015-02-19 Thread Benjamin Herrenschmidt
On Fri, 2015-02-20 at 16:01 +1100, Benjamin Herrenschmidt wrote: > Hi Sreekanth ! > > While looking at some (unrelated) issue where mtp2sas seems to be using > 32-bit DMA instead of 64-bit DMA on some POWER platforms, I noticed this > patch whic

Concerns about "mpt2sas: Added Reply Descriptor Post Queue (RDPQ) Array support"

2015-02-19 Thread Benjamin Herrenschmidt
Hi Sreekanth ! While looking at some (unrelated) issue where mtp2sas seems to be using 32-bit DMA instead of 64-bit DMA on some POWER platforms, I noticed this patch which was merged as 5fb1bf8aaa832e1e9ca3198de7bbecb8eff7db9c. Can you confirm my understanding that you are: - Setting the DMA ma

Re: [RESEND][PATCH 1/2] lib/scatterlist: Make ARCH_HAS_SG_CHAIN an actual Kconfig

2014-03-23 Thread Benjamin Herrenschmidt
On Sun, 2014-03-23 at 00:03 -0700, Christoph Hellwig wrote: > On Sun, Mar 23, 2014 at 02:04:46PM +1100, Benjamin Herrenschmidt wrote: > > > > > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > > > index 1594945..8122294 100644 > > > --- a/arch/arm/

Re: [RESEND][PATCH 1/2] lib/scatterlist: Make ARCH_HAS_SG_CHAIN an actual Kconfig

2014-03-22 Thread Benjamin Herrenschmidt
just include asm-generic/scatterlist.h. > > Cc: Russell King > Cc: Tony Luck > Cc: Fenghua Yu For powerpc Acked-by: Benjamin Herrenschmidt > Cc: Paul Mackerras > Cc: Ingo Molnar > Cc: "H. Peter Anvin" > Cc: "James E.J. Bottomley" > Cc: Feng

Re: [RESEND][PATCH 1/2] lib/scatterlist: Make ARCH_HAS_SG_CHAIN an actual Kconfig

2014-03-22 Thread Benjamin Herrenschmidt
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > index 1594945..8122294 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -82,6 +82,7 @@ config ARM > . > > config ARM_HAS_SG_CHAIN > + select ARCH_HAS_SG_CHAIN > bool > Heh, a sel

Re: [PATCH RFC 00/77] Re-design MSI/MSI-X interrupts enablement pattern

2013-10-08 Thread Benjamin Herrenschmidt
On Tue, 2013-10-08 at 20:55 -0700, H. Peter Anvin wrote: > Why not add a minimum number to pci_enable_msix(), i.e.: > > pci_enable_msix(pdev, msix_entries, nvec, minvec) > > ... which means "nvec" is the number of interrupts *requested*, and > "minvec" is the minimum acceptable number (otherwise

Re: [PATCH RFC 00/77] Re-design MSI/MSI-X interrupts enablement pattern

2013-10-07 Thread Benjamin Herrenschmidt
On Mon, 2013-10-07 at 14:01 -0400, Tejun Heo wrote: > I don't think the same race condition would happen with the loop. The > problem case is where multiple msi(x) allocation fails completely > because the global limit went down before inquiry and allocation. In > the loop based interface, it'd r

Re: [PATCH RFC 00/77] Re-design MSI/MSI-X interrupts enablement pattern

2013-10-05 Thread Benjamin Herrenschmidt
On Sun, 2013-10-06 at 08:02 +0200, Alexander Gordeev wrote: > On Sun, Oct 06, 2013 at 08:46:26AM +1100, Benjamin Herrenschmidt wrote: > > On Sat, 2013-10-05 at 16:20 +0200, Alexander Gordeev wrote: > > > So my point is - drivers should first obtain a number of MSIs they *ca

Re: [PATCH RFC 00/77] Re-design MSI/MSI-X interrupts enablement pattern

2013-10-05 Thread Benjamin Herrenschmidt
On Sat, 2013-10-05 at 16:20 +0200, Alexander Gordeev wrote: > So my point is - drivers should first obtain a number of MSIs they *can* > get, then *derive* a number of MSIs the device is fine with and only then > request that number. Not terribly different from memory or any other type > of resourc

Re: [PATCH v2, part 1 3/9] PCI: Convert alloc_pci_dev(void) to pci_alloc_dev(bus) instead

2013-05-15 Thread Benjamin Herrenschmidt
On Wed, 2013-05-15 at 22:46 +0800, Liu Jiang wrote: >I don't know any OF exports, could you please help to CC > some OF experts? I wrote that code I think. Sorry, I've missed the beginning of the thread, what is the problem ? Cheers, Ben. -- To unsubscribe from this list: send the line

Re: [PATCH] scsi/ibmvscsi: add module alias for ibmvscsic

2012-07-30 Thread Benjamin Herrenschmidt
On Mon, 2012-07-30 at 21:06 +0200, Olaf Hering wrote: > > So while this would work, I do wonder however whether we could > instead > > fix it by simplifying the whole thing as follow since iSeries is now > > gone and so we don't need split backends anymore: > > > > scsi/ibmvscsi: Remove backend ab

Re: [PATCH] scsi/ibmvscsi: /sys/class/scsi_host/hostX/config doesn't show any information

2012-07-29 Thread Benjamin Herrenschmidt
ssing it 0x1 (64K which is our standard PAGE_SIZE) doesn't work and result in an empty config from the server. Signed-off-by: Benjamin Herrenschmidt CC: --- diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c index 3a6c474..337e8b3 100644 --- a/drivers/

Re: [PATCH] scsi/ibmvscsi: add module alias for ibmvscsic

2012-07-29 Thread Benjamin Herrenschmidt
on Now that the iSeries code is gone the backend abstraction in this driver is no longer necessary, which allows us to consolidate the driver in one file. The side effect is that the module name is now ibmvscsi.ko which matches the driver hotplug name and fixes auto-load issues. Signed-o

Re: [PATCH] scsi/ibmvscsi: /sys/class/scsi_host/hostX/config doesn't show any information

2012-07-29 Thread Benjamin Herrenschmidt
On Fri, 2012-07-27 at 07:56 +0100, James Bottomley wrote: > On Fri, 2012-07-27 at 15:19 +1000, Benjamin Herrenschmidt wrote: > > On Wed, 2012-07-18 at 18:49 +0200, o...@aepfle.de wrote: > > > From: Linda Xie > > > > James, can I assume you're picking up those

Re: [PATCH] scsi/ibmvscsi: /sys/class/scsi_host/hostX/config doesn't show any information

2012-07-26 Thread Benjamin Herrenschmidt
On Wed, 2012-07-18 at 18:49 +0200, o...@aepfle.de wrote: > From: Linda Xie James, can I assume you're picking up those two ? Cheers, Ben. > Expected result: > It should show something like this: > x1521p4:~ # cat /sys/class/scsi_host/host1/config > PARTITIONNAME='x1521p4' > NWSDNAME='X1521P4' >

Re: [PATCH 2/2] scsi: Use new __dma_buffer to align sense buffer in scsi_cmnd

2007-12-23 Thread Benjamin Herrenschmidt
> This has the potential of leaving a big fat ugly hole in the middle of > scsi_cmnd. I would suggest of *just* moving the sense_buffer array to be > the *first member* of struct scsi_cmnd. The command itself is already cache > aligned, allocated by the proper flags to it's slab. And put a fat co

Re: [PATCH 2/2] scsi: Use new __dma_buffer to align sense buffer in scsi_cmnd

2007-12-21 Thread Benjamin Herrenschmidt
On Fri, 2007-12-21 at 10:33 +, Alan Cox wrote: > On Fri, 21 Dec 2007 13:30:08 +1100 > Benjamin Herrenschmidt <[EMAIL PROTECTED]> wrote: > > > The sense buffer ins scsi_cmnd can nowadays be DMA'ed into directly > > by some low level drivers (that typically h

Re: [PATCH 2/2] scsi: Use new __dma_buffer to align sense buffer in scsi_cmnd

2007-12-21 Thread Benjamin Herrenschmidt
On Fri, 2007-12-21 at 06:16 -0700, Matthew Wilcox wrote: > On Fri, Dec 21, 2007 at 10:33:26AM +, Alan Cox wrote: > > On Fri, 21 Dec 2007 13:30:08 +1100 > > Benjamin Herrenschmidt <[EMAIL PROTECTED]> wrote: > > > > > The sense buffer ins scsi_cmnd can nowad

Re: [PATCH 1/2] DMA buffer alignment annotations

2007-12-21 Thread Benjamin Herrenschmidt
On Fri, 2007-12-21 at 09:39 +, Russell King wrote: > > +#ifndef ARCH_MIN_DMA_ALIGNMENT > > +#define __dma_aligned > > +#define __dma_buffer > > +#else > > +#define __dma_aligned > > __attribute__((aligned(ARCH_MIN_DMA_ALIGNMENT))) > > +#define __dma_buffer __dma_bu

[PATCH 1/2] DMA buffer alignment annotations

2007-12-20 Thread Benjamin Herrenschmidt
i command structure and can be DMA'ed to. On non-coherent platforms, this causes various corruptions as this cache line is shared with various other fields of the scsi_cmnd data structure. Signed-off-by: Benjamin Herrenschmidt <[EMAIL PROTECTED]> --- Documentation/DMA-mapping.txt | 32

[PATCH 2/2] scsi: Use new __dma_buffer to align sense buffer in scsi_cmnd

2007-12-20 Thread Benjamin Herrenschmidt
mbers, which leads to various forms of corruption. This uses the newly defined __dma_buffer annotation to enforce that on such platforms, the sense_buffer is contained within its own cache line. This has no effect on cache coherent architectures. Signed-off-by: Benjamin Herrenschmidt <[EMAIL P

Re: SCSI breakage on non-cache coherent architectures

2007-11-20 Thread Benjamin Herrenschmidt
On Tue, 2007-11-20 at 15:10 -0600, James Bottomley wrote: > We're talking about trying to fix this for 2.4; which is already at > -rc3 ... Is an entire arch change for dma alignment really a merge > candidate at this stage? Well, as I said before... it's a matter of what seems to be the less like

Re: SCSI breakage on non-cache coherent architectures

2007-11-19 Thread Benjamin Herrenschmidt
FYI, Here's what I have for the SCSI change. I haven't updated drivers to care for the new return code though, help appreciated with that as I don't know much about these drivers. Index: linux-work/drivers/scsi/scsi_error.c === --- li

Re: SCSI breakage on non-cache coherent architectures

2007-11-19 Thread Benjamin Herrenschmidt
On Mon, 2007-11-19 at 18:10 -0800, Roland Dreier wrote: > > I wrapped this ugliness up inside the macro back in what I posted in > 2002 (http://lkml.org/lkml/2002/6/12/234): > > #define __dma_buffer __dma_buffer_line(__LINE__) > #define __dma_buffer_line(line) __dma_buffer_expand_line(line) > #d

Re: SCSI breakage on non-cache coherent architectures

2007-11-19 Thread Benjamin Herrenschmidt
On Mon, 2007-11-19 at 16:46 -0800, David Miller wrote: > > 1) Require that entire buffers are commited by call sites, >and thus "embedding" DMA'd within non-DMA stuff isn't allowed > > 2) Add the __dma_cacheline_aligned tag. > > But note that with #2 it could get quite ugly because the > al

Re: SCSI breakage on non-cache coherent architectures

2007-11-19 Thread Benjamin Herrenschmidt
On Mon, 2007-11-19 at 14:31 -0800, David Miller wrote: > From: Benjamin Herrenschmidt <[EMAIL PROTECTED]> > Date: Tue, 20 Nov 2007 06:51:14 +1100 > > > On Mon, 2007-11-19 at 00:38 -0800, David Miller wrote: > > > From: Benjamin Herrenschmidt <[EMAIL PROTECTED]>

Re: SCSI breakage on non-cache coherent architectures

2007-11-19 Thread Benjamin Herrenschmidt
On Mon, 2007-11-19 at 13:43 -0800, Roland Dreier wrote: > > I've been debugging various issues on the PowerPC 44x embedded > > architecture which happens to have non-coherent PCI DMA. > > > > One of the problem I'm hitting is that one really need to enforce > > kmalloc alignement to cache lin

Re: SCSI breakage on non-cache coherent architectures

2007-11-19 Thread Benjamin Herrenschmidt
> I'd like to be rid of it inside the command for various reasons: every > command has one of these, and they're expensive in the allocation (at 96 > bytes). There's no reason we have to allocate and free that amount of > space with every command. In theory, the number of these is bounded at >

Re: SCSI breakage on non-cache coherent architectures

2007-11-19 Thread Benjamin Herrenschmidt
On Mon, 2007-11-19 at 09:09 -0600, James Bottomley wrote: > > What other drivers do is DMA to their own allocation and then memcpy to > > the sense buffer. > > > > There is a movement to allocate the sense data as its own sg list, but > > I don't think that patch has even been posted yet. > > I'

Re: SCSI breakage on non-cache coherent architectures

2007-11-19 Thread Benjamin Herrenschmidt
On Mon, 2007-11-19 at 05:32 -0700, Matthew Wilcox wrote: > On Mon, Nov 19, 2007 at 04:35:23PM +1100, Benjamin Herrenschmidt wrote: > > The other one I'm hitting now is that the SCSI layer nowadays embeds the > > 'nowadays'? It has always been so. Was

Re: SCSI breakage on non-cache coherent architectures

2007-11-19 Thread Benjamin Herrenschmidt
On Mon, 2007-11-19 at 00:38 -0800, David Miller wrote: > From: Benjamin Herrenschmidt <[EMAIL PROTECTED]> > Date: Mon, 19 Nov 2007 16:35:23 +1100 > > > I'm not sure what is the best way to fix that. Internally, I've done > > some test whacking some ca

SCSI breakage on non-cache coherent architectures

2007-11-18 Thread Benjamin Herrenschmidt
Hi James ! (Please CC me on replies as I'm not subscribed to linux-scsi) I've been debugging various issues on the PowerPC 44x embedded architecture which happens to have non-coherent PCI DMA. One of the problem I'm hitting is that one really need to enforce kmalloc alignement to cache lines or

Re: [patch 5/6] ps3: BD/DVD/CD-ROM Storage Driver

2007-07-16 Thread Benjamin Herrenschmidt
On Mon, 2007-07-16 at 17:03 -0500, James Bottomley wrote: > On Tue, 2007-07-17 at 07:49 +1000, Benjamin Herrenschmidt wrote: > > > No ... that was the point of flush_kernel_dcache_page(). The page in > > > question is page cache backed and contains user mappings. However,

Re: [patch 5/6] ps3: BD/DVD/CD-ROM Storage Driver

2007-07-16 Thread Benjamin Herrenschmidt
> No ... that was the point of flush_kernel_dcache_page(). The page in > question is page cache backed and contains user mappings. However, the > block layer has already done a flush_dcache_page() in get_user_pages() > and the user shouldn't be touching memory under I/O (unless they want > self

Re: [patch 5/6] ps3: BD/DVD/CD-ROM Storage Driver

2007-07-16 Thread Benjamin Herrenschmidt
On Mon, 2007-07-16 at 08:47 -0500, James Bottomley wrote: > > No ... that was the point of flush_kernel_dcache_page(). The page in > question is page cache backed and contains user mappings. However, > the > block layer has already done a flush_dcache_page() in get_user_pages() > and the user sh

Re: [patch 5/6] ps3: BD/DVD/CD-ROM Storage Driver

2007-07-16 Thread Benjamin Herrenschmidt
> Upon closer look, while flush_kernel_dcache_page() is a no-op on ppc64, > flush_dcache_page() isn't. So I'd prefer to not call it if not really needed. > > And according to James, flush_kernel_dcache_page() should be sufficient... > > So I'm getting puzzled again... flush_dcache_page() handle

Re: [patch 5/6] ps3: BD/DVD/CD-ROM Storage Driver

2007-07-13 Thread Benjamin Herrenschmidt
On Fri, 2007-07-13 at 16:19 +0200, Arnd Bergmann wrote: > I'm pretty sure that no ppc64 machine needs alias resolution in the kernel, > although some are VIPT. Last time we discussed this, Segher explained it > to me, but I don't remember which way Cell does it. IIRC, it automatically > flushes cac

Re: [patch 5/6] ps3: BD/DVD/CD-ROM Storage Driver

2007-07-13 Thread Benjamin Herrenschmidt
On Fri, 2007-07-13 at 09:02 -0400, James Bottomley wrote: > On Wed, 2007-07-04 at 15:22 +0200, Geert Uytterhoeven wrote: > > + kaddr = kmap_atomic(sgpnt->page, KM_USER0); > > + if (!kaddr) > > + return -1; > > +

Re: [patch 1/6] ps3: Preallocate bootmem memory for the PS3 FLASH ROM storage driver

2007-06-15 Thread Benjamin Herrenschmidt
On Fri, 2007-06-15 at 13:39 +0200, Geert Uytterhoeven wrote: > plain text document attachment (ps3-stable) > Preallocate 256 KiB of bootmem memory for the PS3 FLASH ROM storage driver. I still very much dislike the #ifdef xxx_MODULE in main kernel code. At the end of the day, is it realistic to e

Re: [patch 6/7] ps3: ROM Storage Driver

2007-05-30 Thread Benjamin Herrenschmidt
On Wed, 2007-05-30 at 12:13 +0200, Christoph Hellwig wrote: > > For any sane hypervisor or hardware the copy should be worth > than that. Then again a sane hardware or hypervisor would support > SG requests.. Agreed... Sony should fix that, it's a bit ridiculous. Ben. - To unsubscribe from th

Re: [patch 6/7] ps3: ROM Storage Driver

2007-05-29 Thread Benjamin Herrenschmidt
On Tue, 2007-05-29 at 13:11 +0200, Geert Uytterhoeven wrote: > > This looks very inefficient. Just set sg_tablesize of your driver > > to 1 to avoid getting mutiple segments. > > The disadvantage of setting sg_tablesize = 1 is that the driver will > get small > requests (PAGE_SIZE) most of the ti

Re: qla_wxyz pci_set_mwi question

2007-04-12 Thread Benjamin Herrenschmidt
> Willy was referring to this from include/asm-powerpc/pci.h: > > #ifdef CONFIG_PPC64 > > /* > * We want to avoid touching the cacheline size or MWI bit. > * pSeries firmware sets the cacheline size (which is not the cpu cacheline > * size in all cases) and hardware treats MWI the same as mem

Re: qla_wxyz pci_set_mwi question

2007-04-12 Thread Benjamin Herrenschmidt
On Thu, 2007-04-12 at 14:04 -0600, Matthew Wilcox wrote: > On Thu, Apr 12, 2007 at 12:37:13PM -0700, Andrew Vasquez wrote: > > On Thu, 12 Apr 2007, Matthew Wilcox wrote: > > > Why should it fail? If there's a platform which can't support a > > > cacheline size that the qla2xyz card can handle, it

Re: [PATCH 35/59] sysctl: C99 convert ctl_tables in arch/powerpc/kernel/idle.c

2007-01-16 Thread Benjamin Herrenschmidt
On Tue, 2007-01-16 at 09:39 -0700, Eric W. Biederman wrote: > From: Eric W. Biederman <[EMAIL PROTECTED]> - unquoted > > This was partially done already and there was no ABI breakage what > a relief. > > Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]> Ac

Re: [PATCH 36/59] sysctl: C99 convert ctl_tables entries in arch/ppc/kernel/ppc_htab.c

2007-01-16 Thread Benjamin Herrenschmidt
On Tue, 2007-01-16 at 09:39 -0700, Eric W. Biederman wrote: > From: Eric W. Biederman <[EMAIL PROTECTED]> - unquoted > > And make the mode of the kernel directory 0555 no one is allowed > to write to sysctl directories. > > Signed-off-by: Eric W. Biederman <[EMAIL PROT

Re: [PATCH 18/59] sysctl: ipmi remove unnecessary insert_at_head flag

2007-01-16 Thread Benjamin Herrenschmidt
On Tue, 2007-01-16 at 09:39 -0700, Eric W. Biederman wrote: > From: Eric W. Biederman <[EMAIL PROTECTED]> - unquoted > > With unique sysctl binary numbers setting insert_at_head is pointless. > > Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]> Acked-by: Ben

Re: [PATCH 2.6.15.4 rel.2 1/1] libata: add hotswap to sata_svw

2006-11-28 Thread Benjamin Herrenschmidt
On Tue, 2006-11-28 at 23:22 +, David Woodhouse wrote: > On Thu, 2006-02-16 at 16:09 +0100, Martin Devera wrote: > > From: Martin Devera <[EMAIL PROTECTED]> > > > > Add hotswap capability to Serverworks/BroadCom SATA controlers. The > > controler has SIM register and it selects which bits in SA

Re: iomapping a big endian area

2005-04-04 Thread Benjamin Herrenschmidt
On Mon, 2005-04-04 at 08:59 -0500, James Bottomley wrote: > On Mon, 2005-04-04 at 17:50 +1000, Benjamin Herrenschmidt wrote: > > I disagree. The driver will never "know" ... > > ? the driver has to know. Look at the 53c700 to see exactly how awful > it is. This beast

Re: iomapping a big endian area

2005-04-04 Thread Benjamin Herrenschmidt
> > Well ... it's like this. Native means "pass through without swapping" > > and has an easy implementation on both BE and LE platforms. Logically > > io{read,write}{16,32}be would have to do byte swaps on LE platforms. > > Being lazy, I'm opposed to doing the work if there's no actual use for >

Re: iomapping a big endian area

2005-04-04 Thread Benjamin Herrenschmidt
On Sat, 2005-04-02 at 22:27 -0600, James Bottomley wrote: > On Sat, 2005-04-02 at 20:08 -0800, David S. Miller wrote: > > > Did anyone have a preference for the API? I was thinking > > > ioread32_native, but ioread32be is fine too. > > > > I think doing foo{be,le}{8,16,32}() would be consistent w

Re: iomapping a big endian area

2005-04-04 Thread Benjamin Herrenschmidt
On Sat, 2005-04-02 at 21:40 -0600, James Bottomley wrote: > Actually, ioread8be is unnecessary, but I was planning to add > ioread16/ioread32 and iowritexx be on be variants (equivalent to > _raw_readw et al.) > > After all, the driver must know the card is BE, so the routines that > make use of