Here is my variant on Eduard - Gabriel Munteanu's patches to add a DMA/IOMMU layer, this one is expanded to allow it to support the PAPR TCE mechanism. At present, we implement PAPR TCEs directly in the PAPR virtual IO bus layer, the last patch of this series reworks the code to implement it through the generic DMA layer. That will make life easier when we come to implement PCI for the pseries machine.
Apart from that, I've significantly reworked how the IOMMU data is accessed from the qdev. The DMADevice structure is gone - I saw no point to it. Instead, the DeviceState contains a pointer directly to a DmaMmu structure. NULL here indicates no IOMMU, so DMAs go directl to guest physical addresses. All the DMA R/W helper functions take a DeviceState * and reach the DmaMmu from there. The DmaMmu represents a single DMA context / address space, it could be seperate for each device, or shared between several devices, depending on whether a particular IOMMU implements independent translation for each device, or a single shared DMA address space for (e.g.) a whole bus. From the DmaMmu structure, IOMMU specific state information can be reached via upcasting. For PCI IOMMUS, the pci bus structure references a PCIBusIOMMU structure. That contains a single 'new_device' callback which obtains the appropriate DmaMmu context for a given PCI device. That could either be a pointer to a fixed existing DmaMmu, if the IOMMU implements a single shared address space (the AMD IOMMU uses this), or it could allocate a new DmaMmu context if the IOMMU provides a separate DMA address space for each device.