Re: [PATCH 2/2] drm/lima: driver for ARM Mali4xx GPUs

2019-02-12 Thread Qiang Yu via dri-devel
On Tue, Feb 12, 2019 at 11:47 PM Rob Herring  wrote:
>
> On Wed, Feb 6, 2019 at 7:16 AM Qiang Yu  wrote:
> >
> > From: Lima Project Developers 
>
> This should be a person (you).
>
> > Signed-off-by: Andreas Baierl 
> > Signed-off-by: Erico Nunes 
> > Signed-off-by: Heiko Stuebner 
> > Signed-off-by: Marek Vasut 
> > Signed-off-by: Neil Armstrong 
> > Signed-off-by: Qiang Yu 
>
> Being the submitter, your S-o-b should be last.
>
> > Signed-off-by: Simon Shields 
> > Signed-off-by: Vasily Khoruzhick 
> > ---
>
> > diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
> > index 4385f00e1d05..dfefcb393858 100644
> > --- a/drivers/gpu/drm/Kconfig
> > +++ b/drivers/gpu/drm/Kconfig
> > @@ -333,6 +333,8 @@ source "drivers/gpu/drm/tve200/Kconfig"
> >
> >  source "drivers/gpu/drm/xen/Kconfig"
> >
> > +source "drivers/gpu/drm/lima/Kconfig"
> > +
> >  # Keep legacy drivers last
> >
> >  menuconfig DRM_LEGACY
> > diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
> > index ce8d1d384319..8d024b729902 100644
> > --- a/drivers/gpu/drm/Makefile
> > +++ b/drivers/gpu/drm/Makefile
> > @@ -109,3 +109,4 @@ obj-$(CONFIG_DRM_TINYDRM) += tinydrm/
> >  obj-$(CONFIG_DRM_PL111) += pl111/
> >  obj-$(CONFIG_DRM_TVE200) += tve200/
> >  obj-$(CONFIG_DRM_XEN) += xen/
> > +obj-$(CONFIG_DRM_LIMA)  += lima/
>
> Not sure about this file, but normally these should be kept sorted.
>
> > diff --git a/drivers/gpu/drm/lima/lima_bcast.c 
> > b/drivers/gpu/drm/lima/lima_bcast.c
> > new file mode 100644
> > index ..63754f6465ea
> > --- /dev/null
> > +++ b/drivers/gpu/drm/lima/lima_bcast.c
> > @@ -0,0 +1,46 @@
> > +// SPDX-License-Identifier: GPL-2.0 OR MIT
> > +/* Copyright 2018 Qiang Yu  */
> > +
> > +#include 
> > +#include 
> > +
> > +#include "lima_device.h"
> > +#include "lima_bcast.h"
> > +#include "lima_regs.h"
> > +
> > +#define bcast_write(reg, data) writel(data, ip->iomem + LIMA_BCAST_##reg)
> > +#define bcast_read(reg) readl(ip->iomem + LIMA_BCAST_##reg)
>
> There are 2 things about this I would change. Just pass in 'ip' to the
> macro so it is clear in calling functions that ip is actually used.
> Second, don't do token pasting. It is generally avoided in the kernel.
> It makes grepping the source code harder and is a pointless
> indirection.
>
> If you do both of those, then these can be static inline functions
> instead which are preferred because you get type checking.
>
> Same comment applies to all the other register accessors.
>
>
> > +struct lima_ip {
> > +   struct lima_device *dev;
> > +   enum lima_ip_id id;
> > +   bool present;
> > +
> > +   void __iomem *iomem;
> > +   int irq;
> > +
> > +   union {
> > +   /* gp/pp */
> > +   bool async_reset;
> > +   /* l2 cache */
> > +   spinlock_t lock;
>
> What happens when you need 2 elements for a sub-block. I'd make this a
> struct pointer for each IP sub-block.
Let's use sub-block latter if there comes second element.

>
> > +   } data;
> > +};
> > +
> > +enum lima_pipe_id {
> > +   lima_pipe_gp,
> > +   lima_pipe_pp,
> > +   lima_pipe_num,
> > +};
> > +
> > +struct lima_device {
> > +   struct device *dev;
> > +   struct drm_device *ddev;
> > +   struct platform_device *pdev;
> > +
> > +   enum lima_gpu_id id;
> > +   int num_pp;
> > +
> > +   void __iomem *iomem;
> > +   struct clk *clk_bus;
> > +   struct clk *clk_gpu;
> > +   struct reset_control *reset;
> > +   struct regulator *regulator;
> > +
> > +   struct lima_ip ip[lima_ip_num];
> > +   struct lima_sched_pipe pipe[lima_pipe_num];
> > +
> > +   struct lima_mman mman;
> > +
> > +   struct lima_vm *empty_vm;
> > +   uint64_t va_start;
> > +   uint64_t va_end;
> > +
> > +   u32 *dlbu_cpu;
> > +   dma_addr_t dlbu_dma;
> > +};
> > +
> > +static inline struct lima_device *
> > +to_lima_dev(struct drm_device *dev)
> > +{
> > +   return dev->dev_private;
> > +}
> > +
> > +static inline struct lima_device *
> > +ttm_to_lima_dev(struct ttm_bo_device *dev)
> > +{
> > +   return container_of(dev, struct lima_device, mman.bdev);
> > +}
> > +
> > +int lima_device_init(struct lima_device *ldev);
> > +void lima_device_fini(struct lima_device *ldev);
> > +
> > +const char *lima_ip_name(struct lima_ip *ip);
> > +
> > +#endif
> > diff --git a/drivers/gpu/drm/lima/lima_dlbu.c 
> > b/drivers/gpu/drm/lima/lima_dlbu.c
> > new file mode 100644
> > index ..6697d4ddd887
> > --- /dev/null
> > +++ b/drivers/gpu/drm/lima/lima_dlbu.c
> > @@ -0,0 +1,56 @@
> > +// SPDX-License-Identifier: GPL-2.0 OR MIT
> > +/* Copyright 2018 Qiang Yu  */
> > +
> > +#include 
> > +#include 
> > +
> > +#include "lima_device.h"
> > +#include "lima_dlbu.h"
> > +#include "lima_vm.h"
> > +#include "lima_regs.h"
> > +
> > +#define dlbu_write(reg, data) writel(data, ip->iomem + LIMA_DLBU_##reg)
> > +#define dlbu_read(reg) readl(ip->iomem + LIMA_DLBU_##reg)
> > +
> 

Re: [PATCH 2/2] drm/lima: driver for ARM Mali4xx GPUs

2019-02-12 Thread Qiang Yu via dri-devel
On Wed, Feb 13, 2019 at 4:05 AM Rob Herring  wrote:
>
> On Tue, Feb 12, 2019 at 10:24 AM Alex Deucher  wrote:
> >
> > On Tue, Feb 12, 2019 at 10:53 AM Rob Herring via dri-devel
> >  wrote:
> > >
> > > On Wed, Feb 6, 2019 at 7:16 AM Qiang Yu  wrote:
> > > >
> > > > From: Lima Project Developers 
>
> [...]
>
> > > > +static int lima_ioctl_gem_va(struct drm_device *dev, void *data, 
> > > > struct drm_file *file)
> > > > +{
> > > > +   struct drm_lima_gem_va *args = data;
> > > > +
> > > > +   switch (args->op) {
> > > > +   case LIMA_VA_OP_MAP:
> > > > +   return lima_gem_va_map(file, args->handle, args->flags, 
> > > > args->va);
> > > > +   case LIMA_VA_OP_UNMAP:
> > > > +   return lima_gem_va_unmap(file, args->handle, args->va);
> > >
> > > These are mapping to GPU VA. Why not do that on GEM object creation or
> > > import or when the objects are submitted with cmd queue as other
> > > drivers do?
> > >
> > > To put it another way, These ioctls look different than what other
> > > drivers do. Why do you need to do things differently? My understanding
> > > is best practice is to map and return the GPU offset when the GEM
> > > object is created. This is what v3d does. I think Intel is moving to
> > > that. And panfrost will do that.
> >
> > I think it would be a good idea to look at the amdgpu driver.  This
> > driver is heavily modeled after it.  Basically the GEM VA ioctl allows
> > userspace to manage per process (per fd really) virtual addresses.
>
> Why do you want userspace to manage assigning VAs versus the kernel to
> do so? Exposing that detail to userspace means the driver must support
> a per process address space. Letting the kernel assign addresses means
> it can either be a single address space or be a per process address
> space. It seems to me more flexible to allow the kernel driver to
> evolve without that ABI.
>
> With any new driver in the kernel, the question is always which
> existing one is the best model to follow. I don't think Intel, AMD or
> Nouveau are good examples to follow because they have a lot of history
> and legacy, are both GPU and DC, and have separate graphics memory
> (except Intel I guess). The GPUs in ARM land have none of those
> really. Looking thru freedreno, etnaviv, and v3d mostly, I see they
> all have similar user ABIs. But they are all different based on what
> driver they copied and how they've evolved. I know it's a big can of
> worms, but it would be nice to have some alignment of ABIs. I know the
> reasons why there isn't, but it's frustrating that 11 out of 60K IGT
> tests will run. I don't think a common ABI matters much for the big 3,
> but in the ARM zoo I think it does. At least if the interfaces are
> kept similar, then having common code shared among the embedded GPUs
> would be easier and writing some IGT shim for each driver would be
> easier.
I admit the userspace VA management is not suitable and overfit to GPU
like Mali4xx which does not have complicated SW/HW like AMD GPU.
It needs some extra effort to make it work but bings no seen benefit
to lima driver.

So I do VA map when BO creation in v2 now. And you may find the v2
driver is closer to the ARM world GPU driver which also uses GEM+shmem
way instead of TTM:
https://gitlab.freedesktop.org/lima/linux/commits/lima-5.0-rc6

But due to the change is huge, I'll send it here for review latter this
week for more testing and address other of your review comments.

Thanks,
Qiang

>
>
> Rob
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 1/2] drm/fourcc: add ARM tiled format modifier

2019-02-16 Thread Qiang Yu via dri-devel
On Thu, Feb 14, 2019 at 11:27 PM Brian Starkey  wrote:
>
> Hi,
>
> On Wed, Feb 06, 2019 at 09:14:56PM +0800, Qiang Yu wrote:
> > Signed-off-by: Qiang Yu 
> > ---
> >  include/uapi/drm/drm_fourcc.h | 9 +
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
> > index 0b44260a5ee9..953b59eb3fd2 100644
> > --- a/include/uapi/drm/drm_fourcc.h
> > +++ b/include/uapi/drm/drm_fourcc.h
> > @@ -644,6 +644,15 @@ extern "C" {
> >   */
> >  #define AFBC_FORMAT_MOD_SC  (1ULL <<  9)
> >
> > +/*
> > + * ARM tiled format
> > + *
> > + * This is used by ARM Mali Utgard/Midgard GPU. It divides buffer into
> > + * 16x16 pixel blocks. Blocks are stored linearly in order, but pixels
> > + * in the block are reordered.
> > + */
> > +#define DRM_FORMAT_MOD_ARM_TILED fourcc_mod_code(ARM, 1)
> > +
>
> This conflicts with the already-defined AFBC modifiers. It has the
> same value as:
>
> DRM_FORMAT_MOD_ARM_AFBC(AFBC_FORMAT_MOD_BLOCK_SIZE_16x16);
>
> I would suggest we use the top bit(s) to namespace between the
> different types of tiling formats Arm has (e.g. AFBC vs
> non-compressed).
Just for conformation, is the non-compressed tile format + BLOCK_SIZE_16x16
super block format same as mali gpu used tile format?

So what name would you suggest to name this mali gpu tile format?
Or maybe it's better that you can send out a patch to add the tile
format (better also other ones) and how it's assembled with super block
format in the modifier, as I don't have a whole picture of this.

>
> I think this would be an internal-only layout though. Do we need
> modifiers for internal-only formats? I thought they were mainly for
> sharing cross-device. I also didn't see this macro getting used
> anywhere in the driver; I suppose at a minimum you should validate the
> modifier value provided by userspace.
>
Yes we need it. It's used in the userspace mesa driver and applications
which need to share mali driver allocated buffers across process. So
that one process can tell another process which format to interpret the
passed buffer.

Thanks,
Qiang
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel