date:20200509

Re: [PATCH v4 04/16] powerpc/64s/exceptions: machine check reconcile irq state

2020-05-09 Thread Nicholas Piggin

Excerpts from Michael Ellerman's message of May 8, 2020 11:39 pm:
> Nicholas Piggin  writes:
> 
>> pseries fwnmi machine check code pops the soft-irq checks in rtas_call
>> (after the previous patch to remove rtas_token from this call path).
>  ^
>  I changed this to "next" which I think is what you meant?

Yes I rearranged them, so yes.

Thanks,
Nick

Re: [PATCH v4 11/16] powerpc/64s: machine check interrupt update NMI accounting

2020-05-09 Thread Nicholas Piggin

Excerpts from kbuild test robot's message of May 9, 2020 1:13 pm:
> Hi Nicholas,
> 
> I love your patch! Yet something to improve:

...

>   1419#if defined(CONFIG_4xx) || defined(CONFIG_BOOKE)
>   1420pr_cont("DEAR: "REG" ESR: "REG" ", regs->dar, 
> regs->dsisr);
>   1421#else
>   1422pr_cont("DAR: "REG" DSISR: %08lx ", regs->dar, 
> regs->dsisr);
>   1423#endif
>   1424#ifdef CONFIG_PPC64
>> 1425 pr_cont("IRQMASK: %lx IN_NMI:%d IN_MCE:%d", regs->softe, 
>> (int)get_paca()->in_nmi, (int)get_paca()->in_mce);

Oh I meant to get rid of that hunk, it crept back in :(

mpe if you could please take it out if you're merging this.

It was quite useful for debugging this stuff, I might do a proper patch 
for this, but for now not necessary (it doesn't matter for "normal" 
crashes only crash crashes).

Thanks,
Nick

Re: [PATCH RFC 1/4] powerpc/radix: Fix compilation for radix with CONFIG_SMP=n

2020-05-09 Thread Nicholas Piggin

Excerpts from Paul Mackerras's message of May 9, 2020 3:02 pm:
> This fixes the compile errors we currently get with CONFIG_SMP=n and
> CONFIG_PPC_RADIX_MMU=y.

Did I already fix this, or does it keep getting broken?! :(

Anyway fine by me if it's required.

Thanks,
Nick

Re: [PATCH RFC 2/4] powerpc: Add Microwatt platform

2020-05-09 Thread Nicholas Piggin

Excerpts from Paul Mackerras's message of May 9, 2020 3:02 pm:
> Microwatt is a FPGA-based implementation of the Power ISA.  It
> currently only implements little-endian 64-bit mode, and does
> not (yet) support SMP.
> 
> This adds a new machine type to support FPGA-based SoCs with a
> Microwatt core.

Very cool!

Would there be any point sharing this with the "naked metal" platform 
Alistair has for booting POWER in L3 without OPAL? Or is it easy enough 
to have a several different simple 64s platforms?

I have an HPT conditional compile patch and a few other diet Kconfig
things I'll now be better justified to try get merged :)

Thanks,
Nick

Re: [PATCH RFC 4/4] powerpc/radix: Add support for microwatt's PRTBL SPR

2020-05-09 Thread Nicholas Piggin

Excerpts from Paul Mackerras's message of May 9, 2020 3:04 pm:
> Microwatt currently doesn't implement hypervisor mode and therefore
> doesn't implement the partition table.  It does implement the process
> table and radix page table walks.
> 
> This adds code to write the base address of the process table to the
> PRTBL SPR, which has been assigned SPR 720 for now, as that is in the
> range of SPR numbers assigned for experimental use.  PRTBL is only
> written when we have neither the FW_FEATURE_LPAR feature nor the
> CPU_FTR_HVMODE feature.

What do you think about adding a FW_FEATURE and/or CPU_FTR for
microwatt?

No big deal now, but I'm sure you'll be adding other things.

Thanks,
Nick

> 
> Signed-off-by: Paul Mackerras 
> ---
>  arch/powerpc/include/asm/reg.h   |  1 +
>  arch/powerpc/mm/book3s64/radix_pgtable.c | 13 +
>  2 files changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
> index 1aa46dff0957..6ea3fc42740d 100644
> --- a/arch/powerpc/include/asm/reg.h
> +++ b/arch/powerpc/include/asm/reg.h
> @@ -721,6 +721,7 @@
>  #endif
>  #define SPRN_TIR 0x1BE   /* Thread Identification Register */
>  #define SPRN_PTCR0x1D0   /* Partition table control Register */
> +#define SPRN_PRTBL   0x2D0   /* Process table pointer */
>  #define SPRN_PSPB0x09F   /* Problem State Priority Boost reg */
>  #define SPRN_PTEHI   0x3D5   /* 981 7450 PTE HI word (S/W TLB load) */
>  #define SPRN_PTELO   0x3D6   /* 982 7450 PTE LO word (S/W TLB load) */
> diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
> b/arch/powerpc/mm/book3s64/radix_pgtable.c
> index dd1bea45325c..2e6a376c9d82 100644
> --- a/arch/powerpc/mm/book3s64/radix_pgtable.c
> +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
> @@ -600,10 +600,15 @@ void __init radix__early_init_mmu(void)
>   radix_init_pgtable();
>  
>   if (!firmware_has_feature(FW_FEATURE_LPAR)) {
> - lpcr = mfspr(SPRN_LPCR);
> - mtspr(SPRN_LPCR, lpcr | LPCR_UPRT | LPCR_HR);
> - radix_init_partition_table();
> - radix_init_amor();
> + if (cpu_has_feature(CPU_FTR_HVMODE)) {
> + lpcr = mfspr(SPRN_LPCR);
> + mtspr(SPRN_LPCR, lpcr | LPCR_UPRT | LPCR_HR);
> + radix_init_partition_table();
> + radix_init_amor();
> + } else {
> + mtspr(SPRN_PRTBL, (__pa(process_tb) |
> +(PRTB_SIZE_SHIFT - 12)));
> + }
>   } else {
>   radix_init_pseries();
>   }
> -- 
> 2.25.3
> 
>

Re: ioremap() called early from pnv_pci_init_ioda_phb()

2020-05-09 Thread Oliver O'Halloran

On Sat, May 9, 2020 at 12:41 AM Qian Cai  wrote:
>
>  Booting POWER9 PowerNV has this message,
>
> "ioremap() called early from pnv_pci_init_ioda_phb+0x420/0xdfc. Use 
> early_ioremap() instead”
>
> but use the patch below will result in leaks because it will never call 
> early_iounmap() anywhere. However, it looks me it was by design that 
> phb->regs mapping would be there forever where it would be used in 
> pnv_ioda_get_inval_reg(), so is just that check_early_ioremap_leak() initcall 
> too strong?

The warning there is junk. The PHBs are setup at boot and never torn
down so we're not "leaking" the mapping. It's supposed to be there for
the lifetime of the kernel.

That said, we could probably move the PCI setup to a point later in
boot where the normal ioremap can be be used. We would have to check
for initcalls which depend on the PHBs being setup and delay those too
though.

Oliver

Re: [PATCH 1/4] dma-mapping: move the remaining DMA API calls out of line

2020-05-09 Thread Christoph Hellwig

On Tue, May 05, 2020 at 02:18:37PM +1000, Alexey Kardashevskiy wrote:
> 
> 
> On 17/04/2020 17:58, Christoph Hellwig wrote:
> > On Wed, Apr 15, 2020 at 09:21:37PM +1000, Alexey Kardashevskiy wrote:
> >> And the fact they were exported leaves possibility that there is a
> >> driver somewhere relying on these symbols or distro kernel won't build
> >> because the symbol disappeared from exports (I do not know what KABI
> >> guarantees or if mainline kernel cares).
> > 
> > We absolutely do not care.  In fact for abuses of APIs that drivers
> > should not use we almost care to make them private and break people
> > abusing them.
> 
> ok :)
> 
> >> I do not care in particular but
> >> some might, a line separated with empty lines in the commit log would do.
> > 
> > I'll add a blurb for the next version.
> 
> 
> Has it gone anywhere? Thanks,

I've been hoping for the sg_buf helpers to land first, as they need
backporting and would conflict.  Do you urgently need the series?

Re: remove a few uses of ->queuedata

2020-05-09 Thread Christoph Hellwig

On Fri, May 08, 2020 at 11:04:45AM -0700, Dan Williams wrote:
> On Fri, May 8, 2020 at 9:16 AM Christoph Hellwig  wrote:
> >
> > Hi all,
> >
> > various bio based drivers use queue->queuedata despite already having
> > set up disk->private_data, which can be used just as easily.  This
> > series cleans them up to only use a single private data pointer.
> 
> ...but isn't the queue pretty much guaranteed to be cache hot and the
> gendisk cache cold? I'm not immediately seeing what else needs the
> gendisk in the I/O path. Is there another motivation I'm missing?

->private_data is right next to the ->queue pointer, pat0 and part_tbl
which are all used in the I/O submission path (generic_make_request /
generic_make_request_checks).  This is mostly a prep cleanup patch to
also remove the pointless queue argument from ->make_request - then
->queue is an extra dereference and extra churn.

Re: remove a few uses of ->queuedata

2020-05-09 Thread Christoph Hellwig

On Sat, May 09, 2020 at 06:13:21AM +0800, Ming Lei wrote:
> On Fri, May 08, 2020 at 06:15:02PM +0200, Christoph Hellwig wrote:
> > Hi all,
> > 
> > various bio based drivers use queue->queuedata despite already having
> > set up disk->private_data, which can be used just as easily.  This
> > series cleans them up to only use a single private data pointer.
> > 
> > blk-mq based drivers that have code pathes that can't easily get at
> > the gendisk are unaffected by this series.
> 
> Yeah, before adding disk, there still may be requests queued to LLD
> for blk-mq based drivers.
> 
> So are there this similar situation for these bio based drivers?

bio submittsion is based on the gendisk, so we can't submit before
it is added.  The passthrough request based path obviously doesn't apply
here.

Re: [PATCH RFC 2/4] powerpc: Add Microwatt platform

2020-05-09 Thread Alistair Popple

On Saturday, 9 May 2020 5:58:57 PM AEST Nicholas Piggin wrote:
> Excerpts from Paul Mackerras's message of May 9, 2020 3:02 pm:
> > Microwatt is a FPGA-based implementation of the Power ISA.  It
> > currently only implements little-endian 64-bit mode, and does
> > not (yet) support SMP.
> > 
> > This adds a new machine type to support FPGA-based SoCs with a
> > Microwatt core.
> 
> Very cool!
> 
> Would there be any point sharing this with the "naked metal" platform
> Alistair has for booting POWER in L3 without OPAL? Or is it easy enough
> to have a several different simple 64s platforms?

It looks pretty similar at the moment, I've been meaning to clean those 
patches up and send them upstream but Paul has beaten me to it. The main 
difference so far is how the console is setup. For booting cache contained I 
was using a device tree pointing at a standard UART driver and enabling the 
standard OF platform device tree probing.

- Alistair

> I have an HPT conditional compile patch and a few other diet Kconfig
> things I'll now be better justified to try get merged :)
> 
> Thanks,
> Nick

Re: ioremap() called early from pnv_pci_init_ioda_phb()

2020-05-09 Thread Nicholas Piggin

Excerpts from Qian Cai's message of May 9, 2020 3:41 am:
> 
> 
>> On May 8, 2020, at 10:39 AM, Qian Cai  wrote:
>> 
>> Booting POWER9 PowerNV has this message,
>> 
>> "ioremap() called early from pnv_pci_init_ioda_phb+0x420/0xdfc. Use 
>> early_ioremap() instead”
>> 
>> but use the patch below will result in leaks because it will never call 
>> early_iounmap() anywhere. However, it looks me it was by design that 
>> phb->regs mapping would be there forever where it would be used in 
>> pnv_ioda_get_inval_reg(), so is just that check_early_ioremap_leak() 
>> initcall too strong?
>> 
>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>> @@ -36,6 +36,7 @@
>> #include 
>> #include 
>> #include 
>> +#include 
>> 
>> #include 
>> 
>> @@ -3827,7 +3828,7 @@ static void __init pnv_pci_init_ioda_phb(struct 
>> device_node *np,
>>/* Get registers */
>>if (!of_address_to_resource(np, 0, &r)) {
>>phb->regs_phys = r.start;
>> -   phb->regs = ioremap(r.start, resource_size(&r));
>> +   phb->regs = early_ioremap(r.start, resource_size(&r));
>>if (phb->regs == NULL)
>>pr_err("  Failed to map registers !\n”);
> 
> This will also trigger a panic with debugfs reads, so isn’t that this commit 
> bogus at least for powerpc64?

Your patch to use early_ioremap is faulting? I wonder why?

Thanks,
Nick

> 
> d538aadc2718 (“powerpc/ioremap: warn on early use of ioremap()")
> 
> 11017.617022][T122068] Faulting instruction address: 0xc00db564
> [11017.617257][T122066] Faulting instruction address: 0xc00db564
> [11017.617950][T122073] Faulting instruction address: 0xc00db564
> [11017.61][T122064] BUG: Unable to handle kernel data access on read at 
> 0xffe20e10
> [11017.618935][T122064] Faulting instruction address: 0xc00db564
> [11017.737996][T122072] 
> [11017.738010][T122073] Oops: Kernel access of bad area, sig: 11 [#2]
> [11017.738024][T122073] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=256 
> DEBUG_PAGEALLOC NUMA PowerNV
> [11017.738051][T122073] Modules linked in: brd ext4 crc16 mbcache jbd2 loop 
> kvm_hv kvm ip_tables x_tables xfs sd_mod bnx2x ahci libahci tg3 mdio libata 
> libphy firmware_class dm_mirror dm_region_hash dm_log dm_mod
> [11017.738110][T122073] CPU: 108 PID: 122073 Comm: read_all Tainted: G  D 
> W 5.7.0-rc4-next-20200508+ #4
> [11017.738138][T122073] NIP:  c00db564 LR: c056f660 CTR: 
> c00db550
> [11017.738173][T122073] REGS: c00374f6f980 TRAP: 0380   Tainted: G  D 
> W  (5.7.0-rc4-next-20200508+)
> [11017.738234][T122073] MSR:  90009033   CR: 
> 22002282  XER: 2004
> [11017.738278][T122073] CFAR: c056f65c IRQMASK: 0 
> [11017.738278][T122073] GPR00: c056f660 c00374f6fc10 
> c1689400 c000201ffc41aa00 
> [11017.738278][T122073] GPR04: c00374f6fc70  
>  0001 
> [11017.738278][T122073] GPR08:  ffe2 
>  c008ee380080 
> [11017.738278][T122073] GPR12: c00db550 c000201fff671280 
>   
> [11017.738278][T122073] GPR16: 0002 10040800 
> 1001ccd8 1001cc80 
> [11017.738278][T122073] GPR20: 1001cc98 1001ccc8 
> 1001cca8 1001cb48 
> [11017.738278][T122073] GPR24:   
> 03ff 7fffebb67390 
> [11017.738278][T122073] GPR28: c00374f6fd90 c000200c0c6a7550 
>  c000200c0c6a7500 
> [11017.738542][T122073] NIP [c00db564] 
> pnv_eeh_dbgfs_get_inbB+0x14/0x30
> [11017.738579][T122073] LR [c056f660] simple_attr_read+0xa0/0x180
> [11017.738613][T122073] Call Trace:
> [11017.738645][T122073] [c00374f6fc10] [c056f630] 
> simple_attr_read+0x70/0x180 (unreliable)
> [11017.738672][T122073] [c00374f6fcb0] [c064a2e0] 
> full_proxy_read+0x90/0xe0
> [11017.738686][T122073] [c00374f6fd00] [c051fe0c] 
> __vfs_read+0x3c/0x70
> [11017.738722][T122073] [c00374f6fd20] [c051feec] 
> vfs_read+0xac/0x170
> [11017.738757][T122073] [c00374f6fd70] [c052034c] 
> ksys_read+0x7c/0x140
> [11017.738818][T122073] [c00374f6fdc0] [c0038af4] 
> system_call_exception+0x114/0x1e0
> [11017.738867][T122073] [c00374f6fe20] [c000c8f0] 
> system_call_common+0xf0/0x278
> [11017.738916][T122073] Instruction dump:
> [11017.738948][T122073] 7c0004ac f9490d10 a14d0c78 3860 b14d0c7a 4e800020 
> 6000 7c0802a6 
> [11017.739001][T122073] 6000 e9230278 e9290028 7c0004ac  
> 0c09 4c00012c 3860

[PATCH 09/15, fіxed] lightnvm: stop using ->queuedata

2020-05-09 Thread Christoph Hellwig

Signed-off-by: Christoph Hellwig 
---

fixed the compilation in the print_ppa arguments

 drivers/lightnvm/core.c  | 1 -
 drivers/lightnvm/pblk-init.c | 2 +-
 drivers/lightnvm/pblk.h  | 2 +-
 3 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
index db38a68abb6c0..85c5490cdfd2e 100644
--- a/drivers/lightnvm/core.c
+++ b/drivers/lightnvm/core.c
@@ -400,7 +400,6 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct 
nvm_ioctl_create *create)
}
 
tdisk->private_data = targetdata;
-   tqueue->queuedata = targetdata;
 
mdts = (dev->geo.csecs >> 9) * NVM_MAX_VLBA;
if (dev->geo.mdts) {
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 9a967a2e83dd7..bec904ec0f7c0 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -49,7 +49,7 @@ struct bio_set pblk_bio_set;
 
 static blk_qc_t pblk_make_rq(struct request_queue *q, struct bio *bio)
 {
-   struct pblk *pblk = q->queuedata;
+   struct pblk *pblk = bio->bi_disk->private_data;
 
if (bio_op(bio) == REQ_OP_DISCARD) {
pblk_discard(pblk, bio);
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index 86ffa875bfe16..49718105bc0dc 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -1255,7 +1255,7 @@ static inline int pblk_boundary_ppa_checks(struct 
nvm_tgt_dev *tgt_dev,
continue;
}
 
-   print_ppa(tgt_dev->q->queuedata, ppa, "boundary", i);
+   print_ppa(tgt_dev->disk->q->private_data, ppa, "boundary", i);
 
return 1;
}
-- 
2.26.2

Re: ioremap() called early from pnv_pci_init_ioda_phb()

2020-05-09 Thread Nicholas Piggin

Excerpts from Oliver O'Halloran's message of May 9, 2020 6:11 pm:
> On Sat, May 9, 2020 at 12:41 AM Qian Cai  wrote:
>>
>>  Booting POWER9 PowerNV has this message,
>>
>> "ioremap() called early from pnv_pci_init_ioda_phb+0x420/0xdfc. Use 
>> early_ioremap() instead”
>>
>> but use the patch below will result in leaks because it will never call 
>> early_iounmap() anywhere. However, it looks me it was by design that 
>> phb->regs mapping would be there forever where it would be used in 
>> pnv_ioda_get_inval_reg(), so is just that check_early_ioremap_leak() 
>> initcall too strong?
> 
> The warning there is junk. The PHBs are setup at boot and never torn
> down so we're not "leaking" the mapping. It's supposed to be there for
> the lifetime of the kernel.
> 
> That said, we could probably move the PCI setup to a point later in
> boot where the normal ioremap can be be used. We would have to check
> for initcalls which depend on the PHBs being setup and delay those too
> though.

I think it helps to unify code a bit more and take special cases out of 
ioremap to have all these early calls use early_ioremap.

We actually do want to move these later if possible too, on radix they 
use memblock for page tables, and on hash they don't even set up proper 
kernel page tables but just bolt PTEs into the hash table.

Thanks,
Nick

Re: [PATCH RFC 2/4] powerpc: Add Microwatt platform

2020-05-09 Thread benh

We should do that here too. Time to replace our silly UART !On 9 May 2020 6:36 pm, Alistair Popple  wrote:On Saturday, 9 May 2020 5:58:57 PM AEST Nicholas Piggin wrote:
> Excerpts from Paul Mackerras's message of May 9, 2020 3:02 pm:
> > Microwatt is a FPGA-based implementation of the Power ISA.  It
> > currently only implements little-endian 64-bit mode, and does
> > not (yet) support SMP.
> > 
> > This adds a new machine type to support FPGA-based SoCs with a
> > Microwatt core.
> 
> Very cool!
> 
> Would there be any point sharing this with the "naked metal" platform
> Alistair has for booting POWER in L3 without OPAL? Or is it easy enough
> to have a several different simple 64s platforms?

It looks pretty similar at the moment, I've been meaning to clean those 
patches up and send them upstream but Paul has beaten me to it. The main 
difference so far is how the console is setup. For booting cache contained I 
was using a device tree pointing at a standard UART driver and enabling the 
standard OF platform device tree probing.

- Alistair

> I have an HPT conditional compile patch and a few other diet Kconfig
> things I'll now be better justified to try get merged :)
> 
> Thanks,
> Nick

Re: [PATCH RFC 2/4] powerpc: Add Microwatt platform

2020-05-09 Thread Nicholas Piggin

Excerpts from Alistair Popple's message of May 9, 2020 6:36 pm:
> On Saturday, 9 May 2020 5:58:57 PM AEST Nicholas Piggin wrote:
>> Excerpts from Paul Mackerras's message of May 9, 2020 3:02 pm:
>> > Microwatt is a FPGA-based implementation of the Power ISA.  It
>> > currently only implements little-endian 64-bit mode, and does
>> > not (yet) support SMP.
>> > 
>> > This adds a new machine type to support FPGA-based SoCs with a
>> > Microwatt core.
>> 
>> Very cool!
>> 
>> Would there be any point sharing this with the "naked metal" platform
>> Alistair has for booting POWER in L3 without OPAL? Or is it easy enough
>> to have a several different simple 64s platforms?
> 
> It looks pretty similar at the moment, I've been meaning to clean those 
> patches up and send them upstream but Paul has beaten me to it. The main 
> difference so far is how the console is setup. For booting cache contained I 
> was using a device tree pointing at a standard UART driver and enabling the 
> standard OF platform device tree probing.

Well I'd only merge them if you think it makes sense. If the platform is 
a perfectly good abstraction for the differences and merging them would 
just result in painful special cases it wouldn't be worthwhile. It's
clearly not a lot of code.

Thanks,
Nick

[PATCH fixes] powerpc/vdso32: Fallback on getres syscall when clock is unknown

2020-05-09 Thread Christophe Leroy

There are other clocks than the standard ones, for instance
per process clocks. Therefore, being above the last standard clock
doesn't mean it is a bad clock. So, fallback to syscall instead
of returning -EINVAL inconditionaly.

Fixes: e33ffc956b08 ("powerpc/vdso32: implement clock_getres entirely")
Cc: sta...@vger.kernel.org
Reported-by: Aurelien Jarno 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/vdso32/gettimeofday.S | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/vdso32/gettimeofday.S 
b/arch/powerpc/kernel/vdso32/gettimeofday.S
index a3951567118a..e7f8f9f1b3f4 100644
--- a/arch/powerpc/kernel/vdso32/gettimeofday.S
+++ b/arch/powerpc/kernel/vdso32/gettimeofday.S
@@ -218,11 +218,11 @@ V_FUNCTION_BEGIN(__kernel_clock_getres)
blr
 
/*
-* invalid clock
+* syscall fallback
 */
 99:
-   li  r3, EINVAL
-   crset   so
+   li  r0,__NR_clock_getres
+   sc
blr
   .cfi_endproc
 V_FUNCTION_END(__kernel_clock_getres)
-- 
2.25.0

Re: [PATCH 01/15] nfblock: use gendisk private_data

2020-05-09 Thread Geert Uytterhoeven

Hi Christoph,

On Fri, May 8, 2020 at 6:16 PM Christoph Hellwig  wrote:
> Signed-off-by: Christoph Hellwig 

Thanks for your patch!

> --- a/arch/m68k/emu/nfblock.c
> +++ b/arch/m68k/emu/nfblock.c
> @@ -61,7 +61,7 @@ struct nfhd_device {
>
>  static blk_qc_t nfhd_make_request(struct request_queue *queue, struct bio 
> *bio)
>  {
> -   struct nfhd_device *dev = queue->queuedata;
> +   struct nfhd_device *dev = bio->bi_disk->private_data;
> struct bio_vec bvec;
> struct bvec_iter iter;
> int dir, len, shift;
> @@ -122,7 +122,6 @@ static int __init nfhd_init_one(int id, u32 blocks, u32 
> bsize)
> if (dev->queue == NULL)
> goto free_dev;
>
> -   dev->queue->queuedata = dev;
> blk_queue_logical_block_size(dev->queue, bsize);
>
> dev->disk = alloc_disk(16);
> @@ -136,6 +135,7 @@ static int __init nfhd_init_one(int id, u32 blocks, u32 
> bsize)
> sprintf(dev->disk->disk_name, "nfhd%u", dev_id);
> set_capacity(dev->disk, (sector_t)blocks * (bsize / 512));
> dev->disk->queue = dev->queue;
> +   dev->disk->private_data = dev;

This is already set above, just before the quoted sprintf() call.

>
> add_disk(dev->disk);

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCH -next] soc: fsl: dpio: remove set but not used variable 'addr_cena'

2020-05-09 Thread Yuehaibing

Pls ignore this duplicate.

On 2020/5/8 22:10, YueHaibing wrote:
> drivers/soc/fsl/dpio//qbman-portal.c:650:11: warning: variable 'addr_cena' 
> set but not used [-Wunused-but-set-variable]
>   uint64_t addr_cena;
>^
> 
> It is never used, so remove it.
> 
> Signed-off-by: YueHaibing 
> ---
>  drivers/soc/fsl/dpio/qbman-portal.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/soc/fsl/dpio/qbman-portal.c 
> b/drivers/soc/fsl/dpio/qbman-portal.c
> index e2e9fbb58a72..0ce859b60888 100644
> --- a/drivers/soc/fsl/dpio/qbman-portal.c
> +++ b/drivers/soc/fsl/dpio/qbman-portal.c
> @@ -647,7 +647,6 @@ int qbman_swp_enqueue_multiple_direct(struct qbman_swp *s,
>   const uint32_t *cl = (uint32_t *)d;
>   uint32_t eqcr_ci, eqcr_pi, half_mask, full_mask;
>   int i, num_enqueued = 0;
> - uint64_t addr_cena;
>  
>   spin_lock(&s->access_spinlock);
>   half_mask = (s->eqcr.pi_ci_mask>>1);
> @@ -700,7 +699,6 @@ int qbman_swp_enqueue_multiple_direct(struct qbman_swp *s,
>  
>   /* Flush all the cacheline without load/store in between */
>   eqcr_pi = s->eqcr.pi;
> - addr_cena = (size_t)s->addr_cena;
>   for (i = 0; i < num_enqueued; i++)
>   eqcr_pi++;
>   s->eqcr.pi = eqcr_pi & full_mask;
>

Re: [PATCH v3] powerpc/64s/pgtable: fix an undefined behaviour

2020-05-09 Thread Qian Cai




> On Mar 6, 2020, at 1:56 AM, Christophe Leroy  wrote:
> 
> 
> 
> Le 06/03/2020 à 05:48, Qian Cai a écrit :
>> Booting a power9 server with hash MMU could trigger an undefined
>> behaviour because pud_offset(p4d, 0) will do,
>> 0 >> (PAGE_SHIFT:16 + PTE_INDEX_SIZE:8 + H_PMD_INDEX_SIZE:10)
>> Fix it by converting pud_index() and friends to static inline
>> functions.
>> UBSAN: shift-out-of-bounds in arch/powerpc/mm/ptdump/ptdump.c:282:15
>> shift exponent 34 is too large for 32-bit type 'int'
>> CPU: 6 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc4-next-20200303+ #13
>> Call Trace:
>> dump_stack+0xf4/0x164 (unreliable)
>> ubsan_epilogue+0x18/0x78
>> __ubsan_handle_shift_out_of_bounds+0x160/0x21c
>> walk_pagetables+0x2cc/0x700
>> walk_pud at arch/powerpc/mm/ptdump/ptdump.c:282
>> (inlined by) walk_pagetables at arch/powerpc/mm/ptdump/ptdump.c:311
>> ptdump_check_wx+0x8c/0xf0
>> mark_rodata_ro+0x48/0x80
>> kernel_init+0x74/0x194
>> ret_from_kernel_thread+0x5c/0x74
>> Suggested-by: Christophe Leroy 
>> Signed-off-by: Qian Cai 
> 
> Reviewed-by: Christophe Leroy 

Michael, can you take a look at this patch when you have a chance? It looks 
falling through the cracks.

> 
>> ---
>> v3: convert pud_index() etc to static inline functions.
>> v2: convert pud_offset() etc to static inline functions.
>>  arch/powerpc/include/asm/book3s/64/pgtable.h | 23 
>>  1 file changed, 19 insertions(+), 4 deletions(-)
>> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
>> b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> index 201a69e6a355..bd432c6706b9 100644
>> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
>> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> @@ -998,10 +998,25 @@ extern struct page *pgd_page(pgd_t pgd);
>>  #define pud_page_vaddr(pud) __va(pud_val(pud) & ~PUD_MASKED_BITS)
>>  #define pgd_page_vaddr(pgd) __va(pgd_val(pgd) & ~PGD_MASKED_BITS)
>>  -#define pgd_index(address) (((address) >> (PGDIR_SHIFT)) & (PTRS_PER_PGD - 
>> 1))
>> -#define pud_index(address) (((address) >> (PUD_SHIFT)) & (PTRS_PER_PUD - 1))
>> -#define pmd_index(address) (((address) >> (PMD_SHIFT)) & (PTRS_PER_PMD - 1))
>> -#define pte_index(address) (((address) >> (PAGE_SHIFT)) & (PTRS_PER_PTE - 
>> 1))
>> +static inline unsigned long pgd_index(unsigned long address)
>> +{
>> +return (address >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1);
>> +}
>> +
>> +static inline unsigned long pud_index(unsigned long address)
>> +{
>> +return (address >> PUD_SHIFT) & (PTRS_PER_PUD - 1);
>> +}
>> +
>> +static inline unsigned long pmd_index(unsigned long address)
>> +{
>> +return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
>> +}
>> +
>> +static inline unsigned long pte_index(unsigned long address)
>> +{
>> +return (address >> PAGE_SHIFT) & (PTRS_PER_PTE - 1);
>> +}
>>/*
>>   * Find an entry in a page-table-directory.  We combine the address region

Re: [PATCH 1/4] dma-mapping: move the remaining DMA API calls out of line

2020-05-09 Thread Alexey Kardashevskiy




On 09/05/2020 18:19, Christoph Hellwig wrote:
> On Tue, May 05, 2020 at 02:18:37PM +1000, Alexey Kardashevskiy wrote:
>>
>>
>> On 17/04/2020 17:58, Christoph Hellwig wrote:
>>> On Wed, Apr 15, 2020 at 09:21:37PM +1000, Alexey Kardashevskiy wrote:
 And the fact they were exported leaves possibility that there is a
 driver somewhere relying on these symbols or distro kernel won't build
 because the symbol disappeared from exports (I do not know what KABI
 guarantees or if mainline kernel cares).
>>>
>>> We absolutely do not care.  In fact for abuses of APIs that drivers
>>> should not use we almost care to make them private and break people
>>> abusing them.
>>
>> ok :)
>>
 I do not care in particular but
 some might, a line separated with empty lines in the commit log would do.
>>>
>>> I'll add a blurb for the next version.
>>
>>
>> Has it gone anywhere? Thanks,
> 
> I've been hoping for the sg_buf helpers to land first, as they need
> backporting and would conflict.  Do you urgently need the series?

Nah, not that urgent. Thanks,


-- 
Alexey

Re: remove a few uses of ->queuedata

2020-05-09 Thread Dan Williams

On Sat, May 9, 2020 at 1:24 AM Christoph Hellwig  wrote:
>
> On Fri, May 08, 2020 at 11:04:45AM -0700, Dan Williams wrote:
> > On Fri, May 8, 2020 at 9:16 AM Christoph Hellwig  wrote:
> > >
> > > Hi all,
> > >
> > > various bio based drivers use queue->queuedata despite already having
> > > set up disk->private_data, which can be used just as easily.  This
> > > series cleans them up to only use a single private data pointer.
> >
> > ...but isn't the queue pretty much guaranteed to be cache hot and the
> > gendisk cache cold? I'm not immediately seeing what else needs the
> > gendisk in the I/O path. Is there another motivation I'm missing?
>
> ->private_data is right next to the ->queue pointer, pat0 and part_tbl
> which are all used in the I/O submission path (generic_make_request /
> generic_make_request_checks).  This is mostly a prep cleanup patch to
> also remove the pointless queue argument from ->make_request - then
> ->queue is an extra dereference and extra churn.

Ah ok. If the changelogs had been filled in with something like "In
preparation for removing @q from make_request_fn, stop using
->queuedata", I probably wouldn't have looked twice.

For the nvdimm/ driver updates you can add:

Reviewed-by: Dan Williams 

...or just let me know if you want me to pick those up through the nvdimm tree.

Re: ioremap() called early from pnv_pci_init_ioda_phb()

2020-05-09 Thread Qian Cai

> On May 9, 2020, at 4:38 AM, Nicholas Piggin  wrote:
> 
> Your patch to use early_ioremap is faulting? I wonder why?

Yes, I don’t know the reasons either. I suppose not many places in other parts 
of the kernel which keep using those addresses from early_ioremap() after 
system booted. Otherwise, we would see those leak warnings elsewhere.

Thus, we probably have to audit the code, and if still necessary, call 
early_ioremap() and then early_iounmap() followed by a ioremap() once slab 
allocator is available?

Re: ioremap() called early from pnv_pci_init_ioda_phb()

2020-05-09 Thread Christophe Leroy





Le 08/05/2020 à 19:41, Qian Cai a écrit :




On May 8, 2020, at 10:39 AM, Qian Cai  wrote:

Booting POWER9 PowerNV has this message,

"ioremap() called early from pnv_pci_init_ioda_phb+0x420/0xdfc. Use 
early_ioremap() instead”

but use the patch below will result in leaks because it will never call 
early_iounmap() anywhere. However, it looks me it was by design that phb->regs 
mapping would be there forever where it would be used in pnv_ioda_get_inval_reg(), 
so is just that check_early_ioremap_leak() initcall too strong?

--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -36,6 +36,7 @@
#include 
#include 
#include 
+#include 

#include 

@@ -3827,7 +3828,7 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
/* Get registers */
if (!of_address_to_resource(np, 0, &r)) {
phb->regs_phys = r.start;
-   phb->regs = ioremap(r.start, resource_size(&r));
+   phb->regs = early_ioremap(r.start, resource_size(&r));
if (phb->regs == NULL)
pr_err("  Failed to map registers !\n”);


This will also trigger a panic with debugfs reads, so isn’t that this commit 
bogus at least for powerpc64?

d538aadc2718 (“powerpc/ioremap: warn on early use of ioremap()")


No d538aadc2718 is not bogus. That's the point, we want to remove all 
early usages of ioremap() in order to remove the hack with the 
ioremap_bot stuff and all, and stick to the generic ioremap logic.


In order to do so, all early use of ioremap() has to be converted to 
early_ioremap() or to fixmap or anything else that allows to do ioremaps 
before the slab is ready.


early_ioremap() is for temporary mappings necessary at boottime. For 
long lasting mappings, another method is to be used.


Now, the point is that other architectures like for instance x86 don't 
seem to have to use early_ioremap() much. Powerpc is for instance doing 
early mappings for PCI. Seems like x86 initialises PCI once slab is 
ready. Can't powerpc do the same ?


Christophe

Re: [PATCH v8 8/8] powerpc/vdso: Provide __kernel_clock_gettime64() on vdso32

2020-05-09 Thread Christophe Leroy





Le 28/04/2020 à 18:05, Arnd Bergmann a écrit :

On Tue, Apr 28, 2020 at 3:16 PM Christophe Leroy
 wrote:


Provides __kernel_clock_gettime64() on vdso32. This is the
64 bits version of __kernel_clock_gettime() which is
y2038 compliant.

Signed-off-by: Christophe Leroy 


Looks good to me

Reviewed-by: Arnd Bergmann 

There was a bug on ARM for the corresponding function, so far it is unclear
if this was a problem related to particular hardware, the 32-bit kernel code,
or the common implementation of clock_gettime64 in the vdso library,
see https://github.com/richfelker/musl-cross-make/issues/96

Just to be sure that powerpc is not affected by the same issue, can you
confirm that repeatedly calling clock_gettime64 on powerpc32, alternating
between vdso and syscall, results in monotically increasing times?



I think that's one of the things vdsotest checks, so yes that's ok I think.

Christophe

[PATCH vdsotest] Add support for clock_gettime64() on powerpc32

2020-05-09 Thread Christophe Leroy

libc test is commented out because at the time being
very few libc if any supports clock_gettime64()

syscall number is hardcoded when it doesn't exists in unistd.h

Signed-off-by: Christophe Leroy 
---
Based on master from https://github.com/nathanlynch/vdsotest.git

 src/clock-boottime.c   |   4 +
 src/clock-monotonic-coarse.c   |   4 +
 src/clock-monotonic-raw.c  |   4 +
 src/clock-monotonic.c  |   4 +
 src/clock-realtime-coarse.c|   4 +
 src/clock-realtime.c   |   4 +
 src/clock-tai.c|   4 +
 src/clock_gettime64_template.c | 401 +
 8 files changed, 429 insertions(+)
 create mode 100644 src/clock_gettime64_template.c

diff --git a/src/clock-boottime.c b/src/clock-boottime.c
index 9fb1ac48501d..07ef31d08614 100644
--- a/src/clock-boottime.c
+++ b/src/clock-boottime.c
@@ -3,3 +3,7 @@
 
 #include "clock_gettime_template.c"
 #include "clock_getres_template.c"
+
+#if defined(__powerpc__) && !defined(__powerpc64__)
+#include "clock_gettime64_template.c"
+#endif
diff --git a/src/clock-monotonic-coarse.c b/src/clock-monotonic-coarse.c
index ca1df58691ca..da064188756f 100644
--- a/src/clock-monotonic-coarse.c
+++ b/src/clock-monotonic-coarse.c
@@ -3,3 +3,7 @@
 
 #include "clock_gettime_template.c"
 #include "clock_getres_template.c"
+
+#if defined(__powerpc__) && !defined(__powerpc64__)
+#include "clock_gettime64_template.c"
+#endif
diff --git a/src/clock-monotonic-raw.c b/src/clock-monotonic-raw.c
index 5dbb1842e698..55373b94ecfd 100644
--- a/src/clock-monotonic-raw.c
+++ b/src/clock-monotonic-raw.c
@@ -3,3 +3,7 @@
 
 #include "clock_gettime_template.c"
 #include "clock_getres_template.c"
+
+#if defined(__powerpc__) && !defined(__powerpc64__)
+#include "clock_gettime64_template.c"
+#endif
diff --git a/src/clock-monotonic.c b/src/clock-monotonic.c
index 44318ae1e1c2..a900d24598a1 100644
--- a/src/clock-monotonic.c
+++ b/src/clock-monotonic.c
@@ -3,3 +3,7 @@
 
 #include "clock_gettime_template.c"
 #include "clock_getres_template.c"
+
+#if defined(__powerpc__) && !defined(__powerpc64__)
+#include "clock_gettime64_template.c"
+#endif
diff --git a/src/clock-realtime-coarse.c b/src/clock-realtime-coarse.c
index 8f33f9a2d30b..8f2e6242bf0d 100644
--- a/src/clock-realtime-coarse.c
+++ b/src/clock-realtime-coarse.c
@@ -3,3 +3,7 @@
 
 #include "clock_gettime_template.c"
 #include "clock_getres_template.c"
+
+#if defined(__powerpc__) && !defined(__powerpc64__)
+#include "clock_gettime64_template.c"
+#endif
diff --git a/src/clock-realtime.c b/src/clock-realtime.c
index 079fd801e654..ab76329a6676 100644
--- a/src/clock-realtime.c
+++ b/src/clock-realtime.c
@@ -3,3 +3,7 @@
 
 #include "clock_gettime_template.c"
 #include "clock_getres_template.c"
+
+#if defined(__powerpc__) && !defined(__powerpc64__)
+#include "clock_gettime64_template.c"
+#endif
diff --git a/src/clock-tai.c b/src/clock-tai.c
index ad0448adeba5..cb2511039ddc 100644
--- a/src/clock-tai.c
+++ b/src/clock-tai.c
@@ -7,3 +7,7 @@
 
 #include "clock_gettime_template.c"
 #include "clock_getres_template.c"
+
+#if defined(__powerpc__) && !defined(__powerpc64__)
+#include "clock_gettime64_template.c"
+#endif
diff --git a/src/clock_gettime64_template.c b/src/clock_gettime64_template.c
new file mode 100644
index ..4752845148d6
--- /dev/null
+++ b/src/clock_gettime64_template.c
@@ -0,0 +1,401 @@
+/*
+ * Copyright 2014 Mentor Graphics Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2 of the
+ * License.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "compiler.h"
+#include "vdsotest.h"
+
+struct timespec64 {
+   long long tv_sec;
+   long tv_nsec;
+};
+
+#ifndef SYS_clock_gettime64
+#define SYS_clock_gettime64 403
+#endif
+
+static int (*clock_gettime64_vdso)(clockid_t id, struct timespec64 *ts);
+
+static bool vdso_has_clock_gettime64(void)
+{
+   return clock_gettime64_vdso != NULL;
+}
+
+static int clock_gettime64_syscall_wrapper(clockid_t id, struct timespec64 *ts)
+{
+   return syscall(SYS_clock_gettime64, id, ts);
+}
+
+static void clock_gettime64_syscall_nofail(clockid_t id, struct timespec64 *ts)
+{
+   int err;
+
+   err = clock_gettime64_syscall_wrapper(id, ts);
+   if (err)
+   error(EXIT_FAILURE, errno, "SYS_clock_gettime64");
+}
+
+stat

[PATCH vdsotest] Add support for clock_gettime64() on powerpc32

2020-05-09 Thread Christophe Leroy

libc test is commented out because at the time being
very few libc if any supports clock_gettime64()

syscall number is hardcoded when it doesn't exists in unistd.h

Signed-off-by: Christophe Leroy 
---
Based on master from https://github.com/nathanlynch/vdsotest.git

 src/clock-boottime.c   |   4 +
 src/clock-monotonic-coarse.c   |   4 +
 src/clock-monotonic-raw.c  |   4 +
 src/clock-monotonic.c  |   4 +
 src/clock-realtime-coarse.c|   4 +
 src/clock-realtime.c   |   4 +
 src/clock-tai.c|   4 +
 src/clock_gettime64_template.c | 401 +
 8 files changed, 429 insertions(+)
 create mode 100644 src/clock_gettime64_template.c

diff --git a/src/clock-boottime.c b/src/clock-boottime.c
index 9fb1ac48501d..07ef31d08614 100644
--- a/src/clock-boottime.c
+++ b/src/clock-boottime.c
@@ -3,3 +3,7 @@
 
 #include "clock_gettime_template.c"
 #include "clock_getres_template.c"
+
+#if defined(__powerpc__) && !defined(__powerpc64__)
+#include "clock_gettime64_template.c"
+#endif
diff --git a/src/clock-monotonic-coarse.c b/src/clock-monotonic-coarse.c
index ca1df58691ca..da064188756f 100644
--- a/src/clock-monotonic-coarse.c
+++ b/src/clock-monotonic-coarse.c
@@ -3,3 +3,7 @@
 
 #include "clock_gettime_template.c"
 #include "clock_getres_template.c"
+
+#if defined(__powerpc__) && !defined(__powerpc64__)
+#include "clock_gettime64_template.c"
+#endif
diff --git a/src/clock-monotonic-raw.c b/src/clock-monotonic-raw.c
index 5dbb1842e698..55373b94ecfd 100644
--- a/src/clock-monotonic-raw.c
+++ b/src/clock-monotonic-raw.c
@@ -3,3 +3,7 @@
 
 #include "clock_gettime_template.c"
 #include "clock_getres_template.c"
+
+#if defined(__powerpc__) && !defined(__powerpc64__)
+#include "clock_gettime64_template.c"
+#endif
diff --git a/src/clock-monotonic.c b/src/clock-monotonic.c
index 44318ae1e1c2..a900d24598a1 100644
--- a/src/clock-monotonic.c
+++ b/src/clock-monotonic.c
@@ -3,3 +3,7 @@
 
 #include "clock_gettime_template.c"
 #include "clock_getres_template.c"
+
+#if defined(__powerpc__) && !defined(__powerpc64__)
+#include "clock_gettime64_template.c"
+#endif
diff --git a/src/clock-realtime-coarse.c b/src/clock-realtime-coarse.c
index 8f33f9a2d30b..8f2e6242bf0d 100644
--- a/src/clock-realtime-coarse.c
+++ b/src/clock-realtime-coarse.c
@@ -3,3 +3,7 @@
 
 #include "clock_gettime_template.c"
 #include "clock_getres_template.c"
+
+#if defined(__powerpc__) && !defined(__powerpc64__)
+#include "clock_gettime64_template.c"
+#endif
diff --git a/src/clock-realtime.c b/src/clock-realtime.c
index 079fd801e654..ab76329a6676 100644
--- a/src/clock-realtime.c
+++ b/src/clock-realtime.c
@@ -3,3 +3,7 @@
 
 #include "clock_gettime_template.c"
 #include "clock_getres_template.c"
+
+#if defined(__powerpc__) && !defined(__powerpc64__)
+#include "clock_gettime64_template.c"
+#endif
diff --git a/src/clock-tai.c b/src/clock-tai.c
index ad0448adeba5..cb2511039ddc 100644
--- a/src/clock-tai.c
+++ b/src/clock-tai.c
@@ -7,3 +7,7 @@
 
 #include "clock_gettime_template.c"
 #include "clock_getres_template.c"
+
+#if defined(__powerpc__) && !defined(__powerpc64__)
+#include "clock_gettime64_template.c"
+#endif
diff --git a/src/clock_gettime64_template.c b/src/clock_gettime64_template.c
new file mode 100644
index ..4752845148d6
--- /dev/null
+++ b/src/clock_gettime64_template.c
@@ -0,0 +1,401 @@
+/*
+ * Copyright 2014 Mentor Graphics Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2 of the
+ * License.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "compiler.h"
+#include "vdsotest.h"
+
+struct timespec64 {
+   long long tv_sec;
+   long tv_nsec;
+};
+
+#ifndef SYS_clock_gettime64
+#define SYS_clock_gettime64 403
+#endif
+
+static int (*clock_gettime64_vdso)(clockid_t id, struct timespec64 *ts);
+
+static bool vdso_has_clock_gettime64(void)
+{
+   return clock_gettime64_vdso != NULL;
+}
+
+static int clock_gettime64_syscall_wrapper(clockid_t id, struct timespec64 *ts)
+{
+   return syscall(SYS_clock_gettime64, id, ts);
+}
+
+static void clock_gettime64_syscall_nofail(clockid_t id, struct timespec64 *ts)
+{
+   int err;
+
+   err = clock_gettime64_syscall_wrapper(id, ts);
+   if (err)
+   error(EXIT_FAILURE, errno, "SYS_clock_gettime64");
+}
+
+stat

[PATCH fixes] powerpc/40x: Make more space for system call exception

2020-05-09 Thread Christophe Leroy

When CONFIG_VIRT_CPU_ACCOUNTING is selected, system call exception
handler doesn't fit below 0xd00 and build fails.

As exception 0xd00 doesn't exist and is never generated by 40x,
comment it out in order to get more space for system call exception.

Reported-by: kbuild test robot 
Fixes: 9e27086292aa ("powerpc/32: Warn and return ENOSYS on syscalls from 
kernel")
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_40x.S | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/head_40x.S b/arch/powerpc/kernel/head_40x.S
index 9bb663977e84..2cec543c38f0 100644
--- a/arch/powerpc/kernel/head_40x.S
+++ b/arch/powerpc/kernel/head_40x.S
@@ -344,8 +344,9 @@ _ENTRY(saved_ksp_limit)
 /* 0x0C00 - System Call Exception */
START_EXCEPTION(0x0C00, SystemCall)
SYSCALL_ENTRY   0xc00
+/* Trap_0D is commented out to get more space for system call exception */
 
-   EXCEPTION(0x0D00, Trap_0D, unknown_exception, EXC_XFER_STD)
+/* EXCEPTION(0x0D00, Trap_0D, unknown_exception, EXC_XFER_STD) */
EXCEPTION(0x0E00, Trap_0E, unknown_exception, EXC_XFER_STD)
EXCEPTION(0x0F00, Trap_0F, unknown_exception, EXC_XFER_STD)
 
-- 
2.25.0

Re: [PATCH v8 8/8] powerpc/vdso: Provide __kernel_clock_gettime64() on vdso32

2020-05-09 Thread Christophe Leroy





Le 09/05/2020 à 17:54, Christophe Leroy a écrit :



Le 28/04/2020 à 18:05, Arnd Bergmann a écrit :

On Tue, Apr 28, 2020 at 3:16 PM Christophe Leroy
 wrote:


Provides __kernel_clock_gettime64() on vdso32. This is the
64 bits version of __kernel_clock_gettime() which is
y2038 compliant.

Signed-off-by: Christophe Leroy 


Looks good to me

Reviewed-by: Arnd Bergmann 

There was a bug on ARM for the corresponding function, so far it is 
unclear
if this was a problem related to particular hardware, the 32-bit 
kernel code,

or the common implementation of clock_gettime64 in the vdso library,
see https://github.com/richfelker/musl-cross-make/issues/96

Just to be sure that powerpc is not affected by the same issue, can you
confirm that repeatedly calling clock_gettime64 on powerpc32, alternating
between vdso and syscall, results in monotically increasing times?



I think that's one of the things vdsotest checks, so yes that's ok I think.



Here is the full result with vdsotest:

gettimeofday: syscall: 3715 nsec/call
gettimeofday:libc: 794 nsec/call
gettimeofday:vdso: 947 nsec/call
getcpu: syscall: 1614 nsec/call
getcpu:libc: 484 nsec/call
getcpu:vdso: 184 nsec/call
clock-gettime64-realtime-coarse: syscall: 3152 nsec/call
clock-gettime64-realtime-coarse:libc: not tested
clock-gettime64-realtime-coarse:vdso: 653 nsec/call
clock-getres-realtime-coarse: syscall: 2963 nsec/call
clock-getres-realtime-coarse:libc: 745 nsec/call
clock-getres-realtime-coarse:vdso: 553 nsec/call
clock-gettime-realtime-coarse: syscall: 5120 nsec/call
clock-gettime-realtime-coarse:libc: 731 nsec/call
clock-gettime-realtime-coarse:vdso: 577 nsec/call
clock-gettime64-realtime: syscall: 3719 nsec/call
clock-gettime64-realtime:libc: not tested
clock-gettime64-realtime:vdso: 976 nsec/call
clock-getres-realtime: syscall: 2496 nsec/call
clock-getres-realtime:libc: 745 nsec/call
clock-getres-realtime:vdso: 546 nsec/call
clock-gettime-realtime: syscall: 4800 nsec/call
clock-gettime-realtime:libc: 1080 nsec/call
clock-gettime-realtime:vdso: 1798 nsec/call
clock-gettime64-boottime: syscall: 4132 nsec/call
clock-gettime64-boottime:libc: not tested
clock-gettime64-boottime:vdso: 975 nsec/call
clock-getres-boottime: syscall: 2497 nsec/call
clock-getres-boottime:libc: 730 nsec/call
clock-getres-boottime:vdso: 546 nsec/call
clock-gettime-boottime: syscall: 3728 nsec/call
clock-gettime-boottime:libc: 1079 nsec/call
clock-gettime-boottime:vdso: 941 nsec/call
clock-gettime64-tai: syscall: 4148 nsec/call
clock-gettime64-tai:libc: not tested
clock-gettime64-tai:vdso: 955 nsec/call
clock-getres-tai: syscall: 2494 nsec/call
clock-getres-tai:libc: 730 nsec/call
clock-getres-tai:vdso: 545 nsec/call
clock-gettime-tai: syscall: 3729 nsec/call
clock-gettime-tai:libc: 1079 nsec/call
clock-gettime-tai:vdso: 927 nsec/call
clock-gettime64-monotonic-raw: syscall: 3677 nsec/call
clock-gettime64-monotonic-raw:libc: not tested
clock-gettime64-monotonic-raw:vdso: 1032 nsec/call
clock-getres-monotonic-raw: syscall: 2494 nsec/call
clock-getres-monotonic-raw:libc: 745 nsec/call
clock-getres-monotonic-raw:vdso: 545 nsec/call
clock-gettime-monotonic-raw: syscall: 3276 nsec/call
clock-gettime-monotonic-raw:libc: 1140 nsec/call
clock-gettime-monotonic-raw:vdso: 1002 nsec/call
clock-gettime64-monotonic-coarse: syscall: 4099 nsec/call
clock-gettime64-monotonic-coarse:libc: not tested
clock-gettime64-monotonic-coarse:vdso: 653 nsec/call
clock-getres-monotonic-coarse: syscall: 2962 nsec/call
clock-getres-monotonic-coarse:libc: 745 nsec/call
clock-getres-monotonic-coarse:vdso: 545 nsec/call
clock-gettime-monotonic-coarse: syscall: 4297 nsec/call
clock-gettime-monotonic-coarse:libc: 730 nsec/call
clock-gettime-monotonic-coarse:vdso: 592 nsec/call
clock-gettime64-monotonic: syscall: 3863 nsec/call
clock-gettime64-monotonic:libc: not tested
clock-gettime64-monotonic:vdso: 975 nsec/call
clock-getres-monotonic: syscall: 2494 nsec/call
clock-getres-monotonic:libc: 745 nsec/call
clock-getres-monotonic:vdso: 545 nsec/call
clock-gettime-monotonic: syscall: 3465 nsec/call
clock-gettime-monotonic:libc: 1079 nsec/call
clock-gettime-monotonic:vdso: 927 nsec/call

Christophe

[PATCH v2 4/9] drivers/ps3: Remove duplicate error messages

2020-05-09 Thread Geoff Levand

From: Markus Elfring 

Remove duplicate memory allocation failure error messages.

Signed-off-by: Markus Elfring 
Signed-off-by: Geoff Levand 
---
 drivers/ps3/ps3-lpm.c   | 2 --
 drivers/ps3/ps3-vuart.c | 1 -
 2 files changed, 3 deletions(-)

diff --git a/drivers/ps3/ps3-lpm.c b/drivers/ps3/ps3-lpm.c
index 83c45659bc9d..fcc346296e3a 100644
--- a/drivers/ps3/ps3-lpm.c
+++ b/drivers/ps3/ps3-lpm.c
@@ -,8 +,6 @@ int ps3_lpm_open(enum ps3_lpm_tb_type tb_type, void 
*tb_cache,
lpm_priv->tb_cache_internal = kzalloc(
lpm_priv->tb_cache_size + 127, GFP_KERNEL);
if (!lpm_priv->tb_cache_internal) {
-   dev_err(sbd_core(), "%s:%u: alloc internal tb_cache "
-   "failed\n", __func__, __LINE__);
result = -ENOMEM;
goto fail_malloc;
}
diff --git a/drivers/ps3/ps3-vuart.c b/drivers/ps3/ps3-vuart.c
index ddaa5ea5801a..3b5878edc6c2 100644
--- a/drivers/ps3/ps3-vuart.c
+++ b/drivers/ps3/ps3-vuart.c
@@ -917,7 +917,6 @@ static int ps3_vuart_bus_interrupt_get(void)
vuart_bus_priv.bmp = kzalloc(sizeof(struct ports_bmp), GFP_KERNEL);
 
if (!vuart_bus_priv.bmp) {
-   pr_debug("%s:%d: kzalloc failed.\n", __func__, __LINE__);
result = -ENOMEM;
goto fail_bmp_malloc;
}
-- 
2.20.1

[PATCH v2 5/9] net/ps3_gelic_net: Remove duplicate error message

2020-05-09 Thread Geoff Levand

From: Markus Elfring 

Remove an extra message for a memory allocation failure in
function gelic_descr_prepare_rx().

Signed-off-by: Markus Elfring 
Signed-off-by: Geoff Levand 
---
 drivers/net/ethernet/toshiba/ps3_gelic_net.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/toshiba/ps3_gelic_net.c 
b/drivers/net/ethernet/toshiba/ps3_gelic_net.c
index 070dd6fa9401..8522f3898e0d 100644
--- a/drivers/net/ethernet/toshiba/ps3_gelic_net.c
+++ b/drivers/net/ethernet/toshiba/ps3_gelic_net.c
@@ -382,8 +382,6 @@ static int gelic_descr_prepare_rx(struct gelic_card *card,
descr->skb = dev_alloc_skb(bufsize + GELIC_NET_RXBUF_ALIGN - 1);
if (!descr->skb) {
descr->buf_addr = 0; /* tell DMAC don't touch memory */
-   dev_info(ctodev(card),
-"%s:allocate skb failed !!\n", __func__);
return -ENOMEM;
}
descr->buf_size = cpu_to_be32(bufsize);
-- 
2.20.1

[PATCH v2 6/9] ps3disk: use the default segment boundary

2020-05-09 Thread Geoff Levand

From: Emmanuel Nicolet 

Hi,
since commit dcebd755926b ("block: use bio_for_each_bvec() to compute
multi-page bvec count"), the kernel will bug_on on the PS3 because
bio_split() is called with sectors == 0:

kernel BUG at block/bio.c:1853!
Oops: Exception in kernel mode, sig: 5 [#1]
BE PAGE_SIZE=4K MMU=Hash PREEMPT SMP NR_CPUS=8 NUMA PS3
Modules linked in: firewire_sbp2 rtc_ps3(+) soundcore ps3_gelic(+) \
ps3rom(+) firewire_core ps3vram(+) usb_common crc_itu_t
CPU: 0 PID: 97 Comm: blkid Not tainted 5.3.0-rc4 #1
NIP:  c027d0d0 LR: c027d0b0 CTR: 
REGS: c135ae90 TRAP: 0700   Not tainted  (5.3.0-rc4)
MSR:  80028032   CR: 44008240  XER: 2000
IRQMASK: 0
GPR00: c0289368 c135b120 c084a500 c4ff8300
GPR04: 0c00 c4c905e0 c4c905e0 
GPR08:  0001  
GPR12:  c08ef000 003e 00080001
GPR16: 0100   0004
GPR20: c062fd7e 0001  0080
GPR24: c0781788 c135b350 0080 c4c905e0
GPR28: c135b348 c4ff8300  c4c9
NIP [c027d0d0] .bio_split+0x28/0xac
LR [c027d0b0] .bio_split+0x8/0xac
Call Trace:
[c135b120] [c027d130] .bio_split+0x88/0xac (unreliable)
[c135b1b0] [c0289368] .__blk_queue_split+0x11c/0x53c
[c135b2d0] [c028f614] .blk_mq_make_request+0x80/0x7d4
[c135b3d0] [c0283a8c] .generic_make_request+0x118/0x294
[c135b4b0] [c0283d34] .submit_bio+0x12c/0x174
[c135b580] [c0205a44] .mpage_bio_submit+0x3c/0x4c
[c135b600] [c0206184] .mpage_readpages+0xa4/0x184
[c135b750] [c01ff8fc] .blkdev_readpages+0x24/0x38
[c135b7c0] [c01589f0] .read_pages+0x6c/0x1a8
[c135b8b0] [c0158c74] .__do_page_cache_readahead+0x118/0x184
[c135b9b0] [c01591a8] .force_page_cache_readahead+0xe4/0xe8
[c135ba50] [c014fc24] .generic_file_read_iter+0x1d8/0x830
[c135bb50] [c01ffadc] .blkdev_read_iter+0x40/0x5c
[c135bbc0] [c01b9e00] .new_sync_read+0x144/0x1a0
[c135bcd0] [c01bc454] .vfs_read+0xa0/0x124
[c135bd70] [c01bc7a4] .ksys_read+0x70/0xd8
[c135be20] [c000a524] system_call+0x5c/0x70
Instruction dump:
7fe3fb78 482e30dc 7c0802a6 482e3085 7c9e2378 f821ff71 7ca42b78 7d3e00d0
7c7d1b78 79290fe0 7cc53378 69290001 <0b09> 81230028 7bca0020 7929ba62
[ end trace 313fec760f30aa1f ]---

The problem originates from setting the segment boundary of the request
queue to -1UL. This makes get_max_segment_size() return zero when offset
is zero, whatever the max segment size. The test with BLK_SEG_BOUNDARY_MASK
fails and 'mask - (mask & offset) + 1' overflows to zero in the return
statement.

Not setting the segment boundary and using the default value
(BLK_SEG_BOUNDARY_MASK) fixes the problem.
Maybe BLK_SEG_BOUNDARY_MASK should be set to -1UL? It's currently set to
only 0xUL. I don't know if that would break anything.

Signed-off-by: Emmanuel Nicolet 
Signed-off-by: Geoff Levand 
---
 drivers/block/ps3disk.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/block/ps3disk.c b/drivers/block/ps3disk.c
index c5c6487a19d5..7b55811c2a81 100644
--- a/drivers/block/ps3disk.c
+++ b/drivers/block/ps3disk.c
@@ -454,7 +454,6 @@ static int ps3disk_probe(struct ps3_system_bus_device *_dev)
queue->queuedata = dev;
 
blk_queue_max_hw_sectors(queue, dev->bounce_size >> 9);
-   blk_queue_segment_boundary(queue, -1UL);
blk_queue_dma_alignment(queue, dev->blk_size-1);
blk_queue_logical_block_size(queue, dev->blk_size);
 
-- 
2.20.1

[PATCH v2 3/9] powerpc/head_check: Avoid broken pipe

2020-05-09 Thread Geoff Levand

Remove the '-m4' option to grep to allow grep to process all of nm's
output.  This avoids the nm warning:

  nm terminated with signal 13 [Broken pipe]

Signed-off-by: Geoff Levand 
---
 arch/powerpc/tools/head_check.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/tools/head_check.sh b/arch/powerpc/tools/head_check.sh
index 37061fb9b58e..e32d3162e5ed 100644
--- a/arch/powerpc/tools/head_check.sh
+++ b/arch/powerpc/tools/head_check.sh
@@ -46,7 +46,7 @@ nm="$1"
 vmlinux="$2"
 
 # gcc-4.6-era toolchain make _stext an A (absolute) symbol rather than T
-$nm "$vmlinux" | grep -e " [TA] _stext$" -e " t start_first_256B$" -e " a 
text_start$" -e " t start_text$" -m4 > .tmp_symbols.txt
+$nm "$vmlinux" | grep -e " [TA] _stext$" -e " t start_first_256B$" -e " a 
text_start$" -e " t start_text$" > .tmp_symbols.txt
 
 
 vma=$(cat .tmp_symbols.txt | grep -e " [TA] _stext$" | cut -d' ' -f1)
-- 
2.20.1

[PATCH v2 2/9] powerpc/wrapper: Output linker map file

2020-05-09 Thread Geoff Levand

To aid debugging wrapper troubles, output a linker map file
'wrapper.map' when the build is verbose.

Signed-off-by: Geoff Levand 
---
 arch/powerpc/boot/wrapper | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/boot/wrapper b/arch/powerpc/boot/wrapper
index ed6266367bc0..35ace40d9fc2 100755
--- a/arch/powerpc/boot/wrapper
+++ b/arch/powerpc/boot/wrapper
@@ -29,6 +29,7 @@ set -e
 # Allow for verbose output
 if [ "$V" = 1 ]; then
 set -x
+map="-Map wrapper.map"
 fi
 
 # defaults
@@ -500,7 +501,7 @@ if [ "$platform" != "miboot" ]; then
 text_start="-Ttext $link_address"
 fi
 #link everything
-${CROSS}ld -m $format -T $lds $text_start $pie $nodl -o "$ofile" \
+${CROSS}ld -m $format -T $lds $text_start $pie $nodl -o "$ofile" $map \
$platformo $tmp $object/wrapper.a
 rm $tmp
 fi
-- 
2.20.1

[PATCH v2 8/9] powerpc/ps3: Fix kexec shutdown hang

2020-05-09 Thread Geoff Levand

The ps3_mm_region_destroy() and ps3_mm_vas_destroy() routines
are called very late in the shutdown via kexec's mmu_cleanup_all
routine.  By the time mmu_cleanup_all runs it is too late to use
udbg_printf, and calling it will cause PS3 systems to hang.

Remove all debugging statements from ps3_mm_region_destroy() and
ps3_mm_vas_destroy() and replace any error reporting with calls
to lv1_panic.

With this change builds with 'DEBUG' defined will not cause kexec
reboots to hang, and builds with 'DEBUG' defined or not will end
in lv1_panic if an error is encountered.

Signed-off-by: Geoff Levand 
---
 arch/powerpc/platforms/ps3/mm.c | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/platforms/ps3/mm.c b/arch/powerpc/platforms/ps3/mm.c
index 423be34f0f5f..f42fe4e86ce5 100644
--- a/arch/powerpc/platforms/ps3/mm.c
+++ b/arch/powerpc/platforms/ps3/mm.c
@@ -200,13 +200,14 @@ void ps3_mm_vas_destroy(void)
 {
int result;
 
-   DBG("%s:%d: map.vas_id= %llu\n", __func__, __LINE__, map.vas_id);
-
if (map.vas_id) {
result = lv1_select_virtual_address_space(0);
-   BUG_ON(result);
-   result = lv1_destruct_virtual_address_space(map.vas_id);
-   BUG_ON(result);
+   result += lv1_destruct_virtual_address_space(map.vas_id);
+
+   if (result) {
+   lv1_panic(0);
+   }
+
map.vas_id = 0;
}
 }
@@ -304,19 +305,20 @@ static void ps3_mm_region_destroy(struct mem_region *r)
int result;
 
if (!r->destroy) {
-   pr_info("%s:%d: Not destroying high region: %llxh %llxh\n",
-   __func__, __LINE__, r->base, r->size);
return;
}
 
-   DBG("%s:%d: r->base = %llxh\n", __func__, __LINE__, r->base);
-
if (r->base) {
result = lv1_release_memory(r->base);
-   BUG_ON(result);
+
+   if (result) {
+   lv1_panic(0);
+   }
+
r->size = r->base = r->offset = 0;
map.total = map.rm.size;
}
+
ps3_mm_set_repository_highmem(NULL);
 }
 
-- 
2.20.1

[PATCH v2 9/9] hvc_console: Allow backends to set I/O buffer size

2020-05-09 Thread Geoff Levand

To allow HVC backends to set the I/O buffer sizes to values
that are most efficient for the backend, change the macro
definitions where the buffer sizes are set to be conditional
on whether or not the macros are already defined.  Also,
rename the macros from N_OUTBUF to HVC_N_OUBUF and from
N_INBUF to HVC_N_INBUF.

Typical usage in the backend source file would be:

Signed-off-by: Geoff Levand 
---
 drivers/tty/hvc/hvc_console.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c
index 436cc51c92c3..2928bad057fc 100644
--- a/drivers/tty/hvc/hvc_console.c
+++ b/drivers/tty/hvc/hvc_console.c
@@ -42,12 +42,15 @@
 #define HVC_CLOSE_WAIT (HZ/100) /* 1/10 of a second */
 
 /*
- * These sizes are most efficient for vio, because they are the
- * native transfer size. We could make them selectable in the
- * future to better deal with backends that want other buffer sizes.
+ * These default sizes are most efficient for vio, because they are
+ * the native transfer size.
  */
-#define N_OUTBUF   16
-#define N_INBUF16
+#if !defined(HVC_N_OUTBUF)
+# define HVC_N_OUTBUF  16
+#endif
+#if !defined(HVC_N_INBUF)
+# define HVC_N_INBUF   16
+#endif
 
 #define __ALIGNED__ __attribute__((__aligned__(sizeof(long
 
@@ -151,7 +154,7 @@ static uint32_t vtermnos[MAX_NR_HVC_CONSOLES] =
 static void hvc_console_print(struct console *co, const char *b,
  unsigned count)
 {
-   char c[N_OUTBUF] __ALIGNED__;
+   char c[HVC_N_OUTBUF] __ALIGNED__;
unsigned i = 0, n = 0;
int r, donecr = 0, index = co->index;
 
@@ -640,7 +643,7 @@ static int __hvc_poll(struct hvc_struct *hp, bool may_sleep)
 {
struct tty_struct *tty;
int i, n, count, poll_mask = 0;
-   char buf[N_INBUF] __ALIGNED__;
+   char buf[HVC_N_INBUF] __ALIGNED__;
unsigned long flags;
int read_total = 0;
int written_total = 0;
@@ -681,7 +684,7 @@ static int __hvc_poll(struct hvc_struct *hp, bool may_sleep)
 
  read_again:
/* Read data if any */
-   count = tty_buffer_request_room(&hp->port, N_INBUF);
+   count = tty_buffer_request_room(&hp->port, HVC_N_INBUF);
 
/* If flip is full, just reschedule a later read */
if (count == 0) {
-- 
2.20.1

[PATCH v2 1/9] powerpc/head_check: Automatic verbosity

2020-05-09 Thread Geoff Levand

To aid debugging build problems turn on shell tracing for the
head_check script when the build is verbose.

Signed-off-by: Geoff Levand 
---
 arch/powerpc/tools/head_check.sh | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/tools/head_check.sh b/arch/powerpc/tools/head_check.sh
index ad9e57209aa4..37061fb9b58e 100644
--- a/arch/powerpc/tools/head_check.sh
+++ b/arch/powerpc/tools/head_check.sh
@@ -31,8 +31,10 @@
 # level entry code (boot, interrupt vectors, etc) until r2 is set up. This
 # could cause the kernel to die in early boot.
 
-# Turn this on if you want more debug output:
-# set -x
+# Allow for verbose output
+if [ "$V" = "1" ]; then
+   set -x
+fi
 
 if [ $# -lt 2 ]; then
echo "$0 [path to nm] [path to vmlinux]" 1>&2
-- 
2.20.1

[PATCH v2 0/9] powerpc + ps3 patches

2020-05-09 Thread Geoff Levand

Hi Michael,

This is a combined V2 of the two patch sets I sent out on March 27th,
'PS3 patches for v5.7' and 'powerpc: Minor updates to improve build debugging'.

I've dropped these two patches that were in my 'PS3 patches for v5.7' set:

  powerpc/ps3: Add lv1_panic
  powerpc/ps3: Add udbg_panic

and replaced them with a new patch:

  powerpc/ps3: Fix kexec shutdown hang

There is new patch I've added in this set:

  hvc_console: Allow backends to set I/O buffer size

which I've been using in my debugging.  There is nothing using this feature in
the upstream kernel, and I don't plan to submit anything that would use it, so
use your discretion as to merge it.

-Geoff

The following changes since commit 0e698dfa282211e414076f9dc7e83c1c288314fd:

  Linux 5.7-rc4 (2020-05-03 14:56:04 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/geoff/ps3-linux.git 
for-merge-powerpc-v2

for you to fetch changes up to 6f6294df663a53f47bb28abcbb1ef756c6a59922:

  hvc_console: Allow backends to set I/O buffer size (2020-05-09 11:24:42 -0700)


Emmanuel Nicolet (1):
  ps3disk: use the default segment boundary

Geoff Levand (6):
  powerpc/head_check: Automatic verbosity
  powerpc/wrapper: Output linker map file
  powerpc/head_check: Avoid broken pipe
  powerpc/ps3: Add check for otheros image size
  powerpc/ps3: Fix kexec shutdown hang
  hvc_console: Allow backends to set I/O buffer size

Markus Elfring (2):
  drivers/ps3: Remove duplicate error messages
  net/ps3_gelic_net: Remove duplicate error message

 arch/powerpc/boot/wrapper| 20 +---
 arch/powerpc/platforms/ps3/mm.c  | 22 --
 arch/powerpc/tools/head_check.sh |  8 +---
 drivers/block/ps3disk.c  |  1 -
 drivers/net/ethernet/toshiba/ps3_gelic_net.c |  2 --
 drivers/ps3/ps3-lpm.c|  2 --
 drivers/ps3/ps3-vuart.c  |  1 -
 drivers/tty/hvc/hvc_console.c| 19 +++
 8 files changed, 45 insertions(+), 30 deletions(-)

-- 
2.20.1

[PATCH v2 7/9] powerpc/ps3: Add check for otheros image size

2020-05-09 Thread Geoff Levand

The ps3's otheros flash loader has a size limit of 16 MiB for the
uncompressed image.  If that limit will be reached output the
flash image file as 'otheros-too-big.bld'.

Signed-off-by: Geoff Levand 
---
 arch/powerpc/boot/wrapper | 17 +++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/boot/wrapper b/arch/powerpc/boot/wrapper
index 35ace40d9fc2..ab1e3ddc79f3 100755
--- a/arch/powerpc/boot/wrapper
+++ b/arch/powerpc/boot/wrapper
@@ -571,7 +571,20 @@ ps3)
 count=$overlay_size bs=1
 
 odir="$(dirname "$ofile.bin")"
-rm -f "$odir/otheros.bld"
-gzip -n --force -9 --stdout "$ofile.bin" > "$odir/otheros.bld"
+
+# The ps3's flash loader has a size limit of 16 MiB for the uncompressed
+# image.  If a compressed image that exceeded this limit is written to
+# flash the loader will decompress that image until the 16 MiB limit is
+# reached, then enter the system reset vector of the partially decompressed
+# image.  No warning is issued.
+rm -f "$odir"/{otheros,otheros-too-big}.bld
+size=$(${CROSS}nm --no-sort --radix=d "$ofile" | egrep ' _end$' | cut -d' 
' -f1)
+bld="otheros.bld"
+if [ $size -gt $((0x100)) ]; then
+bld="otheros-too-big.bld"
+echo "  INFO: Uncompressed kernel is too large to program into PS3 
flash memory;" \
+"size=0x$(printf "%x\n" $size), limit=0x100."
+fi
+gzip -n --force -9 --stdout "$ofile.bin" > "$odir/$bld"
 ;;
 esac
-- 
2.20.1

Re: ioremap() called early from pnv_pci_init_ioda_phb()

2020-05-09 Thread Oliver O'Halloran

On Sun, May 10, 2020 at 1:51 AM Christophe Leroy
 wrote:
>
>
>
> Le 08/05/2020 à 19:41, Qian Cai a écrit :
> >
> >
> >> On May 8, 2020, at 10:39 AM, Qian Cai  wrote:
> >>
> >> Booting POWER9 PowerNV has this message,
> >>
> >> "ioremap() called early from pnv_pci_init_ioda_phb+0x420/0xdfc. Use 
> >> early_ioremap() instead”
> >>
> >> but use the patch below will result in leaks because it will never call 
> >> early_iounmap() anywhere. However, it looks me it was by design that 
> >> phb->regs mapping would be there forever where it would be used in 
> >> pnv_ioda_get_inval_reg(), so is just that check_early_ioremap_leak() 
> >> initcall too strong?
> >>
> >> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> >> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> >> @@ -36,6 +36,7 @@
> >> #include 
> >> #include 
> >> #include 
> >> +#include 
> >>
> >> #include 
> >>
> >> @@ -3827,7 +3828,7 @@ static void __init pnv_pci_init_ioda_phb(struct 
> >> device_node *np,
> >> /* Get registers */
> >> if (!of_address_to_resource(np, 0, &r)) {
> >> phb->regs_phys = r.start;
> >> -   phb->regs = ioremap(r.start, resource_size(&r));
> >> +   phb->regs = early_ioremap(r.start, resource_size(&r));
> >> if (phb->regs == NULL)
> >> pr_err("  Failed to map registers !\n”);
> >
> > This will also trigger a panic with debugfs reads, so isn’t that this 
> > commit bogus at least for powerpc64?
> >
> > d538aadc2718 (“powerpc/ioremap: warn on early use of ioremap()")
>
> No d538aadc2718 is not bogus. That's the point, we want to remove all
> early usages of ioremap() in order to remove the hack with the
> ioremap_bot stuff and all, and stick to the generic ioremap logic.
>
> In order to do so, all early use of ioremap() has to be converted to
> early_ioremap() or to fixmap or anything else that allows to do ioremaps
> before the slab is ready.
>
> early_ioremap() is for temporary mappings necessary at boottime. For
> long lasting mappings, another method is to be used.
>
> Now, the point is that other architectures like for instance x86 don't
> seem to have to use early_ioremap() much. Powerpc is for instance doing
> early mappings for PCI. Seems like x86 initialises PCI once slab is
> ready. Can't powerpc do the same ?

Patches welcome.

Re: [PATCH] tty: hvc: Fix data abort due to race in hvc_open

2020-05-09 Thread rananta


On 2020-05-06 02:48, Greg KH wrote:

On Mon, Apr 27, 2020 at 08:26:01PM -0700, Raghavendra Rao Ananta wrote:

Potentially, hvc_open() can be called in parallel when two tasks calls
open() on /dev/hvcX. In such a scenario, if the 
hp->ops->notifier_add()

callback in the function fails, where it sets the tty->driver_data to
NULL, the parallel hvc_open() can see this NULL and cause a memory 
abort.
Hence, serialize hvc_open and check if tty->private_data is NULL 
before

proceeding ahead.

The issue can be easily reproduced by launching two tasks 
simultaneously

that does nothing but open() and close() on /dev/hvcX.
For example:
$ ./simple_open_close /dev/hvc0 & ./simple_open_close /dev/hvc0 &

Signed-off-by: Raghavendra Rao Ananta 
---
 drivers/tty/hvc/hvc_console.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/hvc/hvc_console.c 
b/drivers/tty/hvc/hvc_console.c

index 436cc51c92c3..ebe26fe5ac09 100644
--- a/drivers/tty/hvc/hvc_console.c
+++ b/drivers/tty/hvc/hvc_console.c
@@ -75,6 +75,8 @@ static LIST_HEAD(hvc_structs);
  */
 static DEFINE_MUTEX(hvc_structs_mutex);

+/* Mutex to serialize hvc_open */
+static DEFINE_MUTEX(hvc_open_mutex);
 /*
  * This value is used to assign a tty->index value to a hvc_struct 
based
  * upon order of exposure via hvc_probe(), when we can not match it 
to
@@ -346,16 +348,24 @@ static int hvc_install(struct tty_driver 
*driver, struct tty_struct *tty)

  */
 static int hvc_open(struct tty_struct *tty, struct file * filp)
 {
-   struct hvc_struct *hp = tty->driver_data;
+   struct hvc_struct *hp;
unsigned long flags;
int rc = 0;

+   mutex_lock(&hvc_open_mutex);
+
+   hp = tty->driver_data;
+   if (!hp) {
+   rc = -EIO;
+   goto out;
+   }
+
spin_lock_irqsave(&hp->port.lock, flags);
/* Check and then increment for fast path open. */
if (hp->port.count++ > 0) {
spin_unlock_irqrestore(&hp->port.lock, flags);
hvc_kick();
-   return 0;
+   goto out;
} /* else count == 0 */
spin_unlock_irqrestore(&hp->port.lock, flags);


Wait, why isn't this driver just calling tty_port_open() instead of
trying to open-code all of this?

Keeping a single mutext for open will not protect it from close, it 
will
just slow things down a bit.  There should already be a tty lock held 
by

the tty core for open() to keep it from racing things, right?
The tty lock should have been held, but not likely across ->install() 
and
->open() callbacks, thus resulting in a race between hvc_install() and 
hvc_open(),
where hvc_install() sets a data and the hvc_open() clears it. hvc_open() 
doesn't

check if the data was set to NULL and proceeds.


Try just removing all of this logic and replacing it with a call to
tty_port_open() and see if that fixes this issue.

As "proof" of this, I don't see other serial drivers needing a single
mutex for their open calls, do you?

thanks,

greg k-h


Thank you.
Raghavendra

[Bug 207359] MegaRAID SAS 9361 controller hang/reset

2020-05-09 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=207359

--- Comment #3 from Cameron (c...@neo-zeon.de) ---
Created attachment 289041
  --> https://bugzilla.kernel.org/attachment.cgi?id=289041&action=edit
5.6.11 megaraid POWER hang

Still happens with 5.6.11. There seems to be potentially a bit more output this
time, and I've included output from shutting down too in case it's useful.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

[PATCH] powerpc/powernv/pci: fix a RCU-list lock

2020-05-09 Thread Qian Cai

It is unsafe to traverse tbl->it_group_list without the RCU read lock.

 WARNING: suspicious RCU usage
 5.7.0-rc4-next-20200508 #1 Not tainted
 -
 arch/powerpc/platforms/powernv/pci-ioda-tce.c:355 RCU-list traversed in 
non-reader section!!

 other info that might help us debug this:

 rcu_scheduler_active = 2, debug_locks = 1
 3 locks held by qemu-kvm/4305:
  #0: c00bc3fe6988 (&container->group_lock){}-{3:3}, at: 
vfio_fops_unl_ioctl+0x108/0x410 [vfio]
  #1: c0080fcc7400 (&vfio.iommu_drivers_lock){+.+.}-{3:3}, at: 
vfio_fops_unl_ioctl+0x148/0x410 [vfio]
  #2: c00bc3fe4d68 (&container->lock){+.+.}-{3:3}, at: 
tce_iommu_attach_group+0x3c/0x4f0 [vfio_iommu_spapr_tce]

 stack backtrace:
 CPU: 4 PID: 4305 Comm: qemu-kvm Not tainted 5.7.0-rc4-next-20200508 #1
 Call Trace:
 [c010f29afa60] [c07154c8] dump_stack+0xfc/0x174 (unreliable)
 [c010f29afab0] [c01d8ff0] lockdep_rcu_suspicious+0x140/0x164
 [c010f29afb30] [c00dae2c] 
pnv_pci_unlink_table_and_group+0x11c/0x200
 [c010f29afb70] [c00d4a34] pnv_pci_ioda2_unset_window+0xc4/0x190
 [c010f29afbf0] [c00d4b4c] pnv_ioda2_take_ownership+0x4c/0xd0
 [c010f29afc30] [c0080fd60ee0] tce_iommu_attach_group+0x2c8/0x4f0 
[vfio_iommu_spapr_tce]
 [c010f29afcd0] [c0080fcc11a0] vfio_fops_unl_ioctl+0x238/0x410 [vfio]
 [c010f29afd50] [c05430a8] ksys_ioctl+0xd8/0x130
 [c010f29afda0] [c0543128] sys_ioctl+0x28/0x40
 [c010f29afdc0] [c0038af4] system_call_exception+0x114/0x1e0
 [c010f29afe20] [c000c8f0] system_call_common+0xf0/0x278

Signed-off-by: Qian Cai 
---
 arch/powerpc/platforms/powernv/pci-ioda-tce.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda-tce.c 
b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
index 5dc6847d5f4c..6be9cf292b4e 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda-tce.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
@@ -352,6 +352,8 @@ void pnv_pci_unlink_table_and_group(struct iommu_table *tbl,
 
/* Remove link to a group from table's list of attached groups */
found = false;
+
+   rcu_read_lock();
list_for_each_entry_rcu(tgl, &tbl->it_group_list, next) {
if (tgl->table_group == table_group) {
list_del_rcu(&tgl->next);
@@ -360,6 +362,8 @@ void pnv_pci_unlink_table_and_group(struct iommu_table *tbl,
break;
}
}
+   rcu_read_unlock();
+
if (WARN_ON(!found))
return;
 
-- 
2.21.0 (Apple Git-122.2)

[PATCH] powerpc/mm/book3s64/iommu: fix some RCU-list locks

2020-05-09 Thread Qian Cai

It is safe to traverse mm->context.iommu_group_mem_list with either
mem_list_mutex or the RCU read lock held. Silence a few RCU-list false
positive warnings and fix a few missing RCU read locks.

 arch/powerpc/mm/book3s64/iommu_api.c:330 RCU-list traversed in non-reader 
section!!

 other info that might help us debug this:

 rcu_scheduler_active = 2, debug_locks = 1
 2 locks held by qemu-kvm/4305:
  #0: c00bc3fe4d68 (&container->lock){+.+.}-{3:3}, at: 
tce_iommu_ioctl.part.9+0xc7c/0x1870 [vfio_iommu_spapr_tce]
  #1: c1501910 (mem_list_mutex){+.+.}-{3:3}, at: mm_iommu_get+0x50/0x190

 
 arch/powerpc/mm/book3s64/iommu_api.c:132 RCU-list traversed in non-reader 
section!!

 other info that might help us debug this:

 rcu_scheduler_active = 2, debug_locks = 1
 2 locks held by qemu-kvm/4305:
  #0: c00bc3fe4d68 (&container->lock){+.+.}-{3:3}, at: 
tce_iommu_ioctl.part.9+0xc7c/0x1870 [vfio_iommu_spapr_tce]
  #1: c1501910 (mem_list_mutex){+.+.}-{3:3}, at: 
mm_iommu_do_alloc+0x120/0x5f0

 
 arch/powerpc/mm/book3s64/iommu_api.c:292 RCU-list traversed in non-reader 
section!!

 other info that might help us debug this:

 rcu_scheduler_active = 2, debug_locks = 1
 2 locks held by qemu-kvm/4312:
  #0: c00ecafe23c8 (&vcpu->mutex){+.+.}-{3:3}, at: 
kvm_vcpu_ioctl+0xdc/0x950 [kvm]
  #1: c00045e6c468 (&kvm->srcu){}-{0:0}, at: 
kvmppc_h_put_tce+0x88/0x340 [kvm]

 
 arch/powerpc/mm/book3s64/iommu_api.c:424 RCU-list traversed in non-reader 
section!!

 other info that might help us debug this:

 rcu_scheduler_active = 2, debug_locks = 1
 2 locks held by qemu-kvm/4312:
  #0: c00ecafe23c8 (&vcpu->mutex){+.+.}-{3:3}, at: 
kvm_vcpu_ioctl+0xdc/0x950 [kvm]
  #1: c00045e6c468 (&kvm->srcu){}-{0:0}, at: 
kvmppc_h_put_tce+0x88/0x340 [kvm]

Signed-off-by: Qian Cai 
---
 arch/powerpc/mm/book3s64/iommu_api.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/iommu_api.c 
b/arch/powerpc/mm/book3s64/iommu_api.c
index fa05bbd1f682..bf0108b6f445 100644
--- a/arch/powerpc/mm/book3s64/iommu_api.c
+++ b/arch/powerpc/mm/book3s64/iommu_api.c
@@ -129,7 +129,8 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, 
unsigned long ua,
 
mutex_lock(&mem_list_mutex);
 
-   list_for_each_entry_rcu(mem2, &mm->context.iommu_group_mem_list, next) {
+   list_for_each_entry_rcu(mem2, &mm->context.iommu_group_mem_list, next,
+   lockdep_is_held(&mem_list_mutex)) {
/* Overlap? */
if ((mem2->ua < (ua + (entries << PAGE_SHIFT))) &&
(ua < (mem2->ua +
@@ -289,6 +290,7 @@ struct mm_iommu_table_group_mem_t *mm_iommu_lookup(struct 
mm_struct *mm,
 {
struct mm_iommu_table_group_mem_t *mem, *ret = NULL;
 
+   rcu_read_lock();
list_for_each_entry_rcu(mem, &mm->context.iommu_group_mem_list, next) {
if ((mem->ua <= ua) &&
(ua + size <= mem->ua +
@@ -297,6 +299,7 @@ struct mm_iommu_table_group_mem_t *mm_iommu_lookup(struct 
mm_struct *mm,
break;
}
}
+   rcu_read_unlock();
 
return ret;
 }
@@ -327,7 +330,8 @@ struct mm_iommu_table_group_mem_t *mm_iommu_get(struct 
mm_struct *mm,
 
mutex_lock(&mem_list_mutex);
 
-   list_for_each_entry_rcu(mem, &mm->context.iommu_group_mem_list, next) {
+   list_for_each_entry_rcu(mem, &mm->context.iommu_group_mem_list, next,
+   lockdep_is_held(&mem_list_mutex)) {
if ((mem->ua == ua) && (mem->entries == entries)) {
ret = mem;
++mem->used;
@@ -421,6 +425,7 @@ bool mm_iommu_is_devmem(struct mm_struct *mm, unsigned long 
hpa,
struct mm_iommu_table_group_mem_t *mem;
unsigned long end;
 
+   rcu_read_lock();
list_for_each_entry_rcu(mem, &mm->context.iommu_group_mem_list, next) {
if (mem->dev_hpa == MM_IOMMU_TABLE_INVALID_HPA)
continue;
@@ -437,6 +442,7 @@ bool mm_iommu_is_devmem(struct mm_struct *mm, unsigned long 
hpa,
return true;
}
}
+   rcu_read_unlock();
 
return false;
 }
-- 
2.21.0 (Apple Git-122.2)

[PATCH] powerpc/kvm/book3s64/vio: fix some RCU-list locks

2020-05-09 Thread Qian Cai

It is unsafe to traverse kvm->arch.spapr_tce_tables and
stt->iommu_tables without the RCU read lock held. Also, add
cond_resched_rcu() in places with the RCU read lock held that could take
a while to finish.

 arch/powerpc/kvm/book3s_64_vio.c:76 RCU-list traversed in non-reader section!!

 other info that might help us debug this:

 rcu_scheduler_active = 2, debug_locks = 1
 no locks held by qemu-kvm/4265.

 stack backtrace:
 CPU: 96 PID: 4265 Comm: qemu-kvm Not tainted 5.7.0-rc4-next-20200508+ #2
 Call Trace:
 [c000201a8690f720] [c0715948] dump_stack+0xfc/0x174 (unreliable)
 [c000201a8690f770] [c01d9470] lockdep_rcu_suspicious+0x140/0x164
 [c000201a8690f7f0] [c00810b9fb48] 
kvm_spapr_tce_release_iommu_group+0x1f0/0x220 [kvm]
 [c000201a8690f870] [c00810b8462c] 
kvm_spapr_tce_release_vfio_group+0x54/0xb0 [kvm]
 [c000201a8690f8a0] [c00810b84710] kvm_vfio_destroy+0x88/0x140 [kvm]
 [c000201a8690f8f0] [c00810b7d488] kvm_put_kvm+0x370/0x600 [kvm]
 [c000201a8690f990] [c00810b7e3c0] kvm_vm_release+0x38/0x60 [kvm]
 [c000201a8690f9c0] [c05223f4] __fput+0x124/0x330
 [c000201a8690fa20] [c0151cd8] task_work_run+0xb8/0x130
 [c000201a8690fa70] [c01197e8] do_exit+0x4e8/0xfa0
 [c000201a8690fb70] [c011a374] do_group_exit+0x64/0xd0
 [c000201a8690fbb0] [c0132c90] get_signal+0x1f0/0x1200
 [c000201a8690fcc0] [c0020690] do_notify_resume+0x130/0x3c0
 [c000201a8690fda0] [c0038d64] syscall_exit_prepare+0x1a4/0x280
 [c000201a8690fe20] [c000c8f8] system_call_common+0xf8/0x278

 
 arch/powerpc/kvm/book3s_64_vio.c:368 RCU-list traversed in non-reader section!!

 other info that might help us debug this:

 rcu_scheduler_active = 2, debug_locks = 1
 2 locks held by qemu-kvm/4264:
  #0: c000201ae2d000d8 (&vcpu->mutex){+.+.}-{3:3}, at: 
kvm_vcpu_ioctl+0xdc/0x950 [kvm]
  #1: c000200c9ed0c468 (&kvm->srcu){}-{0:0}, at: 
kvmppc_h_put_tce+0x88/0x340 [kvm]

 
 arch/powerpc/kvm/book3s_64_vio.c:108 RCU-list traversed in non-reader section!!

 other info that might help us debug this:

 rcu_scheduler_active = 2, debug_locks = 1
 1 lock held by qemu-kvm/4257:
  #0: c000200b1b363a40 (&kv->lock){+.+.}-{3:3}, at: 
kvm_vfio_set_attr+0x598/0x6c0 [kvm]

 
 arch/powerpc/kvm/book3s_64_vio.c:146 RCU-list traversed in non-reader section!!

 other info that might help us debug this:

 rcu_scheduler_active = 2, debug_locks = 1
 1 lock held by qemu-kvm/4257:
  #0: c000200b1b363a40 (&kv->lock){+.+.}-{3:3}, at: 
kvm_vfio_set_attr+0x598/0x6c0 [kvm]

Signed-off-by: Qian Cai 
---
 arch/powerpc/kvm/book3s_64_vio.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 50555ad1db93..4f5016bab723 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -73,6 +73,7 @@ extern void kvm_spapr_tce_release_iommu_group(struct kvm *kvm,
struct kvmppc_spapr_tce_iommu_table *stit, *tmp;
struct iommu_table_group *table_group = NULL;
 
+   rcu_read_lock();
list_for_each_entry_rcu(stt, &kvm->arch.spapr_tce_tables, list) {
 
table_group = iommu_group_get_iommudata(grp);
@@ -87,7 +88,9 @@ extern void kvm_spapr_tce_release_iommu_group(struct kvm *kvm,
kref_put(&stit->kref, kvm_spapr_tce_liobn_put);
}
}
+   cond_resched_rcu();
}
+   rcu_read_unlock();
 }
 
 extern long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd,
@@ -105,12 +108,14 @@ extern long kvm_spapr_tce_attach_iommu_group(struct kvm 
*kvm, int tablefd,
if (!f.file)
return -EBADF;
 
+   rcu_read_lock();
list_for_each_entry_rcu(stt, &kvm->arch.spapr_tce_tables, list) {
if (stt == f.file->private_data) {
found = true;
break;
}
}
+   rcu_read_unlock();
 
fdput(f);
 
@@ -143,6 +148,7 @@ extern long kvm_spapr_tce_attach_iommu_group(struct kvm 
*kvm, int tablefd,
if (!tbl)
return -EINVAL;
 
+   rcu_read_lock();
list_for_each_entry_rcu(stit, &stt->iommu_tables, next) {
if (tbl != stit->tbl)
continue;
@@ -150,14 +156,17 @@ extern long kvm_spapr_tce_attach_iommu_group(struct kvm 
*kvm, int tablefd,
if (!kref_get_unless_zero(&stit->kref)) {
/* stit is being destroyed */
iommu_tce_table_put(tbl);
+   rcu_read_unlock();
return -ENOTTY;
}
/*
 * The table is already known to this KVM, we just increased
 * its KVM reference counter and can return.
 */
+   rcu_read_unlock();
return 0;
}
+   rcu_read_unlock();

Re: [PATCH] tty: hvc: Fix data abort due to race in hvc_open

2020-05-09 Thread Greg KH

On Sat, May 09, 2020 at 06:30:56PM -0700, rana...@codeaurora.org wrote:
> On 2020-05-06 02:48, Greg KH wrote:
> > On Mon, Apr 27, 2020 at 08:26:01PM -0700, Raghavendra Rao Ananta wrote:
> > > Potentially, hvc_open() can be called in parallel when two tasks calls
> > > open() on /dev/hvcX. In such a scenario, if the
> > > hp->ops->notifier_add()
> > > callback in the function fails, where it sets the tty->driver_data to
> > > NULL, the parallel hvc_open() can see this NULL and cause a memory
> > > abort.
> > > Hence, serialize hvc_open and check if tty->private_data is NULL
> > > before
> > > proceeding ahead.
> > > 
> > > The issue can be easily reproduced by launching two tasks
> > > simultaneously
> > > that does nothing but open() and close() on /dev/hvcX.
> > > For example:
> > > $ ./simple_open_close /dev/hvc0 & ./simple_open_close /dev/hvc0 &
> > > 
> > > Signed-off-by: Raghavendra Rao Ananta 
> > > ---
> > >  drivers/tty/hvc/hvc_console.c | 16 ++--
> > >  1 file changed, 14 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/tty/hvc/hvc_console.c
> > > b/drivers/tty/hvc/hvc_console.c
> > > index 436cc51c92c3..ebe26fe5ac09 100644
> > > --- a/drivers/tty/hvc/hvc_console.c
> > > +++ b/drivers/tty/hvc/hvc_console.c
> > > @@ -75,6 +75,8 @@ static LIST_HEAD(hvc_structs);
> > >   */
> > >  static DEFINE_MUTEX(hvc_structs_mutex);
> > > 
> > > +/* Mutex to serialize hvc_open */
> > > +static DEFINE_MUTEX(hvc_open_mutex);
> > >  /*
> > >   * This value is used to assign a tty->index value to a hvc_struct
> > > based
> > >   * upon order of exposure via hvc_probe(), when we can not match it
> > > to
> > > @@ -346,16 +348,24 @@ static int hvc_install(struct tty_driver
> > > *driver, struct tty_struct *tty)
> > >   */
> > >  static int hvc_open(struct tty_struct *tty, struct file * filp)
> > >  {
> > > - struct hvc_struct *hp = tty->driver_data;
> > > + struct hvc_struct *hp;
> > >   unsigned long flags;
> > >   int rc = 0;
> > > 
> > > + mutex_lock(&hvc_open_mutex);
> > > +
> > > + hp = tty->driver_data;
> > > + if (!hp) {
> > > + rc = -EIO;
> > > + goto out;
> > > + }
> > > +
> > >   spin_lock_irqsave(&hp->port.lock, flags);
> > >   /* Check and then increment for fast path open. */
> > >   if (hp->port.count++ > 0) {
> > >   spin_unlock_irqrestore(&hp->port.lock, flags);
> > >   hvc_kick();
> > > - return 0;
> > > + goto out;
> > >   } /* else count == 0 */
> > >   spin_unlock_irqrestore(&hp->port.lock, flags);
> > 
> > Wait, why isn't this driver just calling tty_port_open() instead of
> > trying to open-code all of this?
> > 
> > Keeping a single mutext for open will not protect it from close, it will
> > just slow things down a bit.  There should already be a tty lock held by
> > the tty core for open() to keep it from racing things, right?
> The tty lock should have been held, but not likely across ->install() and
> ->open() callbacks, thus resulting in a race between hvc_install() and
> hvc_open(),

How?  The tty lock is held in install, and should not conflict with
open(), otherwise we would be seeing this happen in all tty drivers,
right?

> where hvc_install() sets a data and the hvc_open() clears it. hvc_open()
> doesn't
> check if the data was set to NULL and proceeds.

What data is being set that hvc_open is checking?

And you are not grabbing a lock in your install callback, you are only
serializing your open call here, I don't see how this is fixing anything
other than perhaps slowing down your codepaths.

As an arument why this isn't correct, can you answer why this same type
of change wouldn't be required for all tty drivers in the tree?

thanks,

greg k-h

45 matches

Mail list logo