Hi, On Wed, Mar 26, 2025 at 10:02:14AM +0800, Peng Fan (OSS) wrote: > From: Peng Fan <peng....@nxp.com> > > There is case as below could trigger kernel dump: > Use U-Boot to start remote processor(rproc) with resource table > published to a fixed address by rproc. After Kernel boots up, > stop the rproc, load a new firmware which doesn't have resource table > ,and start rproc. >
If a firwmare image doesn't have a resouce table, rproc_elf_load_rsc_table() will return an error [1], rproc_fw_boot() will exit prematurely [2] and the remote processor won't be started. What am I missing? [1]. https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/remoteproc_elf_loader.c#L338 [2]. https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/remoteproc_core.c#L1411 > When starting rproc with a firmware not have resource table, > `memcpy(loaded_table, rproc->cached_table, rproc->table_sz)` will > trigger dump, because rproc->cache_table is set to NULL during the last > stop operation, but rproc->table_sz is still valid. > > This issue is found on i.MX8MP and i.MX9. > > Dump as below: > Unable to handle kernel NULL pointer dereference at virtual address > 0000000000000000 > Mem abort info: > ESR = 0x0000000096000004 > EC = 0x25: DABT (current EL), IL = 32 bits > SET = 0, FnV = 0 > EA = 0, S1PTW = 0 > FSC = 0x04: level 0 translation fault > Data abort info: > ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 > CM = 0, WnR = 0, TnD = 0, TagAccess = 0 > GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 > user pgtable: 4k pages, 48-bit VAs, pgdp=000000010af63000 > [0000000000000000] pgd=0000000000000000, p4d=0000000000000000 > Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP > Modules linked in: > CPU: 2 UID: 0 PID: 1060 Comm: sh Not tainted 6.14.0-rc7-next-20250317-dirty > #38 > Hardware name: NXP i.MX8MPlus EVK board (DT) > pstate: a0000005 (NzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) > pc : __pi_memcpy_generic+0x110/0x22c > lr : rproc_start+0x88/0x1e0 > Call trace: > __pi_memcpy_generic+0x110/0x22c (P) > rproc_boot+0x198/0x57c > state_store+0x40/0x104 > dev_attr_store+0x18/0x2c > sysfs_kf_write+0x7c/0x94 > kernfs_fop_write_iter+0x120/0x1cc > vfs_write+0x240/0x378 > ksys_write+0x70/0x108 > __arm64_sys_write+0x1c/0x28 > invoke_syscall+0x48/0x10c > el0_svc_common.constprop.0+0xc0/0xe0 > do_el0_svc+0x1c/0x28 > el0_svc+0x30/0xcc > el0t_64_sync_handler+0x10c/0x138 > el0t_64_sync+0x198/0x19c > > Clear rproc->table_sz to address the issue. > > While at here, also clear rproc->table_sz when rproc_fw_boot and > rproc_detach. > > Fixes: 9dc9507f1880 ("remoteproc: Properly deal with the resource table when > detaching") > Signed-off-by: Peng Fan <peng....@nxp.com> > --- > > V2: > Clear table_sz when rproc_fw_boot and rproc_detach per Arnaud > > drivers/remoteproc/remoteproc_core.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/remoteproc/remoteproc_core.c > b/drivers/remoteproc/remoteproc_core.c > index c2cf0d277729..1efa53d4e0c3 100644 > --- a/drivers/remoteproc/remoteproc_core.c > +++ b/drivers/remoteproc/remoteproc_core.c > @@ -1442,6 +1442,7 @@ static int rproc_fw_boot(struct rproc *rproc, const > struct firmware *fw) > kfree(rproc->cached_table); > rproc->cached_table = NULL; > rproc->table_ptr = NULL; > + rproc->table_sz = 0; > unprepare_rproc: > /* release HW resources if needed */ > rproc_unprepare_device(rproc); > @@ -2025,6 +2026,7 @@ int rproc_shutdown(struct rproc *rproc) > kfree(rproc->cached_table); > rproc->cached_table = NULL; > rproc->table_ptr = NULL; > + rproc->table_sz = 0; > out: > mutex_unlock(&rproc->lock); > return ret; > @@ -2091,6 +2093,7 @@ int rproc_detach(struct rproc *rproc) > kfree(rproc->cached_table); > rproc->cached_table = NULL; > rproc->table_ptr = NULL; > + rproc->table_sz = 0; > out: > mutex_unlock(&rproc->lock); > return ret; > -- > 2.37.1 >