Re: [PATCH] tools/virtio: Use the __GFP_ZERO flag of kmalloc to complete the memory initialization.
On Wed, Jun 05, 2024 at 09:52:45PM +0800, cuitao wrote: > Use the __GFP_ZERO flag of kmalloc to initialize memory while allocating it, > without the need for an additional memset call. > > Signed-off-by: cuitao > --- > tools/virtio/linux/kernel.h | 5 + > 1 file changed, 1 insertion(+), 4 deletions(-) > > diff --git a/tools/virtio/linux/kernel.h b/tools/virtio/linux/kernel.h > index 6702008f7f5c..9e401fb7c215 100644 > --- a/tools/virtio/linux/kernel.h > +++ b/tools/virtio/linux/kernel.h > @@ -66,10 +66,7 @@ static inline void *kmalloc_array(unsigned n, size_t s, > gfp_t gfp) > > static inline void *kzalloc(size_t s, gfp_t gfp) > { > - void *p = kmalloc(s, gfp); > - > - memset(p, 0, s); > - return p; > + return kmalloc(s, gfp | __GFP_ZERO); > } Why do we care? It's just here to make things compile. The simpler the better. > static inline void *alloc_pages_exact(size_t s, gfp_t gfp) > -- > 2.25.1
Re: [PATCH] tools/virtio: Use the __GFP_ZERO flag of kmalloc to complete the memory initialization.
Sorry, maybe I'm wrong. I wonder if the gfp parameter in static inline void *kmalloc(size_t s, gfp_t gfp) can be deleted if it is not used. Or would be better to move memset to kmalloc. Like this: #define __GFP_ZERO 0x1 static inline void *kmalloc(size_t s, gfp_t gfp) { void *p; if (__kmalloc_fake) return __kmalloc_fake; p = malloc(s); if (!p) return p; if (gfp & __GFP_ZERO) memset(p, 0, s); return p; } 在 2024/6/6 15:18, Michael S. Tsirkin 写道: On Wed, Jun 05, 2024 at 09:52:45PM +0800, cuitao wrote: Use the __GFP_ZERO flag of kmalloc to initialize memory while allocating it, without the need for an additional memset call. Signed-off-by: cuitao --- tools/virtio/linux/kernel.h | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/tools/virtio/linux/kernel.h b/tools/virtio/linux/kernel.h index 6702008f7f5c..9e401fb7c215 100644 --- a/tools/virtio/linux/kernel.h +++ b/tools/virtio/linux/kernel.h @@ -66,10 +66,7 @@ static inline void *kmalloc_array(unsigned n, size_t s, gfp_t gfp) static inline void *kzalloc(size_t s, gfp_t gfp) { - void *p = kmalloc(s, gfp); - - memset(p, 0, s); - return p; + return kmalloc(s, gfp | __GFP_ZERO); } Why do we care? It's just here to make things compile. The simpler the better. static inline void *alloc_pages_exact(size_t s, gfp_t gfp) -- 2.25.1
Re: [PATCH v3 1/5] dt-bindings: remoteproc: qcom,sa8775p-pas: Document the SA8775p ADSP, CDSP and GPDSP
On 05/06/2024 18:56, Bartosz Golaszewski wrote: > From: Bartosz Golaszewski > > Document the components used to boot the ADSP, CDSP0, CDSP1, GPDSP0 and > GPDSP1 on the SA8775p SoC. > > Signed-off-by: Bartosz Golaszewski > --- > .../bindings/remoteproc/qcom,sa8775p-pas.yaml | 160 > + > 1 file changed, 160 insertions(+) Reviewed-by: Krzysztof Kozlowski Best regards, Krzysztof
[PATCH] remoteproc: imx_rproc: Adjust phandle parsing issue while remapping optional addresses in imx_rproc_addr_init()
In imx_rproc_addr_init() "nph = of_count_phandle_with_args()" just counts number of phandles. But phandles may be empty. So of_parse_phandle() in the parsing loop (0 < a < nph) may return NULL which is later dereferenced. Adjust this issue by adding NULL-return check. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: a0ff4aa6f010 ("remoteproc: imx_rproc: add a NXP/Freescale imx_rproc driver") Signed-off-by: Aleksandr Mishin --- drivers/remoteproc/imx_rproc.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/remoteproc/imx_rproc.c b/drivers/remoteproc/imx_rproc.c index 5a3fb902acc9..39eacd90af14 100644 --- a/drivers/remoteproc/imx_rproc.c +++ b/drivers/remoteproc/imx_rproc.c @@ -726,6 +726,8 @@ static int imx_rproc_addr_init(struct imx_rproc *priv, struct resource res; node = of_parse_phandle(np, "memory-region", a); + if (!node) + continue; /* Not map vdevbuffer, vdevring region */ if (!strncmp(node->name, "vdev", strlen("vdev"))) { of_node_put(node); -- 2.30.2
RE: [PATCH] remoteproc: imx_rproc: Adjust phandle parsing issue while remapping optional addresses in imx_rproc_addr_init()
Hi Aleksandr, > Subject: [PATCH] remoteproc: imx_rproc: Adjust phandle parsing issue while > remapping optional addresses in imx_rproc_addr_init() > > In imx_rproc_addr_init() "nph = of_count_phandle_with_args()" just counts > number of phandles. But phandles may be empty. So of_parse_phandle() in > the parsing loop (0 < a < nph) may return NULL which is later dereferenced. > Adjust this issue by adding NULL-return check. It is good to add a check here. But I am not sure whether this will really happen. > > Found by Linux Verification Center (linuxtesting.org) with SVACE. > > Fixes: a0ff4aa6f010 ("remoteproc: imx_rproc: add a NXP/Freescale imx_rproc > driver") > Signed-off-by: Aleksandr Mishin Anyway LGTM: Reviewed-by: Peng Fan Thanks, Peng.
[PATCH] remoteproc: mediatek: Increase MT8188 SCP core0 DRAM size
From: Jason Chen Increase MT8188 SCP core0 DRAM size for HEVC driver. Signed-off-by: Jason Chen --- drivers/remoteproc/mtk_scp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/remoteproc/mtk_scp.c b/drivers/remoteproc/mtk_scp.c index b885a9a041e4..2119fc62c3f2 100644 --- a/drivers/remoteproc/mtk_scp.c +++ b/drivers/remoteproc/mtk_scp.c @@ -1390,7 +1390,7 @@ static const struct mtk_scp_sizes_data default_scp_sizes = { }; static const struct mtk_scp_sizes_data mt8188_scp_sizes = { - .max_dram_size = 0x50, + .max_dram_size = 0x80, .ipi_share_buffer_size = 600, }; -- 2.34.1
[PATCH] arm64: dts: qcom: qcm6490-fairphone-fp5: Use .mbn firmware for IPA
Specify the file name for the squashed/non-split firmware with the .mbn extension instead of the split .mdt. The kernel can load both but the squashed version is preferred in dts nowadays. Signed-off-by: Luca Weiss --- arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts b/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts index 8cd2fe80dbb2..6b4ffc585dcf 100644 --- a/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts +++ b/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts @@ -551,7 +551,7 @@ &i2c9 { &ipa { qcom,gsi-loader = "self"; memory-region = <&ipa_fw_mem>; - firmware-name = "qcom/qcm6490/fairphone5/ipa_fws.mdt"; + firmware-name = "qcom/qcm6490/fairphone5/ipa_fws.mbn"; status = "okay"; }; --- base-commit: ee78a17615ad0cfdbbc27182b1047cd36c9d4d5f change-id: 20240606-fp5-ipa-mbn-d0466de6b5bf Best regards, -- Luca Weiss
Re: [PATCH v8 1/5] soc: qcom: pdr: protect locator_addr with the main mutex
On Thu, 6 Jun 2024 at 01:48, Chris Lew wrote: > > Hi Dmitry, > > On 5/11/2024 2:56 PM, Dmitry Baryshkov wrote: > ... > > @@ -76,12 +76,12 @@ static int pdr_locator_new_server(struct qmi_handle > > *qmi, > > locator_hdl); > > struct pdr_service *pds; > > > > + mutex_lock(&pdr->lock); > > /* Create a local client port for QMI communication */ > > pdr->locator_addr.sq_family = AF_QIPCRTR; > > pdr->locator_addr.sq_node = svc->node; > > pdr->locator_addr.sq_port = svc->port; > > > > - mutex_lock(&pdr->lock); > > pdr->locator_init_complete = true; > > mutex_unlock(&pdr->lock); > > > > @@ -104,10 +104,10 @@ static void pdr_locator_del_server(struct qmi_handle > > *qmi, > > > > mutex_lock(&pdr->lock); > > pdr->locator_init_complete = false; > > - mutex_unlock(&pdr->lock); > > > > pdr->locator_addr.sq_node = 0; > > pdr->locator_addr.sq_port = 0; > > + mutex_unlock(&pdr->lock); > > } > > > > static const struct qmi_ops pdr_locator_ops = { > > @@ -365,6 +365,7 @@ static int pdr_get_domain_list(struct > > servreg_get_domain_list_req *req, > > if (ret < 0) > > return ret; > > > > + mutex_lock(&pdr->lock); > > ret = qmi_send_request(&pdr->locator_hdl, > > &pdr->locator_addr, > > &txn, SERVREG_GET_DOMAIN_LIST_REQ, > > @@ -373,15 +374,16 @@ static int pdr_get_domain_list(struct > > servreg_get_domain_list_req *req, > > req); > > if (ret < 0) { > > qmi_txn_cancel(&txn); > > - return ret; > > + goto err_unlock; > > } > > > > ret = qmi_txn_wait(&txn, 5 * HZ); > > if (ret < 0) { > > pr_err("PDR: %s get domain list txn wait failed: %d\n", > > req->service_name, ret); > > - return ret; > > + goto err_unlock; > > } > > + mutex_unlock(&pdr->lock); > > I'm not sure it is necessary to hold the the mutex during the > qmi_txn_wait() since the only variable we are trying to protect is > locator_addr. > > Wouldn't this delay other work like new/del server notifications if this > qmi service is delayed or non-responsive? > I've verified, the addr is stored inside the message data by the enqueueing functions, so the locator_addr isn't referenced after the function returns. I'll reduce the locking scope. -- With best wishes Dmitry
Re: [PATCH v8 0/5] soc: qcom: add in-kernel pd-mapper implementation
On 11/05/2024 23:56, Dmitry Baryshkov wrote: Protection domain mapper is a QMI service providing mapping between 'protection domains' and services supported / allowed in these domains. For example such mapping is required for loading of the WiFi firmware or for properly starting up the UCSI / altmode / battery manager support. The existing userspace implementation has several issue. It doesn't play well with CONFIG_EXTRA_FIRMWARE, it doesn't reread the JSON files if the firmware location is changed (or if the firmware was not available at the time pd-mapper was started but the corresponding directory is mounted later), etc. However this configuration is largely static and common between different platforms. Provide in-kernel service implementing static per-platform data. To: Bjorn Andersson To: Konrad Dybcio To: Sibi Sankar To: Mathieu Poirier Cc: linux-arm-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-remotep...@vger.kernel.org Cc: Johan Hovold Cc: Xilin Wu Cc: "Bryan O'Donoghue" Cc: Steev Klimaszewski Cc: Alexey Minnekhanov -- Changes in v8: - Reworked pd-mapper to register as an rproc_subdev / auxdev - Dropped Tested-by from Steev and Alexey from the last patch since the implementation was changed significantly. - Add sensors, cdsp and mpss_root domains to 660 config (Alexey Minnekhanov) - Added platform entry for sm4250 (used for qrb4210 / RB2) - Added locking to the pdr_get_domain_list() (Chris Lew) - Remove the call to qmi_del_server() and corresponding API (Chris Lew) - In qmi_handle_init() changed 1024 to a defined constant (Chris Lew) - Link to v7: https://lore.kernel.org/r/20240424-qcom-pd-mapper-v7-0-05f7fc646...@linaro.org Changes in v7: - Fixed modular build (Steev) - Link to v6: https://lore.kernel.org/r/20240422-qcom-pd-mapper-v6-0-f96957d01...@linaro.org Changes in v6: - Reworked mutex to fix lockdep issue on deregistration - Fixed dependencies between PD-mapper and remoteproc to fix modular builds (Krzysztof) - Added EXPORT_SYMBOL_GPL to fix modular builds (Krzysztof) - Fixed kerneldocs (Krzysztof) - Removed extra pr_debug messages (Krzysztof) - Fixed wcss build (Krzysztof) - Added platforms which do not require protection domain mapping to silence the notice on those platforms - Link to v5: https://lore.kernel.org/r/20240419-qcom-pd-mapper-v5-0-e35b6f847...@linaro.org Changes in v5: - pdr: drop lock in pdr_register_listener, list_lock is already held (Chris Lew) - pd_mapper: reworked to provide static configuration per platform (Bjorn) - Link to v4: https://lore.kernel.org/r/20240311-qcom-pd-mapper-v4-0-24679cca5...@linaro.org Changes in v4: - Fixed missing chunk, reenabled kfree in qmi_del_server (Konrad) - Added configuration for sm6350 (Thanks to Luca) - Removed RFC tag (Konrad) - Link to v3: https://lore.kernel.org/r/20240304-qcom-pd-mapper-v3-0-6858fa1ac...@linaro.org Changes in RFC v3: - Send start / stop notifications when PD-mapper domain list is changed - Reworked the way PD-mapper treats protection domains, register all of them in a single batch - Added SC7180 domains configuration based on TCL Book 14 GO - Link to v2: https://lore.kernel.org/r/20240301-qcom-pd-mapper-v2-0-5d12a081d...@linaro.org Changes in RFC v2: - Swapped num_domains / domains (Konrad) - Fixed an issue with battery not working on sc8280xp - Added missing configuration for QCS404 --- Dmitry Baryshkov (5): soc: qcom: pdr: protect locator_addr with the main mutex soc: qcom: pdr: fix parsing of domains lists soc: qcom: pdr: extract PDR message marshalling data soc: qcom: add pd-mapper implementation remoteproc: qcom: enable in-kernel PD mapper drivers/remoteproc/qcom_common.c| 87 + drivers/remoteproc/qcom_common.h| 10 + drivers/remoteproc/qcom_q6v5_adsp.c | 3 + drivers/remoteproc/qcom_q6v5_mss.c | 3 + drivers/remoteproc/qcom_q6v5_pas.c | 3 + drivers/remoteproc/qcom_q6v5_wcss.c | 3 + drivers/soc/qcom/Kconfig| 15 + drivers/soc/qcom/Makefile | 2 + drivers/soc/qcom/pdr_interface.c| 17 +- drivers/soc/qcom/pdr_internal.h | 318 ++--- drivers/soc/qcom/qcom_pd_mapper.c | 676 drivers/soc/qcom/qcom_pdr_msg.c | 353 +++ 12 files changed, 1190 insertions(+), 300 deletions(-) --- base-commit: e5119bbdaca76cd3c15c3c975d51d840bbfb2488 change-id: 20240301-qcom-pd-mapper-e12d622d4ad0 Best regards, Tested-by: Neil Armstrong # on SM8550-QRD Tested-by: Neil Armstrong # on SM8550-HDK Tested-by: Neil Armstrong # on SM8650-QRD Thanks, Neil
Re: [PATCH] arm64: dts: qcom: qcm6490-fairphone-fp5: Use .mbn firmware for IPA
On 6.06.2024 11:09 AM, Luca Weiss wrote: > Specify the file name for the squashed/non-split firmware with the .mbn > extension instead of the split .mdt. The kernel can load both but the > squashed version is preferred in dts nowadays. > > Signed-off-by: Luca Weiss > --- Reviewed-by: Konrad Dybcio Konrad
Re: [PATCH v2 2/6] livepatch: Add klp-convert tool
Hello Joe, > > +#define KLP_RELOC_SYMBOL_POS(LP_OBJ_NAME, SYM_OBJ_NAME, SYM_NAME, SYM_POS) > > \ > > + asm("\".klp.sym.rela." #LP_OBJ_NAME "." #SYM_OBJ_NAME "." #SYM_NAME "," > > #SYM_POS "\"") > > ^^^ > I think I found a potential bug, or at least compatiblity problem with > including a comma "," character in this symbol format and older versions > of the GNU assembler. The good news is that other delimiter characters > like "." or "#" seem to work out fine. Thank you for spotting this. I was using binutils 2.38, so I did not even notice this problem. Unfortunately, I was not able to make it work with "#" as a delimiter; only "." worked. Additionally, any type of parenthesis apparently has some special purpose even in labels, so they are also not an option. > If you want to reproduce, you'll need a version of `as` like binutils > 2.36.1 and try building the samples/livepatch/livepatch-extern-symbol.ko > and you should get an error like: > > Assembler messages: > Warning: missing closing '"' > Warning: missing closing '"' > Error: too many memory references for `movq' > > > If you want to retrace my adventure, here are my steps: > > 1) Clone klp-convert-tree repo branch containing this patchset + > Petr's review comments + a few helpful things for klp-convert > development: > > $ git clone \ > --single-branch --branch=klp-convert-minimal-v1-review --depth=9 \ > https://github.com/joe-lawrence/klp-convert-tree.git > [ ... snip ... ] > $ cd klp-convert-tree > > 2) Override .cross-dev defaults: > > $ export BUILD_ARCHES=x86_64 > $ export COMPILER=gcc-11.1.0 > $ export URL=https://cdn.kernel.org/pub/tools/crosstool/files/bin/x86_64/ > $ export OUTDIR_PREFIX=$(pwd)/build > $ export COMPILER_INSTALL_PATH=$(pwd)/cross-compiler > > 3) Setup x86_64 default .config (this will download and install the > gcc-11.1.0 compiler from cdn.kernel.org): > > $ ./cross-dev make defconfig > > x86_64 : make defconfig ... > Compiler will be installed in /root/klp-convert-tree/cross-compiler > [ ... snip ... ] > > 4) Add kernel livepatching configuration options: > > $ ./cross-dev klp-config > > Configuring x86_64 ... > [ ... snip ... ] > > $ grep LIVEPATCH "$OUTDIR_PREFIX"-x86_64/.config > CONFIG_HAVE_LIVEPATCH=y > CONFIG_LIVEPATCH=y > CONFIG_SAMPLE_LIVEPATCH=m > > 5) Run the cross-compiler build until it hits a build error on > livepatch-extern-symbol.ko: > > $ ./cross-dev make -j$(nproc) > [ ... snip ... ] > make: Target '__all' not remade because of errors. > [ x86_64 : make -j48 = FAIL ] > > 6) With pre-requisites already built, retry the external symbol sample > and add -save-temps to the KCFLAGS to keep the generated assembly file: > > $ KCFLAGS="-save-temps=obj" ./cross-dev make > samples/livepatch/livepatch-extern-symbol.ko > [ ... snip ... ] > samples/livepatch/livepatch-extern-symbol.s: Assembler messages: > samples/livepatch/livepatch-extern-symbol.s:103: Warning: missing closing > '"' > samples/livepatch/livepatch-extern-symbol.s:103: Warning: missing closing > '"' > samples/livepatch/livepatch-extern-symbol.s:103: Error: too many memory > references for `movq' > [ ... snip ... ] > > 7) Which line is that? > > $ awk 'NR==103' > "$OUTDIR_PREFIX"-x86_64/samples/livepatch/livepatch-extern-symbol.s > movq > ".klp.sym.rela.vmlinux.vmlinux.saved_command_line,0"(%rip), %rdx > > > You could alternatively poke at it through the compiler explorer service > and toggle the source and binutils versions: > > (error) binutils 2.36.1 : https://godbolt.org/z/cGGs6rfWe > (success) binutils 2.38 : https://godbolt.org/z/ffzza3vYd Thank you for those detailed step-by-step instruction to reproduce it! It helped me a lot to understand the problem. Lukas
Re: [PATCH] remoteproc: mediatek: Don't print error when optional scp handle is missing
Il 05/06/24 21:35, Nícolas F. R. A. Prado ha scritto: The scp_get() helper has two users: the mtk-vcodec and the mtk-mdp3 drivers. mdp3 considers the mediatek,scp phandle optional, and when it's missing mdp3 will directly look for the scp node based on compatible. For that reason printing an error in the helper when the handle is missing is wrong, as it's not always an actual error. Besides, the other user, vcodec, already prints an error message on its own when the helper fails so this message doesn't add that much information. Remove the error message from the helper. This gets rid of the deceptive error on MT8195-Tomato: mtk-mdp3 14001000.dma-controller: can't get SCP node I'm way more for giving it a SCP handle instead of silencing this print... the SCP handle way *is* the correct one and I prefer it, as that'd also be the only way to support Dual-Core SCP in MDP3. I really want to see hardcoded stuff disappear and I really really *really* want to see more consistency across DT nodes in MediaTek platforms. So, still a one-line change, but do it on the devicetree instead of here please. Cheers, Angelo Signed-off-by: Nícolas F. R. A. Prado --- drivers/remoteproc/mtk_scp.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/remoteproc/mtk_scp.c b/drivers/remoteproc/mtk_scp.c index b885a9a041e4..f813117b6312 100644 --- a/drivers/remoteproc/mtk_scp.c +++ b/drivers/remoteproc/mtk_scp.c @@ -37,10 +37,8 @@ struct mtk_scp *scp_get(struct platform_device *pdev) struct platform_device *scp_pdev; scp_node = of_parse_phandle(dev->of_node, "mediatek,scp", 0); - if (!scp_node) { - dev_err(dev, "can't get SCP node\n"); + if (!scp_node) return NULL; - } scp_pdev = of_find_device_by_node(scp_node); of_node_put(scp_node); --- base-commit: d97496ca23a2d4ee80b7302849404859d9058bcd change-id: 20240605-mt8195-dma-scp-node-err-6a8dea26d4f5 Best regards,
Re: [PATCH] remoteproc: mediatek: Increase MT8188 SCP core0 DRAM size
Il 06/06/24 11:06, jason-ch chen ha scritto: From: Jason Chen Increase MT8188 SCP core0 DRAM size for HEVC driver. so the second core only gets a maximum of 0x20 of SRAM? I'm not sure how useful is the secondary SCP core, at this point, with so little available SRAM but... okay, as you wish. Reviewed-by: AngeloGioacchino Del Regno Signed-off-by: Jason Chen --- drivers/remoteproc/mtk_scp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/remoteproc/mtk_scp.c b/drivers/remoteproc/mtk_scp.c index b885a9a041e4..2119fc62c3f2 100644 --- a/drivers/remoteproc/mtk_scp.c +++ b/drivers/remoteproc/mtk_scp.c @@ -1390,7 +1390,7 @@ static const struct mtk_scp_sizes_data default_scp_sizes = { }; static const struct mtk_scp_sizes_data mt8188_scp_sizes = { - .max_dram_size = 0x50, + .max_dram_size = 0x80, .ipi_share_buffer_size = 600, };
[PATCH] function_graph: Rename BYTE_NUMBER to CHAR_NUMBER in selftests
From: "Steven Rostedt (Google)" The function_graph selftests checks various size variables to pass from the entry of the function to the exit. It tests 1, 2, 4 and 8 byte words. The 1 byte macro was called BYTE_NUMBER but that is used in the sh architecture: arch/sh/include/asm/bitops-op32.h Just rename the macro to CHAR_NUMBER. Fixes: 47c3c70aa3697 ("function_graph: Add selftest for passing local variables") Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202406061744.rzdxfrrg-...@intel.com/ Signed-off-by: Steven Rostedt (Google) --- kernel/trace/trace_selftest.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c index 369efc569238..adf0f436d84b 100644 --- a/kernel/trace/trace_selftest.c +++ b/kernel/trace/trace_selftest.c @@ -758,7 +758,7 @@ trace_selftest_startup_function(struct tracer *trace, struct trace_array *tr) #ifdef CONFIG_DYNAMIC_FTRACE -#define BYTE_NUMBER 123 +#define CHAR_NUMBER 123 #define SHORT_NUMBER 12345 #define WORD_NUMBER 1234567890 #define LONG_NUMBER 1234567890123456789LL @@ -789,7 +789,7 @@ static __init int store_entry(struct ftrace_graph_ent *trace, switch (size) { case 1: - *(char *)p = BYTE_NUMBER; + *(char *)p = CHAR_NUMBER; break; case 2: *(short *)p = SHORT_NUMBER; @@ -830,7 +830,7 @@ static __init void store_return(struct ftrace_graph_ret *trace, switch (fixture->store_size) { case 1: - expect = BYTE_NUMBER; + expect = CHAR_NUMBER; found = *(char *)p; break; case 2: -- 2.43.0
Re: [PATCH v2 1/5] ftrace: Rename dup_hash() and comment it
On Wed, Jun 05, 2024 at 02:03:35PM -0400, Steven Rostedt wrote: > From: "Steven Rostedt (Google)" > > The name "dup_hash()" is a misnomer as it does not duplicate the hash that > is passed in, but instead moves its entities from that hash to a newly > allocated one. Rename it to "__move_hash()" (using starting underscores as > it is an internal function), and add some comments about what it does. > > Signed-off-by: Steven Rostedt (Google) Acked-by: Mark Rutland Mark. > --- > kernel/trace/ftrace.c | 8 ++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c > index da7e6abf48b4..9dcdefe9d1aa 100644 > --- a/kernel/trace/ftrace.c > +++ b/kernel/trace/ftrace.c > @@ -1391,7 +1391,11 @@ ftrace_hash_rec_enable_modify(struct ftrace_ops *ops, > int filter_hash); > static int ftrace_hash_ipmodify_update(struct ftrace_ops *ops, > struct ftrace_hash *new_hash); > > -static struct ftrace_hash *dup_hash(struct ftrace_hash *src, int size) > +/* > + * Allocate a new hash and remove entries from @src and move them to the new > hash. > + * On success, the @src hash will be empty and should be freed. > + */ > +static struct ftrace_hash *__move_hash(struct ftrace_hash *src, int size) > { > struct ftrace_func_entry *entry; > struct ftrace_hash *new_hash; > @@ -1438,7 +1442,7 @@ __ftrace_hash_move(struct ftrace_hash *src) > if (ftrace_hash_empty(src)) > return EMPTY_HASH; > > - return dup_hash(src, size); > + return __move_hash(src, size); > } > > static int > -- > 2.43.0 > >
Re: [PATCH v14 14/14] selftests/sgx: Add scripts for EPC cgroup testing
On Thu, 06 Jun 2024 00:32:55 -0500, Jarkko Sakkinen wrote: On Wed Jun 5, 2024 at 6:33 PM EEST, Haitao Huang wrote: sgx_cgroup_try_charge() expects sgx_cg_wq, so it would break unless we check and return 0 which was the initially implemented in v12. But then Kai had some concern on that we expose all the interface files to allow user to set limits but we don't enforce. To keep it simple we settled down ~~ Sure: "Keep it simple and corpse" back to BUG_ON(). This would only happen rarely and user can add command-line to disable SGX if s/he really wants to start kernel in this case, just can't do SGX. Even disabling all of SGX would be a less catastrophical measure. Yes I had a comment but Kai thought it was too obvious and I can't think of a better one that's not obvious so I removed: Not great advice given. Please just document it. In patch, which BUG_ON() I don't want to see my R-by in it, until I've reviewed an updated version. BR, Jarkko Sure. that makes sens. So far I have following changes. I did not do the renaming for the functions as I am not sure it is still needed. Let me know otherwise and if I missed any more. --- a/arch/x86/kernel/cpu/sgx/epc_cgroup.c +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.c @@ -329,7 +329,7 @@ const struct misc_res_ops sgx_cgroup_ops = { .free = sgx_cgroup_free, }; -void sgx_cgroup_init(void) +int __init sgx_cgroup_init(void) { /* * misc root always exists even if misc is disabled from command line. @@ -345,7 +345,8 @@ void sgx_cgroup_init(void) if (cgroup_subsys_enabled(misc_cgrp_subsys)) { sgx_cg_wq = alloc_workqueue("sgx_cg_wq", WQ_UNBOUND | WQ_FREEZABLE, WQ_UNBOUND_MAX_ACTIVE); -BUG_ON(!sgx_cg_wq); +return -ENOMEM; } +return 0; } --- a/arch/x86/kernel/cpu/sgx/epc_cgroup.h +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.h @@ -99,7 +99,7 @@ bool sgx_cgroup_should_reclaim(struct sgx_cgroup *sgx_cg); struct misc_cg *sgx_cgroup_reclaim_pages(struct misc_cg *root, struct misc_cg *next_cg, struct mm_struct *charge_mm); -void sgx_cgroup_init(void); +int __init sgx_cgroup_init(void); #endif /* CONFIG_CGROUP_MISC */ --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -1045,7 +1045,7 @@ static int __init sgx_init(void) if (!sgx_page_cache_init()) return -ENOMEM; -if (!sgx_page_reclaimer_init()) { +if (!sgx_page_reclaimer_init() || !sgx_cgroup_init()) { ret = -ENOMEM; goto err_page_cache; } @@ -1067,9 +1067,6 @@ static int __init sgx_init(void) if (sgx_vepc_init() && ret) goto err_provision; -/* Setup cgroup if either the native or vepc driver is active */ -sgx_cgroup_init(); - return 0; err_provision: --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -119,6 +119,7 @@ static inline void *sgx_get_epc_virt_addr(struct sgx_epc_page *page) * cgroup. */ struct sgx_epc_lru_list { +/* Use this lock to protect access from multiple reclamation worker threads */ spinlock_t lock; struct list_head reclaimable; }; --- a/kernel/cgroup/misc.c +++ b/kernel/cgroup/misc.c @@ -446,7 +446,8 @@ misc_cg_alloc(struct cgroup_subsys_state *parent_css) int ret; if (!parent_css) { -parent_cg = cg = &root_cg; +parent_cg = &root_cg; +cg = &root_cg; } else { cg = kzalloc(sizeof(*cg), GFP_KERNEL); if (!cg) Thanks Haitao
[PATCH v2] ftrace: Add missing kerneldoc parameters to unregister_ftrace_direct()
Add the description to the parameters addr and free_filters of the function unregister_ftrace_direct(). Signed-off-by: Marilene A Garcia --- Hello, Thank you for the feedback. The requested changes were applied in this new version. kernel/trace/ftrace.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index da7e6abf48b4..1f79cf3c598e 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -5960,6 +5960,8 @@ EXPORT_SYMBOL_GPL(register_ftrace_direct); * unregister_ftrace_direct - Remove calls to custom trampoline * previously registered by register_ftrace_direct for @ops object. * @ops: The address of the struct ftrace_ops object + * @addr: The address of the direct function that is called by the @ops functions + * @free_filters: Set to true to remove all filters for the ftrace_ops, false otherwise * * This is used to remove a direct calls to @addr from the nop locations * of the functions registered in @ops (with by ftrace_set_filter_ip -- 2.34.1
[PATCH 6.10.0-rc2] kernel/module: avoid panic on loading broken module
If a module is being loaded, and the .gnu.linkonce.this_module section in the module's ELF file does not have the WRITE flag, the kernel will map the finished module struct of that module as read-only. This causes a kernel panic when the struct is written to the first time after it has been marked read-only. Currently this happens in complete_formation in kernel/module/main.c:2765 when the module's state is set to MODULE_STATE_COMING, just after setting up the memory protections. Down the line, this seems to lead to unpredictable freezes when trying to load other modules - I guess this is due to some structures not being cleaned up properly, but I didn't investigate this further. A check already exists which verifies that .gnu.linkonce.this_module is ALLOC. This patch simply adds an analogous check for WRITE. Signed-off-by: Daniel Kirschten --- kernel/module/main.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/kernel/module/main.c b/kernel/module/main.c index d18a94b973e1..abba097551a2 100644 --- a/kernel/module/main.c +++ b/kernel/module/main.c @@ -1886,6 +1886,12 @@ static int elf_validity_cache_copy(struct load_info *info, int flags) goto no_exec; } + if (!(shdr->sh_flags & SHF_WRITE)) { + pr_err("module %s: .gnu.linkonce.this_module must be writable\n", + info->name ?: "(missing .modinfo section or name field)"); + goto no_exec; + } + if (shdr->sh_size != sizeof(struct module)) { pr_err("module %s: .gnu.linkonce.this_module section size must match the kernel's built struct module size at run time\n", info->name ?: "(missing .modinfo section or name field)"); -- 2.34.1
[PATCH] net: missing check
Two missing check in virtio_net_hdr_to_skb() allowed syzbot to crash kernels again 1. After the skb_segment function the buffer may become non-linear (nr_frags != 0), but since the SKBTX_SHARED_FRAG flag is not set anywhere the __skb_linearize function will not be executed, then the buffer will remain non-linear. Then the condition (offset >= skb_headlen(skb)) becomes true, which causes WARN_ON_ONCE in skb_checksum_help. 2. The struct sk_buff and struct virtio_net_hdr members must be mathematically related. (gso_size) must be greater than (needed) otherwise WARN_ON_ONCE. (remainder) must be greater than (needed) otherwise WARN_ON_ONCE. (remainder) may be 0 if division is without remainder. offset+2 (4191) > skb_headlen() (1116) WARNING: CPU: 1 PID: 5084 at net/core/dev.c:3303 skb_checksum_help+0x5e2/0x740 net/core/dev.c:3303 Modules linked in: CPU: 1 PID: 5084 Comm: syz-executor336 Not tainted 6.7.0-rc3-syzkaller-00014-gdf60cee26a2e #0 Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023 RIP: 0010:skb_checksum_help+0x5e2/0x740 net/core/dev.c:3303 Code: 89 e8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 52 01 00 00 44 89 e2 2b 53 74 4c 89 ee 48 c7 c7 40 57 e9 8b e8 af 8f dd f8 90 <0f> 0b 90 90 e9 87 fe ff ff e8 40 0f 6e f9 e9 4b fa ff ff 48 89 ef RSP: 0018:c90003a9f338 EFLAGS: 00010286 RAX: RBX: 888025125780 RCX: 814db209 RDX: 888015393b80 RSI: 814db216 RDI: 0001 RBP: 8880251257f4 R08: 0001 R09: R10: R11: 0001 R12: 045c R13: 105f R14: 8880251257f0 R15: 105d FS: 55c24380() GS:8880b990() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 2000f000 CR3: 23151000 CR4: 003506f0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 Call Trace: ip_do_fragment+0xa1b/0x18b0 net/ipv4/ip_output.c:777 ip_fragment.constprop.0+0x161/0x230 net/ipv4/ip_output.c:584 ip_finish_output_gso net/ipv4/ip_output.c:286 [inline] __ip_finish_output net/ipv4/ip_output.c:308 [inline] __ip_finish_output+0x49c/0x650 net/ipv4/ip_output.c:295 ip_finish_output+0x31/0x310 net/ipv4/ip_output.c:323 NF_HOOK_COND include/linux/netfilter.h:303 [inline] ip_output+0x13b/0x2a0 net/ipv4/ip_output.c:433 dst_output include/net/dst.h:451 [inline] ip_local_out+0xaf/0x1a0 net/ipv4/ip_output.c:129 iptunnel_xmit+0x5b4/0x9b0 net/ipv4/ip_tunnel_core.c:82 ipip6_tunnel_xmit net/ipv6/sit.c:1034 [inline] sit_tunnel_xmit+0xed2/0x28f0 net/ipv6/sit.c:1076 __netdev_start_xmit include/linux/netdevice.h:4940 [inline] netdev_start_xmit include/linux/netdevice.h:4954 [inline] xmit_one net/core/dev.c:3545 [inline] dev_hard_start_xmit+0x13d/0x6d0 net/core/dev.c:3561 __dev_queue_xmit+0x7c1/0x3d60 net/core/dev.c:4346 dev_queue_xmit include/linux/netdevice.h:3134 [inline] packet_xmit+0x257/0x380 net/packet/af_packet.c:276 packet_snd net/packet/af_packet.c:3087 [inline] packet_sendmsg+0x24ca/0x5240 net/packet/af_packet.c:3119 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0xd5/0x180 net/socket.c:745 __sys_sendto+0x255/0x340 net/socket.c:2190 __do_sys_sendto net/socket.c:2202 [inline] __se_sys_sendto net/socket.c:2198 [inline] __x64_sys_sendto+0xe0/0x1b0 net/socket.c:2198 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x63/0x6b Signed-off-by: Denis Arefev --- include/linux/virtio_net.h | 10 ++ 1 file changed, 10 insertions(+) diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h index 4dfa9b69ca8d..77ebe908d746 100644 --- a/include/linux/virtio_net.h +++ b/include/linux/virtio_net.h @@ -56,6 +56,7 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb, unsigned int thlen = 0; unsigned int p_off = 0; unsigned int ip_proto; + u64 ret, remainder; if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) { switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) { @@ -98,6 +99,15 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb, u32 off = __virtio16_to_cpu(little_endian, hdr->csum_offset); u32 needed = start + max_t(u32, thlen, off + sizeof(__sum16)); + if (hdr->gso_size) { + ret = div64_u64_rem(skb->len, hdr->gso_size, &remainder); + if (!(ret && (hdr->gso_size > needed) && + ((remainder > needed) || (remainder == 0 { + return -EINVAL; + } + skb_shinfo(skb)->tx_flags |= SKBFL_SHARED_FRAG; + } + if (!pskb_may_pull(skb, needed)) return -EINVAL; -- 2.25.
Re: [PATCH] net: missing check
On Thu, Jun 6, 2024 at 4:14 PM Denis Arefev wrote: > > Two missing check in virtio_net_hdr_to_skb() allowed syzbot > to crash kernels again > > 1. After the skb_segment function the buffer may become non-linear > (nr_frags != 0), but since the SKBTX_SHARED_FRAG flag is not set anywhere > the __skb_linearize function will not be executed, then the buffer will > remain non-linear. Then the condition (offset >= skb_headlen(skb)) > becomes true, which causes WARN_ON_ONCE in skb_checksum_help. > > 2. The struct sk_buff and struct virtio_net_hdr members must be > mathematically related. > (gso_size) must be greater than (needed) otherwise WARN_ON_ONCE. > (remainder) must be greater than (needed) otherwise WARN_ON_ONCE. > (remainder) may be 0 if division is without remainder. > > offset+2 (4191) > skb_headlen() (1116) > WARNING: CPU: 1 PID: 5084 at net/core/dev.c:3303 > skb_checksum_help+0x5e2/0x740 net/core/dev.c:3303 > Modules linked in: > CPU: 1 PID: 5084 Comm: syz-executor336 Not tainted > 6.7.0-rc3-syzkaller-00014-gdf60cee26a2e #0 > Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google > 11/10/2023 > RIP: 0010:skb_checksum_help+0x5e2/0x740 net/core/dev.c:3303 > Code: 89 e8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 52 01 00 00 44 89 e2 2b > 53 74 4c 89 ee 48 c7 c7 40 57 e9 8b e8 af 8f dd f8 90 <0f> 0b 90 90 e9 87 fe > ff ff e8 40 0f 6e f9 e9 4b fa ff ff 48 89 ef > RSP: 0018:c90003a9f338 EFLAGS: 00010286 > RAX: RBX: 888025125780 RCX: 814db209 > RDX: 888015393b80 RSI: 814db216 RDI: 0001 > RBP: 8880251257f4 R08: 0001 R09: > R10: R11: 0001 R12: 045c > R13: 105f R14: 8880251257f0 R15: 105d > FS: 55c24380() GS:8880b990() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 2000f000 CR3: 23151000 CR4: 003506f0 > DR0: DR1: DR2: > DR3: DR6: fffe0ff0 DR7: 0400 > Call Trace: > > ip_do_fragment+0xa1b/0x18b0 net/ipv4/ip_output.c:777 > ip_fragment.constprop.0+0x161/0x230 net/ipv4/ip_output.c:584 > ip_finish_output_gso net/ipv4/ip_output.c:286 [inline] > __ip_finish_output net/ipv4/ip_output.c:308 [inline] > __ip_finish_output+0x49c/0x650 net/ipv4/ip_output.c:295 > ip_finish_output+0x31/0x310 net/ipv4/ip_output.c:323 > NF_HOOK_COND include/linux/netfilter.h:303 [inline] > ip_output+0x13b/0x2a0 net/ipv4/ip_output.c:433 > dst_output include/net/dst.h:451 [inline] > ip_local_out+0xaf/0x1a0 net/ipv4/ip_output.c:129 > iptunnel_xmit+0x5b4/0x9b0 net/ipv4/ip_tunnel_core.c:82 > ipip6_tunnel_xmit net/ipv6/sit.c:1034 [inline] > sit_tunnel_xmit+0xed2/0x28f0 net/ipv6/sit.c:1076 > __netdev_start_xmit include/linux/netdevice.h:4940 [inline] > netdev_start_xmit include/linux/netdevice.h:4954 [inline] > xmit_one net/core/dev.c:3545 [inline] > dev_hard_start_xmit+0x13d/0x6d0 net/core/dev.c:3561 > __dev_queue_xmit+0x7c1/0x3d60 net/core/dev.c:4346 > dev_queue_xmit include/linux/netdevice.h:3134 [inline] > packet_xmit+0x257/0x380 net/packet/af_packet.c:276 > packet_snd net/packet/af_packet.c:3087 [inline] > packet_sendmsg+0x24ca/0x5240 net/packet/af_packet.c:3119 > sock_sendmsg_nosec net/socket.c:730 [inline] > __sock_sendmsg+0xd5/0x180 net/socket.c:745 > __sys_sendto+0x255/0x340 net/socket.c:2190 > __do_sys_sendto net/socket.c:2202 [inline] > __se_sys_sendto net/socket.c:2198 [inline] > __x64_sys_sendto+0xe0/0x1b0 net/socket.c:2198 > do_syscall_x64 arch/x86/entry/common.c:51 [inline] > do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82 > entry_SYSCALL_64_after_hwframe+0x63/0x6b > > Signed-off-by: Denis Arefev > --- > include/linux/virtio_net.h | 10 ++ > 1 file changed, 10 insertions(+) > > diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h > index 4dfa9b69ca8d..77ebe908d746 100644 > --- a/include/linux/virtio_net.h > +++ b/include/linux/virtio_net.h > @@ -56,6 +56,7 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb, > unsigned int thlen = 0; > unsigned int p_off = 0; > unsigned int ip_proto; > + u64 ret, remainder; > > if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) { > switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) { > @@ -98,6 +99,15 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff > *skb, > u32 off = __virtio16_to_cpu(little_endian, hdr->csum_offset); > u32 needed = start + max_t(u32, thlen, off + sizeof(__sum16)); > > + if (hdr->gso_size) { > + ret = div64_u64_rem(skb->len, hdr->gso_size, > &remainder); > + if (!(ret && (hdr->gso_size > needed) && > + ((remainder > needed) || > (remainder == 0 { > +
[PATCH 0/5] Add MPSS remoteproc support for SDX75
Add modem support to SDX75 using the PAS remoteproc driver. Also, add qlink_logging memory region and split MPSS DSM region into 2 separate regions. These patches were co-authored by Rohit Agarwal while at Qualcomm. Naina Mehta (5): dt-bindings: remoteproc: qcom,sm8550-pas: document the SDX75 PAS remoteproc: qcom: pas: Add SDX75 remoteproc support arm64: dts: qcom: sdx75: add missing qlink_logging reserved memory for mpss arm64: dts: qcom: sdx75: Add remoteproc node arm64: dts: qcom: sdx75-idp: enable MPSS remoteproc node .../bindings/remoteproc/qcom,sm8550-pas.yaml | 1 + arch/arm64/boot/dts/qcom/sdx75-idp.dts| 6 ++ arch/arm64/boot/dts/qcom/sdx75.dtsi | 64 +-- drivers/remoteproc/qcom_q6v5_pas.c| 1 + 4 files changed, 68 insertions(+), 4 deletions(-) -- 2.17.1
[PATCH 1/5] dt-bindings: remoteproc: qcom,sm8550-pas: document the SDX75 PAS
Document the MPSS Peripheral Authentication Service on SDX75 platform. Signed-off-by: Naina Mehta --- .../devicetree/bindings/remoteproc/qcom,sm8550-pas.yaml | 1 + 1 file changed, 1 insertion(+) diff --git a/Documentation/devicetree/bindings/remoteproc/qcom,sm8550-pas.yaml b/Documentation/devicetree/bindings/remoteproc/qcom,sm8550-pas.yaml index 73fda7565cd1..02e15b1f78ab 100644 --- a/Documentation/devicetree/bindings/remoteproc/qcom,sm8550-pas.yaml +++ b/Documentation/devicetree/bindings/remoteproc/qcom,sm8550-pas.yaml @@ -16,6 +16,7 @@ description: properties: compatible: enum: + - qcom,sdx75-mpss-pas - qcom,sm8550-adsp-pas - qcom,sm8550-cdsp-pas - qcom,sm8550-mpss-pas -- 2.17.1
[PATCH 2/5] remoteproc: qcom: pas: Add SDX75 remoteproc support
Add MPSS Peripheral Authentication Service support for SDX75 platform. Signed-off-by: Naina Mehta --- drivers/remoteproc/qcom_q6v5_pas.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/remoteproc/qcom_q6v5_pas.c b/drivers/remoteproc/qcom_q6v5_pas.c index 8458bcfe9e19..833e2f9c2c5e 100644 --- a/drivers/remoteproc/qcom_q6v5_pas.c +++ b/drivers/remoteproc/qcom_q6v5_pas.c @@ -1343,6 +1343,7 @@ static const struct of_device_id adsp_of_match[] = { { .compatible = "qcom,sdm845-cdsp-pas", .data = &sdm845_cdsp_resource_init}, { .compatible = "qcom,sdm845-slpi-pas", .data = &sdm845_slpi_resource_init}, { .compatible = "qcom,sdx55-mpss-pas", .data = &sdx55_mpss_resource}, + { .compatible = "qcom,sdx75-mpss-pas", .data = &sm8650_mpss_resource}, { .compatible = "qcom,sm6115-adsp-pas", .data = &adsp_resource_init}, { .compatible = "qcom,sm6115-cdsp-pas", .data = &cdsp_resource_init}, { .compatible = "qcom,sm6115-mpss-pas", .data = &sc8180x_mpss_resource}, -- 2.17.1
[PATCH 3/5] arm64: dts: qcom: sdx75: add missing qlink_logging reserved memory for mpss
The qlink_logging memory region is also used by the modem firmware, add it to reserved memory regions. Also split MPSS DSM region into 2 separate regions. Signed-off-by: Naina Mehta --- arch/arm64/boot/dts/qcom/sdx75.dtsi | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/arch/arm64/boot/dts/qcom/sdx75.dtsi b/arch/arm64/boot/dts/qcom/sdx75.dtsi index 9b93f6501d55..9349b1c4e196 100644 --- a/arch/arm64/boot/dts/qcom/sdx75.dtsi +++ b/arch/arm64/boot/dts/qcom/sdx75.dtsi @@ -366,7 +366,12 @@ no-map; }; - qdss_mem: qdss@8880 { + qdss_mem: qdss@8850 { + reg = <0x0 0x8850 0x0 0x30>; + no-map; + }; + + qlink_logging_mem: qlink_logging@8880 { reg = <0x0 0x8880 0x0 0x30>; no-map; }; @@ -377,18 +382,22 @@ no-map; }; - mpss_dsmharq_mem: mpss-dsmharq@88f0 { - reg = <0x0 0x88f0 0x0 0x508>; + mpss_dsm_mem_2: mpss-dsmharq-2@88f0 { + reg = <0x0 0x88f0 0x0 0x250>; no-map; }; + mpss_dsm_mem: mpss-dsmharq@8b40 { + reg = <0x0 0x8b40 0x0 0x2b8>; + }; + q6_mpss_dtb_mem: q6-mpss-dtb@8df8 { reg = <0x0 0x8df8 0x0 0x8>; no-map; }; mpssadsp_mem: mpssadsp@8e00 { - reg = <0x0 0x8e00 0x0 0xf40>; + reg = <0x0 0x8e00 0x0 0xf10>; no-map; }; -- 2.17.1
[PATCH 5/5] arm64: dts: qcom: sdx75-idp: enable MPSS remoteproc node
Enable MPSS remoteproc node on sdx75-idp platform. Signed-off-by: Naina Mehta --- arch/arm64/boot/dts/qcom/sdx75-idp.dts | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/arm64/boot/dts/qcom/sdx75-idp.dts b/arch/arm64/boot/dts/qcom/sdx75-idp.dts index fde16308c7e2..f1bbe7ab01ab 100644 --- a/arch/arm64/boot/dts/qcom/sdx75-idp.dts +++ b/arch/arm64/boot/dts/qcom/sdx75-idp.dts @@ -282,6 +282,12 @@ status = "okay"; }; +&remoteproc_mpss { + firmware-name = "qcom/sdx75/modem.mbn", + "qcom/sdx75/modem_dtb.mbn"; + status = "okay"; +}; + &sdhc { cd-gpios = <&tlmm 103 GPIO_ACTIVE_LOW>; vmmc-supply = <®_2v95_vdd>; -- 2.17.1
[PATCH 4/5] arm64: dts: qcom: sdx75: Add remoteproc node
Add MPSS remoteproc node for SDX75 SoC. Signed-off-by: Naina Mehta --- arch/arm64/boot/dts/qcom/sdx75.dtsi | 47 + 1 file changed, 47 insertions(+) diff --git a/arch/arm64/boot/dts/qcom/sdx75.dtsi b/arch/arm64/boot/dts/qcom/sdx75.dtsi index 9349b1c4e196..25d3f110abe1 100644 --- a/arch/arm64/boot/dts/qcom/sdx75.dtsi +++ b/arch/arm64/boot/dts/qcom/sdx75.dtsi @@ -890,6 +890,53 @@ reg = <0x0 0x01fc 0x0 0x3>; }; + remoteproc_mpss: remoteproc@408 { + compatible = "qcom,sdx75-mpss-pas"; + reg = <0 0x0408 0 0x4040>; + + interrupts-extended = <&intc GIC_SPI 250 IRQ_TYPE_EDGE_RISING>, + <&smp2p_modem_in 0 IRQ_TYPE_EDGE_RISING>, + <&smp2p_modem_in 1 IRQ_TYPE_EDGE_RISING>, + <&smp2p_modem_in 2 IRQ_TYPE_EDGE_RISING>, + <&smp2p_modem_in 3 IRQ_TYPE_EDGE_RISING>, + <&smp2p_modem_in 7 IRQ_TYPE_EDGE_RISING>; + interrupt-names = "wdog", + "fatal", + "ready", + "handover", + "stop-ack", + "shutdown-ack"; + + clocks = <&rpmhcc RPMH_CXO_CLK>; + clock-names = "xo"; + + power-domains = <&rpmhpd RPMHPD_CX>, + <&rpmhpd RPMHPD_MSS>; + power-domain-names = "cx", +"mss"; + + memory-region = <&mpssadsp_mem>, <&q6_mpss_dtb_mem>, + <&mpss_dsm_mem>, <&mpss_dsm_mem_2>, + <&qlink_logging_mem>; + + qcom,qmp = <&aoss_qmp>; + + qcom,smem-states = <&smp2p_modem_out 0>; + qcom,smem-state-names = "stop"; + + status = "disabled"; + + glink-edge { + interrupts-extended = <&ipcc IPCC_CLIENT_MPSS + IPCC_MPROC_SIGNAL_PING + IRQ_TYPE_EDGE_RISING>; + mboxes = <&ipcc IPCC_CLIENT_MPSS + IPCC_MPROC_SIGNAL_PING>; + label = "mpss"; + qcom,remote-pid = <1>; + }; + }; + sdhc: mmc@8804000 { compatible = "qcom,sdx75-sdhci", "qcom,sdhci-msm-v5"; reg = <0x0 0x08804000 0x0 0x1000>; -- 2.17.1
Re: [PATCH 1/5] dt-bindings: remoteproc: qcom,sm8550-pas: document the SDX75 PAS
On 06/06/2024 16:38, Naina Mehta wrote: > Document the MPSS Peripheral Authentication Service on SDX75 platform. > > Signed-off-by: Naina Mehta > --- > .../devicetree/bindings/remoteproc/qcom,sm8550-pas.yaml | 1 + > 1 file changed, 1 insertion(+) > > diff --git > a/Documentation/devicetree/bindings/remoteproc/qcom,sm8550-pas.yaml > b/Documentation/devicetree/bindings/remoteproc/qcom,sm8550-pas.yaml > index 73fda7565cd1..02e15b1f78ab 100644 > --- a/Documentation/devicetree/bindings/remoteproc/qcom,sm8550-pas.yaml > +++ b/Documentation/devicetree/bindings/remoteproc/qcom,sm8550-pas.yaml > @@ -16,6 +16,7 @@ description: > properties: >compatible: > enum: > + - qcom,sdx75-mpss-pas >- qcom,sm8550-adsp-pas >- qcom,sm8550-cdsp-pas >- qcom,sm8550-mpss-pas Missing updates to allOf constraints. Are you sure this is the binding for SDX75? Best regards, Krzysztof
Re: [PATCH v14 02/14] cgroup/misc: Add per resource callbacks for CSS events
On Thu, 06 Jun 2024 08:37:31 -0500, chenridong wrote: If _misc_cg_res_alloc fails, maybe some types do not call ->alloc(), but all types ->free() callback >will be called, is that ok? Not sure I understand. Are you suggesting we ignore failures from ->alloc() callback in _misc_cg_res_alloc() as it is per-resource, and have ->free() callback and resource provider of the failing type to handle the failure internally? IIUC, this failure only happens when a specific subcgroup is created (memory running out for allocation) so failing that subcgroup as a whole seems fine to me. Note the root node is static and no pre-resource callbacks invoked by misc. And resource provider handles its own allocations for root. In SGX case we too declare a static object for corresponding root sgx_cgroup struct. Note also misc cgroup (except for setting capacity[res] = 0 at root) is all or nothing so no mechanism to tell user "this resource does not work but others are fine in this particular cgroup." Thanks Haitao
Re: [PATCH 3/5] arm64: dts: qcom: sdx75: add missing qlink_logging reserved memory for mpss
On 06/06/2024 16:38, Naina Mehta wrote: > The qlink_logging memory region is also used by the modem firmware, > add it to reserved memory regions. > Also split MPSS DSM region into 2 separate regions. > > Signed-off-by: Naina Mehta > --- > arch/arm64/boot/dts/qcom/sdx75.dtsi | 17 + > 1 file changed, 13 insertions(+), 4 deletions(-) > > diff --git a/arch/arm64/boot/dts/qcom/sdx75.dtsi > b/arch/arm64/boot/dts/qcom/sdx75.dtsi > index 9b93f6501d55..9349b1c4e196 100644 > --- a/arch/arm64/boot/dts/qcom/sdx75.dtsi > +++ b/arch/arm64/boot/dts/qcom/sdx75.dtsi > @@ -366,7 +366,12 @@ > no-map; > }; > > - qdss_mem: qdss@8880 { > + qdss_mem: qdss@8850 { > + reg = <0x0 0x8850 0x0 0x30>; > + no-map; > + }; > + > + qlink_logging_mem: qlink_logging@8880 { Sorry, no downstream code. Please follow DTS coding style - no underscores in node names. This applies to all work sent upstream. Best regards, Krzysztof
[PATCH v2 1/1] dt-bindings: remoteproc: imx_rproc: add minItems for power-domain
"fsl,imx8qxp-cm4" and "fsl,imx8qm-cm4" need minimum 2 power domains. Keep the same restriction for other compatible string. Signed-off-by: Frank Li --- Notes: Change from v1 to v2 - set minitem to 2 at top - Add imx8qm compatible string also - use not logic to handle difference compatible string restriction - update commit message. pass dt_binding_check. make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- -j8 dt_binding_check DT_SCHEMA_FILES=fsl,imx-rproc.yaml SCHEMA Documentation/devicetree/bindings/processed-schema.json CHKDT Documentation/devicetree/bindings LINTDocumentation/devicetree/bindings DTEX Documentation/devicetree/bindings/remoteproc/fsl,imx-rproc.example.dts DTC_CHK Documentation/devicetree/bindings/remoteproc/fsl,imx-rproc.example.dtb .../bindings/remoteproc/fsl,imx-rproc.yaml | 14 ++ 1 file changed, 14 insertions(+) diff --git a/Documentation/devicetree/bindings/remoteproc/fsl,imx-rproc.yaml b/Documentation/devicetree/bindings/remoteproc/fsl,imx-rproc.yaml index df36e29d974ca..da108a39df435 100644 --- a/Documentation/devicetree/bindings/remoteproc/fsl,imx-rproc.yaml +++ b/Documentation/devicetree/bindings/remoteproc/fsl,imx-rproc.yaml @@ -59,6 +59,7 @@ properties: maxItems: 32 power-domains: +minItems: 2 maxItems: 8 fsl,auto-boot: @@ -99,6 +100,19 @@ allOf: properties: fsl,iomuxc-gpr: false + - if: + properties: +compatible: + not: +contains: + enum: +- fsl,imx8qxp-cm4 +- fsl,imx8qm-cm4 +then: + properties: +power-domains: + minItems: 8 + additionalProperties: false examples: -- 2.34.1
Re: [PATCH] livepatch: introduce klp_func called interface
Hi Wardenjohn, To follow up, Red Hat kpatch QE pointed me toward this test: https://gitlab.com/redhat/centos-stream/tests/kernel/kernel-tests/-/tree/main/general/kpatch/kpatch-trace?ref_type=heads which reports a few interesting things via systemd service and ftrace: - Patched functions - Traced functions - Code that was modified, but didn't run - Other code that ftrace saw, but is not listed in the sysfs directory The code looks to be built on top of the same basic ftrace commands that I sent the other day. You may be able to reuse/copy/etc bits from the scripts if they are handy. -- Joe
[PATCH 0/3] Tracepoints and static branch/call in Rust
An important part of a production ready Linux kernel driver is tracepoints. So to write production ready Linux kernel drivers in Rust, we must be able to call tracepoints from Rust code. This patch series adds support for calling tracepoints declared in C from Rust. To use the tracepoint support, you must: 1. Declare the tracepoint in a C header file as usual. 2. Make sure that the header file is visible to bindgen so that Rust bindings are generated for the symbols that the tracepoint macro emits. 3. Use the declare_trace! macro in your Rust code to generate Rust functions that call into the tracepoint. For example, the kernel has a tracepoint called `sched_kthread_stop`. It is declared like this: TRACE_EVENT(sched_kthread_stop, TP_PROTO(struct task_struct *t), TP_ARGS(t), TP_STRUCT__entry( __array(char, comm, TASK_COMM_LEN ) __field(pid_t, pid ) ), TP_fast_assign( memcpy(__entry->comm, t->comm, TASK_COMM_LEN); __entry->pid= t->pid; ), TP_printk("comm=%s pid=%d", __entry->comm, __entry->pid) ); To call the above tracepoint from Rust code, you would add the relevant header file to rust/bindings/bindings_helper.h and add the following invocation somewhere in your Rust code: declare_trace! { fn sched_kthread_stop(task: *mut task_struct); } This will define a Rust function of the given name that you can call like any other Rust function. Since these tracepoints often take raw pointers as arguments, it may be convenient to wrap it in a safe wrapper: mod raw { declare_trace! { fn sched_kthread_stop(task: *mut task_struct); } } #[inline] pub fn trace_sched_kthread_stop(task: &Task) { // SAFETY: The pointer to `task` is valid. unsafe { raw::sched_kthread_stop(task.as_raw()) } } A future expansion of the tracepoint support could generate these safe versions automatically, but that is left as future work for now. This is intended for use in the Rust Binder driver, which was originally sent as an RFC [1]. The RFC did not include tracepoint support, but you can see how it will be used in Rust Binder at [2]. The author has verified that the tracepoint support works on Android devices. This implementation implements support for static keys in Rust so that the actual static branch will end up in the Rust object file. However, it would also be possible to just wrap the trace_##name generated by __DECLARE_TRACE in an extern C function and then call that from Rust. This will simplify the Rust code by removing the need for static branches and calls, but it places the static branch behind an external call, which has performance implications. A possible middle ground would be to place just the __DO_TRACE body in an extern C function and to implement the Rust wrapper by doing the static branch in Rust, and then calling into C the code that contains __DO_TRACE when the tracepoint is active. However, this would need some changes to include/linux/tracepoint.h to generate and export a function containing the body of __DO_TRACE when the tracepoint should be callable from Rust. So in general, there is a tradeoff between placing parts of the tracepoint (which is perf sensitive) behind an external call, and having code duplicated in both C and Rust (which must be kept in sync when changes are made). This is an important point that I would like feedback on from the C maintainers. Link: https://lore.kernel.org/rust-for-linux/20231101-rust-binder-v1-0-08ba9197f...@google.com/ [1] Link: https://r.android.com/3110088 [2] Signed-off-by: Alice Ryhl --- Alice Ryhl (3): rust: add static_call support rust: add static_key_false rust: add tracepoint support rust/bindings/bindings_helper.h | 1 + rust/bindings/lib.rs| 15 +++ rust/helpers.c | 24 +++ rust/kernel/lib.rs | 3 ++ rust/kernel/static_call.rs | 92 + rust/kernel/static_key.rs | 87 ++ rust/kernel/tracepoint.rs | 92 + scripts/Makefile.build | 2 +- 8 files changed, 315 insertions(+), 1 deletion(-) --- base-commit: 1613e604df0cd359cf2a7fbd9be7a0bcfacfabd0 change-id: 20240606-tracepoint-31e15b90e471 Best regards, -- Alice Ryhl
[PATCH 3/3] rust: add tracepoint support
Make it possible to have Rust code call into tracepoints defined by C code. It is still required that the tracepoint is declared in a C header, and that this header is included in the input to bindgen. Signed-off-by: Alice Ryhl --- rust/bindings/bindings_helper.h | 1 + rust/bindings/lib.rs| 15 +++ rust/helpers.c | 24 +++ rust/kernel/lib.rs | 1 + rust/kernel/tracepoint.rs | 92 + 5 files changed, 133 insertions(+) diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h index ddb5644d4fd9..d442f9ccfc2c 100644 --- a/rust/bindings/bindings_helper.h +++ b/rust/bindings/bindings_helper.h @@ -15,6 +15,7 @@ #include #include #include +#include #include #include diff --git a/rust/bindings/lib.rs b/rust/bindings/lib.rs index 40ddaee50d8b..48856761d682 100644 --- a/rust/bindings/lib.rs +++ b/rust/bindings/lib.rs @@ -48,3 +48,18 @@ mod bindings_helper { } pub use bindings_raw::*; + +/// Rust version of the C macro `rcu_dereference_raw`. +/// +/// The rust helper only works with void pointers, but this wrapper method makes it work with any +/// pointer type using pointer casts. +/// +/// # Safety +/// +/// This method has the same safety requirements as the C macro of the same name. +#[inline(always)] +pub unsafe fn rcu_dereference_raw(p: *const *mut T) -> *mut T { +// SAFETY: This helper calls into the C macro, so the caller promises to uphold the safety +// requirements. +unsafe { __rcu_dereference_raw(p as *mut *mut _) as *mut T } +} diff --git a/rust/helpers.c b/rust/helpers.c index 2c37a0f5d7a8..0560cc2a512a 100644 --- a/rust/helpers.c +++ b/rust/helpers.c @@ -165,6 +165,30 @@ rust_helper_krealloc(const void *objp, size_t new_size, gfp_t flags) } EXPORT_SYMBOL_GPL(rust_helper_krealloc); +void rust_helper_preempt_enable_notrace(void) +{ + preempt_enable_notrace(); +} +EXPORT_SYMBOL_GPL(rust_helper_preempt_enable_notrace); + +void rust_helper_preempt_disable_notrace(void) +{ + preempt_disable_notrace(); +} +EXPORT_SYMBOL_GPL(rust_helper_preempt_disable_notrace); + +bool rust_helper_current_cpu_online(void) +{ + return cpu_online(raw_smp_processor_id()); +} +EXPORT_SYMBOL_GPL(rust_helper_current_cpu_online); + +void *rust_helper___rcu_dereference_raw(void **p) +{ + return rcu_dereference_raw(p); +} +EXPORT_SYMBOL_GPL(rust_helper___rcu_dereference_raw); + /* * `bindgen` binds the C `size_t` type as the Rust `usize` type, so we can * use it in contexts where Rust expects a `usize` like slice (array) indices. diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs index 22e1fedd0774..3f3b280bb437 100644 --- a/rust/kernel/lib.rs +++ b/rust/kernel/lib.rs @@ -46,6 +46,7 @@ pub mod sync; pub mod task; pub mod time; +pub mod tracepoint; pub mod types; pub mod workqueue; diff --git a/rust/kernel/tracepoint.rs b/rust/kernel/tracepoint.rs new file mode 100644 index ..d628ae71fc58 --- /dev/null +++ b/rust/kernel/tracepoint.rs @@ -0,0 +1,92 @@ +// SPDX-License-Identifier: GPL-2.0 + +// Copyright (C) 2024 Google LLC. + +//! Logic for tracepoints. + +/// Declare the Rust entry point for a tracepoint. +#[macro_export] +macro_rules! declare_trace { +($($(#[$attr:meta])* $pub:vis fn $name:ident($($argname:ident : $argtyp:ty),* $(,)?);)*) => {$( +$( #[$attr] )* +#[inline(always)] +$pub unsafe fn $name($($argname : $argtyp),*) { +#[cfg(CONFIG_TRACEPOINTS)] +{ +use $crate::bindings::*; + +// SAFETY: This macro only compiles if $name is a real tracepoint, and if it is a +// real tracepoint, then it is okay to query the static key. +let should_trace = unsafe { +$crate::macros::paste! { +$crate::static_key::static_key_false!( +[< __tracepoint_ $name >], +$crate::bindings::tracepoint, +key +) +} +}; + +if should_trace { +$crate::tracepoint::do_trace!($name($($argname : $argtyp),*), cond); +} +} + +#[cfg(not(CONFIG_TRACEPOINTS))] +{ +// If tracepoints are disabled, insert a trivial use of each argument +// to avoid unused argument warnings. +$( let _unused = $argname; )* +} +} +)*} +} + +#[doc(hidden)] +#[macro_export] +macro_rules! do_trace { +($name:ident($($argname:ident : $argtyp:ty),* $(,)?), $cond:expr) => {{ +if !$crate::bindings::current_cpu_online() { +return; +} + +// SAFETY: This call is balanced with the call below. +unsafe { $crate::bindings::preempt_disable_notrace() }; + +// SAFETY: This calls the trace
Re: [PATCH 3/3] rust: add tracepoint support
On 2024-06-06 11:05, Alice Ryhl wrote: Make it possible to have Rust code call into tracepoints defined by C code. It is still required that the tracepoint is declared in a C header, and that this header is included in the input to bindgen. Signed-off-by: Alice Ryhl [...] diff --git a/rust/helpers.c b/rust/helpers.c index 2c37a0f5d7a8..0560cc2a512a 100644 --- a/rust/helpers.c +++ b/rust/helpers.c @@ -165,6 +165,30 @@ rust_helper_krealloc(const void *objp, size_t new_size, gfp_t flags) } EXPORT_SYMBOL_GPL(rust_helper_krealloc); +void rust_helper_preempt_enable_notrace(void) +{ + preempt_enable_notrace(); +} +EXPORT_SYMBOL_GPL(rust_helper_preempt_enable_notrace); + +void rust_helper_preempt_disable_notrace(void) +{ + preempt_disable_notrace(); +} +EXPORT_SYMBOL_GPL(rust_helper_preempt_disable_notrace); + +bool rust_helper_current_cpu_online(void) +{ + return cpu_online(raw_smp_processor_id()); +} +EXPORT_SYMBOL_GPL(rust_helper_current_cpu_online); + +void *rust_helper___rcu_dereference_raw(void **p) +{ + return rcu_dereference_raw(p); +} +EXPORT_SYMBOL_GPL(rust_helper___rcu_dereference_raw); Ouch. Doing a function call for each of those small operations will have a rather large performance impact when tracing is active. If it is not possible to inline those in Rust, then implementing __DO_TRACE in a C function would at least allow Rust to only do a single call to C rather than go back and forth between Rust and C. What prevents inlining those helpers in Rust ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com
[PATCH 1/3] rust: add static_call support
Add static_call support by mirroring how C does. When the platform does not support static calls (right now, that means that it is not x86), then the function pointer is loaded from a global and called. Otherwise, we generate a call to a trampoline function, and objtool is used to make these calls patchable at runtime. Signed-off-by: Alice Ryhl --- rust/kernel/lib.rs | 1 + rust/kernel/static_call.rs | 92 ++ 2 files changed, 93 insertions(+) diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs index fbd91a48ff8b..d534b1178955 100644 --- a/rust/kernel/lib.rs +++ b/rust/kernel/lib.rs @@ -38,6 +38,7 @@ pub mod prelude; pub mod print; mod static_assert; +pub mod static_call; #[doc(hidden)] pub mod std_vendor; pub mod str; diff --git a/rust/kernel/static_call.rs b/rust/kernel/static_call.rs new file mode 100644 index ..f7b8ba7bf1fb --- /dev/null +++ b/rust/kernel/static_call.rs @@ -0,0 +1,92 @@ +// SPDX-License-Identifier: GPL-2.0 + +// Copyright (C) 2024 Google LLC. + +//! Logic for static calls. + +#[macro_export] +#[doc(hidden)] +macro_rules! ty_underscore_for { +($arg:expr) => { +_ +}; +} + +#[doc(hidden)] +#[repr(transparent)] +pub struct AddressableStaticCallKey { +_ptr: *const bindings::static_call_key, +} +unsafe impl Sync for AddressableStaticCallKey {} +impl AddressableStaticCallKey { +pub const fn new(ptr: *const bindings::static_call_key) -> Self { +Self { _ptr: ptr } +} +} + +#[cfg(CONFIG_HAVE_STATIC_CALL)] +#[doc(hidden)] +#[macro_export] +macro_rules! _static_call { +($name:ident($($args:expr),* $(,)?)) => {{ +// Symbol mangling will give this symbol a unique name. +#[cfg(CONFIG_HAVE_STATIC_CALL_INLINE)] +#[link_section = ".discard.addressable"] +#[used] +static __ADDRESSABLE: $crate::static_call::AddressableStaticCallKey = unsafe { + $crate::static_call::AddressableStaticCallKey::new(::core::ptr::addr_of!( +$crate::macros::paste! { $crate::bindings:: [<__SCK__ $name >]; } +)) +}; + +let fn_ptr: unsafe extern "C" fn($($crate::static_call::ty_underscore_for!($args)),*) -> _ = +$crate::macros::paste! { $crate::bindings:: [<__SCT__ $name >]; }; +(fn_ptr)($($args),*) +}}; +} + +#[cfg(not(CONFIG_HAVE_STATIC_CALL))] +#[doc(hidden)] +#[macro_export] +macro_rules! _static_call { +($name:ident($($args:expr),* $(,)?)) => {{ +let void_ptr_fn: *mut ::core::ffi::c_void = +$crate::macros::paste! { $crate::bindings:: [<__SCK__ $name >]; }.func; + +let fn_ptr: unsafe extern "C" fn($($crate::static_call::ty_underscore_for!($args)),*) -> _ = +if true { +::core::mem::transmute(void_ptr_fn) +} else { +// This is dead code, but it influences type inference on `fn_ptr` so that we +// transmute the function pointer to the right type. +$crate::macros::paste! { $crate::bindings:: [<__SCT__ $name >]; } +}; + +(fn_ptr)($($args),*) +}}; +} + +/// Statically call a global function. +/// +/// # Safety +/// +/// This macro will call the provided function. It is up to the caller to uphold the safety +/// guarantees of the function. +/// +/// # Examples +/// +/// ```ignore +/// fn call_static() { +/// unsafe { +/// static_call! { your_static_call() }; +/// } +/// } +/// ``` +#[macro_export] +macro_rules! static_call { +// Forward to the real implementation. Separated like this so that we don't have to duplicate +// the documentation. +($($args:tt)*) => { $crate::static_call::_static_call! { $($args)* } }; +} + +pub use {_static_call, static_call, ty_underscore_for}; -- 2.45.2.505.gda0bf45e8d-goog
[PATCH 2/3] rust: add static_key_false
Add just enough support for static key so that we can use it from tracepoints. Tracepoints rely on `static_key_false` even though it is deprecated, so we add the same functionality to Rust. Signed-off-by: Alice Ryhl --- rust/kernel/lib.rs| 1 + rust/kernel/static_key.rs | 87 +++ scripts/Makefile.build| 2 +- 3 files changed, 89 insertions(+), 1 deletion(-) diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs index d534b1178955..22e1fedd0774 100644 --- a/rust/kernel/lib.rs +++ b/rust/kernel/lib.rs @@ -39,6 +39,7 @@ pub mod print; mod static_assert; pub mod static_call; +pub mod static_key; #[doc(hidden)] pub mod std_vendor; pub mod str; diff --git a/rust/kernel/static_key.rs b/rust/kernel/static_key.rs new file mode 100644 index ..6c3dbe14c98a --- /dev/null +++ b/rust/kernel/static_key.rs @@ -0,0 +1,87 @@ +// SPDX-License-Identifier: GPL-2.0 + +// Copyright (C) 2024 Google LLC. + +//! Logic for static keys. + +use crate::bindings::*; + +#[doc(hidden)] +#[macro_export] +#[cfg(target_arch = "x86_64")] +macro_rules! _static_key_false { +($key:path, $keytyp:ty, $field:ident) => {'my_label: { +core::arch::asm!( +r#" +1: .byte 0x0f,0x1f,0x44,0x00,0x00 + +.pushsection __jump_table, "aw" +.balign 8 +.long 1b - . +.long {0} - . +.quad {1} + {2} - . +.popsection +"#, +label { +break 'my_label true; +}, +sym $key, +const ::core::mem::offset_of!($keytyp, $field), +); + +break 'my_label false; +}}; +} + +#[doc(hidden)] +#[macro_export] +#[cfg(target_arch = "aarch64")] +macro_rules! _static_key_false { +($key:path, $keytyp:ty, $field:ident) => {'my_label: { +core::arch::asm!( +r#" +1: nop + +.pushsection __jump_table, "aw" +.align 3 +.long 1b - ., {0} - . +.quad {1} + {2} - . +.popsection +"#, +label { +break 'my_label true; +}, +sym $key, +const ::core::mem::offset_of!($keytyp, $field), +); + +break 'my_label false; +}}; +} + +/// Branch based on a static key. +/// +/// Takes three arguments: +/// +/// * `key` - the path to the static variable containing the `static_key`. +/// * `keytyp` - the type of `key`. +/// * `field` - the name of the field of `key` that contains the `static_key`. +#[macro_export] +macro_rules! static_key_false { +// Forward to the real implementation. Separated like this so that we don't have to duplicate +// the documentation. +($key:path, $keytyp:ty, $field:ident) => {{ +// Assert that `$key` has type `$keytyp` and that `$key.$field` has type `static_key`. +// +// SAFETY: We know that `$key` is a static because otherwise the inline assembly will not +// compile. The raw pointers created in this block are in-bounds of `$key`. +static _TY_ASSERT: () = unsafe { +let key: *const $keytyp = ::core::ptr::addr_of!($key); +let _: *const $crate::bindings::static_key = ::core::ptr::addr_of!((*key).$field); +}; + +$crate::static_key::_static_key_false! { $key, $keytyp, $field } +}}; +} + +pub use {_static_key_false, static_key_false}; diff --git a/scripts/Makefile.build b/scripts/Makefile.build index efacca63c897..60197c1c063f 100644 --- a/scripts/Makefile.build +++ b/scripts/Makefile.build @@ -263,7 +263,7 @@ $(obj)/%.lst: $(obj)/%.c FORCE # Compile Rust sources (.rs) # --- -rust_allowed_features := new_uninit +rust_allowed_features := asm_const,asm_goto,new_uninit # `--out-dir` is required to avoid temporaries being created by `rustc` in the # current working directory, which may be not accessible in the out-of-tree -- 2.45.2.505.gda0bf45e8d-goog
Re: [PATCH 2/3] rust: add static_key_false
On 2024-06-06 11:05, Alice Ryhl wrote: Add just enough support for static key so that we can use it from tracepoints. Tracepoints rely on `static_key_false` even though it is deprecated, so we add the same functionality to Rust. Signed-off-by: Alice Ryhl --- rust/kernel/lib.rs| 1 + rust/kernel/static_key.rs | 87 +++ scripts/Makefile.build| 2 +- 3 files changed, 89 insertions(+), 1 deletion(-) diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs index d534b1178955..22e1fedd0774 100644 --- a/rust/kernel/lib.rs +++ b/rust/kernel/lib.rs @@ -39,6 +39,7 @@ pub mod print; mod static_assert; pub mod static_call; +pub mod static_key; #[doc(hidden)] pub mod std_vendor; pub mod str; diff --git a/rust/kernel/static_key.rs b/rust/kernel/static_key.rs new file mode 100644 index ..6c3dbe14c98a --- /dev/null +++ b/rust/kernel/static_key.rs @@ -0,0 +1,87 @@ +// SPDX-License-Identifier: GPL-2.0 This static key code is something that can be generally useful both in kernel and user-space. Is there anything that would prevent licensing this under MIT right away instead so it could eventually be re-used more broadly in userspace as well ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com
Re: [RFC v3 net-next 1/7] net: add rx_sk to trace_kfree_skb
On Wed, Jun 5, 2024 at 6:57 PM Steven Rostedt wrote: > > On Tue, 4 Jun 2024 14:47:38 -0700 > Yan Zhai wrote: > > > skb does not include enough information to find out receiving > > sockets/services and netns/containers on packet drops. In theory > > skb->dev tells about netns, but it can get cleared/reused, e.g. by TCP > > stack for OOO packet lookup. Similarly, skb->sk often identifies a local > > sender, and tells nothing about a receiver. > > > > Allow passing an extra receiving socket to the tracepoint to improve > > the visibility on receiving drops. > > > > Signed-off-by: Yan Zhai > > --- > > v2->v3: fixed drop_monitor function prototype > > --- > > include/trace/events/skb.h | 11 +++ > > net/core/dev.c | 2 +- > > net/core/drop_monitor.c| 9 ++--- > > net/core/skbuff.c | 2 +- > > 4 files changed, 15 insertions(+), 9 deletions(-) > > > > diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h > > index 07e0715628ec..aa6b46b6172c 100644 > > --- a/include/trace/events/skb.h > > +++ b/include/trace/events/skb.h > > @@ -24,15 +24,16 @@ DEFINE_DROP_REASON(FN, FN) > > TRACE_EVENT(kfree_skb, > > > > TP_PROTO(struct sk_buff *skb, void *location, > > - enum skb_drop_reason reason), > > + enum skb_drop_reason reason, struct sock *rx_sk), > > > > - TP_ARGS(skb, location, reason), > > + TP_ARGS(skb, location, reason, rx_sk), > > > > TP_STRUCT__entry( > > __field(void *, skbaddr) > > __field(void *, location) > > __field(unsigned short, protocol) > > __field(enum skb_drop_reason, reason) > > + __field(void *, rx_skaddr) > > Please add the pointer after the other pointers: > > __field(void *, skbaddr) > __field(void *, location) > + __field(void *, rx_skaddr) > __field(unsigned short, protocol) > __field(enum skb_drop_reason, reason) > > otherwise you are adding holes in the ring buffer event. > > The TP_STRUCT__entry() is a structure that is saved in the ring buffer. We > want to avoid alignment holes. I also question having a short before the > enum, if the emum is 4 bytes. The short should be at the end. > > In fact, looking at the format file, there is a 2 byte hole: > > # cat /sys/kernel/tracing/events/skb/kfree_skb/format > > name: kfree_skb > ID: 1799 > format: > field:unsigned short common_type; offset:0; size:2; > signed:0; > field:unsigned char common_flags; offset:2; size:1; > signed:0; > field:unsigned char common_preempt_count; offset:3; > size:1; signed:0; > field:int common_pid; offset:4; size:4; signed:1; > > field:void * skbaddr; offset:8; size:8; signed:0; > field:void * location; offset:16; size:8; signed:0; > field:unsigned short protocol; offset:24; size:2; signed:0; > field:enum skb_drop_reason reason; offset:28; size:4; > signed:0; > > Notice that "protocol" is 2 bytes in size at offset 24, but "reason" starts > at offset 28. This means at offset 26, there's a 2 byte hole. > The reason I added the pointer as the last argument is trying to minimize the surprise to existing TP users, because for common ABIs it's fine to omit later arguments when defining a function, but it needs change and recompilation if the order of arguments changed. Looking at the actual format after the change, it does not add a new hole since protocol and reason are already packed into the same 8-byte block, so rx_skaddr starts at 8-byte aligned offset: # cat /sys/kernel/debug/tracing/events/skb/kfree_skb/format name: kfree_skb ID: 2260 format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1; field:void * skbaddr; offset:8; size:8; signed:0; field:void * location; offset:16; size:8; signed:0; field:unsigned short protocol; offset:24; size:2; signed:0; field:enum skb_drop_reason reason; offset:28; size:4; signed:0; field:void * rx_skaddr; offset:32; size:8; signed:0; Do you think we still need to change the order? Yan > -- Steve > > > > > ), > > > > TP_fast_assign( > > @@ -40,12 +41,14 @@ TRACE_EVENT(kfree_skb, > > __entry->location = location; > > __entry->protocol = ntohs(skb->protocol); > > __entry->reason = reason; > > + __entry->rx_skaddr = rx_sk; > > ), > > >
Re: [PATCH 0/3] Tracepoints and static branch/call in Rust
lication of binary code (instructions). Thanks, Mathieu Link: https://lore.kernel.org/rust-for-linux/20231101-rust-binder-v1-0-08ba9197f...@google.com/ [1] Link: https://r.android.com/3110088 [2] Signed-off-by: Alice Ryhl --- Alice Ryhl (3): rust: add static_call support rust: add static_key_false rust: add tracepoint support rust/bindings/bindings_helper.h | 1 + rust/bindings/lib.rs| 15 +++ rust/helpers.c | 24 +++ rust/kernel/lib.rs | 3 ++ rust/kernel/static_call.rs | 92 + rust/kernel/static_key.rs | 87 ++ rust/kernel/tracepoint.rs | 92 + scripts/Makefile.build | 2 +- 8 files changed, 315 insertions(+), 1 deletion(-) --- base-commit: 1613e604df0cd359cf2a7fbd9be7a0bcfacfabd0 change-id: 20240606-tracepoint-31e15b90e471 Best regards, -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com
Re: [PATCH 0/3] Tracepoints and static branch/call in Rust
On Thu, Jun 6, 2024 at 5:25 PM Mathieu Desnoyers wrote: > > On 2024-06-06 11:05, Alice Ryhl wrote: > > This implementation implements support for static keys in Rust so that > > the actual static branch will end up in the Rust object file. However, > > it would also be possible to just wrap the trace_##name generated by > > __DECLARE_TRACE in an extern C function and then call that from Rust. > > This will simplify the Rust code by removing the need for static > > branches and calls, but it places the static branch behind an external > > call, which has performance implications. > > The tracepoints try very hard to minimize overhead of dormant tracepoints > so it is not frowned-upon to have them built into production binaries. > This is needed to make sure distribution vendors keep those tracepoints > in the kernel binaries that reach end-users. > > Adding a function call before evaluation of the static branch goes against > this major goal. > > > > > A possible middle ground would be to place just the __DO_TRACE body in > > an extern C function and to implement the Rust wrapper by doing the > > static branch in Rust, and then calling into C the code that contains > > __DO_TRACE when the tracepoint is active. However, this would need some > > changes to include/linux/tracepoint.h to generate and export a function > > containing the body of __DO_TRACE when the tracepoint should be callable > > from Rust. > > This tradeoff is more acceptable than having a function call before > evaluation of the static branch, but I wonder what is the upside of > this tradeoff compared to inlining the whole __DO_TRACE in Rust ? > > > So in general, there is a tradeoff between placing parts of the > > tracepoint (which is perf sensitive) behind an external call, and having > > code duplicated in both C and Rust (which must be kept in sync when > > changes are made). This is an important point that I would like feedback > > on from the C maintainers. > > I don't see how the duplication happens there: __DO_TRACE is meant to be > inlined into each C tracepoint caller site, so the code is already meant > to be duplicated. Having an explicit function wrapping the tracepoint > for Rust would just create an extra instance of __DO_TRACE if it happens > to be also inlined into C code. > > Or do you meant you would like to prevent having to duplicate the > implementation of __DO_TRACE in both C and Rust ? > > I'm not sure if you mean to prevent source code duplication between > C and Rust or duplication of binary code (instructions). It's a question of maintenance burden. If you change how __DO_TRACE is implemented, then those changes must also be reflected in the Rust version. There's no issue in the binary code. Alice
Re: [PATCH 3/3] rust: add tracepoint support
On Thu, Jun 06, 2024 at 11:30:03AM -0400, Mathieu Desnoyers wrote: > On 2024-06-06 11:05, Alice Ryhl wrote: > > Make it possible to have Rust code call into tracepoints defined by C > > code. It is still required that the tracepoint is declared in a C > > header, and that this header is included in the input to bindgen. > > > > Signed-off-by: Alice Ryhl > > [...] > > > diff --git a/rust/helpers.c b/rust/helpers.c > > index 2c37a0f5d7a8..0560cc2a512a 100644 > > --- a/rust/helpers.c > > +++ b/rust/helpers.c > > @@ -165,6 +165,30 @@ rust_helper_krealloc(const void *objp, size_t > > new_size, gfp_t flags) > > } > > EXPORT_SYMBOL_GPL(rust_helper_krealloc); > > +void rust_helper_preempt_enable_notrace(void) > > +{ > > + preempt_enable_notrace(); > > +} > > +EXPORT_SYMBOL_GPL(rust_helper_preempt_enable_notrace); > > + > > +void rust_helper_preempt_disable_notrace(void) > > +{ > > + preempt_disable_notrace(); > > +} > > +EXPORT_SYMBOL_GPL(rust_helper_preempt_disable_notrace); > > + > > +bool rust_helper_current_cpu_online(void) > > +{ > > + return cpu_online(raw_smp_processor_id()); > > +} > > +EXPORT_SYMBOL_GPL(rust_helper_current_cpu_online); > > + > > +void *rust_helper___rcu_dereference_raw(void **p) > > +{ > > + return rcu_dereference_raw(p); > > +} > > +EXPORT_SYMBOL_GPL(rust_helper___rcu_dereference_raw); > > Ouch. Doing a function call for each of those small operations will > have a rather large performance impact when tracing is active. If it is Long-term plan is to 1) compile the C helpers in some IR and 2) inline the helpers with Rust in IR-level, as what Gary has: https://lore.kernel.org/rust-for-linux/20240529202817.3641974-1-g...@garyguo.net/ and I use it for the upcoming atomic API support: https://github.com/fbq/linux/tree/dev/rust/atomic-rfc and it works very well. Regards, Boqun > not possible to inline those in Rust, then implementing __DO_TRACE in > a C function would at least allow Rust to only do a single call to C > rather than go back and forth between Rust and C. > > What prevents inlining those helpers in Rust ? > > Thanks, > > Mathieu > > -- > Mathieu Desnoyers > EfficiOS Inc. > https://www.efficios.com >
Re: [PATCH] ACPI: NFIT: add missing MODULE_DESCRIPTION() macro
On 6/3/24 6:30 AM, Jeff Johnson wrote: > make allmodconfig && make W=1 C=1 reports: > WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/acpi/nfit/nfit.o > > Add the missing invocation of the MODULE_DESCRIPTION() macro. > > Signed-off-by: Jeff Johnson Reviewed-by: Dave Jiang > --- > drivers/acpi/nfit/core.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c > index d4595d1985b1..e8520fb8af4f 100644 > --- a/drivers/acpi/nfit/core.c > +++ b/drivers/acpi/nfit/core.c > @@ -3531,5 +3531,6 @@ static __exit void nfit_exit(void) > > module_init(nfit_init); > module_exit(nfit_exit); > +MODULE_DESCRIPTION("ACPI NVDIMM Firmware Interface Table (NFIT) module"); > MODULE_LICENSE("GPL v2"); > MODULE_AUTHOR("Intel Corporation"); > > --- > base-commit: a693b9c95abd4947c2d06e05733de5d470ab6586 > change-id: 20240603-md-drivers-acpi-nfit-e032bad0b189 >
Re: [PATCH RESEND] nvdimm: add missing MODULE_DESCRIPTION() macros
On 5/26/24 10:07 AM, Jeff Johnson wrote: > Fix the 'make W=1' warnings: > WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvdimm/libnvdimm.o > WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvdimm/nd_pmem.o > WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvdimm/nd_btt.o > WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvdimm/nd_e820.o > WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvdimm/of_pmem.o > WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvdimm/nd_virtio.o > > Signed-off-by: Jeff Johnson Reviewed-by: Dave Jiang > --- > drivers/nvdimm/btt.c | 1 + > drivers/nvdimm/core.c | 1 + > drivers/nvdimm/e820.c | 1 + > drivers/nvdimm/nd_virtio.c | 1 + > drivers/nvdimm/of_pmem.c | 1 + > drivers/nvdimm/pmem.c | 1 + > 6 files changed, 6 insertions(+) > > diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c > index 1e5aedaf8c7b..a47acc5d05df 100644 > --- a/drivers/nvdimm/btt.c > +++ b/drivers/nvdimm/btt.c > @@ -1721,6 +1721,7 @@ static void __exit nd_btt_exit(void) > > MODULE_ALIAS_ND_DEVICE(ND_DEVICE_BTT); > MODULE_AUTHOR("Vishal Verma "); > +MODULE_DESCRIPTION("NVDIMM Block Translation Table"); > MODULE_LICENSE("GPL v2"); > module_init(nd_btt_init); > module_exit(nd_btt_exit); > diff --git a/drivers/nvdimm/core.c b/drivers/nvdimm/core.c > index 2023a661bbb0..f4b6fb4b9828 100644 > --- a/drivers/nvdimm/core.c > +++ b/drivers/nvdimm/core.c > @@ -540,6 +540,7 @@ static __exit void libnvdimm_exit(void) > nvdimm_devs_exit(); > } > > +MODULE_DESCRIPTION("NVDIMM (Non-Volatile Memory Device) core module"); > MODULE_LICENSE("GPL v2"); > MODULE_AUTHOR("Intel Corporation"); > subsys_initcall(libnvdimm_init); > diff --git a/drivers/nvdimm/e820.c b/drivers/nvdimm/e820.c > index 4cd18be9d0e9..008b9aae74ff 100644 > --- a/drivers/nvdimm/e820.c > +++ b/drivers/nvdimm/e820.c > @@ -69,5 +69,6 @@ static struct platform_driver e820_pmem_driver = { > module_platform_driver(e820_pmem_driver); > > MODULE_ALIAS("platform:e820_pmem*"); > +MODULE_DESCRIPTION("NVDIMM support for e820 type-12 memory"); > MODULE_LICENSE("GPL v2"); > MODULE_AUTHOR("Intel Corporation"); > diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c > index 1f8c667c6f1e..35c8fbbba10e 100644 > --- a/drivers/nvdimm/nd_virtio.c > +++ b/drivers/nvdimm/nd_virtio.c > @@ -123,4 +123,5 @@ int async_pmem_flush(struct nd_region *nd_region, struct > bio *bio) > return 0; > }; > EXPORT_SYMBOL_GPL(async_pmem_flush); > +MODULE_DESCRIPTION("Virtio Persistent Memory Driver"); > MODULE_LICENSE("GPL"); > diff --git a/drivers/nvdimm/of_pmem.c b/drivers/nvdimm/of_pmem.c > index d3fca0ab6290..5134a8d08bf9 100644 > --- a/drivers/nvdimm/of_pmem.c > +++ b/drivers/nvdimm/of_pmem.c > @@ -111,5 +111,6 @@ static struct platform_driver of_pmem_region_driver = { > > module_platform_driver(of_pmem_region_driver); > MODULE_DEVICE_TABLE(of, of_pmem_region_match); > +MODULE_DESCRIPTION("NVDIMM Device Tree support"); > MODULE_LICENSE("GPL"); > MODULE_AUTHOR("IBM Corporation"); > diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c > index 598fe2e89bda..57cb30f8a3b8 100644 > --- a/drivers/nvdimm/pmem.c > +++ b/drivers/nvdimm/pmem.c > @@ -768,4 +768,5 @@ static struct nd_device_driver nd_pmem_driver = { > module_nd_driver(nd_pmem_driver); > > MODULE_AUTHOR("Ross Zwisler "); > +MODULE_DESCRIPTION("NVDIMM Persistent Memory Driver"); > MODULE_LICENSE("GPL v2"); > > --- > base-commit: 416ff45264d50a983c3c0b99f0da6ee59f9acd68 > change-id: 20240526-md-drivers-nvdimm-121215a4b93f
Re: [PATCH 3/3] rust: add tracepoint support
On Thu, Jun 6, 2024 at 5:29 PM Mathieu Desnoyers wrote: > > On 2024-06-06 11:05, Alice Ryhl wrote: > > Make it possible to have Rust code call into tracepoints defined by C > > code. It is still required that the tracepoint is declared in a C > > header, and that this header is included in the input to bindgen. > > > > Signed-off-by: Alice Ryhl > > [...] > > > diff --git a/rust/helpers.c b/rust/helpers.c > > index 2c37a0f5d7a8..0560cc2a512a 100644 > > --- a/rust/helpers.c > > +++ b/rust/helpers.c > > @@ -165,6 +165,30 @@ rust_helper_krealloc(const void *objp, size_t > > new_size, gfp_t flags) > > } > > EXPORT_SYMBOL_GPL(rust_helper_krealloc); > > > > +void rust_helper_preempt_enable_notrace(void) > > +{ > > + preempt_enable_notrace(); > > +} > > +EXPORT_SYMBOL_GPL(rust_helper_preempt_enable_notrace); > > + > > +void rust_helper_preempt_disable_notrace(void) > > +{ > > + preempt_disable_notrace(); > > +} > > +EXPORT_SYMBOL_GPL(rust_helper_preempt_disable_notrace); > > + > > +bool rust_helper_current_cpu_online(void) > > +{ > > + return cpu_online(raw_smp_processor_id()); > > +} > > +EXPORT_SYMBOL_GPL(rust_helper_current_cpu_online); > > + > > +void *rust_helper___rcu_dereference_raw(void **p) > > +{ > > + return rcu_dereference_raw(p); > > +} > > +EXPORT_SYMBOL_GPL(rust_helper___rcu_dereference_raw); > > Ouch. Doing a function call for each of those small operations will > have a rather large performance impact when tracing is active. If it is > not possible to inline those in Rust, then implementing __DO_TRACE in > a C function would at least allow Rust to only do a single call to C > rather than go back and forth between Rust and C. > > What prevents inlining those helpers in Rust ? There's nothing fundamental that prevents it. But they contain a large amount of architecture specific code, which makes them a significant amount of work to reimplement in Rust. For example, rcu_dereference_raw calls into READ_ONCE. READ_ONCE is usually a volatile load, but under arm64+LTO, you get a bunch of inline assembly that relies on ALTERNATIVE for feature detection: https://elixir.bootlin.com/linux/v6.9/source/arch/arm64/include/asm/rwonce.h#L36 And preempt_enable_notrace has a similar story. The solution that Boqun mentions is nice, but it relies on rustc and clang using the same version of LLVM. You are unlikely to have compilers with matching LLVMs unless you intentionally take steps to make that happen. But yes, perhaps these helpers are an argument to have a single call for __DO_TRACE instead. Alice
Re: [PATCH 3/3] rust: add tracepoint support
On 2024-06-06 11:49, Boqun Feng wrote: On Thu, Jun 06, 2024 at 11:30:03AM -0400, Mathieu Desnoyers wrote: On 2024-06-06 11:05, Alice Ryhl wrote: Make it possible to have Rust code call into tracepoints defined by C code. It is still required that the tracepoint is declared in a C header, and that this header is included in the input to bindgen. Signed-off-by: Alice Ryhl [...] diff --git a/rust/helpers.c b/rust/helpers.c index 2c37a0f5d7a8..0560cc2a512a 100644 --- a/rust/helpers.c +++ b/rust/helpers.c @@ -165,6 +165,30 @@ rust_helper_krealloc(const void *objp, size_t new_size, gfp_t flags) } EXPORT_SYMBOL_GPL(rust_helper_krealloc); +void rust_helper_preempt_enable_notrace(void) +{ + preempt_enable_notrace(); +} +EXPORT_SYMBOL_GPL(rust_helper_preempt_enable_notrace); + +void rust_helper_preempt_disable_notrace(void) +{ + preempt_disable_notrace(); +} +EXPORT_SYMBOL_GPL(rust_helper_preempt_disable_notrace); + +bool rust_helper_current_cpu_online(void) +{ + return cpu_online(raw_smp_processor_id()); +} +EXPORT_SYMBOL_GPL(rust_helper_current_cpu_online); + +void *rust_helper___rcu_dereference_raw(void **p) +{ + return rcu_dereference_raw(p); +} +EXPORT_SYMBOL_GPL(rust_helper___rcu_dereference_raw); Ouch. Doing a function call for each of those small operations will have a rather large performance impact when tracing is active. If it is Long-term plan is to 1) compile the C helpers in some IR and 2) inline the helpers with Rust in IR-level, as what Gary has: https://lore.kernel.org/rust-for-linux/20240529202817.3641974-1-g...@garyguo.net/ and I use it for the upcoming atomic API support: https://github.com/fbq/linux/tree/dev/rust/atomic-rfc and it works very well. Thanks for the pointers, it makes sense. Thanks, Mathieu Regards, Boqun not possible to inline those in Rust, then implementing __DO_TRACE in a C function would at least allow Rust to only do a single call to C rather than go back and forth between Rust and C. What prevents inlining those helpers in Rust ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com
Re: [PATCH 3/3] rust: add tracepoint support
On 2024-06-06 12:16, Alice Ryhl wrote: On Thu, Jun 6, 2024 at 5:29 PM Mathieu Desnoyers wrote: On 2024-06-06 11:05, Alice Ryhl wrote: Make it possible to have Rust code call into tracepoints defined by C code. It is still required that the tracepoint is declared in a C header, and that this header is included in the input to bindgen. Signed-off-by: Alice Ryhl [...] diff --git a/rust/helpers.c b/rust/helpers.c index 2c37a0f5d7a8..0560cc2a512a 100644 --- a/rust/helpers.c +++ b/rust/helpers.c @@ -165,6 +165,30 @@ rust_helper_krealloc(const void *objp, size_t new_size, gfp_t flags) } EXPORT_SYMBOL_GPL(rust_helper_krealloc); +void rust_helper_preempt_enable_notrace(void) +{ + preempt_enable_notrace(); +} +EXPORT_SYMBOL_GPL(rust_helper_preempt_enable_notrace); + +void rust_helper_preempt_disable_notrace(void) +{ + preempt_disable_notrace(); +} +EXPORT_SYMBOL_GPL(rust_helper_preempt_disable_notrace); + +bool rust_helper_current_cpu_online(void) +{ + return cpu_online(raw_smp_processor_id()); +} +EXPORT_SYMBOL_GPL(rust_helper_current_cpu_online); + +void *rust_helper___rcu_dereference_raw(void **p) +{ + return rcu_dereference_raw(p); +} +EXPORT_SYMBOL_GPL(rust_helper___rcu_dereference_raw); Ouch. Doing a function call for each of those small operations will have a rather large performance impact when tracing is active. If it is not possible to inline those in Rust, then implementing __DO_TRACE in a C function would at least allow Rust to only do a single call to C rather than go back and forth between Rust and C. What prevents inlining those helpers in Rust ? There's nothing fundamental that prevents it. But they contain a large amount of architecture specific code, which makes them a significant amount of work to reimplement in Rust. For example, rcu_dereference_raw calls into READ_ONCE. READ_ONCE is usually a volatile load, but under arm64+LTO, you get a bunch of inline assembly that relies on ALTERNATIVE for feature detection: https://elixir.bootlin.com/linux/v6.9/source/arch/arm64/include/asm/rwonce.h#L36 And preempt_enable_notrace has a similar story. The solution that Boqun mentions is nice, but it relies on rustc and clang using the same version of LLVM. You are unlikely to have compilers with matching LLVMs unless you intentionally take steps to make that happen. But yes, perhaps these helpers are an argument to have a single call for __DO_TRACE instead. If those helpers end up being inlined into Rust with the solution pointed to by Boqun, then it makes sense to implement __DO_TRACE in Rust. Otherwise doing a single call to C would be more efficient than calling each of the helpers individually. Thanks, Mathieu Alice -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com
Re: [PATCH 2/3] rust: add static_key_false
On Thu, Jun 6, 2024 at 5:37 PM Mathieu Desnoyers wrote: > > On 2024-06-06 11:05, Alice Ryhl wrote: > > Add just enough support for static key so that we can use it from > > tracepoints. Tracepoints rely on `static_key_false` even though it is > > deprecated, so we add the same functionality to Rust. > > > > Signed-off-by: Alice Ryhl > > --- > > rust/kernel/lib.rs| 1 + > > rust/kernel/static_key.rs | 87 > > +++ > > scripts/Makefile.build| 2 +- > > 3 files changed, 89 insertions(+), 1 deletion(-) > > > > diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs > > index d534b1178955..22e1fedd0774 100644 > > --- a/rust/kernel/lib.rs > > +++ b/rust/kernel/lib.rs > > @@ -39,6 +39,7 @@ > > pub mod print; > > mod static_assert; > > pub mod static_call; > > +pub mod static_key; > > #[doc(hidden)] > > pub mod std_vendor; > > pub mod str; > > diff --git a/rust/kernel/static_key.rs b/rust/kernel/static_key.rs > > new file mode 100644 > > index ..6c3dbe14c98a > > --- /dev/null > > +++ b/rust/kernel/static_key.rs > > @@ -0,0 +1,87 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > This static key code is something that can be generally useful > both in kernel and user-space. Is there anything that would prevent > licensing this under MIT right away instead so it could eventually be > re-used more broadly in userspace as well ? I would not mind. Alice
Re: [PATCH] nvdimm: Fix devs leaks in scan_labels()
On 6/3/24 8:16 PM, Li Zhijian wrote: > Don't allocate devs again when it's valid pointer which has pionted to > the memory allocated above with size (count + 2 * sizeof(dev)). > > A kmemleak reports: > unreferenced object 0x88800dda1980 (size 16): > comm "kworker/u10:5", pid 69, jiffies 4294671781 > hex dump (first 16 bytes): > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > backtrace (crc 0): > [] __kmalloc+0x32c/0x470 > [<9ed43c83>] nd_region_register_namespaces+0x6fb/0x1120 > [libnvdimm] > [<0e07a65c>] nd_region_probe+0xfe/0x210 [libnvdimm] > [<7b79ce5f>] nvdimm_bus_probe+0x7a/0x1e0 [libnvdimm] > [ ] really_probe+0xc6/0x390 > [<129e2a69>] __driver_probe_device+0x78/0x150 > [<2dfed28b>] driver_probe_device+0x1e/0x90 > [ ] __device_attach_driver+0x85/0x110 > [<32dca295>] bus_for_each_drv+0x85/0xe0 > [<391c5a7d>] __device_attach+0xbe/0x1e0 > [<26dabec0>] bus_probe_device+0x94/0xb0 > [ ] device_add+0x656/0x870 > [<3d69bfaa>] nd_async_device_register+0xe/0x50 [libnvdimm] > [<3f4c52a4>] async_run_entry_fn+0x2e/0x110 > [ ] process_one_work+0x1ee/0x600 > [<6d90d5a9>] worker_thread+0x183/0x350 > > Fixes: 1b40e09a1232 ("libnvdimm: blk labels and namespace instantiation") > Signed-off-by: Li Zhijian > --- > drivers/nvdimm/namespace_devs.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c > index d6d558f94d6b..56b016dbe307 100644 > --- a/drivers/nvdimm/namespace_devs.c > +++ b/drivers/nvdimm/namespace_devs.c > @@ -1994,7 +1994,9 @@ static struct device **scan_labels(struct nd_region > *nd_region) > /* Publish a zero-sized namespace for userspace to configure. */ > nd_mapping_free_labels(nd_mapping); > > - devs = kcalloc(2, sizeof(dev), GFP_KERNEL); > + /* devs probably has been allocated */ > + if (!devs) > + devs = kcalloc(2, sizeof(dev), GFP_KERNEL); This changes the behavior of this code and possibly corrupting the previously allocated memory at times when 'devs' is valid. Was the 'devs' leaked out from the previous loop and should be freed instead? > if (!devs) > goto err; >
Re: [PATCH 0/3] Tracepoints and static branch/call in Rust
On 2024-06-06 11:46, Alice Ryhl wrote: On Thu, Jun 6, 2024 at 5:25 PM Mathieu Desnoyers wrote: On 2024-06-06 11:05, Alice Ryhl wrote: This implementation implements support for static keys in Rust so that the actual static branch will end up in the Rust object file. However, it would also be possible to just wrap the trace_##name generated by __DECLARE_TRACE in an extern C function and then call that from Rust. This will simplify the Rust code by removing the need for static branches and calls, but it places the static branch behind an external call, which has performance implications. The tracepoints try very hard to minimize overhead of dormant tracepoints so it is not frowned-upon to have them built into production binaries. This is needed to make sure distribution vendors keep those tracepoints in the kernel binaries that reach end-users. Adding a function call before evaluation of the static branch goes against this major goal. A possible middle ground would be to place just the __DO_TRACE body in an extern C function and to implement the Rust wrapper by doing the static branch in Rust, and then calling into C the code that contains __DO_TRACE when the tracepoint is active. However, this would need some changes to include/linux/tracepoint.h to generate and export a function containing the body of __DO_TRACE when the tracepoint should be callable from Rust. This tradeoff is more acceptable than having a function call before evaluation of the static branch, but I wonder what is the upside of this tradeoff compared to inlining the whole __DO_TRACE in Rust ? So in general, there is a tradeoff between placing parts of the tracepoint (which is perf sensitive) behind an external call, and having code duplicated in both C and Rust (which must be kept in sync when changes are made). This is an important point that I would like feedback on from the C maintainers. I don't see how the duplication happens there: __DO_TRACE is meant to be inlined into each C tracepoint caller site, so the code is already meant to be duplicated. Having an explicit function wrapping the tracepoint for Rust would just create an extra instance of __DO_TRACE if it happens to be also inlined into C code. Or do you meant you would like to prevent having to duplicate the implementation of __DO_TRACE in both C and Rust ? I'm not sure if you mean to prevent source code duplication between C and Rust or duplication of binary code (instructions). It's a question of maintenance burden. If you change how __DO_TRACE is implemented, then those changes must also be reflected in the Rust version. There's no issue in the binary code. As long as it is only __DO_TRACE that is duplicated between C and Rust, I don't see it as a large burden. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com
Re: [RFC bpf-next 01/10] uprobe: Add session callbacks to uprobe_consumer
On Wed, Jun 05, 2024 at 10:50:11PM +0200, Jiri Olsa wrote: > On Wed, Jun 05, 2024 at 07:56:19PM +0200, Oleg Nesterov wrote: > > On 06/05, Andrii Nakryiko wrote: > > > > > > so any such > > > limitations will cause problems, issue reports, investigation, etc. > > > > Agreed... > > > > > As one possible solution, what if we do > > > > > > struct return_instance { > > > ... > > > u64 session_cookies[]; > > > }; > > > > > > and allocate sizeof(struct return_instance) + 8 * > > > and then at runtime pass > > > &session_cookies[i] as data pointer to session-aware callbacks? > > > > I too thought about this, but I guess it is not that simple. > > > > Just for example. Suppose we have 2 session-consumers C1 and C2. > > What if uprobe_unregister(C1) comes before the probed function > > returns? > > > > We need something like map_cookie_to_consumer(). > > I guess we could have hash table in return_instance that gets 'consumer -> > cookie' ? ok, hash table is probably too big for this.. I guess some solution that would iterate consumers and cookies made sure it matches would be fine jirka > > return instance is freed after the consumers' return handlers are executed, > so there's no leak if some consumer gets unregistered before that > > > > > > > + /* The handler_session callback return value controls execution > > > > of > > > > +* the return uprobe and ret_handler_session callback. > > > > +* 0 on success > > > > +* 1 on failure, DO NOT install/execute the return uprobe > > > > +*console warning for anything else > > > > +*/ > > > > + int (*handler_session)(struct uprobe_consumer *self, struct > > > > pt_regs *regs, > > > > + unsigned long *data); > > > > + int (*ret_handler_session)(struct uprobe_consumer *self, > > > > unsigned long func, > > > > + struct pt_regs *regs, unsigned long > > > > *data); > > > > + > > > > > > We should try to avoid an alternative set of callbacks, IMO. Let's > > > extend existing ones with `unsigned long *data`, > > > > Oh yes, agreed. > > > > And the comment about the return value looks confusing too. I mean, the > > logic doesn't differ from the ret-code from ->handler(). > > > > "DO NOT install/execute the return uprobe" is not true if another > > non-session-consumer returns 0. > > well they are meant to be exclusive, so there'd be no other > non-session-consumer > > jirka
Re: [RFC bpf-next 01/10] uprobe: Add session callbacks to uprobe_consumer
On Thu, Jun 6, 2024 at 9:46 AM Jiri Olsa wrote: > > On Wed, Jun 05, 2024 at 10:50:11PM +0200, Jiri Olsa wrote: > > On Wed, Jun 05, 2024 at 07:56:19PM +0200, Oleg Nesterov wrote: > > > On 06/05, Andrii Nakryiko wrote: > > > > > > > > so any such > > > > limitations will cause problems, issue reports, investigation, etc. > > > > > > Agreed... > > > > > > > As one possible solution, what if we do > > > > > > > > struct return_instance { > > > > ... > > > > u64 session_cookies[]; > > > > }; > > > > > > > > and allocate sizeof(struct return_instance) + 8 * > > > > and then at runtime pass > > > > &session_cookies[i] as data pointer to session-aware callbacks? > > > > > > I too thought about this, but I guess it is not that simple. > > > > > > Just for example. Suppose we have 2 session-consumers C1 and C2. > > > What if uprobe_unregister(C1) comes before the probed function > > > returns? > > > > > > We need something like map_cookie_to_consumer(). > > > > I guess we could have hash table in return_instance that gets 'consumer -> > > cookie' ? > > ok, hash table is probably too big for this.. I guess some solution that > would iterate consumers and cookies made sure it matches would be fine > Yes, I was hoping to avoid hash tables for this, and in the common case have no added overhead. > jirka > > > > > return instance is freed after the consumers' return handlers are executed, > > so there's no leak if some consumer gets unregistered before that > > > > > > > > > > + /* The handler_session callback return value controls > > > > > execution of > > > > > +* the return uprobe and ret_handler_session callback. > > > > > +* 0 on success > > > > > +* 1 on failure, DO NOT install/execute the return uprobe > > > > > +*console warning for anything else > > > > > +*/ > > > > > + int (*handler_session)(struct uprobe_consumer *self, struct > > > > > pt_regs *regs, > > > > > + unsigned long *data); > > > > > + int (*ret_handler_session)(struct uprobe_consumer *self, > > > > > unsigned long func, > > > > > + struct pt_regs *regs, unsigned > > > > > long *data); > > > > > + > > > > > > > > We should try to avoid an alternative set of callbacks, IMO. Let's > > > > extend existing ones with `unsigned long *data`, > > > > > > Oh yes, agreed. > > > > > > And the comment about the return value looks confusing too. I mean, the > > > logic doesn't differ from the ret-code from ->handler(). > > > > > > "DO NOT install/execute the return uprobe" is not true if another > > > non-session-consumer returns 0. > > > > well they are meant to be exclusive, so there'd be no other > > non-session-consumer > > > > jirka
Re: [PATCH] nvdimm: Fix devs leaks in scan_labels()
Li Zhijian wrote: > Don't allocate devs again when it's valid pointer which has pionted to > the memory allocated above with size (count + 2 * sizeof(dev)). > > A kmemleak reports: > unreferenced object 0x88800dda1980 (size 16): > comm "kworker/u10:5", pid 69, jiffies 4294671781 > hex dump (first 16 bytes): > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > backtrace (crc 0): > [] __kmalloc+0x32c/0x470 > [<9ed43c83>] nd_region_register_namespaces+0x6fb/0x1120 > [libnvdimm] > [<0e07a65c>] nd_region_probe+0xfe/0x210 [libnvdimm] > [<7b79ce5f>] nvdimm_bus_probe+0x7a/0x1e0 [libnvdimm] > [ ] really_probe+0xc6/0x390 > [<129e2a69>] __driver_probe_device+0x78/0x150 > [<2dfed28b>] driver_probe_device+0x1e/0x90 > [ ] __device_attach_driver+0x85/0x110 > [<32dca295>] bus_for_each_drv+0x85/0xe0 > [<391c5a7d>] __device_attach+0xbe/0x1e0 > [<26dabec0>] bus_probe_device+0x94/0xb0 > [ ] device_add+0x656/0x870 > [<3d69bfaa>] nd_async_device_register+0xe/0x50 [libnvdimm] > [<3f4c52a4>] async_run_entry_fn+0x2e/0x110 > [ ] process_one_work+0x1ee/0x600 > [<6d90d5a9>] worker_thread+0x183/0x350 > > Fixes: 1b40e09a1232 ("libnvdimm: blk labels and namespace instantiation") > Signed-off-by: Li Zhijian > --- > drivers/nvdimm/namespace_devs.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c > index d6d558f94d6b..56b016dbe307 100644 > --- a/drivers/nvdimm/namespace_devs.c > +++ b/drivers/nvdimm/namespace_devs.c > @@ -1994,7 +1994,9 @@ static struct device **scan_labels(struct nd_region > *nd_region) > /* Publish a zero-sized namespace for userspace to configure. */ > nd_mapping_free_labels(nd_mapping); > > - devs = kcalloc(2, sizeof(dev), GFP_KERNEL); > + /* devs probably has been allocated */ I don't think this is where the bug is. The loop above is processing the known labels and should exit with a count > 0 if devs is not NULL. >From what I can tell create_namespace_pmem() must be returning EAGAIN which leaves devs allocated but fails to increment count. Thus there are no valid labels but devs was not free'ed. Can you trace the error you are seeing a bit more to see if this is the case? Thanks, Ira > + if (!devs) > + devs = kcalloc(2, sizeof(dev), GFP_KERNEL); > if (!devs) > goto err; > > -- > 2.29.2 >
Re: [PATCH 1/3] rust: add static_call support
On Thu, Jun 06, 2024 at 03:05:24PM +, Alice Ryhl wrote: > Add static_call support by mirroring how C does. When the platform does > not support static calls (right now, that means that it is not x86), > then the function pointer is loaded from a global and called. Otherwise, > we generate a call to a trampoline function, and objtool is used to make > these calls patchable at runtime. This is absolutely unreadable gibberish -- how am I supposed to keep this in sync with the rest of the static_call infrastructure? > Signed-off-by: Alice Ryhl > --- > rust/kernel/lib.rs | 1 + > rust/kernel/static_call.rs | 92 > ++ > 2 files changed, 93 insertions(+) > > diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs > index fbd91a48ff8b..d534b1178955 100644 > --- a/rust/kernel/lib.rs > +++ b/rust/kernel/lib.rs > @@ -38,6 +38,7 @@ > pub mod prelude; > pub mod print; > mod static_assert; > +pub mod static_call; > #[doc(hidden)] > pub mod std_vendor; > pub mod str; > diff --git a/rust/kernel/static_call.rs b/rust/kernel/static_call.rs > new file mode 100644 > index ..f7b8ba7bf1fb > --- /dev/null > +++ b/rust/kernel/static_call.rs > @@ -0,0 +1,92 @@ > +// SPDX-License-Identifier: GPL-2.0 > + > +// Copyright (C) 2024 Google LLC. > + > +//! Logic for static calls. > + > +#[macro_export] > +#[doc(hidden)] > +macro_rules! ty_underscore_for { > +($arg:expr) => { > +_ > +}; > +} > + > +#[doc(hidden)] > +#[repr(transparent)] > +pub struct AddressableStaticCallKey { > +_ptr: *const bindings::static_call_key, > +} > +unsafe impl Sync for AddressableStaticCallKey {} > +impl AddressableStaticCallKey { > +pub const fn new(ptr: *const bindings::static_call_key) -> Self { > +Self { _ptr: ptr } > +} > +} > + > +#[cfg(CONFIG_HAVE_STATIC_CALL)] > +#[doc(hidden)] > +#[macro_export] > +macro_rules! _static_call { > +($name:ident($($args:expr),* $(,)?)) => {{ > +// Symbol mangling will give this symbol a unique name. > +#[cfg(CONFIG_HAVE_STATIC_CALL_INLINE)] > +#[link_section = ".discard.addressable"] > +#[used] > +static __ADDRESSABLE: $crate::static_call::AddressableStaticCallKey > = unsafe { > + > $crate::static_call::AddressableStaticCallKey::new(::core::ptr::addr_of!( > +$crate::macros::paste! { $crate::bindings:: [<__SCK__ $name > >]; } > +)) > +}; > + > +let fn_ptr: unsafe extern "C" > fn($($crate::static_call::ty_underscore_for!($args)),*) -> _ = > +$crate::macros::paste! { $crate::bindings:: [<__SCT__ $name >]; > }; > +(fn_ptr)($($args),*) > +}}; > +} > + > +#[cfg(not(CONFIG_HAVE_STATIC_CALL))] > +#[doc(hidden)] > +#[macro_export] > +macro_rules! _static_call { > +($name:ident($($args:expr),* $(,)?)) => {{ > +let void_ptr_fn: *mut ::core::ffi::c_void = > +$crate::macros::paste! { $crate::bindings:: [<__SCK__ $name >]; > }.func; > + > +let fn_ptr: unsafe extern "C" > fn($($crate::static_call::ty_underscore_for!($args)),*) -> _ = > +if true { > +::core::mem::transmute(void_ptr_fn) > +} else { > +// This is dead code, but it influences type inference on > `fn_ptr` so that we > +// transmute the function pointer to the right type. > +$crate::macros::paste! { $crate::bindings:: [<__SCT__ $name > >]; } > +}; > + > +(fn_ptr)($($args),*) > +}}; > +} > + > +/// Statically call a global function. > +/// > +/// # Safety > +/// > +/// This macro will call the provided function. It is up to the caller to > uphold the safety > +/// guarantees of the function. > +/// > +/// # Examples > +/// > +/// ```ignore > +/// fn call_static() { > +/// unsafe { > +/// static_call! { your_static_call() }; > +/// } > +/// } > +/// ``` > +#[macro_export] > +macro_rules! static_call { > +// Forward to the real implementation. Separated like this so that we > don't have to duplicate > +// the documentation. > +($($args:tt)*) => { $crate::static_call::_static_call! { $($args)* } }; > +} > + > +pub use {_static_call, static_call, ty_underscore_for}; > > -- > 2.45.2.505.gda0bf45e8d-goog >
Re: [PATCH 2/3] rust: add static_key_false
On Thu, Jun 06, 2024 at 03:05:25PM +, Alice Ryhl wrote: > Add just enough support for static key so that we can use it from > tracepoints. Tracepoints rely on `static_key_false` even though it is > deprecated, so we add the same functionality to Rust. Urgh, more unreadable gibberish :-( Can we please get bindgen to translate asm/jump_table.h instead of randomly duplicating random bits. I really think that whoever created rust was an esoteric language freak. Hideous crap :-(.
Re: [PATCH 3/3] rust: add tracepoint support
On Thu, Jun 06, 2024 at 03:05:26PM +, Alice Ryhl wrote: > Make it possible to have Rust code call into tracepoints defined by C > code. It is still required that the tracepoint is declared in a C > header, and that this header is included in the input to bindgen. > > Signed-off-by: Alice Ryhl > --- > rust/bindings/bindings_helper.h | 1 + > rust/bindings/lib.rs| 15 +++ > rust/helpers.c | 24 +++ > rust/kernel/lib.rs | 1 + > rust/kernel/tracepoint.rs | 92 > + > 5 files changed, 133 insertions(+) > > diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h > index ddb5644d4fd9..d442f9ccfc2c 100644 > --- a/rust/bindings/bindings_helper.h > +++ b/rust/bindings/bindings_helper.h > @@ -15,6 +15,7 @@ > #include > #include > #include > +#include > #include > #include > > diff --git a/rust/bindings/lib.rs b/rust/bindings/lib.rs > index 40ddaee50d8b..48856761d682 100644 > --- a/rust/bindings/lib.rs > +++ b/rust/bindings/lib.rs > @@ -48,3 +48,18 @@ mod bindings_helper { > } > > pub use bindings_raw::*; > + > +/// Rust version of the C macro `rcu_dereference_raw`. > +/// > +/// The rust helper only works with void pointers, but this wrapper method > makes it work with any > +/// pointer type using pointer casts. > +/// > +/// # Safety > +/// > +/// This method has the same safety requirements as the C macro of the same > name. > +#[inline(always)] > +pub unsafe fn rcu_dereference_raw(p: *const *mut T) -> *mut T { > +// SAFETY: This helper calls into the C macro, so the caller promises to > uphold the safety > +// requirements. > +unsafe { __rcu_dereference_raw(p as *mut *mut _) as *mut T } > +} > diff --git a/rust/helpers.c b/rust/helpers.c > index 2c37a0f5d7a8..0560cc2a512a 100644 > --- a/rust/helpers.c > +++ b/rust/helpers.c > @@ -165,6 +165,30 @@ rust_helper_krealloc(const void *objp, size_t new_size, > gfp_t flags) > } > EXPORT_SYMBOL_GPL(rust_helper_krealloc); > > +void rust_helper_preempt_enable_notrace(void) > +{ > + preempt_enable_notrace(); > +} > +EXPORT_SYMBOL_GPL(rust_helper_preempt_enable_notrace); > + > +void rust_helper_preempt_disable_notrace(void) > +{ > + preempt_disable_notrace(); > +} > +EXPORT_SYMBOL_GPL(rust_helper_preempt_disable_notrace); A notrace wrapper that is tracable, lol. > +bool rust_helper_current_cpu_online(void) > +{ > + return cpu_online(raw_smp_processor_id()); > +} > +EXPORT_SYMBOL_GPL(rust_helper_current_cpu_online); > + > +void *rust_helper___rcu_dereference_raw(void **p) > +{ > + return rcu_dereference_raw(p); > +} > +EXPORT_SYMBOL_GPL(rust_helper___rcu_dereference_raw); I'm going to keep yelling and objecting to these wrappers. Fix bindgen already. Or whatever is needed to get it to interoperate with C inline / asm. NAK NAK NAK
Re: [PATCH 3/3] rust: add tracepoint support
On Thu, Jun 06, 2024 at 08:49:06AM -0700, Boqun Feng wrote: > Long-term plan is to 1) compile the C helpers in some IR and 2) inline > the helpers with Rust in IR-level, as what Gary has: > > > https://lore.kernel.org/rust-for-linux/20240529202817.3641974-1-g...@garyguo.net/ Urgh, that still needs us to maintain that silly list of helpers :-/ Can't we pretty please have clang parse the actual header files into IR and munge that into rust? So that we don't get to manually duplicate everything+dog.
Re: [PATCH v2 2/5] ftrace: Remove "ftrace_hash" parameter from __ftrace_hash_rec_update()
On Wed, Jun 05, 2024 at 02:03:36PM -0400, Steven Rostedt wrote: > From: "Steven Rostedt (Google)" > > While adding comments to the function __ftrace_hash_rec_update() and > trying to describe in detail what the parameter for "ftrace_hash" does, I > realized that it basically does exactly the same thing (but differently) > if it is set or not! Typo: "ftrace_hash" should be "filter_hash", and likewise in the commit title. > If it is set, the idea was the ops->filter_hash was being updated, and the > code should focus on the functions that are in the ops->filter_hash and > add them. But it still had to pay attention to the functions in the > ops->notrace_hash, to ignore them. > > If it was cleared, it focused on the ops->notrace_hash, and would add > functions that were not in the ops->notrace_hash but would still keep > functions in the "ops->filter_hash". Basically doing the same thing. > > In reality, the __ftrace_hash_rec_update() only needs to either remove the > functions associated to the give ops (if "inc" is set) or remove them (if > "inc" is cleared). It has to pay attention to both the filter_hash and > notrace_hash regardless. AFAICT, we actually remove a latent bug here, but that bug wasn't reachable because we never call __ftrace_hash_rec_update() with "filter_hash" clear. Before this patch, if we did call __ftrace_hash_rec_update() with "filter_hash" clear, we'd setup: all = false; ... if (filter_hash) { ... } else { hash = ops->func_hash->notrace_hash; other_hash = ops->func_hash->filter_hash; } ... and then at the tail of the loop where we do: count++; [...] /* Shortcut, if we handled all records, we are done. */ if (!all && count == hash->count) { pr_info("HARK: stopping after %d recs\n", count); return update; } ... we'd be checking whether we've updated notrace_hash->count entries, when that could be smaller than the number of entries we actually need to update (e.g. if the notrace hash contains a single entry, but the filter_hash contained more). As above, we never actually hit that because we never call with "filter_hash" clear, so it might be good to explicitly say that before this patch we never actually call __ftrace_hash_rec_update() with "filter_hash" clear, and this is removing dead (and potentially broken) code. > Remove the "filter_hash" parameter from __filter_hash_rec_update() and > comment the function for what it really is doing. > > Signed-off-by: Steven Rostedt (Google) FWIW, this looks good to me, and works in testing, so: Reviewed-by: Mark Rutland Tested-by: Mark Rutland I have one comment below, but the above stands regardless. [...] > +/* > + * This is the main engine to the ftrace updates to the dyn_ftrace records. > + * > + * It will iterate through all the available ftrace functions > + * (the ones that ftrace can have callbacks to) and set the flags > + * in the associated dyn_ftrace records. > + * > + * @inc: If true, the functions associated to @ops are added to > + * the dyn_ftrace records, otherwise they are removed. > + */ > static bool __ftrace_hash_rec_update(struct ftrace_ops *ops, > - int filter_hash, >bool inc) > { > struct ftrace_hash *hash; > - struct ftrace_hash *other_hash; > + struct ftrace_hash *notrace_hash; > struct ftrace_page *pg; > struct dyn_ftrace *rec; > bool update = false; > @@ -1718,35 +1725,16 @@ static bool __ftrace_hash_rec_update(struct > ftrace_ops *ops, > return false; > > /* > - * In the filter_hash case: >* If the count is zero, we update all records. >* Otherwise we just update the items in the hash. > - * > - * In the notrace_hash case: > - * We enable the update in the hash. > - * As disabling notrace means enabling the tracing, > - * and enabling notrace means disabling, the inc variable > - * gets inversed. >*/ > - if (filter_hash) { > - hash = ops->func_hash->filter_hash; > - other_hash = ops->func_hash->notrace_hash; > - if (ftrace_hash_empty(hash)) > - all = true; > - } else { > - inc = !inc; > - hash = ops->func_hash->notrace_hash; > - other_hash = ops->func_hash->filter_hash; > - /* > - * If the notrace hash has no items, > - * then there's nothing to do. > - */ > - if (ftrace_hash_empty(hash)) > - return false; > - } > + hash = ops->func_hash->filter_hash; > + notrace_hash = ops->func_hash->notrace_hash; > + if (ftrace_hash_empty(hash)) > + all = true; > > do_for_each_ftrace_rec(pg, rec) { > - int in_other_hash =
Re: [PATCH v2 4/5] ftrace: Convert "inc" parameter to bool in ftrace_hash_rec_update_modify()
On Wed, Jun 05, 2024 at 02:03:38PM -0400, Steven Rostedt wrote: > From: "Steven Rostedt (Google)" > > The parameter "inc" in the function ftrace_hash_rec_update_modify() is > boolean. Change it to be such. > > Also add documentation to what the function does. > > Signed-off-by: Steven Rostedt (Google) Acked-by: Mark Rutland Mark. > --- > kernel/trace/ftrace.c | 23 --- > 1 file changed, 20 insertions(+), 3 deletions(-) > > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c > index 1a2444199740..83f23f8fc26d 100644 > --- a/kernel/trace/ftrace.c > +++ b/kernel/trace/ftrace.c > @@ -1882,7 +1882,24 @@ static bool ftrace_hash_rec_enable(struct ftrace_ops > *ops) > return __ftrace_hash_rec_update(ops, true); > } > > -static void ftrace_hash_rec_update_modify(struct ftrace_ops *ops, int inc) > +/* > + * This function will update what functions @ops traces when its filter > + * changes. > + * > + * The @inc states if the @ops callbacks are going to be added or removed. > + * When one of the @ops hashes are updated to a "new_hash" the dyn_ftrace > + * records are update via: > + * > + * ftrace_hash_rec_disable_modify(ops); > + * ops->hash = new_hash > + * ftrace_hash_rec_enable_modify(ops); > + * > + * Where the @ops is removed from all the records it is tracing using > + * its old hash. The @ops hash is updated to the new hash, and then > + * the @ops is added back to the records so that it is tracing all > + * the new functions. > + */ > +static void ftrace_hash_rec_update_modify(struct ftrace_ops *ops, bool inc) > { > struct ftrace_ops *op; > > @@ -1906,12 +1923,12 @@ static void ftrace_hash_rec_update_modify(struct > ftrace_ops *ops, int inc) > > static void ftrace_hash_rec_disable_modify(struct ftrace_ops *ops) > { > - ftrace_hash_rec_update_modify(ops, 0); > + ftrace_hash_rec_update_modify(ops, false); > } > > static void ftrace_hash_rec_enable_modify(struct ftrace_ops *ops) > { > - ftrace_hash_rec_update_modify(ops, 1); > + ftrace_hash_rec_update_modify(ops, true); > } > > /* > -- > 2.43.0 > >
Re: [PATCH v2 5/5] ftrace: Add comments to ftrace_hash_move() and friends
On Wed, Jun 05, 2024 at 02:03:39PM -0400, Steven Rostedt wrote: > From: "Steven Rostedt (Google)" > > Describe what ftrace_hash_move() does and add some more comments to some > other functions to make it easier to understand. > > Signed-off-by: Steven Rostedt (Google) Acked-by: Mark Rutland Mark. > --- > kernel/trace/ftrace.c | 24 +++- > 1 file changed, 23 insertions(+), 1 deletion(-) > > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c > index 83f23f8fc26d..ae1603e771c5 100644 > --- a/kernel/trace/ftrace.c > +++ b/kernel/trace/ftrace.c > @@ -169,6 +169,7 @@ static inline void ftrace_ops_init(struct ftrace_ops *ops) > #endif > } > > +/* Call this function for when a callback filters on set_ftrace_pid */ > static void ftrace_pid_func(unsigned long ip, unsigned long parent_ip, > struct ftrace_ops *op, struct ftrace_regs *fregs) > { > @@ -1317,7 +1318,7 @@ static struct ftrace_hash *alloc_ftrace_hash(int > size_bits) > return hash; > } > > - > +/* Used to save filters on functions for modules not loaded yet */ > static int ftrace_add_mod(struct trace_array *tr, > const char *func, const char *module, > int enable) > @@ -1429,6 +1430,7 @@ static struct ftrace_hash *__move_hash(struct > ftrace_hash *src, int size) > return new_hash; > } > > +/* Move the @src entries to a newly allocated hash */ > static struct ftrace_hash * > __ftrace_hash_move(struct ftrace_hash *src) > { > @@ -1443,6 +1445,26 @@ __ftrace_hash_move(struct ftrace_hash *src) > return __move_hash(src, size); > } > > +/** > + * ftrace_hash_move - move a new hash to a filter and do updates > + * @ops: The ops with the hash that @dst points to > + * @enable: True if for the filter hash, false for the notrace hash > + * @dst: Points to the @ops hash that should be updated > + * @src: The hash to update @dst with > + * > + * This is called when an ftrace_ops hash is being updated and the > + * the kernel needs to reflect this. Note, this only updates the kernel > + * function callbacks if the @ops is enabled (not to be confused with > + * @enable above). If the @ops is enabled, its hash determines what > + * callbacks get called. This function gets called when the @ops hash > + * is updated and it requires new callbacks. > + * > + * On success the elements of @src is moved to @dst, and @dst is updated > + * properly, as well as the functions determined by the @ops hashes > + * are now calling the @ops callback function. > + * > + * Regardless of return type, @src should be freed with free_ftrace_hash(). > + */ > static int > ftrace_hash_move(struct ftrace_ops *ops, int enable, >struct ftrace_hash **dst, struct ftrace_hash *src) > -- > 2.43.0 > >
Re: [PATCH v2 2/5] ftrace: Remove "ftrace_hash" parameter from __ftrace_hash_rec_update()
On Thu, 6 Jun 2024 18:53:12 +0100 Mark Rutland wrote: > On Wed, Jun 05, 2024 at 02:03:36PM -0400, Steven Rostedt wrote: > > From: "Steven Rostedt (Google)" > > > > While adding comments to the function __ftrace_hash_rec_update() and > > trying to describe in detail what the parameter for "ftrace_hash" does, I > > realized that it basically does exactly the same thing (but differently) > > if it is set or not! > > Typo: "ftrace_hash" should be "filter_hash", and likewise in the commit > title. Let me go fix up linux-next :-p > > > If it is set, the idea was the ops->filter_hash was being updated, and the > > code should focus on the functions that are in the ops->filter_hash and > > add them. But it still had to pay attention to the functions in the > > ops->notrace_hash, to ignore them. > > > > If it was cleared, it focused on the ops->notrace_hash, and would add > > functions that were not in the ops->notrace_hash but would still keep > > functions in the "ops->filter_hash". Basically doing the same thing. > > > > In reality, the __ftrace_hash_rec_update() only needs to either remove the > > functions associated to the give ops (if "inc" is set) or remove them (if > > "inc" is cleared). It has to pay attention to both the filter_hash and > > notrace_hash regardless. > > AFAICT, we actually remove a latent bug here, but that bug wasn't > reachable because we never call __ftrace_hash_rec_update() with > "filter_hash" clear. > > Before this patch, if we did call __ftrace_hash_rec_update() with > "filter_hash" clear, we'd setup: > > all = false; > ... > if (filter_hash) { > ... > } else { > hash = ops->func_hash->notrace_hash; > other_hash = ops->func_hash->filter_hash; > } > > ... and then at the tail of the loop where we do: > > count++; > > [...] > > /* Shortcut, if we handled all records, we are done. */ > if (!all && count == hash->count) { > pr_info("HARK: stopping after %d recs\n", count); > return update; > } > > ... we'd be checking whether we've updated notrace_hash->count entries, > when that could be smaller than the number of entries we actually need > to update (e.g. if the notrace hash contains a single entry, but the > filter_hash contained more). I noticed this too as well as: if (filter_hash) { hash = ops->func_hash->filter_hash; other_hash = ops->func_hash->notrace_hash; if (ftrace_hash_empty(hash)) all = true; } else { inc = !inc; hash = ops->func_hash->notrace_hash; other_hash = ops->func_hash->filter_hash; /* * If the notrace hash has no items, * then there's nothing to do. */ if (ftrace_hash_empty(hash)) return false; } That "return false" is actually a mistake as well. But I tried to hit it, but found that I could not. I think I updated the code due to bugs where I prevent that from happening, but the real fix would have been this patch. :-p > > As above, we never actually hit that because we never call with > "filter_hash" clear, so it might be good to explicitly say that before > this patch we never actually call __ftrace_hash_rec_update() with > "filter_hash" clear, and this is removing dead (and potentially broken) > code. Right. > > > Remove the "filter_hash" parameter from __filter_hash_rec_update() and > > comment the function for what it really is doing. > > > > Signed-off-by: Steven Rostedt (Google) > > FWIW, this looks good to me, and works in testing, so: > > Reviewed-by: Mark Rutland > Tested-by: Mark Rutland > > I have one comment below, but the above stands regardless. > > [...] > > > +/* > > + * This is the main engine to the ftrace updates to the dyn_ftrace records. > > + * > > + * It will iterate through all the available ftrace functions > > + * (the ones that ftrace can have callbacks to) and set the flags > > + * in the associated dyn_ftrace records. > > + * > > + * @inc: If true, the functions associated to @ops are added to > > + * the dyn_ftrace records, otherwise they are removed. > > + */ > > static bool __ftrace_hash_rec_update(struct ftrace_ops *ops, > > -int filter_hash, > > bool inc) > > { > > struct ftrace_hash *hash; > > - struct ftrace_hash *other_hash; > > + struct ftrace_hash *notrace_hash; > > struct ftrace_page *pg; > > struct dyn_ftrace *rec; > > bool update = false; > > @@ -1718,35 +1725,16 @@ static bool __ftrace_hash_rec_update(struct > > ftrace_ops *ops, > > return false; > > > > /* > > -* In the filter_hash case: > > * If the count is zero, we update all records. > > * Otherwise we just update the item
Re: [PATCH 3/3] rust: add tracepoint support
On Thu, Jun 06, 2024 at 07:35:44PM +0200, Peter Zijlstra wrote: > On Thu, Jun 06, 2024 at 08:49:06AM -0700, Boqun Feng wrote: > > > Long-term plan is to 1) compile the C helpers in some IR and 2) inline > > the helpers with Rust in IR-level, as what Gary has: > > > > > > https://lore.kernel.org/rust-for-linux/20240529202817.3641974-1-g...@garyguo.net/ > > Urgh, that still needs us to maintain that silly list of helpers :-/ > But it's an improvement from the current stage, right? ;-) > Can't we pretty please have clang parse the actual header files into IR > and munge that into rust? So that we don't get to manually duplicate > everything+dog. That won't always work, because some of our kernel APIs are defined as macros, and I don't think it's a trivial job to generate a macro definition to a function definition so that it can be translated to something in IR. We will have to do the macro -> function mapping ourselves somewhere, if we want to inline the API across languages. Regards, Boqun
[PATCH v2] rpmsg: qcom_smd: Improve error handling for qcom_smd_parse_edge
When the mailbox driver has not probed yet, the error message "failed to parse smd edge" is just going to confuse users, so improve the error prints a bit. Cover the last remaining exits from qcom_smd_parse_edge with proper error prints, especially the one for the mbox_chan deserved dev_err_probe to handle EPROBE_DEFER nicely. And add one for ipc_regmap also to be complete. With this done, we can remove the outer print completely. Signed-off-by: Luca Weiss --- Changes in v2: - Rebase on qcom for-next, drop dts patches which have been applied - Improve error printing situation (Bjorn) - Link to v1: https://lore.kernel.org/r/20240424-apcs-mboxes-v1-0-6556c47cb...@z3ntu.xyz --- drivers/rpmsg/qcom_smd.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/rpmsg/qcom_smd.c b/drivers/rpmsg/qcom_smd.c index 43f601c84b4f..06e6ba653ea1 100644 --- a/drivers/rpmsg/qcom_smd.c +++ b/drivers/rpmsg/qcom_smd.c @@ -1369,7 +1369,8 @@ static int qcom_smd_parse_edge(struct device *dev, edge->mbox_chan = mbox_request_channel(&edge->mbox_client, 0); if (IS_ERR(edge->mbox_chan)) { if (PTR_ERR(edge->mbox_chan) != -ENODEV) { - ret = PTR_ERR(edge->mbox_chan); + ret = dev_err_probe(dev, PTR_ERR(edge->mbox_chan), + "failed to acquire IPC mailbox\n"); goto put_node; } @@ -1386,6 +1387,7 @@ static int qcom_smd_parse_edge(struct device *dev, of_node_put(syscon_np); if (IS_ERR(edge->ipc_regmap)) { ret = PTR_ERR(edge->ipc_regmap); + dev_err(dev, "failed to get regmap from syscon: %d\n", ret); goto put_node; } @@ -1501,10 +1503,8 @@ struct qcom_smd_edge *qcom_smd_register_edge(struct device *parent, } ret = qcom_smd_parse_edge(&edge->dev, node, edge); - if (ret) { - dev_err(&edge->dev, "failed to parse smd edge\n"); + if (ret) goto unregister_dev; - } ret = qcom_smd_create_chrdev(edge); if (ret) { --- base-commit: 2c79712cc83b172ce26c3086ced1c1fae087d8fb change-id: 20240423-apcs-mboxes-12ee6c01a5b3 Best regards, -- Luca Weiss
Re: [PATCH v3 4/6] module: script to generate offset ranges for builtin modules
On Tue, May 21, 2024 at 01:53:27AM +0900, Masahiro Yamada wrote: > On Fri, May 17, 2024 at 1:31???PM Kris Van Hees > wrote: > > > > The offset range data for builtin modules is generated using: > > - modules.builtin.modinfo: associates object files with module names > > - vmlinux.map: provides load order of sections and offset of first member > > per section > > - vmlinux.o.map: provides offset of object file content per section > > - .*.cmd: build cmd file with KBUILD_MODFILE and KBUILD_MODNAME > > > > The generated data will look like: > > > > .text - = _text > > .text baf0-cb10 amd_uncore > > .text 0009bd10-0009c8e0 iosf_mbi > > ... > > .text 008e6660-008e9630 snd_soc_wcd_mbhc > > .text 008e9630-008ea610 snd_soc_wcd9335 snd_soc_wcd934x snd_soc_wcd938x > > .text 008ea610-008ea780 snd_soc_wcd9335 > > ... > > .data - = _sdata > > .data f020-f680 amd_uncore > > > > For each ELF section, it lists the offset of the first symbol. This can > > be used to determine the base address of the section at runtime. > > > > Next, it lists (in strict ascending order) offset ranges in that section > > that cover the symbols of one or more builtin modules. Multiple ranges > > can apply to a single module, and ranges can be shared between modules. > > > > Signed-off-by: Kris Van Hees > > Reviewed-by: Nick Alcock > > --- > > Changes since v2: > > - 1st arg to generate_builtin_ranges.awk is now modules.builtin.modinfo > > - Switched from using modules.builtin.objs to parsing .*.cmd files > > - Parse data from .*.cmd in generate_builtin_ranges.awk > > --- > > scripts/generate_builtin_ranges.awk | 232 > > 1 file changed, 232 insertions(+) > > create mode 100755 scripts/generate_builtin_ranges.awk > > > > diff --git a/scripts/generate_builtin_ranges.awk > > b/scripts/generate_builtin_ranges.awk > > new file mode 100755 > > index 0..6975a9c7266d9 > > --- /dev/null > > +++ b/scripts/generate_builtin_ranges.awk > > @@ -0,0 +1,232 @@ > > +#!/usr/bin/gawk -f > > +# SPDX-License-Identifier: GPL-2.0 > > +# generate_builtin_ranges.awk: Generate address range data for builtin > > modules > > +# Written by Kris Van Hees > > +# > > +# Usage: generate_builtin_ranges.awk modules.builtin.modinfo vmlinux.map \ > > +# vmlinux.o.map > modules.builtin.ranges > > +# > > + > > +BEGIN { > > + # modules.builtin.modinfo uses \0 as record separator > > + # All other files use \n. > > + RS = "[\n\0]"; > > +} > > > Why do you want to parse modules.builtin.modinfo > instead of modules.builtin? > > modules.builtin uses \n separator. Oh my, I completely overlooked modules.builtin. Thank you! That is indeed much easier. > > + > > +# Return the module name(s) (if any) associated with the given object. > > +# > > +# If we have seen this object before, return information from the cache. > > +# Otherwise, retrieve it from the corresponding .cmd file. > > +# > > +function get_module_info(fn, mod, obj, mfn, s) { > > > There are 5 arguments, while the caller passes only 1 argument > ( get_module_info($4) ) That is the way to be able to have local variables in an AWK function - every variable mentioned in the function declaration is local to the function. It is an oddity in AWK. > > + if (fn in omod) > > + return omod[fn]; > > + > > + if (match(fn, /\/[^/]+$/) == 0) > > + return ""; > > + > > + obj = fn; > > + mod = ""; > > + mfn = ""; > > + fn = substr(fn, 1, RSTART) "." substr(fn, RSTART + 1) ".cmd"; > > + if (getline s > + if (match(s, /DKBUILD_MODNAME=[^ ]+/) > 0) { > > + mod = substr(s, RSTART + 17, RLENGTH - 17); > > + gsub(/['"]/, "", mod); > > + gsub(/:/, " ", mod); > > + } > > + > > + if (match(s, /DKBUILD_MODFILE=[^ ]+/) > 0) { > > + mfn = substr(s, RSTART + 17, RLENGTH - 17); > > + gsub(/['"]/, "", mfn); > > + gsub(/:/, " ", mfn); > > + } > > + } > > + close(fn); > > + > > +# tmp = $0; > > +# $0 = s; > > +# print mod " " mfn " " obj " " $NF; > > +# $0 = tmp; > > + > > + # A single module (common case) also reflects objects that are not > > part > > + # of a module. Some of those objects have names that are also a > > module > > + # name (e.g. core). We check the associated module file name, and > > if > > + # they do not match, the object is not part of a module. > > > You do not need to use KBUILD_MODNAME. > > Just use KBUILD_MODFILE only. > If the same path is found in modules.builtin, > it is a built-in module. > > Its basename is modname. Yes, that is true. I can do all this based on KBUILD_MODFILE. Thank you. Adjusting the patch that way. > One more question in a corner case. > > How does your c
Re: [PATCH 1/3] rust: add static_call support
On Thu, Jun 6, 2024 at 7:19 PM Peter Zijlstra wrote: > > This is absolutely unreadable gibberish -- how am I supposed to keep > this in sync with the rest of the static_call infrastructure? Yeah, they are macros, which look different from "normal" Rust code. Is there something we could do to help here? I think Alice and others would be happy to explain how it works and/or help maintain it in the future if you prefer. Cheers, Miguel
Re: [PATCH] remoteproc: mediatek: Don't print error when optional scp handle is missing
On Thu, Jun 06, 2024 at 12:50:56PM +0200, AngeloGioacchino Del Regno wrote: > Il 05/06/24 21:35, Nícolas F. R. A. Prado ha scritto: > > The scp_get() helper has two users: the mtk-vcodec and the mtk-mdp3 > > drivers. mdp3 considers the mediatek,scp phandle optional, and when it's > > missing mdp3 will directly look for the scp node based on compatible. > > > > For that reason printing an error in the helper when the handle is > > missing is wrong, as it's not always an actual error. Besides, the other > > user, vcodec, already prints an error message on its own when the helper > > fails so this message doesn't add that much information. > > > > Remove the error message from the helper. This gets rid of the deceptive > > error on MT8195-Tomato: > > > >mtk-mdp3 14001000.dma-controller: can't get SCP node > > I'm way more for giving it a SCP handle instead of silencing this print... the > SCP handle way *is* the correct one and I prefer it, as that'd also be the > only > way to support Dual-Core SCP in MDP3. > > I really want to see hardcoded stuff disappear and I really really *really* > want > to see more consistency across DT nodes in MediaTek platforms. > > So, still a one-line change, but do it on the devicetree instead of here > please. Sure. So the missing scp phandle case is maintained exclusively for backwards-compatibility. And we're ok that the old DTs will keep getting the error. I'll send a v2 adding the phandle in the DT instead. Thanks, Nícolas
[PATCH v2 1/2] dt-bindings: soc: qcom,smsm: Allow specifying mboxes instead of qcom,ipc
The qcom,ipc-N properties are essentially providing a reference to a mailbox, so allow using the mboxes property to do the same in a more structured way. Since multiple SMSM hosts are supported, we need to be able to provide the correct mailbox for each host. The old qcom,ipc-N properties map to the mboxes property by index, starting at 0 since that's a valid SMSM host also. Mark the older qcom,ipc-N as deprecated and update the example with mboxes. Signed-off-by: Luca Weiss --- .../devicetree/bindings/soc/qcom/qcom,smsm.yaml| 30 +++--- 1 file changed, 21 insertions(+), 9 deletions(-) diff --git a/Documentation/devicetree/bindings/soc/qcom/qcom,smsm.yaml b/Documentation/devicetree/bindings/soc/qcom/qcom,smsm.yaml index db67cf043256..4900215f26af 100644 --- a/Documentation/devicetree/bindings/soc/qcom/qcom,smsm.yaml +++ b/Documentation/devicetree/bindings/soc/qcom/qcom,smsm.yaml @@ -33,6 +33,14 @@ properties: specifier of the column in the subscription matrix representing the local processor. + mboxes: +minItems: 1 +maxItems: 5 +description: + Reference to the mailbox representing the outgoing doorbell in APCS for + this client. Each entry represents the N:th remote processor by index + (0-indexed). + '#size-cells': const: 0 @@ -47,6 +55,7 @@ patternProperties: description: Three entries specifying the outgoing ipc bit used for signaling the N:th remote processor. +deprecated: true "@[0-9a-f]$": type: object @@ -98,15 +107,18 @@ required: - '#address-cells' - '#size-cells' -anyOf: +oneOf: - required: - - qcom,ipc-1 - - required: - - qcom,ipc-2 - - required: - - qcom,ipc-3 - - required: - - qcom,ipc-4 + - mboxes + - anyOf: + - required: + - qcom,ipc-1 + - required: + - qcom,ipc-2 + - required: + - qcom,ipc-3 + - required: + - qcom,ipc-4 additionalProperties: false @@ -122,7 +134,7 @@ examples: compatible = "qcom,smsm"; #address-cells = <1>; #size-cells = <0>; -qcom,ipc-3 = <&apcs 8 19>; +mboxes = <0>, <0>, <0>, <&apcs 19>; apps_smsm: apps@0 { reg = <0>; -- 2.45.2
[PATCH v2 2/2] soc: qcom: smsm: Support using mailbox interface
Add support for using the mbox interface instead of manually writing to the syscon. With this change the driver will attempt to get the mailbox first, and if that fails it will fall back to the existing way of using qcom,ipc-* properties and converting to syscon. Reviewed-by: Konrad Dybcio Signed-off-by: Luca Weiss --- drivers/soc/qcom/smsm.c | 51 - 1 file changed, 50 insertions(+), 1 deletion(-) diff --git a/drivers/soc/qcom/smsm.c b/drivers/soc/qcom/smsm.c index e7c7e9a640a6..ffe78ae34386 100644 --- a/drivers/soc/qcom/smsm.c +++ b/drivers/soc/qcom/smsm.c @@ -5,6 +5,7 @@ */ #include +#include #include #include #include @@ -71,6 +72,7 @@ struct smsm_host; * @lock: spinlock for read-modify-write of the outgoing state * @entries: context for each of the entries * @hosts: context for each of the hosts + * @mbox_client: mailbox client handle */ struct qcom_smsm { struct device *dev; @@ -88,6 +90,8 @@ struct qcom_smsm { struct smsm_entry *entries; struct smsm_host *hosts; + + struct mbox_client mbox_client; }; /** @@ -120,11 +124,14 @@ struct smsm_entry { * @ipc_regmap:regmap for outgoing interrupt * @ipc_offset:offset in @ipc_regmap for outgoing interrupt * @ipc_bit: bit in @ipc_regmap + @ipc_offset for outgoing interrupt + * @mbox_chan: apcs ipc mailbox channel handle */ struct smsm_host { struct regmap *ipc_regmap; int ipc_offset; int ipc_bit; + + struct mbox_chan *mbox_chan; }; /** @@ -172,7 +179,13 @@ static int smsm_update_bits(void *data, u32 mask, u32 value) hostp = &smsm->hosts[host]; val = readl(smsm->subscription + host); - if (val & changes && hostp->ipc_regmap) { + if (!(val & changes)) + continue; + + if (hostp->mbox_chan) { + mbox_send_message(hostp->mbox_chan, NULL); + mbox_client_txdone(hostp->mbox_chan, 0); + } else if (hostp->ipc_regmap) { regmap_write(hostp->ipc_regmap, hostp->ipc_offset, BIT(hostp->ipc_bit)); @@ -352,6 +365,28 @@ static const struct irq_domain_ops smsm_irq_ops = { .xlate = irq_domain_xlate_twocell, }; +/** + * smsm_parse_mbox() - requests an mbox channel + * @smsm: smsm driver context + * @host_id: index of the remote host to be resolved + * + * Requests the desired channel using the mbox interface which is needed for + * sending the outgoing interrupts to a remove hosts - identified by @host_id. + */ +static int smsm_parse_mbox(struct qcom_smsm *smsm, unsigned int host_id) +{ + struct smsm_host *host = &smsm->hosts[host_id]; + int ret = 0; + + host->mbox_chan = mbox_request_channel(&smsm->mbox_client, host_id); + if (IS_ERR(host->mbox_chan)) { + ret = PTR_ERR(host->mbox_chan); + host->mbox_chan = NULL; + } + + return ret; +} + /** * smsm_parse_ipc() - parses a qcom,ipc-%d device tree property * @smsm: smsm driver context @@ -521,8 +556,16 @@ static int qcom_smsm_probe(struct platform_device *pdev) "qcom,local-host", &smsm->local_host); + smsm->mbox_client.dev = &pdev->dev; + smsm->mbox_client.knows_txdone = true; + /* Parse the host properties */ for (id = 0; id < smsm->num_hosts; id++) { + /* Try using mbox interface first, otherwise fall back to syscon */ + ret = smsm_parse_mbox(smsm, id); + if (!ret) + continue; + ret = smsm_parse_ipc(smsm, id); if (ret < 0) goto out_put; @@ -609,6 +652,9 @@ static int qcom_smsm_probe(struct platform_device *pdev) qcom_smem_state_unregister(smsm->state); out_put: + for (id = 0; id < smsm->num_hosts; id++) + mbox_free_channel(smsm->hosts[id].mbox_chan); + of_node_put(local_node); return ret; } @@ -622,6 +668,9 @@ static void qcom_smsm_remove(struct platform_device *pdev) if (smsm->entries[id].domain) irq_domain_remove(smsm->entries[id].domain); + for (id = 0; id < smsm->num_hosts; id++) + mbox_free_channel(smsm->hosts[id].mbox_chan); + qcom_smem_state_unregister(smsm->state); } -- 2.45.2
[PATCH v2 0/2] Support mailbox interface in qcom,smsm driver
Take a shot at converting the last driver that requires direct "qcom,ipc*" syscon references in devicetree by allowing the smsm driver to use the mailbox interface to achieve the same effect. Still not sure if the devicetree bindings are the prettiest but they're functional. One alternative I'm thinking of is to use mbox-names to not have <0> elements in dt, and reference the items by name from the driver? e.g. this change for msm8226 could be represented differently. - qcom,ipc-1 = <&apcs 8 13>; - qcom,ipc-2 = <&apcs 8 9>; - qcom,ipc-3 = <&apcs 8 19>; + mboxes = <0>, <&apcs 13>, <&apcs 9>, <&apcs 19>; vs. for example: - qcom,ipc-1 = <&apcs 8 13>; - qcom,ipc-2 = <&apcs 8 9>; - qcom,ipc-3 = <&apcs 8 19>; + mboxes = <&apcs 13>, <&apcs 9>, <&apcs 19>; + mbox-names = "ipc-1", "ipc-2", "ipc-3"; But also here the name with 'ipc-N' is probably not particularly fitting? Please let me know your thoughts and any suggestions. Signed-off-by: Luca Weiss --- Changes in v2: - Mark qcom,ipc-N as deprecated - Update & expand description for mboxes property - Don't duplicate example, just update existing one since qcom,ipc-N is deprecated now anyways - Pick up tags - Link to v1: https://lore.kernel.org/r/20240424-smsm-mbox-v1-0-555f3f442...@z3ntu.xyz --- Luca Weiss (2): dt-bindings: soc: qcom,smsm: Allow specifying mboxes instead of qcom,ipc soc: qcom: smsm: Support using mailbox interface .../devicetree/bindings/soc/qcom/qcom,smsm.yaml| 30 + drivers/soc/qcom/smsm.c| 51 +- 2 files changed, 71 insertions(+), 10 deletions(-) --- base-commit: ee78a17615ad0cfdbbc27182b1047cd36c9d4d5f change-id: 20240424-smsm-mbox-0666f35eae44 Best regards, -- Luca Weiss
Re: [PATCH 3/3] rust: add tracepoint support
On Thu, Jun 06, 2024 at 12:00:36PM -0700, Boqun Feng wrote: > On Thu, Jun 06, 2024 at 07:35:44PM +0200, Peter Zijlstra wrote: > > On Thu, Jun 06, 2024 at 08:49:06AM -0700, Boqun Feng wrote: > > > > > Long-term plan is to 1) compile the C helpers in some IR and 2) inline > > > the helpers with Rust in IR-level, as what Gary has: > > > > > > > > > https://lore.kernel.org/rust-for-linux/20240529202817.3641974-1-g...@garyguo.net/ > > > > Urgh, that still needs us to maintain that silly list of helpers :-/ > > > > But it's an improvement from the current stage, right? ;-) Somewhat, but only marginal. > > Can't we pretty please have clang parse the actual header files into IR > > and munge that into rust? So that we don't get to manually duplicate > > everything+dog. > > That won't always work, because some of our kernel APIs are defined as > macros, and I don't think it's a trivial job to generate a macro > definition to a function definition so that it can be translated to > something in IR. We will have to do the macro -> function mapping > ourselves somewhere, if we want to inline the API across languages. We can try and see how far we can get with moving a bunch of stuff into inlines. There's quite a bit of simple CPP that could be inlines or const objects I suppose. Things like the tracepoints are of course glorious CPP abuse and are never going to work. But perhaps you can have an explicit 'eval-CPP on this here' construct or whatnot. If I squit I see this paste! thingy (WTF's up with that ! operator?) to munge function names in the static_call thing. So something like apply CPP from over there on this here can also be done :-)
Re: [PATCH 1/3] rust: add static_call support
On Thu, Jun 06, 2024 at 09:09:00PM +0200, Miguel Ojeda wrote: > On Thu, Jun 6, 2024 at 7:19 PM Peter Zijlstra wrote: > > > > This is absolutely unreadable gibberish -- how am I supposed to keep > > this in sync with the rest of the static_call infrastructure? > > Yeah, they are macros, which look different from "normal" Rust code. Macros like CPP ? > Is there something we could do to help here? I think Alice and others > would be happy to explain how it works and/or help maintain it in the > future if you prefer. Write a new language that looks more like C -- pretty please ? :-) Mostly I would just really like you to just use arm/jump_label.h, they're inline functions and should be doable with IR, no weirdo CPP involved in this case.
[PATCH v3 2/4] tracing/timer: use __print_sym()
From: Johannes Berg Use the new __print_sym() in the timer tracing, just to show how to convert something. This adds ~80 bytes of .text for a saving of ~1.5K of data in my builds. Note the format changes from print fmt: "success=%d dependency=%s", REC->success, __print_symbolic(REC->dependency, { 0, "NONE" }, { (1 << 0), "POSIX_TIMER" }, { (1 << 1), "PERF_EVENTS" }, { (1 << 2), "SCHED" }, { (1 << 3), "CLOCK_UNSTABLE" }, { (1 << 4), "RCU" }, { (1 << 5), "RCU_EXP" }) to print fmt: "success=%d dependency=%s", REC->success, __print_symbolic(REC->dependency, { 0, "NONE" }, { 1, "POSIX_TIMER" }, { 2, "PERF_EVENTS" }, { 4, "SCHED" }, { 8, "CLOCK_UNSTABLE" }, { 16, "RCU" }, { 32, "RCU_EXP" }) since the values are now just printed in the show function as pure decimal values. Signed-off-by: Johannes Berg --- include/trace/events/timer.h | 22 +++--- 1 file changed, 7 insertions(+), 15 deletions(-) diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h index 1ef58a04fc57..d483abffed78 100644 --- a/include/trace/events/timer.h +++ b/include/trace/events/timer.h @@ -402,26 +402,18 @@ TRACE_EVENT(itimer_expire, #undef tick_dep_mask_name #undef tick_dep_name_end -/* The MASK will convert to their bits and they need to be processed too */ -#define tick_dep_name(sdep) TRACE_DEFINE_ENUM(TICK_DEP_BIT_##sdep); \ - TRACE_DEFINE_ENUM(TICK_DEP_MASK_##sdep); -#define tick_dep_name_end(sdep) TRACE_DEFINE_ENUM(TICK_DEP_BIT_##sdep); \ - TRACE_DEFINE_ENUM(TICK_DEP_MASK_##sdep); -/* NONE only has a mask defined for it */ -#define tick_dep_mask_name(sdep) TRACE_DEFINE_ENUM(TICK_DEP_MASK_##sdep); - -TICK_DEP_NAMES - -#undef tick_dep_name -#undef tick_dep_mask_name -#undef tick_dep_name_end - #define tick_dep_name(sdep) { TICK_DEP_MASK_##sdep, #sdep }, #define tick_dep_mask_name(sdep) { TICK_DEP_MASK_##sdep, #sdep }, #define tick_dep_name_end(sdep) { TICK_DEP_MASK_##sdep, #sdep } +TRACE_DEFINE_SYM_LIST(tick_dep_names, TICK_DEP_NAMES); + +#undef tick_dep_name +#undef tick_dep_mask_name +#undef tick_dep_name_end + #define show_tick_dep_name(val)\ - __print_symbolic(val, TICK_DEP_NAMES) + __print_sym(val, tick_dep_names) TRACE_EVENT(tick_stop, -- 2.45.1
[PATCH v3 0/4] tracing: improve symbolic printing
Completely forgot about this again ... here's another respin. v2 was: - rebased on 6.9-rc1 - always search for __print_sym() and get rid of the DYNPRINT flag and associated code; I think ideally we'll just remove the older __print_symbolic() entirely - use ':' as the separator instead of "//" since that makes searching for it much easier and it's still not a valid char in an identifier - fix RCU v3: - fix #undef issues - fix drop_monitor default - rebase on linux-trace/for-next (there were no conflicts) - move net patches to 3/4 - clarify symbol name matching logic (and remove ")" from it) To recap, it's annoying to have irq/65-iwlwifi:-401 [000]22.79: kfree_skb: ... reason: 0x2 and much nicer to see irq/65-iwlwifi:-401 [000]22.79: kfree_skb: ... reason: RX_DROP_MONITOR but this doesn't work now because __print_symbolic() can only deal with a hard-coded list (which is actually really big.) So here's __print_sym() which doesn't build the list into the kernel image, but creates it at runtime. For userspace, it will look the same as __print_symbolic() (it literally shows __print_symbolic() to userspace) so no changes are needed, but the actual list of values exposed to userspace in there is built dynamically. For SKB drop reasons, this then has all the reasons known when userspace queries the trace format. I guess patch 3/4 should go through net-next, so not sure how to handle this patch series. Or perhaps, as this will not cause conflicts, in fact I've been rebasing it for a long time, go through tracing anyway with an Ack from netdev? But I can also just wait for the trace patch(es) to land and resubmit the net patches after. Assuming this looks good at all :-) Thanks, johannes %
[PATCH v3 3/4] net: dropreason: use new __print_sym() in tracing
From: Johannes Berg The __print_symbolic() could only ever print the core drop reasons, since that's the way the infrastructure works. Now that we have __print_sym() with all the advantages mentioned in that commit, convert to that and get all the drop reasons from all subsystems. As we already have a list of them, that's really easy. This is a little bit of .text (~100 bytes in my build) and saves a lot of .data (~17k). Signed-off-by: Johannes Berg --- include/net/dropreason.h | 5 + include/trace/events/skb.h | 16 +++--- net/core/skbuff.c | 43 ++ 3 files changed, 51 insertions(+), 13 deletions(-) diff --git a/include/net/dropreason.h b/include/net/dropreason.h index 56cb7be92244..c157070b5303 100644 --- a/include/net/dropreason.h +++ b/include/net/dropreason.h @@ -42,6 +42,11 @@ struct drop_reason_list { extern const struct drop_reason_list __rcu * drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_NUM]; +#ifdef CONFIG_TRACEPOINTS +const char *drop_reason_lookup(unsigned long long value); +void drop_reason_show(struct seq_file *m); +#endif + void drop_reasons_register_subsys(enum skb_drop_reason_subsys subsys, const struct drop_reason_list *list); void drop_reasons_unregister_subsys(enum skb_drop_reason_subsys subsys); diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h index 07e0715628ec..8a1a63f9e796 100644 --- a/include/trace/events/skb.h +++ b/include/trace/events/skb.h @@ -8,15 +8,9 @@ #include #include #include +#include -#undef FN -#define FN(reason) TRACE_DEFINE_ENUM(SKB_DROP_REASON_##reason); -DEFINE_DROP_REASON(FN, FN) - -#undef FN -#undef FNe -#define FN(reason) { SKB_DROP_REASON_##reason, #reason }, -#define FNe(reason){ SKB_DROP_REASON_##reason, #reason } +TRACE_DEFINE_SYM_FNS(drop_reason, drop_reason_lookup, drop_reason_show); /* * Tracepoint for free an sk_buff: @@ -44,13 +38,9 @@ TRACE_EVENT(kfree_skb, TP_printk("skbaddr=%p protocol=%u location=%pS reason: %s", __entry->skbaddr, __entry->protocol, __entry->location, - __print_symbolic(__entry->reason, - DEFINE_DROP_REASON(FN, FNe))) + __print_sym(__entry->reason, drop_reason )) ); -#undef FN -#undef FNe - TRACE_EVENT(consume_skb, TP_PROTO(struct sk_buff *skb, void *location), diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 466999a7515e..cd1ea6c3e0f8 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -145,6 +145,49 @@ drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_NUM] = { }; EXPORT_SYMBOL(drop_reasons_by_subsys); +#ifdef CONFIG_TRACEPOINTS +const char *drop_reason_lookup(unsigned long long value) +{ + unsigned long long subsys_id = value >> SKB_DROP_REASON_SUBSYS_SHIFT; + u32 reason = value & ~SKB_DROP_REASON_SUBSYS_MASK; + const struct drop_reason_list *subsys; + + if (subsys_id >= SKB_DROP_REASON_SUBSYS_NUM) + return NULL; + + subsys = rcu_dereference(drop_reasons_by_subsys[subsys_id]); + if (!subsys) + return NULL; + if (reason >= subsys->n_reasons) + return NULL; + return subsys->reasons[reason]; +} + +void drop_reason_show(struct seq_file *m) +{ + u32 subsys_id; + + rcu_read_lock(); + for (subsys_id = 0; subsys_id < SKB_DROP_REASON_SUBSYS_NUM; subsys_id++) { + const struct drop_reason_list *subsys; + u32 i; + + subsys = rcu_dereference(drop_reasons_by_subsys[subsys_id]); + if (!subsys) + continue; + + for (i = 0; i < subsys->n_reasons; i++) { + if (!subsys->reasons[i]) + continue; + seq_printf(m, ", { %u, \"%s\" }", + (subsys_id << SKB_DROP_REASON_SUBSYS_SHIFT) | i, + subsys->reasons[i]); + } + } + rcu_read_unlock(); +} +#endif + /** * drop_reasons_register_subsys - register another drop reason subsystem * @subsys: the subsystem to register, must not be the core -- 2.45.1
[PATCH v3 1/4] tracing: add __print_sym() to replace __print_symbolic()
From: Johannes Berg The way __print_symbolic() works is limited and inefficient in multiple ways: - you can only use it with a static list of symbols, but e.g. the SKB dropreasons are now a dynamic list - it builds the list in memory _three_ times, so it takes a lot of memory: - The print_fmt contains the list (since it's passed to the macro there). This actually contains the names _twice_, which is fixed up at runtime. - TRACE_DEFINE_ENUM() puts a 24-byte struct trace_eval_map for every entry, plus the string pointed to by it, which cannot be deduplicated with the strings in the print_fmt - The in-kernel symbolic printing creates yet another list of struct trace_print_flags for trace_print_symbols_seq() - it also requires runtime fixup during init, which is a lot of string parsing due to the print_fmt fixup Introduce __print_sym() to - over time - replace the old one. We can easily extend this also to __print_flags later, but I cared only about the SKB dropreasons for now, which has only __print_symbolic(). This new __print_sym() requires only a single list of items, created by TRACE_DEFINE_SYM_LIST(), or can even use another already existing list by using TRACE_DEFINE_SYM_FNS() with lookup and show methods. Then, instead of doing an init-time fixup, just do this at the time when userspace reads the print_fmt. This way, dynamically updated lists are possible. For userspace, nothing actually changes, because the print_fmt is shown exactly the same way the old __print_symbolic() was. This adds about 4k .text in my test builds, but that'll be more than paid for by the actual conversions. Signed-off-by: Johannes Berg --- v2: - fix RCU - use ':' as separator to simplify the code, that's still not valid in a C identifier v3: - add missing #undef lines (reported by Simon Horman) - clarify name is not NUL-terminated and fix logic for the comparison for that --- include/asm-generic/vmlinux.lds.h | 3 +- include/linux/module.h | 2 + include/linux/trace_events.h | 7 ++ include/linux/tracepoint.h | 20 + include/trace/stages/init.h| 55 + include/trace/stages/stage2_data_offsets.h | 6 ++ include/trace/stages/stage3_trace_output.h | 9 ++ include/trace/stages/stage7_class_define.h | 3 + kernel/module/main.c | 3 + kernel/trace/trace_events.c| 95 +- kernel/trace/trace_output.c| 45 ++ 11 files changed, 245 insertions(+), 3 deletions(-) diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index 5703526d6ebf..8275a06bcaee 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -258,7 +258,8 @@ #define FTRACE_EVENTS() \ . = ALIGN(8); \ BOUNDED_SECTION(_ftrace_events) \ - BOUNDED_SECTION_BY(_ftrace_eval_map, _ftrace_eval_maps) + BOUNDED_SECTION_BY(_ftrace_eval_map, _ftrace_eval_maps) \ + BOUNDED_SECTION(_ftrace_sym_defs) #else #define FTRACE_EVENTS() #endif diff --git a/include/linux/module.h b/include/linux/module.h index ffa1c603163c..7256762d59e1 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -524,6 +524,8 @@ struct module { unsigned int num_trace_events; struct trace_eval_map **trace_evals; unsigned int num_trace_evals; + struct trace_sym_def **trace_sym_defs; + unsigned int num_trace_sym_defs; #endif #ifdef CONFIG_FTRACE_MCOUNT_RECORD unsigned int num_ftrace_callsites; diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index 9df3e2973626..2743280c9a46 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -27,6 +27,13 @@ const char *trace_print_flags_seq(struct trace_seq *p, const char *delim, const char *trace_print_symbols_seq(struct trace_seq *p, unsigned long val, const struct trace_print_flags *symbol_array); +const char *trace_print_sym_seq(struct trace_seq *p, unsigned long long val, + const char *(*lookup)(unsigned long long val)); +const char *trace_sym_lookup(const struct trace_sym_entry *list, +size_t len, unsigned long long value); +void trace_sym_show(struct seq_file *m, + const struct trace_sym_entry *list, size_t len); + #if BITS_PER_LONG == 32 const char *trace_print_flags_seq_u64(struct trace_seq *p, const char *delim, unsigned long long flags, diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h index 689b6d71590e..cc3b387953d1 100644 --- a/include/linux/tracepoint.h +++ b/include/linux/tracepoint.h @@ -31,6 +31,24 @@ struct trace_eval_map {
[PATCH v3 4/4] net: drop_monitor: use drop_reason_lookup()
From: Johannes Berg Now that we have drop_reason_lookup(), we can just use it for drop_monitor as well, rather than exporting the list itself. Signed-off-by: Johannes Berg --- v3: - look up SKB_DROP_REASON_NOT_SPECIFIED if initial lookup returns NULL, to preserve previous behaviour --- include/net/dropreason.h | 4 net/core/drop_monitor.c | 20 +--- net/core/skbuff.c| 6 +++--- 3 files changed, 8 insertions(+), 22 deletions(-) diff --git a/include/net/dropreason.h b/include/net/dropreason.h index c157070b5303..0e2195ccf2cd 100644 --- a/include/net/dropreason.h +++ b/include/net/dropreason.h @@ -38,10 +38,6 @@ struct drop_reason_list { size_t n_reasons; }; -/* Note: due to dynamic registrations, access must be under RCU */ -extern const struct drop_reason_list __rcu * -drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_NUM]; - #ifdef CONFIG_TRACEPOINTS const char *drop_reason_lookup(unsigned long long value); void drop_reason_show(struct seq_file *m); diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c index 430ed18f8584..fddf6b68bf06 100644 --- a/net/core/drop_monitor.c +++ b/net/core/drop_monitor.c @@ -610,9 +610,8 @@ static int net_dm_packet_report_fill(struct sk_buff *msg, struct sk_buff *skb, size_t payload_len) { struct net_dm_skb_cb *cb = NET_DM_SKB_CB(skb); - const struct drop_reason_list *list = NULL; - unsigned int subsys, subsys_reason; char buf[NET_DM_MAX_SYMBOL_LEN]; + const char *reason_str; struct nlattr *attr; void *hdr; int rc; @@ -630,19 +629,10 @@ static int net_dm_packet_report_fill(struct sk_buff *msg, struct sk_buff *skb, goto nla_put_failure; rcu_read_lock(); - subsys = u32_get_bits(cb->reason, SKB_DROP_REASON_SUBSYS_MASK); - if (subsys < SKB_DROP_REASON_SUBSYS_NUM) - list = rcu_dereference(drop_reasons_by_subsys[subsys]); - subsys_reason = cb->reason & ~SKB_DROP_REASON_SUBSYS_MASK; - if (!list || - subsys_reason >= list->n_reasons || - !list->reasons[subsys_reason] || - strlen(list->reasons[subsys_reason]) > NET_DM_MAX_REASON_LEN) { - list = rcu_dereference(drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_CORE]); - subsys_reason = SKB_DROP_REASON_NOT_SPECIFIED; - } - if (nla_put_string(msg, NET_DM_ATTR_REASON, - list->reasons[subsys_reason])) { + reason_str = drop_reason_lookup(cb->reason); + if (unlikely(!reason_str)) + reason_str = drop_reason_lookup(SKB_DROP_REASON_NOT_SPECIFIED); + if (nla_put_string(msg, NET_DM_ATTR_REASON, reason_str)) { rcu_read_unlock(); goto nla_put_failure; } diff --git a/net/core/skbuff.c b/net/core/skbuff.c index cd1ea6c3e0f8..bd4fb7410284 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -139,13 +139,11 @@ static const struct drop_reason_list drop_reasons_core = { .n_reasons = ARRAY_SIZE(drop_reasons), }; -const struct drop_reason_list __rcu * +static const struct drop_reason_list __rcu * drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_NUM] = { [SKB_DROP_REASON_SUBSYS_CORE] = RCU_INITIALIZER(&drop_reasons_core), }; -EXPORT_SYMBOL(drop_reasons_by_subsys); -#ifdef CONFIG_TRACEPOINTS const char *drop_reason_lookup(unsigned long long value) { unsigned long long subsys_id = value >> SKB_DROP_REASON_SUBSYS_SHIFT; @@ -162,7 +160,9 @@ const char *drop_reason_lookup(unsigned long long value) return NULL; return subsys->reasons[reason]; } +EXPORT_SYMBOL(drop_reason_lookup); +#ifdef CONFIG_TRACEPOINTS void drop_reason_show(struct seq_file *m) { u32 subsys_id; -- 2.45.1
Re: [PATCH v4] drivers: remoteproc: xlnx: add attach detach support
On Wed, 5 Jun 2024 at 11:47, Tanmay Shah wrote: > > > > On 6/4/24 3:23 PM, Bjorn Andersson wrote: > > On Mon, Jun 03, 2024 at 01:34:38PM -0700, Tanmay Shah wrote: > >> It is possible that remote processor is already running before > >> linux boot or remoteproc platform driver probe. Implement required > >> remoteproc framework ops to provide resource table address and > >> connect or disconnect with remote processor in such case. > >> > > > > I know if changes the current style for this driver, but could we drop > > "drivers: " from the subject prefix, to align changes to this driver > > with others? > > > > Ack. Will be fixed. Is it convention not to have "drivers" ? "drivers" is not needed. > If so, from now on, I will format subject prefix: /: > That will be just fine. > > Regards, > > Bjorn >
Re: [PATCH v4] drivers: remoteproc: xlnx: add attach detach support
On Wed, Jun 05, 2024 at 12:45:17PM -0500, Tanmay Shah wrote: > > > On 6/4/24 10:34 AM, Mathieu Poirier wrote: > > Hi Mathieu, > > Thanks for reviews. > Please find my comments below. > > > Hi Tanmay, > > > > On Mon, Jun 03, 2024 at 01:34:38PM -0700, Tanmay Shah wrote: > >> It is possible that remote processor is already running before > >> linux boot or remoteproc platform driver probe. Implement required > >> remoteproc framework ops to provide resource table address and > >> connect or disconnect with remote processor in such case. > >> > >> Signed-off-by: Tanmay Shah > >> > >> --- > >> Changes in v4: > >> - Move change log out of commit text > >> > >> Changes in v3: > >> > >> - Drop SRAM patch from the series > >> - Change type from "struct resource_table *" to void __iomem * > >> - Change comment format from /** to /* > >> - Remove unmap of resource table va address during detach, allowing > >> attach-detach-reattach use case. > >> - Unmap rsc_data_va after retrieving resource table data structure. > >> - Unmap resource table va during driver remove op > >> > >> Changes in v2: > >> > >> - Fix typecast warnings reported using sparse tool. > >> - Fix following sparse warnings: > >> > >> drivers/remoteproc/xlnx_r5_remoteproc.c:827:21: sparse: warning: incorrect > >> type in assignment (different address spaces) > >> drivers/remoteproc/xlnx_r5_remoteproc.c:844:18: sparse: warning: incorrect > >> type in assignment (different address spaces) > >> drivers/remoteproc/xlnx_r5_remoteproc.c:898:24: sparse: warning: incorrect > >> type in argument 1 (different address spaces) > >> drivers/remoteproc/xlnx_r5_remoteproc.c | 173 +++- > >> 1 file changed, 169 insertions(+), 4 deletions(-) > >> > >> diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c > >> b/drivers/remoteproc/xlnx_r5_remoteproc.c > >> index 84243d1dff9f..6898d4761566 100644 > >> --- a/drivers/remoteproc/xlnx_r5_remoteproc.c > >> +++ b/drivers/remoteproc/xlnx_r5_remoteproc.c > >> @@ -25,6 +25,10 @@ > >> /* RX mailbox client buffer max length */ > >> #define MBOX_CLIENT_BUF_MAX (IPI_BUF_LEN_MAX + \ > >> sizeof(struct zynqmp_ipi_message)) > >> + > >> +#define RSC_TBL_XLNX_MAGIC((uint32_t)'x' << 24 | (uint32_t)'a' << > >> 16 | \ > >> + (uint32_t)'m' << 8 | (uint32_t)'p') > >> + > >> /* > >> * settings for RPU cluster mode which > >> * reflects possible values of xlnx,cluster-mode dt-property > >> @@ -73,6 +77,26 @@ struct mbox_info { > >>struct mbox_chan *rx_chan; > >> }; > >> > >> +/** > >> + * struct rsc_tbl_data > >> + * > >> + * Platform specific data structure used to sync resource table address. > >> + * It's important to maintain order and size of each field on remote side. > >> + * > >> + * @version: version of data structure > >> + * @magic_num: 32-bit magic number. > >> + * @comp_magic_num: complement of above magic number > >> + * @rsc_tbl_size: resource table size > >> + * @rsc_tbl: resource table address > >> + */ > >> +struct rsc_tbl_data { > >> + const int version; > >> + const u32 magic_num; > >> + const u32 comp_magic_num; > > > > I thought we agreed on making the magic number a u64 - did I get this wrong? > > > > Looks like I missed this comment from previous reviews, so didn't address it. > Thanks for pointing this. > > So I think having two 32-bit numbers with proper name, implies what is > expected and less chance of errors. > With 64-bit number, it's easy to create errors when assigning magic number. > > However, if 64-bit number is preferred, I will change it and test it. > Please let me know. I was under the impression we had agreed on a u64 but that may just be my own interpretation. Things can stay as they are now. > > > >> + const u32 rsc_tbl_size; > >> + const uintptr_t rsc_tbl; > >> +} __packed; > >> + > >> /* > >> * Hardcoded TCM bank values. This will stay in driver to maintain > >> backward > >> * compatibility with device-tree that does not have TCM information. > >> @@ -95,20 +119,24 @@ static const struct mem_bank_data > >> zynqmp_tcm_banks_lockstep[] = { > >> /** > >> * struct zynqmp_r5_core > >> * > >> + * @rsc_tbl_va: resource table virtual address > >> * @dev: device of RPU instance > >> * @np: device node of RPU instance > >> * @tcm_bank_count: number TCM banks accessible to this RPU > >> * @tcm_banks: array of each TCM bank data > >> * @rproc: rproc handle > >> + * @rsc_tbl_size: resource table size retrieved from remote > >> * @pm_domain_id: RPU CPU power domain id > >> * @ipi: pointer to mailbox information > >> */ > >> struct zynqmp_r5_core { > >> + void __iomem *rsc_tbl_va; > >>struct device *dev; > >>struct device_node *np; > >>int tcm_bank_count; > >>struct mem_bank_data **tcm_banks; > >>struct rproc *rproc; > >> + u32 rsc_tbl_size; > >>u32 pm_domain_id; > >>struct mbox_info *ipi; > >> }; >
[PATCH v3 01/13] ring-buffer: Allow mapped field to be set without mapping
From: "Steven Rostedt (Google)" In preparation for having the ring buffer mapped to a dedicated location, which will have the same restrictions as user space memory mapped buffers, allow it to use the "mapped" field of the ring_buffer_per_cpu structure without having the user space meta page mapping. When this starts using the mapped field, it will need to handle adding a user space mapping (and removing it) from a ring buffer that is using a dedicated memory range. Signed-off-by: Steven Rostedt (Google) --- kernel/trace/ring_buffer.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 28853966aa9a..78beaccf9c8c 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -5224,6 +5224,9 @@ static void rb_update_meta_page(struct ring_buffer_per_cpu *cpu_buffer) { struct trace_buffer_meta *meta = cpu_buffer->meta_page; + if (!meta) + return; + meta->reader.read = cpu_buffer->reader_page->read; meta->reader.id = cpu_buffer->reader_page->id; meta->reader.lost_events = cpu_buffer->lost_events; @@ -6167,7 +6170,7 @@ rb_get_mapped_buffer(struct trace_buffer *buffer, int cpu) mutex_lock(&cpu_buffer->mapping_lock); - if (!cpu_buffer->mapped) { + if (!cpu_buffer->mapped || !cpu_buffer->meta_page) { mutex_unlock(&cpu_buffer->mapping_lock); return ERR_PTR(-ENODEV); } @@ -6359,12 +6362,13 @@ int ring_buffer_map(struct trace_buffer *buffer, int cpu, */ raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); rb_setup_ids_meta_page(cpu_buffer, subbuf_ids); + raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); err = __rb_map_vma(cpu_buffer, vma); if (!err) { raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); - cpu_buffer->mapped = 1; + cpu_buffer->mapped++; raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); } else { kfree(cpu_buffer->subbuf_ids); @@ -6403,7 +6407,8 @@ int ring_buffer_unmap(struct trace_buffer *buffer, int cpu) mutex_lock(&buffer->mutex); raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); - cpu_buffer->mapped = 0; + WARN_ON_ONCE(!cpu_buffer->mapped); + cpu_buffer->mapped--; raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); -- 2.43.0
[PATCH v3 02/13] ring-buffer: Add ring_buffer_alloc_range()
From: "Steven Rostedt (Google)" In preparation to allowing the trace ring buffer to be allocated in a range of memory that is persistent across reboots, add ring_buffer_alloc_range(). It takes a contiguous range of memory and will split it up evenly for the per CPU ring buffers. If there's not enough memory to handle all CPUs with the minimum size, it will fail to allocate the ring buffer. Signed-off-by: Steven Rostedt (Google) --- include/linux/ring_buffer.h | 17 +++ kernel/trace/ring_buffer.c | 239 ++-- 2 files changed, 220 insertions(+), 36 deletions(-) diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h index 96d2140b471e..a50b0223b1d3 100644 --- a/include/linux/ring_buffer.h +++ b/include/linux/ring_buffer.h @@ -89,6 +89,11 @@ void ring_buffer_discard_commit(struct trace_buffer *buffer, struct trace_buffer * __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *key); +struct trace_buffer *__ring_buffer_alloc_range(unsigned long size, unsigned flags, + int order, unsigned long start, + unsigned long range_size, + struct lock_class_key *key); + /* * Because the ring buffer is generic, if other users of the ring buffer get * traced by ftrace, it can produce lockdep warnings. We need to keep each @@ -100,6 +105,18 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k __ring_buffer_alloc((size), (flags), &__key); \ }) +/* + * Because the ring buffer is generic, if other users of the ring buffer get + * traced by ftrace, it can produce lockdep warnings. We need to keep each + * ring buffer's lock class separate. + */ +#define ring_buffer_alloc_range(size, flags, order, start, range_size) \ +({ \ + static struct lock_class_key __key; \ + __ring_buffer_alloc_range((size), (flags), (order), (start),\ + (range_size), &__key);\ +}) + typedef bool (*ring_buffer_cond_fn)(void *data); int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full, ring_buffer_cond_fn cond, void *data); diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 78beaccf9c8c..53abe7916f2b 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -42,6 +42,9 @@ static void update_pages_handler(struct work_struct *work); +struct ring_buffer_meta { +}; + /* * The ring buffer header is special. We must manually up keep it. */ @@ -342,7 +345,8 @@ struct buffer_page { local_t entries; /* entries on this page */ unsigned longreal_end; /* real end of data */ unsigned order; /* order of the page */ - u32 id;/* ID for external mapping */ + u32 id:30; /* ID for external mapping */ + u32 range:1; /* Mapped via a range */ struct buffer_data_page *page; /* Actual data page */ }; @@ -373,7 +377,9 @@ static __always_inline unsigned int rb_page_commit(struct buffer_page *bpage) static void free_buffer_page(struct buffer_page *bpage) { - free_pages((unsigned long)bpage->page, bpage->order); + /* Range pages are not to be freed */ + if (!bpage->range) + free_pages((unsigned long)bpage->page, bpage->order); kfree(bpage); } @@ -523,6 +529,9 @@ struct trace_buffer { struct rb_irq_work irq_work; booltime_stamp_abs; + unsigned long range_addr_start; + unsigned long range_addr_end; + unsigned intsubbuf_size; unsigned intsubbuf_order; unsigned intmax_data_size; @@ -1490,9 +1499,70 @@ static void rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer) } } +/* + * Take an address, add the meta data size as well as the array of + * array subbuffer indexes, then align it to a subbuffer size. + * + * This is used to help find the next per cpu subbuffer within a mapped range. + */ +static unsigned long +rb_range_align_subbuf(unsigned long addr, int subbuf_size, int nr_subbufs) +{ + addr += sizeof(struct ring_buffer_meta) + + sizeof(int) * nr_subbufs; + return ALIGN(addr, subbuf_size); +} + +/* + * Return a specific sub-buffer for a given @cpu defined by @idx. + */ +static void *rb_range_buffer(struct trace_buffer *buffer, int cpu, int nr_pages, int idx) +{ + unsigned long ptr; + int subbuf_size = buffer->subbuf_size + BUF_PAGE_HDR_SIZE; + int nr_subbufs; + + /* Include the reader page */ + nr_subbufs = nr_pag
[PATCH v3 00/13] tracing: Persistent traces across a reboot or crash
This is a way to map a ring buffer instance across reboots. The requirement is that you have a memory region that is not erased. I tested this on a Debian VM running on qemu on a Debian server, and even tested it on a baremetal box running Fedora. I was surprised that it worked on the baremetal box, but it does so surprisingly consistently. This series does not require the ring buffer mapping, but simply takes a physical address that has been reserved via memmap (on x86 only) An example of the kernel command line is: memmap=12M$0x28540 trace_instance=boot_mapped@0x28540:12M The above will reserve 12M at physical address 0x28540 (done by the existing memmap command line option), and then the trace_instance option was extended to take an address and size (@0x28540:12M). It will then vmap() that address and allocate a ring buffer in it. If a ring buffer already exists, it will use it and expose the contents to user space. The memory reserved is used by the ring buffer of this instance. It acts like a memory mapped instance so it has some limitations. It does not allow snapshots nor does it allow tracers which use a snapshot buffer (like irqsoff and wakeup tracers). On boot up, when setting up the ring buffer, it looks at the current content and does a vigorous test to see if the content is valid. It even walks the events in all the sub-buffers to make sure the ring buffer meta data is correct. If it determines that the content is valid, it will reconstruct the ring buffer to use the content it has found. If the buffer is valid, on the next boot, the boot_mapped instance will contain the data from the previous boot. You can cat the trace or trace_pipe file, or even run trace-cmd extract on it to make a trace.dat file that holds the date. This is much better than dealing with a ftrace_dump_on_opps (I wish I had this a decade ago!) There are still some limitations of this buffer. One is that it assumes that the kernel you are booting back into is the same one that crashed. At least the trace_events (like sched_switch and friends) all have the same ids. This would be true with the same kernel as the ids are determined at link time. Module events could possible be a problem as the ids may not match. This version of the patch series saves a text function and a data string address in the persistent memory, and this is used to calculate the delta between text and data addresses of the new boot up. Now function tracing and "%pS" still work across boots. Even the RCU trace events that point to static strings work as well! The delta is exported by a new file in the instance called "last_boot_info" that has something like this: # cat last_boot_info text delta:-268435456 data delta:-268435456 This can be used by trace-cmd that reads the trace_pipe_raw data and now can figure out how to map the print_formats and kallsyms to the raw data in the buffers. This can be used to debug kernel shutdown. I ran the following: # trace-cmd start -B boot_mapped -p function # reboot [after reboot] # trace-cmd show -B boot_mapped | tail -20 swapper/0-1 [000] d..1.63.479667: preempt_count_add <-delay_tsc swapper/0-1 [000] d..2.63.479669: preempt_count_sub <-delay_tsc swapper/0-1 [000] d..1.63.479671: disable_local_APIC <-native_stop_other_cpus swapper/0-1 [000] d..1.63.479673: clear_local_APIC.part.0 <-disable_local_APIC swapper/0-1 [000] d..1.63.479716: mcheck_cpu_clear <-native_stop_other_cpus swapper/0-1 [000] d..1.63.479718: mce_intel_feature_clear <-native_stop_other_cpus swapper/0-1 [000] d..1.63.479720: lmce_supported <-mce_intel_feature_clear swapper/0-1 [000] d..1.63.479732: lapic_shutdown <-native_machine_shutdown swapper/0-1 [000] d..1.63.479735: disable_local_APIC <-native_machine_shutdown swapper/0-1 [000] d..1.63.479736: clear_local_APIC.part.0 <-disable_local_APIC swapper/0-1 [000] d..1.63.479763: restore_boot_irq_mode <-native_machine_shutdown swapper/0-1 [000] d..1.63.479763: native_restore_boot_irq_mode <-native_machine_shutdown swapper/0-1 [000] d..1.63.479764: disconnect_bsp_APIC <-native_machine_shutdown swapper/0-1 [000] d..1.63.479777: hpet_disable <-native_machine_shutdown swapper/0-1 [000] d..1.63.479778: iommu_shutdown_noop <-native_machine_restart swapper/0-1 [000] d..1.63.479779: native_machine_emergency_restart <-__do_sys_reboot swapper/0-1 [000] d..1.63.479779: tboot_shutdown <-native_machine_emergency_restart swapper/0-1 [000] d..1.63.479790: acpi_reboot <-native_machine_emergency_restart swapper/0-1 [000] d..1.63.479791: acpi_reset <-acpi_reboot swapper/0-1 [000] d..1.63.479791: acpi_os_write_p
[PATCH v3 04/13] tracing: Implement creating an instance based on a given memory region
From: "Steven Rostedt (Google)" Allow for creating a new instance by passing in an address and size to map the ring buffer for the instance to. This will allow features like a pstore memory mapped region to be used for an tracing instance ring buffer that can be retrieved from one boot to the next. Signed-off-by: Steven Rostedt (Google) --- kernel/trace/trace.c | 50 +++- kernel/trace/trace.h | 4 2 files changed, 44 insertions(+), 10 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 578a49ff5c32..ff2b504fbe00 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -4921,6 +4921,11 @@ static int tracing_open(struct inode *inode, struct file *file) static bool trace_ok_for_array(struct tracer *t, struct trace_array *tr) { +#ifdef CONFIG_TRACER_SNAPSHOT + /* arrays with mapped buffer range do not have snapshots */ + if (tr->range_addr_start && t->use_max_tr) + return false; +#endif return (tr->flags & TRACE_ARRAY_FL_GLOBAL) || t->allow_instances; } @@ -8664,11 +8669,13 @@ tracing_init_tracefs_percpu(struct trace_array *tr, long cpu) tr, cpu, &tracing_entries_fops); #ifdef CONFIG_TRACER_SNAPSHOT - trace_create_cpu_file("snapshot", TRACE_MODE_WRITE, d_cpu, - tr, cpu, &snapshot_fops); + if (!tr->range_addr_start) { + trace_create_cpu_file("snapshot", TRACE_MODE_WRITE, d_cpu, + tr, cpu, &snapshot_fops); - trace_create_cpu_file("snapshot_raw", TRACE_MODE_READ, d_cpu, - tr, cpu, &snapshot_raw_fops); + trace_create_cpu_file("snapshot_raw", TRACE_MODE_READ, d_cpu, + tr, cpu, &snapshot_raw_fops); + } #endif } @@ -9205,7 +9212,18 @@ allocate_trace_buffer(struct trace_array *tr, struct array_buffer *buf, int size buf->tr = tr; - buf->buffer = ring_buffer_alloc(size, rb_flags); + if (tr->range_addr_start && tr->range_addr_size) { + buf->buffer = ring_buffer_alloc_range(size, rb_flags, 0, + tr->range_addr_start, + tr->range_addr_size); + /* +* This is basically the same as a mapped buffer, +* with the same restrictions. +*/ + tr->mapped++; + } else { + buf->buffer = ring_buffer_alloc(size, rb_flags); + } if (!buf->buffer) return -ENOMEM; @@ -9242,6 +9260,10 @@ static int allocate_trace_buffers(struct trace_array *tr, int size) return ret; #ifdef CONFIG_TRACER_MAX_TRACE + /* Fix mapped buffer trace arrays do not have snapshot buffers */ + if (tr->range_addr_start) + return 0; + ret = allocate_trace_buffer(tr, &tr->max_buffer, allocate_snapshot ? size : 1); if (MEM_FAIL(ret, "Failed to allocate trace buffer\n")) { @@ -9342,7 +9364,9 @@ static int trace_array_create_dir(struct trace_array *tr) } static struct trace_array * -trace_array_create_systems(const char *name, const char *systems) +trace_array_create_systems(const char *name, const char *systems, + unsigned long range_addr_start, + unsigned long range_addr_size) { struct trace_array *tr; int ret; @@ -9368,6 +9392,10 @@ trace_array_create_systems(const char *name, const char *systems) goto out_free_tr; } + /* Only for boot up memory mapped ring buffers */ + tr->range_addr_start = range_addr_start; + tr->range_addr_size = range_addr_size; + tr->trace_flags = global_trace.trace_flags & ~ZEROED_TRACE_FLAGS; cpumask_copy(tr->tracing_cpumask, cpu_all_mask); @@ -9425,7 +9453,7 @@ trace_array_create_systems(const char *name, const char *systems) static struct trace_array *trace_array_create(const char *name) { - return trace_array_create_systems(name, NULL); + return trace_array_create_systems(name, NULL, 0, 0); } static int instance_mkdir(const char *name) @@ -9479,7 +9507,7 @@ struct trace_array *trace_array_get_by_name(const char *name, const char *system goto out_unlock; } - tr = trace_array_create_systems(name, systems); + tr = trace_array_create_systems(name, systems, 0, 0); if (IS_ERR(tr)) tr = NULL; @@ -9672,8 +9700,10 @@ init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer) MEM_FAIL(1, "Could not allocate function filter files"); #ifdef CONFIG_TRACER_SNAPSHOT - trace_create_file("snapshot", TRACE_MODE_WRITE, d_tracer, - tr, &snapshot_fops); + if (!tr->ran
[PATCH v3 03/13] ring-buffer: Add ring_buffer_meta data
From: "Steven Rostedt (Google)" Populate the ring_buffer_meta array. It holds the pointer to the head_buffer (next to read), the commit_buffer (next to write) the size of the sub-buffers, number of sub-buffers and an array that keeps track of the order of the sub-buffers. This information will be stored in the persistent memory to help on reboot to reconstruct the ring buffer. Signed-off-by: Steven Rostedt (Google) --- kernel/trace/ring_buffer.c | 209 - 1 file changed, 184 insertions(+), 25 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 53abe7916f2b..385dc1750fc7 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -43,6 +43,11 @@ static void update_pages_handler(struct work_struct *work); struct ring_buffer_meta { + unsigned long head_buffer; + unsigned long commit_buffer; + __u32 subbuf_size; + __u32 nr_subbufs; + int buffers[]; }; /* @@ -500,6 +505,7 @@ struct ring_buffer_per_cpu { struct mutexmapping_lock; unsigned long *subbuf_ids;/* ID to subbuf VA */ struct trace_buffer_meta*meta_page; + struct ring_buffer_meta *ring_meta; /* ring buffer pages to update, > 0 to add, < 0 to remove */ longnr_pages_to_update; @@ -1260,6 +1266,11 @@ static void rb_head_page_activate(struct ring_buffer_per_cpu *cpu_buffer) * Set the previous list pointer to have the HEAD flag. */ rb_set_list_to_head(head->list.prev); + + if (cpu_buffer->ring_meta) { + struct ring_buffer_meta *meta = cpu_buffer->ring_meta; + meta->head_buffer = (unsigned long)head->page; + } } static void rb_list_head_clear(struct list_head *list) @@ -1514,51 +1525,127 @@ rb_range_align_subbuf(unsigned long addr, int subbuf_size, int nr_subbufs) } /* - * Return a specific sub-buffer for a given @cpu defined by @idx. + * Return the ring_buffer_meta for a given @cpu. */ -static void *rb_range_buffer(struct trace_buffer *buffer, int cpu, int nr_pages, int idx) +static void *rb_range_meta(struct trace_buffer *buffer, int nr_pages, int cpu) { - unsigned long ptr; int subbuf_size = buffer->subbuf_size + BUF_PAGE_HDR_SIZE; + unsigned long ptr = buffer->range_addr_start; + struct ring_buffer_meta *meta; int nr_subbufs; - /* Include the reader page */ - nr_subbufs = nr_pages + 1; + if (!ptr) + return NULL; + + /* When nr_pages passed in is zero, the first meta has already been initialized */ + if (!nr_pages) { + meta = (struct ring_buffer_meta *)ptr; + nr_subbufs = meta->nr_subbufs; + } else { + meta = NULL; + /* Include the reader page */ + nr_subbufs = nr_pages + 1; + } /* * The first chunk may not be subbuffer aligned, where as * the rest of the chunks are. */ - ptr = buffer->range_addr_start; - ptr = rb_range_align_subbuf(ptr, subbuf_size, nr_subbufs); if (cpu) { - unsigned long p; - - ptr += subbuf_size * nr_subbufs; - - /* Save the beginning of this CPU chunk */ - p = ptr; - ptr = rb_range_align_subbuf(ptr, subbuf_size, nr_subbufs); + ptr += subbuf_size * nr_subbufs; /* We can use multiplication to find chunks greater than 1 */ if (cpu > 1) { unsigned long size; + unsigned long p; + /* Save the beginning of this CPU chunk */ + p = ptr; + ptr = rb_range_align_subbuf(ptr, subbuf_size, nr_subbufs); ptr += subbuf_size * nr_subbufs; /* Now all chunks after this are the same size */ size = ptr - p; ptr += size * (cpu - 2); - - ptr = rb_range_align_subbuf(ptr, subbuf_size, nr_subbufs); } } - if (ptr + subbuf_size * nr_subbufs > buffer->range_addr_end) + return (void *)ptr; +} + +/* Return the start of subbufs given the meta pointer */ +static void *rb_subbufs_from_meta(struct ring_buffer_meta *meta) +{ + int subbuf_size = meta->subbuf_size; + unsigned long ptr; + + ptr = (unsigned long)meta; + ptr = rb_range_align_subbuf(ptr, subbuf_size, meta->nr_subbufs); + + return (void *)ptr; +} + +/* + * Return a specific sub-buffer for a given @cpu defined by @idx. + */ +static void *rb_range_buffer(struct ring_buffer_per_cpu *cpu_buffer, int idx) +{ + struct ring_buffer_meta *meta; + unsigned long ptr; + int subbuf_size; + + meta =
[PATCH v3 06/13] ring-buffer: Add test if range of boot buffer is valid
From: "Steven Rostedt (Google)" Add a test against the ring buffer memory range to see if it has valid data. The ring_buffer_meta structure is given a new field called "first_buffer" which holds the address of the first sub-buffer. This is used to both determine if the other fields are valid as well as finding the offset between the old addresses of the sub-buffer from the previous boot to the new addresses of the current boot. Since the values for nr_subbufs and subbuf_size is to be the same, check if the values in the meta page match the values calculated. Take the range of the first_buffer and the total size of all the buffers and make sure the saved head_buffer and commit_buffer fall in the range. Iterate through all the sub-buffers to make sure that the values in the sub-buffer "commit" field (the field that holds the amount of data on the sub-buffer) is within the end of the sub-buffer. Also check the index array to make sure that all the indexes are within nr_subbufs. Signed-off-by: Steven Rostedt (Google) --- kernel/trace/ring_buffer.c | 143 ++--- 1 file changed, 135 insertions(+), 8 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 093c73c617cc..aecd4a7d62be 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -45,6 +45,7 @@ static void update_pages_handler(struct work_struct *work); struct ring_buffer_meta { + unsigned long first_buffer; unsigned long head_buffer; unsigned long commit_buffer; __u32 subbuf_size; @@ -1617,21 +1618,103 @@ static void *rb_range_buffer(struct ring_buffer_per_cpu *cpu_buffer, int idx) return (void *)ptr; } +/* + * See if the existing memory contains valid ring buffer data. + * As the previous kernel must be the same as this kernel, all + * the calculations (size of buffers and number of buffers) + * must be the same. + */ +static bool rb_meta_valid(struct ring_buffer_meta *meta, int cpu, + struct trace_buffer *buffer, int nr_pages) +{ + int subbuf_size = PAGE_SIZE; + struct buffer_data_page *subbuf; + unsigned long buffers_start; + unsigned long buffers_end; + int i; + + /* The subbuffer's size and number of subbuffers must match */ + if (meta->subbuf_size != subbuf_size || + meta->nr_subbufs != nr_pages + 1) { + pr_info("Ring buffer boot meta [%d] mismatch of subbuf_size/nr_pages\n", cpu); + return false; + } + + buffers_start = meta->first_buffer; + buffers_end = meta->first_buffer + (subbuf_size * meta->nr_subbufs); + + /* Is the head and commit buffers within the range of buffers? */ + if (meta->head_buffer < buffers_start || + meta->head_buffer >= buffers_end) { + pr_info("Ring buffer boot meta [%d] head buffer out of range\n", cpu); + return false; + } + + if (meta->commit_buffer < buffers_start || + meta->commit_buffer >= buffers_end) { + pr_info("Ring buffer boot meta [%d] commit buffer out of range\n", cpu); + return false; + } + + subbuf = rb_subbufs_from_meta(meta); + + /* Is the meta buffers and the subbufs themselves have correct data? */ + for (i = 0; i < meta->nr_subbufs; i++) { + if (meta->buffers[i] < 0 || + meta->buffers[i] >= meta->nr_subbufs) { + pr_info("Ring buffer boot meta [%d] array out of range\n", cpu); + return false; + } + + if ((unsigned)local_read(&subbuf->commit) > subbuf_size) { + pr_info("Ring buffer boot meta [%d] buffer invalid commit\n", cpu); + return false; + } + + subbuf = (void *)subbuf + subbuf_size; + } + + pr_info("Ring buffer meta is from previous boot!\n"); + return true; +} + static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages) { struct ring_buffer_meta *meta; + unsigned long delta; void *subbuf; int cpu; int i; for (cpu = 0; cpu < nr_cpu_ids; cpu++) { + void *next_meta; + meta = rb_range_meta(buffer, nr_pages, cpu); + if (rb_meta_valid(meta, cpu, buffer, nr_pages)) { + /* Make the mappings match the current address */ + subbuf = rb_subbufs_from_meta(meta); + delta = (unsigned long)subbuf - meta->first_buffer; + meta->first_buffer += delta; + meta->head_buffer += delta; + meta->commit_buffer += delta; + continue; + } + + if (cpu < nr_cpu_ids - 1) + next_meta = rb_range_meta(buffer, nr_pages, cpu + 1); +
[PATCH v3 05/13] ring-buffer: Add output of ring buffer meta page
From: "Steven Rostedt (Google)" Add a buffer_meta per-cpu file for the trace instance that is mapped to boot memory. This shows the current meta-data and can be used by user space tools to record off the current mappings to help reconstruct the ring buffer after a reboot. It does not expose any virtual addresses, just indexes into the sub-buffer pages. Signed-off-by: Steven Rostedt (Google) --- kernel/trace/ring_buffer.c | 77 ++ kernel/trace/trace.c | 30 ++- kernel/trace/trace.h | 2 + 3 files changed, 107 insertions(+), 2 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 385dc1750fc7..093c73c617cc 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -32,6 +32,8 @@ #include #include +#include "trace.h" + /* * The "absolute" timestamp in the buffer is only 59 bits. * If a clock has the 5 MSBs set, it needs to be saved and @@ -1646,6 +1648,81 @@ static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages) } } +static void *rbm_start(struct seq_file *m, loff_t *pos) +{ + struct ring_buffer_per_cpu *cpu_buffer = m->private; + struct ring_buffer_meta *meta = cpu_buffer->ring_meta; + unsigned long val; + + if (!meta) + return NULL; + + if (*pos > meta->nr_subbufs) + return NULL; + + val = *pos; + val++; + + return (void *)val; +} + +static void *rbm_next(struct seq_file *m, void *v, loff_t *pos) +{ + (*pos)++; + + return rbm_start(m, pos); +} + +static int rb_meta_subbuf_idx(struct ring_buffer_meta *meta, void *subbuf); + +static int rbm_show(struct seq_file *m, void *v) +{ + struct ring_buffer_per_cpu *cpu_buffer = m->private; + struct ring_buffer_meta *meta = cpu_buffer->ring_meta; + unsigned long val = (unsigned long)v; + + if (val == 1) { + seq_printf(m, "head_buffer: %d\n", + rb_meta_subbuf_idx(meta, (void *)meta->head_buffer)); + seq_printf(m, "commit_buffer: %d\n", + rb_meta_subbuf_idx(meta, (void *)meta->commit_buffer)); + seq_printf(m, "subbuf_size: %d\n", meta->subbuf_size); + seq_printf(m, "nr_subbufs:%d\n", meta->nr_subbufs); + return 0; + } + + val -= 2; + seq_printf(m, "buffer[%ld]:%d\n", val, meta->buffers[val]); + + return 0; +} + +static void rbm_stop(struct seq_file *m, void *p) +{ +} + +static const struct seq_operations rb_meta_seq_ops = { + .start = rbm_start, + .next = rbm_next, + .show = rbm_show, + .stop = rbm_stop, +}; + +int ring_buffer_meta_seq_init(struct file *file, struct trace_buffer *buffer, int cpu) +{ + struct seq_file *m; + int ret; + + ret = seq_open(file, &rb_meta_seq_ops); + if (ret) + return ret; + + m = file->private_data; + m->private = buffer->buffers[cpu]; + + return 0; +} + static int __rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer, long nr_pages, struct list_head *pages) { diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index ff2b504fbe00..622fe670949d 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -5018,7 +5018,7 @@ static int show_traces_open(struct inode *inode, struct file *file) return 0; } -static int show_traces_release(struct inode *inode, struct file *file) +static int tracing_seq_release(struct inode *inode, struct file *file) { struct trace_array *tr = inode->i_private; @@ -5059,7 +5059,7 @@ static const struct file_operations show_traces_fops = { .open = show_traces_open, .read = seq_read, .llseek = seq_lseek, - .release= show_traces_release, + .release= tracing_seq_release, }; static ssize_t @@ -6860,6 +6860,22 @@ tracing_total_entries_read(struct file *filp, char __user *ubuf, return simple_read_from_buffer(ubuf, cnt, ppos, buf, r); } +static int tracing_buffer_meta_open(struct inode *inode, struct file *filp) +{ + struct trace_array *tr = inode->i_private; + int cpu = tracing_get_cpu(inode); + int ret; + + ret = tracing_check_open_get_tr(tr); + if (ret) + return ret; + + ret = ring_buffer_meta_seq_init(filp, tr->array_buffer.buffer, cpu); + if (ret < 0) + __trace_array_put(tr); + return ret; +} + static ssize_t tracing_free_buffer_write(struct file *filp, const char __user *ubuf, size_t cnt, loff_t *ppos) @@ -7436,6 +7452,13 @@ static const struct file_operations tracing_entries_fops = { .release= tracing_release_generic_tr, }; +static const struct file_operations tracing_buffer_meta_fops = { + .open
[PATCH v3 08/13] tracing: Add option to use memmapped memory for trace boot instance
From: "Steven Rostedt (Google)" Add an option to the trace_instance kernel command line parameter that allows it to use the reserved memory from memmap boot parameter. memmap=12M$0x28450 trace_instance=boot_mapped@0x28450:12M The above will reserves 12 megs at the physical address 0x28450. The second parameter will create a "boot_mapped" instance and use the memory reserved as the memory for the ring buffer. That will create an instance called "boot_mapped": /sys/kernel/tracing/instances/boot_mapped Note, because the ring buffer is using a defined memory ranged, it will act just like a memory mapped ring buffer. It will not have a snapshot buffer, as it can't swap out the buffer. The snapshot files as well as any tracers that uses a snapshot will not be present in the boot_mapped instance. Cc: linux...@kvack.org Signed-off-by: Steven Rostedt (Google) --- .../admin-guide/kernel-parameters.txt | 9 +++ kernel/trace/trace.c | 75 +-- 2 files changed, 78 insertions(+), 6 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index b600df82669d..ff26b6094e79 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -6754,6 +6754,15 @@ the same thing would happen if it was left off). The irq_handler_entry event, and all events under the "initcall" system. + If memory has been reserved (see memmap for x86), the instance + can use that memory: + + memmap=12M$0x28450 trace_instance=boot_map@0x28450:12M + + The above will create a "boot_map" instance that uses the physical + memory at 0x28450 that is 12Megs. The per CPU buffers of that + instance will be split up accordingly. + trace_options=[option-list] [FTRACE] Enable or disable tracer options at boot. The option-list is a comma delimited list of options diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 622fe670949d..13e89023f33b 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -9504,6 +9504,31 @@ static int instance_mkdir(const char *name) return ret; } +static u64 map_pages(u64 start, u64 size) +{ + struct page **pages; + phys_addr_t page_start; + unsigned int page_count; + unsigned int i; + void *vaddr; + + page_count = DIV_ROUND_UP(size, PAGE_SIZE); + + page_start = start; + pages = kmalloc_array(page_count, sizeof(struct page *), GFP_KERNEL); + if (!pages) + return 0; + + for (i = 0; i < page_count; i++) { + phys_addr_t addr = page_start + i * PAGE_SIZE; + pages[i] = pfn_to_page(addr >> PAGE_SHIFT); + } + vaddr = vmap(pages, page_count, VM_MAP, PAGE_KERNEL); + kfree(pages); + + return (u64)(unsigned long)vaddr; +} + /** * trace_array_get_by_name - Create/Lookup a trace array, given its name. * @name: The name of the trace array to be looked up/created. @@ -10350,6 +10375,7 @@ __init static void enable_instances(void) { struct trace_array *tr; char *curr_str; + char *name; char *str; char *tok; @@ -10358,19 +10384,56 @@ __init static void enable_instances(void) str = boot_instance_info; while ((curr_str = strsep(&str, "\t"))) { + unsigned long start = 0; + unsigned long size = 0; + unsigned long addr = 0; tok = strsep(&curr_str, ","); + name = strsep(&tok, "@"); + if (tok) { + start = memparse(tok, &tok); + if (!start) { + pr_warn("Tracing: Invalid boot instance address for %s\n", + name); + continue; + } + } - if (IS_ENABLED(CONFIG_TRACER_MAX_TRACE)) - do_allocate_snapshot(tok); + if (start) { + if (*tok != ':') { + pr_warn("Tracing: No size specified for instance %s\n", name); + continue; + } + tok++; + size = memparse(tok, &tok); + if (!size) { + pr_warn("Tracing: Invalid boot instance size for %s\n", + name); + continue; + } + addr = map_pages(start, size); + if (addr) { + pr_info("Tracing: mapped
[PATCH v3 07/13] ring-buffer: Validate boot range memory events
From: "Steven Rostedt (Google)" Make sure all the events in each of the sub-buffers that were mapped in a memory region are valid. This moves the code that walks the buffers for time-stamp validation out of the CONFIG_RING_BUFFER_VALIDATE_TIME_DELTAS ifdef block and is used to validate the content. Only the ring buffer event meta data and time stamps are checked and not the data load. This also has a second purpose. The buffer_page structure that points to the data sub-buffers has accounting that keeps track of the number of events that are on the sub-buffer. This updates that counter as well. That counter is used in reading the buffer and knowing if the ring buffer is empty or not. Signed-off-by: Steven Rostedt (Google) --- kernel/trace/ring_buffer.c | 188 + 1 file changed, 151 insertions(+), 37 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index aecd4a7d62be..1e959ff71855 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -1674,10 +1674,151 @@ static bool rb_meta_valid(struct ring_buffer_meta *meta, int cpu, subbuf = (void *)subbuf + subbuf_size; } - pr_info("Ring buffer meta is from previous boot!\n"); return true; } +static int rb_meta_subbuf_idx(struct ring_buffer_meta *meta, void *subbuf); + +static int rb_read_data_buffer(struct buffer_data_page *dpage, int tail, int cpu, + unsigned long long *timestamp, u64 *delta_ptr) +{ + struct ring_buffer_event *event; + u64 ts, delta; + int events = 0; + int e; + + *delta_ptr = 0; + *timestamp = 0; + + ts = dpage->time_stamp; + + for (e = 0; e < tail; e += rb_event_length(event)) { + + event = (struct ring_buffer_event *)(dpage->data + e); + + switch (event->type_len) { + + case RINGBUF_TYPE_TIME_EXTEND: + delta = rb_event_time_stamp(event); + ts += delta; + break; + + case RINGBUF_TYPE_TIME_STAMP: + delta = rb_event_time_stamp(event); + if (delta < ts) { + *delta_ptr = delta; + *timestamp = ts; + return -1; + } + ts = delta; + break; + + case RINGBUF_TYPE_PADDING: + if (event->time_delta == 1) + break; + fallthrough; + case RINGBUF_TYPE_DATA: + events++; + ts += event->time_delta; + break; + + default: + return -1; + } + } + *timestamp = ts; + return events; +} + +static int rb_validate_buffer(struct buffer_data_page *dpage, int cpu) +{ + unsigned long long ts; + u64 delta; + int tail; + + tail = local_read(&dpage->commit); + return rb_read_data_buffer(dpage, tail, cpu, &ts, &delta); +} + +/* If the meta data has been validated, now validate the events */ +static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer) +{ + struct ring_buffer_meta *meta = cpu_buffer->ring_meta; + struct buffer_page *head_page; + unsigned long entry_bytes = 0; + unsigned long entries = 0; + int ret; + int i; + + if (!meta || !meta->head_buffer) + return; + + /* Do the reader page first */ + ret = rb_validate_buffer(cpu_buffer->reader_page->page, cpu_buffer->cpu); + if (ret < 0) { + pr_info("Ring buffer reader page is invalid\n"); + goto invalid; + } + entries += ret; + entry_bytes += local_read(&cpu_buffer->reader_page->page->commit); + local_set(&cpu_buffer->reader_page->entries, ret); + + head_page = cpu_buffer->head_page; + + /* If both the head and commit are on the reader_page then we are done. */ + if (head_page == cpu_buffer->reader_page && + head_page == cpu_buffer->commit_page) + goto done; + + /* Iterate until finding the commit page */ + for (i = 0; i < meta->nr_subbufs + 1; i++, rb_inc_page(&head_page)) { + + /* Reader page has already been done */ + if (head_page == cpu_buffer->reader_page) + continue; + + ret = rb_validate_buffer(head_page->page, cpu_buffer->cpu); + if (ret < 0) { + pr_info("Ring buffer meta [%d] invalid buffer page\n", + cpu_buffer->cpu); + goto invalid; + } + entries += ret; + entry_bytes += local_read(&head_page->page->commit); + local_set(&cpu_buffe
[PATCH v3 12/13] tracing: Update function tracing output for previous boot buffer
From: "Steven Rostedt (Google)" For a persistent ring buffer that is saved across boots, if function tracing was performed in the previous boot, it only saves the address of the functions and uses "%pS" to print their names. But the current boot, those functions may be in different locations. The persistent meta-data saves the text delta between the two boots and can be used to find the address of the saved function of where it is located in the current boot. Signed-off-by: Steven Rostedt (Google) --- kernel/trace/trace_output.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c index d8b302d01083..b9d2c64c0648 100644 --- a/kernel/trace/trace_output.c +++ b/kernel/trace/trace_output.c @@ -990,8 +990,11 @@ enum print_line_t trace_nop_print(struct trace_iterator *iter, int flags, } static void print_fn_trace(struct trace_seq *s, unsigned long ip, - unsigned long parent_ip, int flags) + unsigned long parent_ip, long delta, int flags) { + ip += delta; + parent_ip += delta; + seq_print_ip_sym(s, ip, flags); if ((flags & TRACE_ITER_PRINT_PARENT) && parent_ip) { @@ -1009,7 +1012,7 @@ static enum print_line_t trace_fn_trace(struct trace_iterator *iter, int flags, trace_assign_type(field, iter->ent); - print_fn_trace(s, field->ip, field->parent_ip, flags); + print_fn_trace(s, field->ip, field->parent_ip, iter->tr->text_delta, flags); trace_seq_putc(s, '\n'); return trace_handle_return(s); @@ -1674,7 +1677,7 @@ trace_func_repeats_print(struct trace_iterator *iter, int flags, trace_assign_type(field, iter->ent); - print_fn_trace(s, field->ip, field->parent_ip, flags); + print_fn_trace(s, field->ip, field->parent_ip, iter->tr->text_delta, flags); trace_seq_printf(s, " (repeats: %u, last_ts:", field->count); trace_print_time(s, iter, iter->ts - FUNC_REPEATS_GET_DELTA_TS(field)); -- 2.43.0
[PATCH v3 11/13] tracing: Handle old buffer mappings for event strings and functions
From: "Steven Rostedt (Google)" Use the saved text_delta and data_delta of a persistent memory mapped ring buffer that was saved from a previous boot, and use the delta in the trace event print output so that strings and functions show up normally. That is, for an event like trace_kmalloc() that prints the callsite via "%pS", if it used the address saved in the ring buffer it will not match the function that was saved in the previous boot if the kernel remaps itself between boots. For RCU events that point to saved static strings where only the address of the string is saved in the ring buffer, it too will be adjusted to point to where the string is on the current boot. Signed-off-by: Steven Rostedt (Google) --- kernel/trace/trace.c | 42 +++--- 1 file changed, 39 insertions(+), 3 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 7dbb20b31ae5..b641f6f1db6a 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -3671,8 +3671,11 @@ static void test_can_verify(void) void trace_check_vprintf(struct trace_iterator *iter, const char *fmt, va_list ap) { + long text_delta = iter->tr->text_delta; + long data_delta = iter->tr->data_delta; const char *p = fmt; const char *str; + bool good; int i, j; if (WARN_ON_ONCE(!fmt)) @@ -3691,7 +3694,10 @@ void trace_check_vprintf(struct trace_iterator *iter, const char *fmt, j = 0; - /* We only care about %s and variants */ + /* +* We only care about %s and variants +* as well as %p[sS] if delta is non-zero +*/ for (i = 0; p[i]; i++) { if (i + 1 >= iter->fmt_size) { /* @@ -3720,6 +3726,11 @@ void trace_check_vprintf(struct trace_iterator *iter, const char *fmt, } if (p[i+j] == 's') break; + + if (text_delta && p[i+1] == 'p' && + ((p[i+2] == 's' || p[i+2] == 'S'))) + break; + star = false; } j = 0; @@ -3733,6 +3744,24 @@ void trace_check_vprintf(struct trace_iterator *iter, const char *fmt, iter->fmt[i] = '\0'; trace_seq_vprintf(&iter->seq, iter->fmt, ap); + /* Add delta to %pS pointers */ + if (p[i+1] == 'p') { + unsigned long addr; + char fmt[4]; + + fmt[0] = '%'; + fmt[1] = 'p'; + fmt[2] = p[i+2]; /* Either %ps or %pS */ + fmt[3] = '\0'; + + addr = va_arg(ap, unsigned long); + addr += text_delta; + trace_seq_printf(&iter->seq, fmt, (void *)addr); + + p += i + 3; + continue; + } + /* * If iter->seq is full, the above call no longer guarantees * that ap is in sync with fmt processing, and further calls @@ -3751,6 +3780,14 @@ void trace_check_vprintf(struct trace_iterator *iter, const char *fmt, /* The ap now points to the string data of the %s */ str = va_arg(ap, const char *); + good = trace_safe_str(iter, str, star, len); + + /* Could be from the last boot */ + if (data_delta && !good) { + str += data_delta; + good = trace_safe_str(iter, str, star, len); + } + /* * If you hit this warning, it is likely that the * trace event in question used %s on a string that @@ -3760,8 +3797,7 @@ void trace_check_vprintf(struct trace_iterator *iter, const char *fmt, * instead. See samples/trace_events/trace-events-sample.h * for reference. */ - if (WARN_ONCE(!trace_safe_str(iter, str, star, len), - "fmt: '%s' current_buffer: '%s'", + if (WARN_ONCE(!good, "fmt: '%s' current_buffer: '%s'", fmt, seq_buf_str(&iter->seq.seq))) { int ret; -- 2.43.0
[PATCH v3 09/13] ring-buffer: Save text and data locations in mapped meta data
From: "Steven Rostedt (Google)" When a ring buffer is mapped to a specific address, save the address of a text function and some data. This will be used to determine the delta between the last boot and the current boot for pointers to functions as well as to data. Signed-off-by: Steven Rostedt (Google) --- kernel/trace/ring_buffer.c | 16 1 file changed, 16 insertions(+) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 1e959ff71855..755ef278276b 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -45,6 +45,8 @@ static void update_pages_handler(struct work_struct *work); struct ring_buffer_meta { + unsigned long text_addr; + unsigned long data_addr; unsigned long first_buffer; unsigned long head_buffer; unsigned long commit_buffer; @@ -541,6 +543,9 @@ struct trace_buffer { unsigned long range_addr_start; unsigned long range_addr_end; + longlast_text_delta; + longlast_data_delta; + unsigned intsubbuf_size; unsigned intsubbuf_order; unsigned intmax_data_size; @@ -1819,10 +1824,15 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer) } } +/* Used to calculate data delta */ +static char rb_data_ptr[] = ""; + static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages) { struct ring_buffer_meta *meta; unsigned long delta; + unsigned long this_text = (unsigned long)rb_range_meta_init; + unsigned long this_data = (unsigned long)rb_data_ptr; void *subbuf; int cpu; int i; @@ -1839,6 +1849,10 @@ static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages) meta->first_buffer += delta; meta->head_buffer += delta; meta->commit_buffer += delta; + buffer->last_text_delta = this_text - meta->text_addr; + buffer->last_data_delta = this_data - meta->data_addr; + meta->text_addr = this_text; + meta->data_addr = this_data; continue; } @@ -1855,6 +1869,8 @@ static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages) subbuf = rb_subbufs_from_meta(meta); meta->first_buffer = (unsigned long)subbuf; + meta->text_addr = this_text; + meta->data_addr = this_data; /* * The buffers[] array holds the order of the sub-buffers -- 2.43.0
[PATCH v3 13/13] tracing: Add last boot delta offset for stack traces
From: "Steven Rostedt (Google)" The addresses of a stack trace event are relative to the kallsyms. As that can change between boots, when printing the stack trace from a buffer that was from the last boot, it needs all the addresses to be added to the "text_delta" that gives the delta between the addresses of the functions for the current boot compared to the address of the last boot. Then it can be passed to kallsyms to find the function name, otherwise it just shows a useless list of addresses. Signed-off-by: Steven Rostedt (Google) --- kernel/trace/trace_output.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c index b9d2c64c0648..48de93598897 100644 --- a/kernel/trace/trace_output.c +++ b/kernel/trace/trace_output.c @@ -1233,6 +1233,7 @@ static enum print_line_t trace_stack_print(struct trace_iterator *iter, struct trace_seq *s = &iter->seq; unsigned long *p; unsigned long *end; + long delta = iter->tr->text_delta; trace_assign_type(field, iter->ent); end = (unsigned long *)((long)iter->ent + iter->ent_size); @@ -1245,7 +1246,7 @@ static enum print_line_t trace_stack_print(struct trace_iterator *iter, break; trace_seq_puts(s, " => "); - seq_print_ip_sym(s, *p, flags); + seq_print_ip_sym(s, (*p) + delta, flags); trace_seq_putc(s, '\n'); } -- 2.43.0
[PATCH v3 10/13] tracing/ring-buffer: Add last_boot_info file to boot instance
From: "Steven Rostedt (Google)" If an instance is mapped to memory on boot up, create a new file called "last_boot_info" that will hold information that can be used to properly parse the raw data in the ring buffer. It will export the delta of the addresses for text and data from what it was from the last boot. It does not expose actually addresses (unless you knew what the actual address was from the last boot). The output will look like: # cat last_boot_info text delta:-268435456 data delta:-268435456 The text and data are kept separate in case they are ever made different. Signed-off-by: Steven Rostedt (Google) --- include/linux/ring_buffer.h | 3 +++ kernel/trace/ring_buffer.c | 23 ++ kernel/trace/trace.c| 47 - kernel/trace/trace.h| 2 ++ 4 files changed, 74 insertions(+), 1 deletion(-) diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h index a50b0223b1d3..55de3798a9b9 100644 --- a/include/linux/ring_buffer.h +++ b/include/linux/ring_buffer.h @@ -94,6 +94,9 @@ struct trace_buffer *__ring_buffer_alloc_range(unsigned long size, unsigned flag unsigned long range_size, struct lock_class_key *key); +bool ring_buffer_last_boot_delta(struct trace_buffer *buffer, long *text, +long *data); + /* * Because the ring buffer is generic, if other users of the ring buffer get * traced by ftrace, it can produce lockdep warnings. We need to keep each diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 755ef278276b..3a927c6ddd14 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -2394,6 +2394,29 @@ struct trace_buffer *__ring_buffer_alloc_range(unsigned long size, unsigned flag return alloc_buffer(size, flags, order, start, start + range_size, key); } +/** + * ring_buffer_last_boot_delta - return the delta offset from last boot + * @buffer: The buffer to return the delta from + * @text: Return text delta + * @data: Return data delta + * + * Returns: The true if the delta is non zero + */ +bool ring_buffer_last_boot_delta(struct trace_buffer *buffer, long *text, +long *data) +{ + if (!buffer) + return false; + + if (!buffer->last_text_delta) + return false; + + *text = buffer->last_text_delta; + *data = buffer->last_data_delta; + + return true; +} + /** * ring_buffer_free - free a ring buffer. * @buffer: the buffer to free. diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 13e89023f33b..7dbb20b31ae5 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -6041,6 +6041,18 @@ ssize_t tracing_resize_ring_buffer(struct trace_array *tr, return ret; } +static void update_last_data(struct trace_array *tr) +{ + if (!tr->text_delta && !tr->data_delta) + return; + + /* Clear old data */ + tracing_reset_online_cpus(&tr->array_buffer); + + /* Using current data now */ + tr->text_delta = 0; + tr->data_delta = 0; +} /** * tracing_update_buffers - used by tracing facility to expand ring buffers @@ -6058,6 +6070,9 @@ int tracing_update_buffers(struct trace_array *tr) int ret = 0; mutex_lock(&trace_types_lock); + + update_last_data(tr); + if (!tr->ring_buffer_expanded) ret = __tracing_resize_ring_buffer(tr, trace_buf_size, RING_BUFFER_ALL_CPUS); @@ -6113,6 +6128,8 @@ int tracing_set_tracer(struct trace_array *tr, const char *buf) mutex_lock(&trace_types_lock); + update_last_data(tr); + if (!tr->ring_buffer_expanded) { ret = __tracing_resize_ring_buffer(tr, trace_buf_size, RING_BUFFER_ALL_CPUS); @@ -6860,6 +6877,21 @@ tracing_total_entries_read(struct file *filp, char __user *ubuf, return simple_read_from_buffer(ubuf, cnt, ppos, buf, r); } +static ssize_t +tracing_last_boot_read(struct file *filp, char __user *ubuf, size_t cnt, loff_t *ppos) +{ + struct trace_array *tr = filp->private_data; + struct seq_buf seq; + char buf[64]; + + seq_buf_init(&seq, buf, 64); + + seq_buf_printf(&seq, "text delta:\t%ld\n", tr->text_delta); + seq_buf_printf(&seq, "data delta:\t%ld\n", tr->data_delta); + + return simple_read_from_buffer(ubuf, cnt, ppos, buf, seq_buf_used(&seq)); +} + static int tracing_buffer_meta_open(struct inode *inode, struct file *filp) { struct trace_array *tr = inode->i_private; @@ -7499,6 +7531,13 @@ static const struct file_operations trace_time_stamp_mode_fops = { .release= tracing_single_release_tr, }; +static const struct file_operations last_boot_fops = { + .open
Re: [PATCH v3 08/13] tracing: Add option to use memmapped memory for trace boot instance
Memory management folks. Please review this patch. Specifically the "map_pages()" function below. On Thu, 06 Jun 2024 17:17:43 -0400 Steven Rostedt wrote: > From: "Steven Rostedt (Google)" > > Add an option to the trace_instance kernel command line parameter that > allows it to use the reserved memory from memmap boot parameter. > > memmap=12M$0x28450 trace_instance=boot_mapped@0x28450:12M > > The above will reserves 12 megs at the physical address 0x28450. > The second parameter will create a "boot_mapped" instance and use the > memory reserved as the memory for the ring buffer. > > That will create an instance called "boot_mapped": > > /sys/kernel/tracing/instances/boot_mapped > > Note, because the ring buffer is using a defined memory ranged, it will > act just like a memory mapped ring buffer. It will not have a snapshot > buffer, as it can't swap out the buffer. The snapshot files as well as any > tracers that uses a snapshot will not be present in the boot_mapped > instance. > > Cc: linux...@kvack.org > Signed-off-by: Steven Rostedt (Google) > --- > .../admin-guide/kernel-parameters.txt | 9 +++ > kernel/trace/trace.c | 75 +-- > 2 files changed, 78 insertions(+), 6 deletions(-) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt > b/Documentation/admin-guide/kernel-parameters.txt > index b600df82669d..ff26b6094e79 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -6754,6 +6754,15 @@ > the same thing would happen if it was left off). The > irq_handler_entry > event, and all events under the "initcall" system. > > + If memory has been reserved (see memmap for x86), the > instance > + can use that memory: > + > + memmap=12M$0x28450 > trace_instance=boot_map@0x28450:12M > + > + The above will create a "boot_map" instance that uses > the physical > + memory at 0x28450 that is 12Megs. The per CPU > buffers of that > + instance will be split up accordingly. > + > trace_options=[option-list] > [FTRACE] Enable or disable tracer options at boot. > The option-list is a comma delimited list of options > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c > index 622fe670949d..13e89023f33b 100644 > --- a/kernel/trace/trace.c > +++ b/kernel/trace/trace.c > @@ -9504,6 +9504,31 @@ static int instance_mkdir(const char *name) > return ret; > } > > +static u64 map_pages(u64 start, u64 size) > +{ > + struct page **pages; > + phys_addr_t page_start; > + unsigned int page_count; > + unsigned int i; > + void *vaddr; > + > + page_count = DIV_ROUND_UP(size, PAGE_SIZE); > + > + page_start = start; > + pages = kmalloc_array(page_count, sizeof(struct page *), GFP_KERNEL); > + if (!pages) > + return 0; > + > + for (i = 0; i < page_count; i++) { > + phys_addr_t addr = page_start + i * PAGE_SIZE; > + pages[i] = pfn_to_page(addr >> PAGE_SHIFT); > + } > + vaddr = vmap(pages, page_count, VM_MAP, PAGE_KERNEL); > + kfree(pages); > + > + return (u64)(unsigned long)vaddr; > +} If for some reason the memmap=nn$ss fails, but this still gets called, will the above just map over any memory. That is, is it possible that the kernel could have used this memory? Is there a way to detect this? That is, I don't want this to succeed if the memory location it's about to map to is used by the kernel, or will be used by user space. -- Steve > + > /** > * trace_array_get_by_name - Create/Lookup a trace array, given its name. > * @name: The name of the trace array to be looked up/created. > @@ -10350,6 +10375,7 @@ __init static void enable_instances(void) > { > struct trace_array *tr; > char *curr_str; > + char *name; > char *str; > char *tok; > > @@ -10358,19 +10384,56 @@ __init static void enable_instances(void) > str = boot_instance_info; > > while ((curr_str = strsep(&str, "\t"))) { > + unsigned long start = 0; > + unsigned long size = 0; > + unsigned long addr = 0; > > tok = strsep(&curr_str, ","); > + name = strsep(&tok, "@"); > + if (tok) { > + start = memparse(tok, &tok); > + if (!start) { > + pr_warn("Tracing: Invalid boot instance address > for %s\n", > + name); > + continue; > + } > + } > > - if (IS_ENABLED(CONFIG_TRACER_MAX_TRACE)) > - do_allocate_snapshot(tok); > + if (start) { > +
Re: [PATCH 3/3] rust: add tracepoint support
On Thu, Jun 06, 2024 at 09:29:51PM +0200, Peter Zijlstra wrote: > On Thu, Jun 06, 2024 at 12:00:36PM -0700, Boqun Feng wrote: > > On Thu, Jun 06, 2024 at 07:35:44PM +0200, Peter Zijlstra wrote: > > > On Thu, Jun 06, 2024 at 08:49:06AM -0700, Boqun Feng wrote: > > > > > > > Long-term plan is to 1) compile the C helpers in some IR and 2) inline > > > > the helpers with Rust in IR-level, as what Gary has: > > > > > > > > > > > > https://lore.kernel.org/rust-for-linux/20240529202817.3641974-1-g...@garyguo.net/ > > > > > > Urgh, that still needs us to maintain that silly list of helpers :-/ > > > > > > > But it's an improvement from the current stage, right? ;-) > > Somewhat, but only marginal. > > > > Can't we pretty please have clang parse the actual header files into IR > > > and munge that into rust? So that we don't get to manually duplicate > > > everything+dog. > > > > That won't always work, because some of our kernel APIs are defined as > > macros, and I don't think it's a trivial job to generate a macro > > definition to a function definition so that it can be translated to > > something in IR. We will have to do the macro -> function mapping > > ourselves somewhere, if we want to inline the API across languages. > > We can try and see how far we can get with moving a bunch of stuff into > inlines. There's quite a bit of simple CPP that could be inlines or > const objects I suppose. > We can, but I'd first stick with what we have, improve it and make it stable until we go to the next stage. Plus, there's benefit of keeping an explicit helper list: it's clear what APIs are called by Rust, and moreover, it's easier to modify the helpers if you were to change an API, other than chasing where Rust code calls it. (Don't make me wrong, I'm happy if you want to do that ;-)) Regards, Boqun > Things like the tracepoints are of course glorious CPP abuse and are > never going to work. > > But perhaps you can have an explicit 'eval-CPP on this here' construct > or whatnot. If I squit I see this paste! thingy (WTF's up with that ! > operator?) to munge function names in the static_call thing. So > something like apply CPP from over there on this here can also be done > :-)
Re: [PATCH v14 02/14] cgroup/misc: Add per resource callbacks for CSS events
I think it is better when _misc_cg_res_alloc fails, it just calls _misc_cg_res_free(cg, index)(add index parameter, it means ending of iterator), so it can avoid calling ->free() that do not call ->alloc(). And in misc_cg_free, just call _misc_cg_res_free(cg, MISC_CG_RES_TYPES) to free all. Thanks Ridong On 2024/6/6 22:51, Haitao Huang wrote: On Thu, 06 Jun 2024 08:37:31 -0500, chenridong wrote: If _misc_cg_res_alloc fails, maybe some types do not call ->alloc(), but all types ->free() callback >will be called, is that ok? Not sure I understand. Are you suggesting we ignore failures from ->alloc() callback in _misc_cg_res_alloc() as it is per-resource, and have ->free() callback and resource provider of the failing type to handle the failure internally? IIUC, this failure only happens when a specific subcgroup is created (memory running out for allocation) so failing that subcgroup as a whole seems fine to me. Note the root node is static and no pre-resource callbacks invoked by misc. And resource provider handles its own allocations for root. In SGX case we too declare a static object for corresponding root sgx_cgroup struct. Note also misc cgroup (except for setting capacity[res] = 0 at root) is all or nothing so no mechanism to tell user "this resource does not work but others are fine in this particular cgroup." Thanks Haitao
Re: [PATCH] nvdimm: Fix devs leaks in scan_labels()
On 07/06/2024 00:27, Dave Jiang wrote: > > > On 6/3/24 8:16 PM, Li Zhijian wrote: >> Don't allocate devs again when it's valid pointer which has pionted to >> the memory allocated above with size (count + 2 * sizeof(dev)). >> >> A kmemleak reports: >> unreferenced object 0x88800dda1980 (size 16): >>comm "kworker/u10:5", pid 69, jiffies 4294671781 >>hex dump (first 16 bytes): >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >>backtrace (crc 0): >> [] __kmalloc+0x32c/0x470 >> [<9ed43c83>] nd_region_register_namespaces+0x6fb/0x1120 >> [libnvdimm] >> [<0e07a65c>] nd_region_probe+0xfe/0x210 [libnvdimm] >> [<7b79ce5f>] nvdimm_bus_probe+0x7a/0x1e0 [libnvdimm] >> [ ] really_probe+0xc6/0x390 >> [<129e2a69>] __driver_probe_device+0x78/0x150 >> [<2dfed28b>] driver_probe_device+0x1e/0x90 >> [ ] __device_attach_driver+0x85/0x110 >> [<32dca295>] bus_for_each_drv+0x85/0xe0 >> [<391c5a7d>] __device_attach+0xbe/0x1e0 >> [<26dabec0>] bus_probe_device+0x94/0xb0 >> [ ] device_add+0x656/0x870 >> [<3d69bfaa>] nd_async_device_register+0xe/0x50 [libnvdimm] >> [<3f4c52a4>] async_run_entry_fn+0x2e/0x110 >> [ ] process_one_work+0x1ee/0x600 >> [<6d90d5a9>] worker_thread+0x183/0x350 >> >> Fixes: 1b40e09a1232 ("libnvdimm: blk labels and namespace instantiation") >> Signed-off-by: Li Zhijian >> --- >> drivers/nvdimm/namespace_devs.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/nvdimm/namespace_devs.c >> b/drivers/nvdimm/namespace_devs.c >> index d6d558f94d6b..56b016dbe307 100644 >> --- a/drivers/nvdimm/namespace_devs.c >> +++ b/drivers/nvdimm/namespace_devs.c >> @@ -1994,7 +1994,9 @@ static struct device **scan_labels(struct nd_region >> *nd_region) >> /* Publish a zero-sized namespace for userspace to configure. */ >> nd_mapping_free_labels(nd_mapping); >> >> -devs = kcalloc(2, sizeof(dev), GFP_KERNEL); >> +/* devs probably has been allocated */ >> +if (!devs) >> +devs = kcalloc(2, sizeof(dev), GFP_KERNEL); > > This changes the behavior of this code and possibly corrupting the previously > allocated memory at times when 'devs' is valid. If we reach here, that means count == 0 is true. so we can deduce that the previously allocated memory was not touched at all in previous loop. and its size was (2 * sizeof(dev)) too. Was the 'devs' leaked out from the previous loop and should be freed instead? this option is also fine to me. we can also fix this by free(devs) before allocate it again.: + kfree(devs); // kfree(NULL) is safe. devs = kcalloc(2, sizeof(dev), GFP_KERNEL); Thanks Zhijian > >> if (!devs) >> goto err; >> >
Re: [PATCH] livepatch: introduce klp_func called interface
> On Jun 6, 2024, at 23:01, Joe Lawrence wrote: > > Hi Wardenjohn, > > To follow up, Red Hat kpatch QE pointed me toward this test: > > https://gitlab.com/redhat/centos-stream/tests/kernel/kernel-tests/-/tree/main/general/kpatch/kpatch-trace?ref_type=heads > > which reports a few interesting things via systemd service and ftrace: > > - Patched functions > - Traced functions > - Code that was modified, but didn't run > - Other code that ftrace saw, but is not listed in the sysfs directory > > The code looks to be built on top of the same basic ftrace commands that > I sent the other day. You may be able to reuse/copy/etc bits from the > scripts if they are handy. > > -- > Joe > Thank you so much, you really helped me, Joe! I will try to learn the script and make it suitable for our system. Again, thank you, Joe! Regards, Wardenjohn
Re: [PATCH] nvdimm: Fix devs leaks in scan_labels()
On 07/06/2024 00:49, Ira Weiny wrote: > Li Zhijian wrote: >> Don't allocate devs again when it's valid pointer which has pionted to >> the memory allocated above with size (count + 2 * sizeof(dev)). >> >> A kmemleak reports: >> unreferenced object 0x88800dda1980 (size 16): >>comm "kworker/u10:5", pid 69, jiffies 4294671781 >>hex dump (first 16 bytes): >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >>backtrace (crc 0): >> [] __kmalloc+0x32c/0x470 >> [<9ed43c83>] nd_region_register_namespaces+0x6fb/0x1120 >> [libnvdimm] >> [<0e07a65c>] nd_region_probe+0xfe/0x210 [libnvdimm] >> [<7b79ce5f>] nvdimm_bus_probe+0x7a/0x1e0 [libnvdimm] >> [ ] really_probe+0xc6/0x390 >> [<129e2a69>] __driver_probe_device+0x78/0x150 >> [<2dfed28b>] driver_probe_device+0x1e/0x90 >> [ ] __device_attach_driver+0x85/0x110 >> [<32dca295>] bus_for_each_drv+0x85/0xe0 >> [<391c5a7d>] __device_attach+0xbe/0x1e0 >> [<26dabec0>] bus_probe_device+0x94/0xb0 >> [ ] device_add+0x656/0x870 >> [<3d69bfaa>] nd_async_device_register+0xe/0x50 [libnvdimm] >> [<3f4c52a4>] async_run_entry_fn+0x2e/0x110 >> [ ] process_one_work+0x1ee/0x600 >> [<6d90d5a9>] worker_thread+0x183/0x350 >> >> Fixes: 1b40e09a1232 ("libnvdimm: blk labels and namespace instantiation") >> Signed-off-by: Li Zhijian >> --- >> drivers/nvdimm/namespace_devs.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/nvdimm/namespace_devs.c >> b/drivers/nvdimm/namespace_devs.c >> index d6d558f94d6b..56b016dbe307 100644 >> --- a/drivers/nvdimm/namespace_devs.c >> +++ b/drivers/nvdimm/namespace_devs.c >> @@ -1994,7 +1994,9 @@ static struct device **scan_labels(struct nd_region >> *nd_region) >> /* Publish a zero-sized namespace for userspace to configure. */ >> nd_mapping_free_labels(nd_mapping); >> >> -devs = kcalloc(2, sizeof(dev), GFP_KERNEL); >> +/* devs probably has been allocated */ > > I don't think this is where the bug is. The loop above is processing the > known labels and should exit with a count > 0 if devs is not NULL. > > From what I can tell create_namespace_pmem() must be returning EAGAIN > which leaves devs allocated but fails to increment count. Thus there are > no valid labels but devs was not free'ed. Per the piece of the code, return EAGAIN and ENODEV could cause this issue in theory. 1980 dev = create_namespace_pmem(nd_region, nd_mapping, nd_label); 1981 if (IS_ERR(dev)) { 1982 switch (PTR_ERR(dev)) { 1983 case -EAGAIN: 1984 /* skip invalid labels */ 1985 continue; 1986 case -ENODEV: 1987 /* fallthrough to seed creation */ 1988 break; 1989 default: 1990 goto err; 1991 } 1992 } else 1993 devs[count++] = dev; > > Can you trace the error you are seeing a bit more to see if this is the > case? I just tried, but I cannot reproduce this leaking again. When it happened(100% reproduce at that time), I probably had a corrupted LSA(I see empty output with command 'ndctl list'). It seemed the QEMU emulated Nvdimm device was broken for some reasons. Thanks Zhijian > > Thanks, > Ira > >> +if (!devs) >> +devs = kcalloc(2, sizeof(dev), GFP_KERNEL); >> if (!devs) >> goto err; >> >> -- >> 2.29.2 >> > >
Re: [PATCH 3/5] arm64: dts: qcom: sdx75: add missing qlink_logging reserved memory for mpss
On 6/6/2024 8:20 PM, Krzysztof Kozlowski wrote: On 06/06/2024 16:38, Naina Mehta wrote: The qlink_logging memory region is also used by the modem firmware, add it to reserved memory regions. Also split MPSS DSM region into 2 separate regions. Signed-off-by: Naina Mehta --- arch/arm64/boot/dts/qcom/sdx75.dtsi | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/arch/arm64/boot/dts/qcom/sdx75.dtsi b/arch/arm64/boot/dts/qcom/sdx75.dtsi index 9b93f6501d55..9349b1c4e196 100644 --- a/arch/arm64/boot/dts/qcom/sdx75.dtsi +++ b/arch/arm64/boot/dts/qcom/sdx75.dtsi @@ -366,7 +366,12 @@ no-map; }; - qdss_mem: qdss@8880 { + qdss_mem: qdss@8850 { + reg = <0x0 0x8850 0x0 0x30>; + no-map; + }; + + qlink_logging_mem: qlink_logging@8880 { Sorry, no downstream code. Please follow DTS coding style - no underscores in node names. This applies to all work sent upstream. Thanks for pointing this out. I will update in next revision. Regards, Naina Best regards, Krzysztof
Re: [PATCH] net: missing check
Hi Denis, kernel test robot noticed the following build warnings: [auto build test WARNING on linus/master] [also build test WARNING on v6.10-rc2 next-20240607] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Denis-Arefev/net-missing-check/20240606-230540 base: linus/master patch link: https://lore.kernel.org/r/20240606141450.44709-1-arefev%40swemel.ru patch subject: [PATCH] net: missing check config: x86_64-randconfig-121-20240607 (https://download.01.org/0day-ci/archive/20240607/202406071404.oilhfohm-...@intel.com/config) compiler: gcc-13 (Ubuntu 13.2.0-4ubuntu3) 13.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240607/202406071404.oilhfohm-...@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot | Closes: https://lore.kernel.org/oe-kbuild-all/202406071404.oilhfohm-...@intel.com/ sparse warnings: (new ones prefixed by >>) drivers/net/virtio_net.c: note: in included file: >> include/linux/virtio_net.h:103:58: sparse: sparse: incorrect type in >> argument 2 (different base types) @@ expected unsigned long long >> [usertype] divisor @@ got restricted __virtio16 const [usertype] >> gso_size @@ include/linux/virtio_net.h:103:58: sparse: expected unsigned long long [usertype] divisor include/linux/virtio_net.h:103:58: sparse: got restricted __virtio16 const [usertype] gso_size >> include/linux/virtio_net.h:104:42: sparse: sparse: restricted __virtio16 >> degrades to integer vim +103 include/linux/virtio_net.h 49 50 static inline int virtio_net_hdr_to_skb(struct sk_buff *skb, 51 const struct virtio_net_hdr *hdr, 52 bool little_endian) 53 { 54 unsigned int nh_min_len = sizeof(struct iphdr); 55 unsigned int gso_type = 0; 56 unsigned int thlen = 0; 57 unsigned int p_off = 0; 58 unsigned int ip_proto; 59 u64 ret, remainder; 60 61 if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) { 62 switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) { 63 case VIRTIO_NET_HDR_GSO_TCPV4: 64 gso_type = SKB_GSO_TCPV4; 65 ip_proto = IPPROTO_TCP; 66 thlen = sizeof(struct tcphdr); 67 break; 68 case VIRTIO_NET_HDR_GSO_TCPV6: 69 gso_type = SKB_GSO_TCPV6; 70 ip_proto = IPPROTO_TCP; 71 thlen = sizeof(struct tcphdr); 72 nh_min_len = sizeof(struct ipv6hdr); 73 break; 74 case VIRTIO_NET_HDR_GSO_UDP: 75 gso_type = SKB_GSO_UDP; 76 ip_proto = IPPROTO_UDP; 77 thlen = sizeof(struct udphdr); 78 break; 79 case VIRTIO_NET_HDR_GSO_UDP_L4: 80 gso_type = SKB_GSO_UDP_L4; 81 ip_proto = IPPROTO_UDP; 82 thlen = sizeof(struct udphdr); 83 break; 84 default: 85 return -EINVAL; 86 } 87 88 if (hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN) 89 gso_type |= SKB_GSO_TCP_ECN; 90 91 if (hdr->gso_size == 0) 92 return -EINVAL; 93 } 94 95 skb_reset_mac_header(skb); 96 97 if (hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) { 98 u32 start = __virtio16_to_cpu(little_endian, hdr->csum_start); 99 u32 off = __virtio16_to_cpu(little_endian, hdr->csum_offset); 100 u32 needed = start + max_t(u32, thlen, off + sizeof(__sum16)); 101 102 if (hdr->gso_size) { > 103 ret = div64_u64_rem(skb->len, hdr->gso_size, > &remainder); > 104 if (!(ret && (hdr->gso_size > needed) && 105 ((remainder > needed) || (remainder == 0 { 106 return -EINVAL; 107 } 108 skb_s