date:20240606

Re: [PATCH] tools/virtio: Use the __GFP_ZERO flag of kmalloc to complete the memory initialization.

2024-06-06 Thread Michael S. Tsirkin

On Wed, Jun 05, 2024 at 09:52:45PM +0800, cuitao wrote:
> Use the __GFP_ZERO flag of kmalloc to initialize memory while allocating it,
> without the need for an additional memset call.
> 
> Signed-off-by: cuitao 
> ---
>  tools/virtio/linux/kernel.h | 5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/tools/virtio/linux/kernel.h b/tools/virtio/linux/kernel.h
> index 6702008f7f5c..9e401fb7c215 100644
> --- a/tools/virtio/linux/kernel.h
> +++ b/tools/virtio/linux/kernel.h
> @@ -66,10 +66,7 @@ static inline void *kmalloc_array(unsigned n, size_t s, 
> gfp_t gfp)
>  
>  static inline void *kzalloc(size_t s, gfp_t gfp)
>  {
> - void *p = kmalloc(s, gfp);
> -
> - memset(p, 0, s);
> - return p;
> + return kmalloc(s, gfp | __GFP_ZERO);
>  }


Why do we care? It's just here to make things compile. The simpler the
better.

>  static inline void *alloc_pages_exact(size_t s, gfp_t gfp)
> -- 
> 2.25.1

Re: [PATCH] tools/virtio: Use the __GFP_ZERO flag of kmalloc to complete the memory initialization.

2024-06-06 Thread cuitao


Sorry, maybe I'm wrong.

I wonder if the gfp parameter in static inline void *kmalloc(size_t s, 
gfp_t gfp) can be deleted if it is not used.


Or would be better to move memset to kmalloc.

Like this:
#define __GFP_ZERO 0x1
static inline void *kmalloc(size_t s, gfp_t gfp)
{
    void *p;
    if (__kmalloc_fake)
    return __kmalloc_fake;

    p = malloc(s);
    if (!p)
    return p;

    if (gfp & __GFP_ZERO)
    memset(p, 0, s);
    return p;
}

在 2024/6/6 15:18, Michael S. Tsirkin 写道:

On Wed, Jun 05, 2024 at 09:52:45PM +0800, cuitao wrote:

Use the __GFP_ZERO flag of kmalloc to initialize memory while allocating it,
without the need for an additional memset call.

Signed-off-by: cuitao 
---
  tools/virtio/linux/kernel.h | 5 +
  1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/tools/virtio/linux/kernel.h b/tools/virtio/linux/kernel.h
index 6702008f7f5c..9e401fb7c215 100644
--- a/tools/virtio/linux/kernel.h
+++ b/tools/virtio/linux/kernel.h
@@ -66,10 +66,7 @@ static inline void *kmalloc_array(unsigned n, size_t s, 
gfp_t gfp)
  
  static inline void *kzalloc(size_t s, gfp_t gfp)

  {
-   void *p = kmalloc(s, gfp);
-
-   memset(p, 0, s);
-   return p;
+   return kmalloc(s, gfp | __GFP_ZERO);
  }


Why do we care? It's just here to make things compile. The simpler the
better.


  static inline void *alloc_pages_exact(size_t s, gfp_t gfp)
--
2.25.1

Re: [PATCH v3 1/5] dt-bindings: remoteproc: qcom,sa8775p-pas: Document the SA8775p ADSP, CDSP and GPDSP

2024-06-06 Thread Krzysztof Kozlowski

On 05/06/2024 18:56, Bartosz Golaszewski wrote:
> From: Bartosz Golaszewski 
> 
> Document the components used to boot the ADSP, CDSP0, CDSP1, GPDSP0 and
> GPDSP1 on the SA8775p SoC.
> 
> Signed-off-by: Bartosz Golaszewski 
> ---
>  .../bindings/remoteproc/qcom,sa8775p-pas.yaml  | 160 
> +
>  1 file changed, 160 insertions(+)

Reviewed-by: Krzysztof Kozlowski 

Best regards,
Krzysztof

[PATCH] remoteproc: imx_rproc: Adjust phandle parsing issue while remapping optional addresses in imx_rproc_addr_init()

2024-06-06 Thread Aleksandr Mishin

In imx_rproc_addr_init() "nph = of_count_phandle_with_args()" just counts
number of phandles. But phandles may be empty. So of_parse_phandle() in
the parsing loop (0 < a < nph) may return NULL which is later dereferenced.
Adjust this issue by adding NULL-return check.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: a0ff4aa6f010 ("remoteproc: imx_rproc: add a NXP/Freescale imx_rproc 
driver")
Signed-off-by: Aleksandr Mishin 
---
 drivers/remoteproc/imx_rproc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/remoteproc/imx_rproc.c b/drivers/remoteproc/imx_rproc.c
index 5a3fb902acc9..39eacd90af14 100644
--- a/drivers/remoteproc/imx_rproc.c
+++ b/drivers/remoteproc/imx_rproc.c
@@ -726,6 +726,8 @@ static int imx_rproc_addr_init(struct imx_rproc *priv,
struct resource res;
 
node = of_parse_phandle(np, "memory-region", a);
+   if (!node)
+   continue;
/* Not map vdevbuffer, vdevring region */
if (!strncmp(node->name, "vdev", strlen("vdev"))) {
of_node_put(node);
-- 
2.30.2

RE: [PATCH] remoteproc: imx_rproc: Adjust phandle parsing issue while remapping optional addresses in imx_rproc_addr_init()

2024-06-06 Thread Peng Fan

Hi Aleksandr,

> Subject: [PATCH] remoteproc: imx_rproc: Adjust phandle parsing issue while
> remapping optional addresses in imx_rproc_addr_init()
> 
> In imx_rproc_addr_init() "nph = of_count_phandle_with_args()" just counts
> number of phandles. But phandles may be empty. So of_parse_phandle() in
> the parsing loop (0 < a < nph) may return NULL which is later dereferenced.
> Adjust this issue by adding NULL-return check.

It is good to add a check here. But I am not sure whether this will really
happen.

> 
> Found by Linux Verification Center (linuxtesting.org) with SVACE.
> 
> Fixes: a0ff4aa6f010 ("remoteproc: imx_rproc: add a NXP/Freescale imx_rproc
> driver")
> Signed-off-by: Aleksandr Mishin 

Anyway LGTM:
Reviewed-by: Peng Fan 

Thanks,
Peng.

[PATCH] remoteproc: mediatek: Increase MT8188 SCP core0 DRAM size

2024-06-06 Thread jason-ch chen

From: Jason Chen 

Increase MT8188 SCP core0 DRAM size for HEVC driver.

Signed-off-by: Jason Chen 
---
 drivers/remoteproc/mtk_scp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/remoteproc/mtk_scp.c b/drivers/remoteproc/mtk_scp.c
index b885a9a041e4..2119fc62c3f2 100644
--- a/drivers/remoteproc/mtk_scp.c
+++ b/drivers/remoteproc/mtk_scp.c
@@ -1390,7 +1390,7 @@ static const struct mtk_scp_sizes_data default_scp_sizes 
= {
 };
 
 static const struct mtk_scp_sizes_data mt8188_scp_sizes = {
-   .max_dram_size = 0x50,
+   .max_dram_size = 0x80,
.ipi_share_buffer_size = 600,
 };
 
-- 
2.34.1

[PATCH] arm64: dts: qcom: qcm6490-fairphone-fp5: Use .mbn firmware for IPA

2024-06-06 Thread Luca Weiss

Specify the file name for the squashed/non-split firmware with the .mbn
extension instead of the split .mdt. The kernel can load both but the
squashed version is preferred in dts nowadays.

Signed-off-by: Luca Weiss 
---
 arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts 
b/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts
index 8cd2fe80dbb2..6b4ffc585dcf 100644
--- a/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts
+++ b/arch/arm64/boot/dts/qcom/qcm6490-fairphone-fp5.dts
@@ -551,7 +551,7 @@ &i2c9 {
 &ipa {
qcom,gsi-loader = "self";
memory-region = <&ipa_fw_mem>;
-   firmware-name = "qcom/qcm6490/fairphone5/ipa_fws.mdt";
+   firmware-name = "qcom/qcm6490/fairphone5/ipa_fws.mbn";
status = "okay";
 };
 

---
base-commit: ee78a17615ad0cfdbbc27182b1047cd36c9d4d5f
change-id: 20240606-fp5-ipa-mbn-d0466de6b5bf

Best regards,
-- 
Luca Weiss

Re: [PATCH v8 1/5] soc: qcom: pdr: protect locator_addr with the main mutex

2024-06-06 Thread Dmitry Baryshkov

On Thu, 6 Jun 2024 at 01:48, Chris Lew  wrote:
>
> Hi Dmitry,
>
> On 5/11/2024 2:56 PM, Dmitry Baryshkov wrote:
> ...
> > @@ -76,12 +76,12 @@ static int pdr_locator_new_server(struct qmi_handle 
> > *qmi,
> > locator_hdl);
> >   struct pdr_service *pds;
> >
> > + mutex_lock(&pdr->lock);
> >   /* Create a local client port for QMI communication */
> >   pdr->locator_addr.sq_family = AF_QIPCRTR;
> >   pdr->locator_addr.sq_node = svc->node;
> >   pdr->locator_addr.sq_port = svc->port;
> >
> > - mutex_lock(&pdr->lock);
> >   pdr->locator_init_complete = true;
> >   mutex_unlock(&pdr->lock);
> >
> > @@ -104,10 +104,10 @@ static void pdr_locator_del_server(struct qmi_handle 
> > *qmi,
> >
> >   mutex_lock(&pdr->lock);
> >   pdr->locator_init_complete = false;
> > - mutex_unlock(&pdr->lock);
> >
> >   pdr->locator_addr.sq_node = 0;
> >   pdr->locator_addr.sq_port = 0;
> > + mutex_unlock(&pdr->lock);
> >   }
> >
> >   static const struct qmi_ops pdr_locator_ops = {
> > @@ -365,6 +365,7 @@ static int pdr_get_domain_list(struct 
> > servreg_get_domain_list_req *req,
> >   if (ret < 0)
> >   return ret;
> >
> > + mutex_lock(&pdr->lock);
> >   ret = qmi_send_request(&pdr->locator_hdl,
> >  &pdr->locator_addr,
> >  &txn, SERVREG_GET_DOMAIN_LIST_REQ,
> > @@ -373,15 +374,16 @@ static int pdr_get_domain_list(struct 
> > servreg_get_domain_list_req *req,
> >  req);
> >   if (ret < 0) {
> >   qmi_txn_cancel(&txn);
> > - return ret;
> > + goto err_unlock;
> >   }
> >
> >   ret = qmi_txn_wait(&txn, 5 * HZ);
> >   if (ret < 0) {
> >   pr_err("PDR: %s get domain list txn wait failed: %d\n",
> >  req->service_name, ret);
> > - return ret;
> > + goto err_unlock;
> >   }
> > + mutex_unlock(&pdr->lock);
>
> I'm not sure it is necessary to hold the the mutex during the
> qmi_txn_wait() since the only variable we are trying to protect is
> locator_addr.
>
> Wouldn't this delay other work like new/del server notifications if this
> qmi service is delayed or non-responsive?
>

I've verified, the addr is stored inside the message data by the
enqueueing functions, so the locator_addr isn't referenced after the
function returns. I'll reduce the locking scope.


-- 
With best wishes
Dmitry

Re: [PATCH v8 0/5] soc: qcom: add in-kernel pd-mapper implementation

2024-06-06 Thread Neil Armstrong


On 11/05/2024 23:56, Dmitry Baryshkov wrote:

Protection domain mapper is a QMI service providing mapping between
'protection domains' and services supported / allowed in these domains.
For example such mapping is required for loading of the WiFi firmware or
for properly starting up the UCSI / altmode / battery manager support.

The existing userspace implementation has several issue. It doesn't play
well with CONFIG_EXTRA_FIRMWARE, it doesn't reread the JSON files if the
firmware location is changed (or if the firmware was not available at
the time pd-mapper was started but the corresponding directory is
mounted later), etc.

However this configuration is largely static and common between
different platforms. Provide in-kernel service implementing static
per-platform data.

To: Bjorn Andersson 
To: Konrad Dybcio 
To: Sibi Sankar 
To: Mathieu Poirier 
Cc: linux-arm-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-remotep...@vger.kernel.org
Cc: Johan Hovold 
Cc: Xilin Wu 
Cc: "Bryan O'Donoghue" 
Cc: Steev Klimaszewski 
Cc: Alexey Minnekhanov 

--

Changes in v8:
- Reworked pd-mapper to register as an rproc_subdev / auxdev
- Dropped Tested-by from Steev and Alexey from the last patch since the
   implementation was changed significantly.
- Add sensors, cdsp and mpss_root domains to 660 config (Alexey
   Minnekhanov)
- Added platform entry for sm4250 (used for qrb4210 / RB2)
- Added locking to the pdr_get_domain_list() (Chris Lew)
- Remove the call to qmi_del_server() and corresponding API (Chris Lew)
- In qmi_handle_init() changed 1024 to a defined constant (Chris Lew)
- Link to v7: 
https://lore.kernel.org/r/20240424-qcom-pd-mapper-v7-0-05f7fc646...@linaro.org

Changes in v7:
- Fixed modular build (Steev)
- Link to v6: 
https://lore.kernel.org/r/20240422-qcom-pd-mapper-v6-0-f96957d01...@linaro.org

Changes in v6:
- Reworked mutex to fix lockdep issue on deregistration
- Fixed dependencies between PD-mapper and remoteproc to fix modular
   builds (Krzysztof)
- Added EXPORT_SYMBOL_GPL to fix modular builds (Krzysztof)
- Fixed kerneldocs (Krzysztof)
- Removed extra pr_debug messages (Krzysztof)
- Fixed wcss build (Krzysztof)
- Added platforms which do not require protection domain mapping to
   silence the notice on those platforms
- Link to v5: 
https://lore.kernel.org/r/20240419-qcom-pd-mapper-v5-0-e35b6f847...@linaro.org

Changes in v5:
- pdr: drop lock in pdr_register_listener, list_lock is already held (Chris Lew)
- pd_mapper: reworked to provide static configuration per platform
   (Bjorn)
- Link to v4: 
https://lore.kernel.org/r/20240311-qcom-pd-mapper-v4-0-24679cca5...@linaro.org

Changes in v4:
- Fixed missing chunk, reenabled kfree in qmi_del_server (Konrad)
- Added configuration for sm6350 (Thanks to Luca)
- Removed RFC tag (Konrad)
- Link to v3: 
https://lore.kernel.org/r/20240304-qcom-pd-mapper-v3-0-6858fa1ac...@linaro.org

Changes in RFC v3:
- Send start / stop notifications when PD-mapper domain list is changed
- Reworked the way PD-mapper treats protection domains, register all of
   them in a single batch
- Added SC7180 domains configuration based on TCL Book 14 GO
- Link to v2: 
https://lore.kernel.org/r/20240301-qcom-pd-mapper-v2-0-5d12a081d...@linaro.org

Changes in RFC v2:
- Swapped num_domains / domains (Konrad)
- Fixed an issue with battery not working on sc8280xp
- Added missing configuration for QCS404

---
Dmitry Baryshkov (5):
   soc: qcom: pdr: protect locator_addr with the main mutex
   soc: qcom: pdr: fix parsing of domains lists
   soc: qcom: pdr: extract PDR message marshalling data
   soc: qcom: add pd-mapper implementation
   remoteproc: qcom: enable in-kernel PD mapper

  drivers/remoteproc/qcom_common.c|  87 +
  drivers/remoteproc/qcom_common.h|  10 +
  drivers/remoteproc/qcom_q6v5_adsp.c |   3 +
  drivers/remoteproc/qcom_q6v5_mss.c  |   3 +
  drivers/remoteproc/qcom_q6v5_pas.c  |   3 +
  drivers/remoteproc/qcom_q6v5_wcss.c |   3 +
  drivers/soc/qcom/Kconfig|  15 +
  drivers/soc/qcom/Makefile   |   2 +
  drivers/soc/qcom/pdr_interface.c|  17 +-
  drivers/soc/qcom/pdr_internal.h | 318 ++---
  drivers/soc/qcom/qcom_pd_mapper.c   | 676 
  drivers/soc/qcom/qcom_pdr_msg.c | 353 +++
  12 files changed, 1190 insertions(+), 300 deletions(-)
---
base-commit: e5119bbdaca76cd3c15c3c975d51d840bbfb2488
change-id: 20240301-qcom-pd-mapper-e12d622d4ad0

Best regards,


Tested-by: Neil Armstrong  # on SM8550-QRD
Tested-by: Neil Armstrong  # on SM8550-HDK
Tested-by: Neil Armstrong  # on SM8650-QRD

Thanks,
Neil

Re: [PATCH] arm64: dts: qcom: qcm6490-fairphone-fp5: Use .mbn firmware for IPA

2024-06-06 Thread Konrad Dybcio

On 6.06.2024 11:09 AM, Luca Weiss wrote:
> Specify the file name for the squashed/non-split firmware with the .mbn
> extension instead of the split .mdt. The kernel can load both but the
> squashed version is preferred in dts nowadays.
> 
> Signed-off-by: Luca Weiss 
> ---

Reviewed-by: Konrad Dybcio 

Konrad

Re: [PATCH v2 2/6] livepatch: Add klp-convert tool

2024-06-06 Thread Lukas Hruska

Hello Joe,
> > +#define KLP_RELOC_SYMBOL_POS(LP_OBJ_NAME, SYM_OBJ_NAME, SYM_NAME, SYM_POS) 
> > \
> > +   asm("\".klp.sym.rela." #LP_OBJ_NAME "." #SYM_OBJ_NAME "." #SYM_NAME "," 
> > #SYM_POS "\"")
> 
> ^^^
> I think I found a potential bug, or at least compatiblity problem with
> including a comma "," character in this symbol format and older versions
> of the GNU assembler.  The good news is that other delimiter characters
> like "." or "#" seem to work out fine.

Thank you for spotting this. I was using binutils 2.38, so I did not
even notice this problem. Unfortunately, I was not able to make it work
with "#" as a delimiter; only "." worked. Additionally, any type of
parenthesis apparently has some special purpose even in labels, so they
are also not an option.

> If you want to reproduce, you'll need a version of `as` like binutils
> 2.36.1 and try building the samples/livepatch/livepatch-extern-symbol.ko
> and you should get an error like:
> 
>   Assembler messages:
>   Warning: missing closing '"'
>   Warning: missing closing '"'
>   Error: too many memory references for `movq'
> 
> 
> If you want to retrace my adventure, here are my steps:
> 
>   1) Clone klp-convert-tree repo branch containing this patchset +
>   Petr's review comments + a few helpful things for klp-convert
>   development:
>   
> $ git clone \
> --single-branch --branch=klp-convert-minimal-v1-review --depth=9 \
> https://github.com/joe-lawrence/klp-convert-tree.git
> [ ... snip ... ]
> $ cd klp-convert-tree
>   
>   2) Override .cross-dev defaults:
>   
> $ export BUILD_ARCHES=x86_64
> $ export COMPILER=gcc-11.1.0
> $ export URL=https://cdn.kernel.org/pub/tools/crosstool/files/bin/x86_64/
> $ export OUTDIR_PREFIX=$(pwd)/build
> $ export COMPILER_INSTALL_PATH=$(pwd)/cross-compiler
>   
>   3) Setup x86_64 default .config (this will download and install the
>   gcc-11.1.0 compiler from cdn.kernel.org):
>   
> $ ./cross-dev make defconfig
> 
> x86_64 : make defconfig ...
> Compiler will be installed in /root/klp-convert-tree/cross-compiler
> [ ... snip ... ]
>   
>   4) Add kernel livepatching configuration options:
>   
> $ ./cross-dev klp-config
> 
> Configuring x86_64 ...
> [ ... snip ... ]
> 
> $ grep LIVEPATCH "$OUTDIR_PREFIX"-x86_64/.config
> CONFIG_HAVE_LIVEPATCH=y
> CONFIG_LIVEPATCH=y
> CONFIG_SAMPLE_LIVEPATCH=m
>   
>   5) Run the cross-compiler build until it hits a build error on
>   livepatch-extern-symbol.ko:
>   
> $ ./cross-dev make -j$(nproc)
> [ ... snip ... ]
> make: Target '__all' not remade because of errors.
> [ x86_64 : make -j48 = FAIL ]
>   
>   6) With pre-requisites already built, retry the external symbol sample
>   and add -save-temps to the KCFLAGS to keep the generated assembly file:
>   
> $ KCFLAGS="-save-temps=obj" ./cross-dev make 
> samples/livepatch/livepatch-extern-symbol.ko
> [ ... snip ... ]
> samples/livepatch/livepatch-extern-symbol.s: Assembler messages:
> samples/livepatch/livepatch-extern-symbol.s:103: Warning: missing closing 
> '"'
> samples/livepatch/livepatch-extern-symbol.s:103: Warning: missing closing 
> '"'
> samples/livepatch/livepatch-extern-symbol.s:103: Error: too many memory 
> references for `movq'
> [ ... snip ... ]
>   
>   7) Which line is that?
>   
> $ awk 'NR==103' 
> "$OUTDIR_PREFIX"-x86_64/samples/livepatch/livepatch-extern-symbol.s
> movq
> ".klp.sym.rela.vmlinux.vmlinux.saved_command_line,0"(%rip), %rdx
> 
> 
> You could alternatively poke at it through the compiler explorer service
> and toggle the source and binutils versions:
> 
>   (error)   binutils 2.36.1 : https://godbolt.org/z/cGGs6rfWe
>   (success) binutils 2.38   : https://godbolt.org/z/ffzza3vYd

Thank you for those detailed step-by-step instruction to reproduce it!
It helped me a lot to understand the problem.

Lukas

Re: [PATCH] remoteproc: mediatek: Don't print error when optional scp handle is missing

2024-06-06 Thread AngeloGioacchino Del Regno


Il 05/06/24 21:35, Nícolas F. R. A. Prado ha scritto:

The scp_get() helper has two users: the mtk-vcodec and the mtk-mdp3
drivers. mdp3 considers the mediatek,scp phandle optional, and when it's
missing mdp3 will directly look for the scp node based on compatible.

For that reason printing an error in the helper when the handle is
missing is wrong, as it's not always an actual error. Besides, the other
user, vcodec, already prints an error message on its own when the helper
fails so this message doesn't add that much information.

Remove the error message from the helper. This gets rid of the deceptive
error on MT8195-Tomato:

   mtk-mdp3 14001000.dma-controller: can't get SCP node


I'm way more for giving it a SCP handle instead of silencing this print... the
SCP handle way *is* the correct one and I prefer it, as that'd also be the only
way to support Dual-Core SCP in MDP3.

I really want to see hardcoded stuff disappear and I really really *really* want
to see more consistency across DT nodes in MediaTek platforms.

So, still a one-line change, but do it on the devicetree instead of here please.

Cheers,
Angelo



Signed-off-by: Nícolas F. R. A. Prado 
---
  drivers/remoteproc/mtk_scp.c | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/remoteproc/mtk_scp.c b/drivers/remoteproc/mtk_scp.c
index b885a9a041e4..f813117b6312 100644
--- a/drivers/remoteproc/mtk_scp.c
+++ b/drivers/remoteproc/mtk_scp.c
@@ -37,10 +37,8 @@ struct mtk_scp *scp_get(struct platform_device *pdev)
struct platform_device *scp_pdev;
  
  	scp_node = of_parse_phandle(dev->of_node, "mediatek,scp", 0);

-   if (!scp_node) {
-   dev_err(dev, "can't get SCP node\n");
+   if (!scp_node)
return NULL;
-   }
  
  	scp_pdev = of_find_device_by_node(scp_node);

of_node_put(scp_node);

---
base-commit: d97496ca23a2d4ee80b7302849404859d9058bcd
change-id: 20240605-mt8195-dma-scp-node-err-6a8dea26d4f5

Best regards,

Re: [PATCH] remoteproc: mediatek: Increase MT8188 SCP core0 DRAM size

2024-06-06 Thread AngeloGioacchino Del Regno


Il 06/06/24 11:06, jason-ch chen ha scritto:

From: Jason Chen 

Increase MT8188 SCP core0 DRAM size for HEVC driver.



so the second core only gets a maximum of 0x20 of SRAM?
I'm not sure how useful is the secondary SCP core, at this point, with so little
available SRAM but... okay, as you wish.

Reviewed-by: AngeloGioacchino Del Regno 




Signed-off-by: Jason Chen 
---
  drivers/remoteproc/mtk_scp.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/remoteproc/mtk_scp.c b/drivers/remoteproc/mtk_scp.c
index b885a9a041e4..2119fc62c3f2 100644
--- a/drivers/remoteproc/mtk_scp.c
+++ b/drivers/remoteproc/mtk_scp.c
@@ -1390,7 +1390,7 @@ static const struct mtk_scp_sizes_data default_scp_sizes 
= {
  };
  
  static const struct mtk_scp_sizes_data mt8188_scp_sizes = {

-   .max_dram_size = 0x50,
+   .max_dram_size = 0x80,
.ipi_share_buffer_size = 600,
  };

[PATCH] function_graph: Rename BYTE_NUMBER to CHAR_NUMBER in selftests

2024-06-06 Thread Steven Rostedt

From: "Steven Rostedt (Google)" 

The function_graph selftests checks various size variables to pass from
the entry of the function to the exit. It tests 1, 2, 4 and 8 byte words.
The 1 byte macro was called BYTE_NUMBER but that is used in the sh
architecture: arch/sh/include/asm/bitops-op32.h

Just rename the macro to CHAR_NUMBER.

Fixes: 47c3c70aa3697 ("function_graph: Add selftest for passing local 
variables")
Reported-by: kernel test robot 
Closes: 
https://lore.kernel.org/oe-kbuild-all/202406061744.rzdxfrrg-...@intel.com/
Signed-off-by: Steven Rostedt (Google) 
---
 kernel/trace/trace_selftest.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
index 369efc569238..adf0f436d84b 100644
--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -758,7 +758,7 @@ trace_selftest_startup_function(struct tracer *trace, 
struct trace_array *tr)
 
 #ifdef CONFIG_DYNAMIC_FTRACE
 
-#define BYTE_NUMBER 123
+#define CHAR_NUMBER 123
 #define SHORT_NUMBER 12345
 #define WORD_NUMBER 1234567890
 #define LONG_NUMBER 1234567890123456789LL
@@ -789,7 +789,7 @@ static __init int store_entry(struct ftrace_graph_ent 
*trace,
 
switch (size) {
case 1:
-   *(char *)p = BYTE_NUMBER;
+   *(char *)p = CHAR_NUMBER;
break;
case 2:
*(short *)p = SHORT_NUMBER;
@@ -830,7 +830,7 @@ static __init void store_return(struct ftrace_graph_ret 
*trace,
 
switch (fixture->store_size) {
case 1:
-   expect = BYTE_NUMBER;
+   expect = CHAR_NUMBER;
found = *(char *)p;
break;
case 2:
-- 
2.43.0

Re: [PATCH v2 1/5] ftrace: Rename dup_hash() and comment it

2024-06-06 Thread Mark Rutland

On Wed, Jun 05, 2024 at 02:03:35PM -0400, Steven Rostedt wrote:
> From: "Steven Rostedt (Google)" 
> 
> The name "dup_hash()" is a misnomer as it does not duplicate the hash that
> is passed in, but instead moves its entities from that hash to a newly
> allocated one. Rename it to "__move_hash()" (using starting underscores as
> it is an internal function), and add some comments about what it does.
> 
> Signed-off-by: Steven Rostedt (Google) 

Acked-by: Mark Rutland 

Mark.

> ---
>  kernel/trace/ftrace.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index da7e6abf48b4..9dcdefe9d1aa 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -1391,7 +1391,11 @@ ftrace_hash_rec_enable_modify(struct ftrace_ops *ops, 
> int filter_hash);
>  static int ftrace_hash_ipmodify_update(struct ftrace_ops *ops,
>  struct ftrace_hash *new_hash);
>  
> -static struct ftrace_hash *dup_hash(struct ftrace_hash *src, int size)
> +/*
> + * Allocate a new hash and remove entries from @src and move them to the new 
> hash.
> + * On success, the @src hash will be empty and should be freed.
> + */
> +static struct ftrace_hash *__move_hash(struct ftrace_hash *src, int size)
>  {
>   struct ftrace_func_entry *entry;
>   struct ftrace_hash *new_hash;
> @@ -1438,7 +1442,7 @@ __ftrace_hash_move(struct ftrace_hash *src)
>   if (ftrace_hash_empty(src))
>   return EMPTY_HASH;
>  
> - return dup_hash(src, size);
> + return __move_hash(src, size);
>  }
>  
>  static int
> -- 
> 2.43.0
> 
>

Re: [PATCH v14 14/14] selftests/sgx: Add scripts for EPC cgroup testing

2024-06-06 Thread Haitao Huang

On Thu, 06 Jun 2024 00:32:55 -0500, Jarkko Sakkinen   
wrote:



On Wed Jun 5, 2024 at 6:33 PM EEST, Haitao Huang wrote:


sgx_cgroup_try_charge() expects sgx_cg_wq, so it would break unless we
check and return 0 which was the initially implemented in v12. But then
Kai had some concern on that we expose all the interface files to allow
user to set limits but we don't enforce. To keep it simple we settled  
down

  ~~

Sure:

"Keep it simple and corpse"


back to BUG_ON(). This would only happen rarely and user can add
command-line to disable SGX if s/he really wants to start kernel in this
case, just can't do SGX.


Even disabling all of SGX would be a less catastrophical measure.


Yes I had a comment but Kai thought it was too obvious and I can't think
of a better one that's not obvious so I removed:


Not great advice given. Please just document it. In patch, which
BUG_ON() I don't want to see my R-by in it, until I've reviewed an
updated version.

BR, Jarkko



Sure. that makes sens.  So far I have following changes. I did not do the  
renaming for the functions as I am not sure it is still needed. Let me  
know otherwise and if I missed any more.


--- a/arch/x86/kernel/cpu/sgx/epc_cgroup.c
+++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.c
@@ -329,7 +329,7 @@ const struct misc_res_ops sgx_cgroup_ops = {
 .free = sgx_cgroup_free,
 };

-void sgx_cgroup_init(void)
+int __init sgx_cgroup_init(void)
 {
 /*
  * misc root always exists even if misc is disabled from command line.
@@ -345,7 +345,8 @@ void sgx_cgroup_init(void)
 if (cgroup_subsys_enabled(misc_cgrp_subsys)) {
 sgx_cg_wq = alloc_workqueue("sgx_cg_wq", WQ_UNBOUND |  
WQ_FREEZABLE,

 WQ_UNBOUND_MAX_ACTIVE);
-BUG_ON(!sgx_cg_wq);
+return -ENOMEM;
 }

+return 0;
 }

--- a/arch/x86/kernel/cpu/sgx/epc_cgroup.h
+++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.h
@@ -99,7 +99,7 @@ bool sgx_cgroup_should_reclaim(struct sgx_cgroup  
*sgx_cg);

 struct misc_cg *sgx_cgroup_reclaim_pages(struct misc_cg *root,
  struct misc_cg *next_cg,
  struct mm_struct *charge_mm);
-void sgx_cgroup_init(void);
+int __init sgx_cgroup_init(void);

 #endif /* CONFIG_CGROUP_MISC */


--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -1045,7 +1045,7 @@ static int __init sgx_init(void)
 if (!sgx_page_cache_init())
 return -ENOMEM;

-if (!sgx_page_reclaimer_init()) {
+if (!sgx_page_reclaimer_init() || !sgx_cgroup_init()) {
 ret = -ENOMEM;
 goto err_page_cache;
 }
@@ -1067,9 +1067,6 @@ static int __init sgx_init(void)
 if (sgx_vepc_init() && ret)
 goto err_provision;

-/* Setup cgroup if either the native or vepc driver is active */
-sgx_cgroup_init();
-
 return 0;

 err_provision:

--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -119,6 +119,7 @@ static inline void *sgx_get_epc_virt_addr(struct  
sgx_epc_page *page)

  * cgroup.
  */
 struct sgx_epc_lru_list {
+/* Use this lock to protect access from multiple reclamation worker  
threads */

 spinlock_t lock;
 struct list_head reclaimable;
 };

--- a/kernel/cgroup/misc.c
+++ b/kernel/cgroup/misc.c
@@ -446,7 +446,8 @@ misc_cg_alloc(struct cgroup_subsys_state *parent_css)
 int ret;

 if (!parent_css) {
-parent_cg = cg = &root_cg;
+parent_cg = &root_cg;
+cg = &root_cg;
 } else {
 cg = kzalloc(sizeof(*cg), GFP_KERNEL);
 if (!cg)

Thanks
Haitao

[PATCH v2] ftrace: Add missing kerneldoc parameters to unregister_ftrace_direct()

2024-06-06 Thread Marilene A Garcia

Add the description to the parameters addr and free_filters
of the function unregister_ftrace_direct().

Signed-off-by: Marilene A Garcia 
---
Hello,
Thank you for the feedback.
The requested changes were applied in this new version.

 kernel/trace/ftrace.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index da7e6abf48b4..1f79cf3c598e 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -5960,6 +5960,8 @@ EXPORT_SYMBOL_GPL(register_ftrace_direct);
  * unregister_ftrace_direct - Remove calls to custom trampoline
  * previously registered by register_ftrace_direct for @ops object.
  * @ops: The address of the struct ftrace_ops object
+ * @addr: The address of the direct function that is called by the @ops 
functions
+ * @free_filters: Set to true to remove all filters for the ftrace_ops, false 
otherwise
  *
  * This is used to remove a direct calls to @addr from the nop locations
  * of the functions registered in @ops (with by ftrace_set_filter_ip
-- 
2.34.1

[PATCH 6.10.0-rc2] kernel/module: avoid panic on loading broken module

2024-06-06 Thread Daniel v. Kirschten


If a module is being loaded, and the .gnu.linkonce.this_module section
in the module's ELF file does not have the WRITE flag, the kernel will
map the finished module struct of that module as read-only.
This causes a kernel panic when the struct is written to the first time
after it has been marked read-only. Currently this happens in
complete_formation in kernel/module/main.c:2765 when the module's state is
set to MODULE_STATE_COMING, just after setting up the memory protections.

Down the line, this seems to lead to unpredictable freezes when trying to
load other modules - I guess this is due to some structures not being
cleaned up properly, but I didn't investigate this further.

A check already exists which verifies that .gnu.linkonce.this_module
is ALLOC. This patch simply adds an analogous check for WRITE.

Signed-off-by: Daniel Kirschten 
---
 kernel/module/main.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/kernel/module/main.c b/kernel/module/main.c
index d18a94b973e1..abba097551a2 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -1886,6 +1886,12 @@ static int elf_validity_cache_copy(struct load_info 
*info, int flags)
goto no_exec;
}
 
+   if (!(shdr->sh_flags & SHF_WRITE)) {

+   pr_err("module %s: .gnu.linkonce.this_module must be 
writable\n",
+  info->name ?: "(missing .modinfo section or name 
field)");
+   goto no_exec;
+   }
+
if (shdr->sh_size != sizeof(struct module)) {
pr_err("module %s: .gnu.linkonce.this_module section size must match 
the kernel's built struct module size at run time\n",
   info->name ?: "(missing .modinfo section or name 
field)");
--
2.34.1

[PATCH] net: missing check

2024-06-06 Thread Denis Arefev

Two missing check in virtio_net_hdr_to_skb() allowed syzbot 
to crash kernels again 

1. After the skb_segment function the buffer may become non-linear 
(nr_frags != 0), but since the SKBTX_SHARED_FRAG flag is not set anywhere
the __skb_linearize function will not be executed, then the buffer will 
remain non-linear. Then the condition (offset >= skb_headlen(skb))
becomes true, which causes WARN_ON_ONCE in skb_checksum_help.

2. The struct sk_buff and struct virtio_net_hdr members must be 
mathematically related.
(gso_size) must be greater than (needed) otherwise WARN_ON_ONCE.
(remainder) must be greater than (needed) otherwise WARN_ON_ONCE.
(remainder) may be 0 if division is without remainder.

offset+2 (4191) > skb_headlen() (1116)
WARNING: CPU: 1 PID: 5084 at net/core/dev.c:3303 skb_checksum_help+0x5e2/0x740 
net/core/dev.c:3303
Modules linked in:
CPU: 1 PID: 5084 Comm: syz-executor336 Not tainted 
6.7.0-rc3-syzkaller-00014-gdf60cee26a2e #0
Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 
11/10/2023
RIP: 0010:skb_checksum_help+0x5e2/0x740 net/core/dev.c:3303
Code: 89 e8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 52 01 00 00 44 89 e2 2b 
53 74 4c 89 ee 48 c7 c7 40 57 e9 8b e8 af 8f dd f8 90 <0f> 0b 90 90 e9 87 fe ff 
ff e8 40 0f 6e f9 e9 4b fa ff ff 48 89 ef
RSP: 0018:c90003a9f338 EFLAGS: 00010286
RAX:  RBX: 888025125780 RCX: 814db209
RDX: 888015393b80 RSI: 814db216 RDI: 0001
RBP: 8880251257f4 R08: 0001 R09: 
R10:  R11: 0001 R12: 045c
R13: 105f R14: 8880251257f0 R15: 105d
FS:  55c24380() GS:8880b990() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 2000f000 CR3: 23151000 CR4: 003506f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 
 ip_do_fragment+0xa1b/0x18b0 net/ipv4/ip_output.c:777
 ip_fragment.constprop.0+0x161/0x230 net/ipv4/ip_output.c:584
 ip_finish_output_gso net/ipv4/ip_output.c:286 [inline]
 __ip_finish_output net/ipv4/ip_output.c:308 [inline]
 __ip_finish_output+0x49c/0x650 net/ipv4/ip_output.c:295
 ip_finish_output+0x31/0x310 net/ipv4/ip_output.c:323
 NF_HOOK_COND include/linux/netfilter.h:303 [inline]
 ip_output+0x13b/0x2a0 net/ipv4/ip_output.c:433
 dst_output include/net/dst.h:451 [inline]
 ip_local_out+0xaf/0x1a0 net/ipv4/ip_output.c:129
 iptunnel_xmit+0x5b4/0x9b0 net/ipv4/ip_tunnel_core.c:82
 ipip6_tunnel_xmit net/ipv6/sit.c:1034 [inline]
 sit_tunnel_xmit+0xed2/0x28f0 net/ipv6/sit.c:1076
 __netdev_start_xmit include/linux/netdevice.h:4940 [inline]
 netdev_start_xmit include/linux/netdevice.h:4954 [inline]
 xmit_one net/core/dev.c:3545 [inline]
 dev_hard_start_xmit+0x13d/0x6d0 net/core/dev.c:3561
 __dev_queue_xmit+0x7c1/0x3d60 net/core/dev.c:4346
 dev_queue_xmit include/linux/netdevice.h:3134 [inline]
 packet_xmit+0x257/0x380 net/packet/af_packet.c:276
 packet_snd net/packet/af_packet.c:3087 [inline]
 packet_sendmsg+0x24ca/0x5240 net/packet/af_packet.c:3119
 sock_sendmsg_nosec net/socket.c:730 [inline]
 __sock_sendmsg+0xd5/0x180 net/socket.c:745
 __sys_sendto+0x255/0x340 net/socket.c:2190
 __do_sys_sendto net/socket.c:2202 [inline]
 __se_sys_sendto net/socket.c:2198 [inline]
 __x64_sys_sendto+0xe0/0x1b0 net/socket.c:2198
 do_syscall_x64 arch/x86/entry/common.c:51 [inline]
 do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82
 entry_SYSCALL_64_after_hwframe+0x63/0x6b

Signed-off-by: Denis Arefev 
---
 include/linux/virtio_net.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index 4dfa9b69ca8d..77ebe908d746 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -56,6 +56,7 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
unsigned int thlen = 0;
unsigned int p_off = 0;
unsigned int ip_proto;
+   u64 ret, remainder;
 
if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
@@ -98,6 +99,15 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
u32 off = __virtio16_to_cpu(little_endian, hdr->csum_offset);
u32 needed = start + max_t(u32, thlen, off + sizeof(__sum16));
 
+   if (hdr->gso_size) {
+   ret = div64_u64_rem(skb->len, hdr->gso_size, 
&remainder);
+   if (!(ret && (hdr->gso_size > needed) &&
+   ((remainder > needed) || 
(remainder == 0 {
+   return -EINVAL;
+   }
+   skb_shinfo(skb)->tx_flags |= SKBFL_SHARED_FRAG;
+   }
+
if (!pskb_may_pull(skb, needed))
return -EINVAL;
 
-- 
2.25.

Re: [PATCH] net: missing check

2024-06-06 Thread Eric Dumazet

On Thu, Jun 6, 2024 at 4:14 PM Denis Arefev  wrote:
>
> Two missing check in virtio_net_hdr_to_skb() allowed syzbot
> to crash kernels again
>
> 1. After the skb_segment function the buffer may become non-linear
> (nr_frags != 0), but since the SKBTX_SHARED_FRAG flag is not set anywhere
> the __skb_linearize function will not be executed, then the buffer will
> remain non-linear. Then the condition (offset >= skb_headlen(skb))
> becomes true, which causes WARN_ON_ONCE in skb_checksum_help.
>
> 2. The struct sk_buff and struct virtio_net_hdr members must be
> mathematically related.
> (gso_size) must be greater than (needed) otherwise WARN_ON_ONCE.
> (remainder) must be greater than (needed) otherwise WARN_ON_ONCE.
> (remainder) may be 0 if division is without remainder.
>
> offset+2 (4191) > skb_headlen() (1116)
> WARNING: CPU: 1 PID: 5084 at net/core/dev.c:3303 
> skb_checksum_help+0x5e2/0x740 net/core/dev.c:3303
> Modules linked in:
> CPU: 1 PID: 5084 Comm: syz-executor336 Not tainted 
> 6.7.0-rc3-syzkaller-00014-gdf60cee26a2e #0
> Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 
> 11/10/2023
> RIP: 0010:skb_checksum_help+0x5e2/0x740 net/core/dev.c:3303
> Code: 89 e8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 52 01 00 00 44 89 e2 2b 
> 53 74 4c 89 ee 48 c7 c7 40 57 e9 8b e8 af 8f dd f8 90 <0f> 0b 90 90 e9 87 fe 
> ff ff e8 40 0f 6e f9 e9 4b fa ff ff 48 89 ef
> RSP: 0018:c90003a9f338 EFLAGS: 00010286
> RAX:  RBX: 888025125780 RCX: 814db209
> RDX: 888015393b80 RSI: 814db216 RDI: 0001
> RBP: 8880251257f4 R08: 0001 R09: 
> R10:  R11: 0001 R12: 045c
> R13: 105f R14: 8880251257f0 R15: 105d
> FS:  55c24380() GS:8880b990() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 2000f000 CR3: 23151000 CR4: 003506f0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  
>  ip_do_fragment+0xa1b/0x18b0 net/ipv4/ip_output.c:777
>  ip_fragment.constprop.0+0x161/0x230 net/ipv4/ip_output.c:584
>  ip_finish_output_gso net/ipv4/ip_output.c:286 [inline]
>  __ip_finish_output net/ipv4/ip_output.c:308 [inline]
>  __ip_finish_output+0x49c/0x650 net/ipv4/ip_output.c:295
>  ip_finish_output+0x31/0x310 net/ipv4/ip_output.c:323
>  NF_HOOK_COND include/linux/netfilter.h:303 [inline]
>  ip_output+0x13b/0x2a0 net/ipv4/ip_output.c:433
>  dst_output include/net/dst.h:451 [inline]
>  ip_local_out+0xaf/0x1a0 net/ipv4/ip_output.c:129
>  iptunnel_xmit+0x5b4/0x9b0 net/ipv4/ip_tunnel_core.c:82
>  ipip6_tunnel_xmit net/ipv6/sit.c:1034 [inline]
>  sit_tunnel_xmit+0xed2/0x28f0 net/ipv6/sit.c:1076
>  __netdev_start_xmit include/linux/netdevice.h:4940 [inline]
>  netdev_start_xmit include/linux/netdevice.h:4954 [inline]
>  xmit_one net/core/dev.c:3545 [inline]
>  dev_hard_start_xmit+0x13d/0x6d0 net/core/dev.c:3561
>  __dev_queue_xmit+0x7c1/0x3d60 net/core/dev.c:4346
>  dev_queue_xmit include/linux/netdevice.h:3134 [inline]
>  packet_xmit+0x257/0x380 net/packet/af_packet.c:276
>  packet_snd net/packet/af_packet.c:3087 [inline]
>  packet_sendmsg+0x24ca/0x5240 net/packet/af_packet.c:3119
>  sock_sendmsg_nosec net/socket.c:730 [inline]
>  __sock_sendmsg+0xd5/0x180 net/socket.c:745
>  __sys_sendto+0x255/0x340 net/socket.c:2190
>  __do_sys_sendto net/socket.c:2202 [inline]
>  __se_sys_sendto net/socket.c:2198 [inline]
>  __x64_sys_sendto+0xe0/0x1b0 net/socket.c:2198
>  do_syscall_x64 arch/x86/entry/common.c:51 [inline]
>  do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82
>  entry_SYSCALL_64_after_hwframe+0x63/0x6b
>
> Signed-off-by: Denis Arefev 
> ---
>  include/linux/virtio_net.h | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
> index 4dfa9b69ca8d..77ebe908d746 100644
> --- a/include/linux/virtio_net.h
> +++ b/include/linux/virtio_net.h
> @@ -56,6 +56,7 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
> unsigned int thlen = 0;
> unsigned int p_off = 0;
> unsigned int ip_proto;
> +   u64 ret, remainder;
>
> if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
> switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
> @@ -98,6 +99,15 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff 
> *skb,
> u32 off = __virtio16_to_cpu(little_endian, hdr->csum_offset);
> u32 needed = start + max_t(u32, thlen, off + sizeof(__sum16));
>
> +   if (hdr->gso_size) {
> +   ret = div64_u64_rem(skb->len, hdr->gso_size, 
> &remainder);
> +   if (!(ret && (hdr->gso_size > needed) &&
> +   ((remainder > needed) || 
> (remainder == 0 {
> +

[PATCH 0/5] Add MPSS remoteproc support for SDX75

2024-06-06 Thread Naina Mehta

Add modem support to SDX75 using the PAS remoteproc driver.
Also, add qlink_logging memory region and split MPSS DSM
region into 2 separate regions.

These patches were co-authored by Rohit Agarwal while at
Qualcomm.

Naina Mehta (5):
  dt-bindings: remoteproc: qcom,sm8550-pas: document the SDX75 PAS
  remoteproc: qcom: pas: Add SDX75 remoteproc support
  arm64: dts: qcom: sdx75: add missing qlink_logging reserved memory for
mpss
  arm64: dts: qcom: sdx75: Add remoteproc node
  arm64: dts: qcom: sdx75-idp: enable MPSS remoteproc node

 .../bindings/remoteproc/qcom,sm8550-pas.yaml  |  1 +
 arch/arm64/boot/dts/qcom/sdx75-idp.dts|  6 ++
 arch/arm64/boot/dts/qcom/sdx75.dtsi   | 64 +--
 drivers/remoteproc/qcom_q6v5_pas.c|  1 +
 4 files changed, 68 insertions(+), 4 deletions(-)

-- 
2.17.1

[PATCH 1/5] dt-bindings: remoteproc: qcom,sm8550-pas: document the SDX75 PAS

2024-06-06 Thread Naina Mehta

Document the MPSS Peripheral Authentication Service on SDX75 platform.

Signed-off-by: Naina Mehta 
---
 .../devicetree/bindings/remoteproc/qcom,sm8550-pas.yaml  | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/remoteproc/qcom,sm8550-pas.yaml 
b/Documentation/devicetree/bindings/remoteproc/qcom,sm8550-pas.yaml
index 73fda7565cd1..02e15b1f78ab 100644
--- a/Documentation/devicetree/bindings/remoteproc/qcom,sm8550-pas.yaml
+++ b/Documentation/devicetree/bindings/remoteproc/qcom,sm8550-pas.yaml
@@ -16,6 +16,7 @@ description:
 properties:
   compatible:
 enum:
+  - qcom,sdx75-mpss-pas
   - qcom,sm8550-adsp-pas
   - qcom,sm8550-cdsp-pas
   - qcom,sm8550-mpss-pas
-- 
2.17.1

[PATCH 2/5] remoteproc: qcom: pas: Add SDX75 remoteproc support

2024-06-06 Thread Naina Mehta

Add MPSS Peripheral Authentication Service support for SDX75 platform.

Signed-off-by: Naina Mehta 
---
 drivers/remoteproc/qcom_q6v5_pas.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/remoteproc/qcom_q6v5_pas.c 
b/drivers/remoteproc/qcom_q6v5_pas.c
index 8458bcfe9e19..833e2f9c2c5e 100644
--- a/drivers/remoteproc/qcom_q6v5_pas.c
+++ b/drivers/remoteproc/qcom_q6v5_pas.c
@@ -1343,6 +1343,7 @@ static const struct of_device_id adsp_of_match[] = {
{ .compatible = "qcom,sdm845-cdsp-pas", .data = 
&sdm845_cdsp_resource_init},
{ .compatible = "qcom,sdm845-slpi-pas", .data = 
&sdm845_slpi_resource_init},
{ .compatible = "qcom,sdx55-mpss-pas", .data = &sdx55_mpss_resource},
+   { .compatible = "qcom,sdx75-mpss-pas", .data = &sm8650_mpss_resource},
{ .compatible = "qcom,sm6115-adsp-pas", .data = &adsp_resource_init},
{ .compatible = "qcom,sm6115-cdsp-pas", .data = &cdsp_resource_init},
{ .compatible = "qcom,sm6115-mpss-pas", .data = &sc8180x_mpss_resource},
-- 
2.17.1

[PATCH 3/5] arm64: dts: qcom: sdx75: add missing qlink_logging reserved memory for mpss

2024-06-06 Thread Naina Mehta

The qlink_logging memory region is also used by the modem firmware,
add it to reserved memory regions.
Also split MPSS DSM region into 2 separate regions.

Signed-off-by: Naina Mehta 
---
 arch/arm64/boot/dts/qcom/sdx75.dtsi | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/boot/dts/qcom/sdx75.dtsi 
b/arch/arm64/boot/dts/qcom/sdx75.dtsi
index 9b93f6501d55..9349b1c4e196 100644
--- a/arch/arm64/boot/dts/qcom/sdx75.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdx75.dtsi
@@ -366,7 +366,12 @@
no-map;
};
 
-   qdss_mem: qdss@8880 {
+   qdss_mem: qdss@8850 {
+   reg = <0x0 0x8850 0x0 0x30>;
+   no-map;
+   };
+
+   qlink_logging_mem: qlink_logging@8880 {
reg = <0x0 0x8880 0x0 0x30>;
no-map;
};
@@ -377,18 +382,22 @@
no-map;
};
 
-   mpss_dsmharq_mem: mpss-dsmharq@88f0 {
-   reg = <0x0 0x88f0 0x0 0x508>;
+   mpss_dsm_mem_2: mpss-dsmharq-2@88f0 {
+   reg = <0x0 0x88f0 0x0 0x250>;
no-map;
};
 
+   mpss_dsm_mem: mpss-dsmharq@8b40 {
+   reg = <0x0 0x8b40 0x0 0x2b8>;
+   };
+
q6_mpss_dtb_mem: q6-mpss-dtb@8df8 {
reg = <0x0 0x8df8 0x0 0x8>;
no-map;
};
 
mpssadsp_mem: mpssadsp@8e00 {
-   reg = <0x0 0x8e00 0x0 0xf40>;
+   reg = <0x0 0x8e00 0x0 0xf10>;
no-map;
};
 
-- 
2.17.1

[PATCH 5/5] arm64: dts: qcom: sdx75-idp: enable MPSS remoteproc node

2024-06-06 Thread Naina Mehta

Enable MPSS remoteproc node on sdx75-idp platform.

Signed-off-by: Naina Mehta 
---
 arch/arm64/boot/dts/qcom/sdx75-idp.dts | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/sdx75-idp.dts 
b/arch/arm64/boot/dts/qcom/sdx75-idp.dts
index fde16308c7e2..f1bbe7ab01ab 100644
--- a/arch/arm64/boot/dts/qcom/sdx75-idp.dts
+++ b/arch/arm64/boot/dts/qcom/sdx75-idp.dts
@@ -282,6 +282,12 @@
status = "okay";
 };
 
+&remoteproc_mpss {
+   firmware-name = "qcom/sdx75/modem.mbn",
+   "qcom/sdx75/modem_dtb.mbn";
+   status = "okay";
+};
+
 &sdhc {
cd-gpios = <&tlmm 103 GPIO_ACTIVE_LOW>;
vmmc-supply = <®_2v95_vdd>;
-- 
2.17.1

[PATCH 4/5] arm64: dts: qcom: sdx75: Add remoteproc node

2024-06-06 Thread Naina Mehta

Add MPSS remoteproc node for SDX75 SoC.

Signed-off-by: Naina Mehta 
---
 arch/arm64/boot/dts/qcom/sdx75.dtsi | 47 +
 1 file changed, 47 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/sdx75.dtsi 
b/arch/arm64/boot/dts/qcom/sdx75.dtsi
index 9349b1c4e196..25d3f110abe1 100644
--- a/arch/arm64/boot/dts/qcom/sdx75.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdx75.dtsi
@@ -890,6 +890,53 @@
reg = <0x0 0x01fc 0x0 0x3>;
};
 
+   remoteproc_mpss: remoteproc@408 {
+   compatible = "qcom,sdx75-mpss-pas";
+   reg = <0 0x0408 0 0x4040>;
+
+   interrupts-extended = <&intc GIC_SPI 250 
IRQ_TYPE_EDGE_RISING>,
+ <&smp2p_modem_in 0 
IRQ_TYPE_EDGE_RISING>,
+ <&smp2p_modem_in 1 
IRQ_TYPE_EDGE_RISING>,
+ <&smp2p_modem_in 2 
IRQ_TYPE_EDGE_RISING>,
+ <&smp2p_modem_in 3 
IRQ_TYPE_EDGE_RISING>,
+ <&smp2p_modem_in 7 
IRQ_TYPE_EDGE_RISING>;
+   interrupt-names = "wdog",
+ "fatal",
+ "ready",
+ "handover",
+ "stop-ack",
+ "shutdown-ack";
+
+   clocks = <&rpmhcc RPMH_CXO_CLK>;
+   clock-names = "xo";
+
+   power-domains = <&rpmhpd RPMHPD_CX>,
+   <&rpmhpd RPMHPD_MSS>;
+   power-domain-names = "cx",
+"mss";
+
+   memory-region = <&mpssadsp_mem>, <&q6_mpss_dtb_mem>,
+   <&mpss_dsm_mem>, <&mpss_dsm_mem_2>,
+   <&qlink_logging_mem>;
+
+   qcom,qmp = <&aoss_qmp>;
+
+   qcom,smem-states = <&smp2p_modem_out 0>;
+   qcom,smem-state-names = "stop";
+
+   status = "disabled";
+
+   glink-edge {
+   interrupts-extended = <&ipcc IPCC_CLIENT_MPSS
+
IPCC_MPROC_SIGNAL_PING
+
IRQ_TYPE_EDGE_RISING>;
+   mboxes = <&ipcc IPCC_CLIENT_MPSS
+   IPCC_MPROC_SIGNAL_PING>;
+   label = "mpss";
+   qcom,remote-pid = <1>;
+   };
+   };
+
sdhc: mmc@8804000 {
compatible = "qcom,sdx75-sdhci", "qcom,sdhci-msm-v5";
reg = <0x0 0x08804000 0x0 0x1000>;
-- 
2.17.1

Re: [PATCH 1/5] dt-bindings: remoteproc: qcom,sm8550-pas: document the SDX75 PAS

2024-06-06 Thread Krzysztof Kozlowski

On 06/06/2024 16:38, Naina Mehta wrote:
> Document the MPSS Peripheral Authentication Service on SDX75 platform.
> 
> Signed-off-by: Naina Mehta 
> ---
>  .../devicetree/bindings/remoteproc/qcom,sm8550-pas.yaml  | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git 
> a/Documentation/devicetree/bindings/remoteproc/qcom,sm8550-pas.yaml 
> b/Documentation/devicetree/bindings/remoteproc/qcom,sm8550-pas.yaml
> index 73fda7565cd1..02e15b1f78ab 100644
> --- a/Documentation/devicetree/bindings/remoteproc/qcom,sm8550-pas.yaml
> +++ b/Documentation/devicetree/bindings/remoteproc/qcom,sm8550-pas.yaml
> @@ -16,6 +16,7 @@ description:
>  properties:
>compatible:
>  enum:
> +  - qcom,sdx75-mpss-pas
>- qcom,sm8550-adsp-pas
>- qcom,sm8550-cdsp-pas
>- qcom,sm8550-mpss-pas

Missing updates to allOf constraints. Are you sure this is the binding
for SDX75?

Best regards,
Krzysztof

Re: [PATCH v14 02/14] cgroup/misc: Add per resource callbacks for CSS events

2024-06-06 Thread Haitao Huang

On Thu, 06 Jun 2024 08:37:31 -0500, chenridong   
wrote:




   
If _misc_cg_res_alloc fails, maybe some types do not call ->alloc(), but  
all types ->free() callback >will be called, is that ok?


Not sure I understand. Are you suggesting we ignore failures from  
->alloc() callback in _misc_cg_res_alloc() as it is per-resource, and have  
->free() callback and resource provider of the failing type to handle the  
failure internally?


IIUC, this failure only happens when a specific subcgroup is created  
(memory running out for allocation) so failing that subcgroup as a whole  
seems fine to me. Note the root node is static and no pre-resource  
callbacks invoked by misc. And resource provider handles its own  
allocations for root. In SGX case we too declare a static object for  
corresponding root sgx_cgroup struct.


Note also misc cgroup (except for setting capacity[res] = 0 at root) is  
all or nothing so no mechanism to tell user "this resource does not work  
but others are fine in this particular cgroup."


Thanks
Haitao

Re: [PATCH 3/5] arm64: dts: qcom: sdx75: add missing qlink_logging reserved memory for mpss

2024-06-06 Thread Krzysztof Kozlowski

On 06/06/2024 16:38, Naina Mehta wrote:
> The qlink_logging memory region is also used by the modem firmware,
> add it to reserved memory regions.
> Also split MPSS DSM region into 2 separate regions.
> 
> Signed-off-by: Naina Mehta 
> ---
>  arch/arm64/boot/dts/qcom/sdx75.dtsi | 17 +
>  1 file changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/boot/dts/qcom/sdx75.dtsi 
> b/arch/arm64/boot/dts/qcom/sdx75.dtsi
> index 9b93f6501d55..9349b1c4e196 100644
> --- a/arch/arm64/boot/dts/qcom/sdx75.dtsi
> +++ b/arch/arm64/boot/dts/qcom/sdx75.dtsi
> @@ -366,7 +366,12 @@
>   no-map;
>   };
>  
> - qdss_mem: qdss@8880 {
> + qdss_mem: qdss@8850 {
> + reg = <0x0 0x8850 0x0 0x30>;
> + no-map;
> + };
> +
> + qlink_logging_mem: qlink_logging@8880 {

Sorry, no downstream code.

Please follow DTS coding style - no underscores in node names. This
applies to all work sent upstream.



Best regards,
Krzysztof

[PATCH v2 1/1] dt-bindings: remoteproc: imx_rproc: add minItems for power-domain

2024-06-06 Thread Frank Li

"fsl,imx8qxp-cm4" and "fsl,imx8qm-cm4" need minimum 2 power domains. Keep
the same restriction for other compatible string.

Signed-off-by: Frank Li 
---

Notes:
Change from v1 to v2
- set minitem to 2 at top
- Add imx8qm compatible string also
- use not logic to handle difference compatible string restriction
- update commit message.

pass dt_binding_check.

make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- -j8  dt_binding_check 
DT_SCHEMA_FILES=fsl,imx-rproc.yaml
  SCHEMA  Documentation/devicetree/bindings/processed-schema.json
  CHKDT   Documentation/devicetree/bindings
  LINTDocumentation/devicetree/bindings
  DTEX
Documentation/devicetree/bindings/remoteproc/fsl,imx-rproc.example.dts
  DTC_CHK 
Documentation/devicetree/bindings/remoteproc/fsl,imx-rproc.example.dtb

 .../bindings/remoteproc/fsl,imx-rproc.yaml | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/Documentation/devicetree/bindings/remoteproc/fsl,imx-rproc.yaml 
b/Documentation/devicetree/bindings/remoteproc/fsl,imx-rproc.yaml
index df36e29d974ca..da108a39df435 100644
--- a/Documentation/devicetree/bindings/remoteproc/fsl,imx-rproc.yaml
+++ b/Documentation/devicetree/bindings/remoteproc/fsl,imx-rproc.yaml
@@ -59,6 +59,7 @@ properties:
 maxItems: 32
 
   power-domains:
+minItems: 2
 maxItems: 8
 
   fsl,auto-boot:
@@ -99,6 +100,19 @@ allOf:
   properties:
 fsl,iomuxc-gpr: false
 
+  - if:
+  properties:
+compatible:
+  not:
+contains:
+  enum:
+- fsl,imx8qxp-cm4
+- fsl,imx8qm-cm4
+then:
+  properties:
+power-domains:
+  minItems: 8
+
 additionalProperties: false
 
 examples:
-- 
2.34.1

Re: [PATCH] livepatch: introduce klp_func called interface

2024-06-06 Thread Joe Lawrence

Hi Wardenjohn,

To follow up, Red Hat kpatch QE pointed me toward this test:

https://gitlab.com/redhat/centos-stream/tests/kernel/kernel-tests/-/tree/main/general/kpatch/kpatch-trace?ref_type=heads

which reports a few interesting things via systemd service and ftrace:

- Patched functions
- Traced functions
- Code that was modified, but didn't run
- Other code that ftrace saw, but is not listed in the sysfs directory

The code looks to be built on top of the same basic ftrace commands that
I sent the other day.  You may be able to reuse/copy/etc bits from the
scripts if they are handy.

--
Joe

[PATCH 0/3] Tracepoints and static branch/call in Rust

2024-06-06 Thread Alice Ryhl

An important part of a production ready Linux kernel driver is
tracepoints. So to write production ready Linux kernel drivers in Rust,
we must be able to call tracepoints from Rust code. This patch series
adds support for calling tracepoints declared in C from Rust.

To use the tracepoint support, you must:

1. Declare the tracepoint in a C header file as usual.
2. Make sure that the header file is visible to bindgen so that Rust
   bindings are generated for the symbols that the tracepoint macro
   emits.
3. Use the declare_trace! macro in your Rust code to generate Rust
   functions that call into the tracepoint.

For example, the kernel has a tracepoint called `sched_kthread_stop`. It
is declared like this:

TRACE_EVENT(sched_kthread_stop,
TP_PROTO(struct task_struct *t),
TP_ARGS(t),
TP_STRUCT__entry(
__array(char,   comm,   TASK_COMM_LEN   )
__field(pid_t,  pid )
),
TP_fast_assign(
memcpy(__entry->comm, t->comm, TASK_COMM_LEN);
__entry->pid= t->pid;
),
TP_printk("comm=%s pid=%d", __entry->comm, __entry->pid)
);

To call the above tracepoint from Rust code, you would add the relevant
header file to rust/bindings/bindings_helper.h and add the following
invocation somewhere in your Rust code:

declare_trace! {
fn sched_kthread_stop(task: *mut task_struct);
}

This will define a Rust function of the given name that you can call
like any other Rust function. Since these tracepoints often take raw
pointers as arguments, it may be convenient to wrap it in a safe
wrapper:

mod raw {
declare_trace! {
fn sched_kthread_stop(task: *mut task_struct);
}
}

#[inline]
pub fn trace_sched_kthread_stop(task: &Task) {
// SAFETY: The pointer to `task` is valid.
unsafe { raw::sched_kthread_stop(task.as_raw()) }
}

A future expansion of the tracepoint support could generate these safe
versions automatically, but that is left as future work for now.

This is intended for use in the Rust Binder driver, which was originally
sent as an RFC [1]. The RFC did not include tracepoint support, but you
can see how it will be used in Rust Binder at [2]. The author has
verified that the tracepoint support works on Android devices.

This implementation implements support for static keys in Rust so that
the actual static branch will end up in the Rust object file. However,
it would also be possible to just wrap the trace_##name generated by
__DECLARE_TRACE in an extern C function and then call that from Rust.
This will simplify the Rust code by removing the need for static
branches and calls, but it places the static branch behind an external
call, which has performance implications.

A possible middle ground would be to place just the __DO_TRACE body in
an extern C function and to implement the Rust wrapper by doing the
static branch in Rust, and then calling into C the code that contains
__DO_TRACE when the tracepoint is active. However, this would need some
changes to include/linux/tracepoint.h to generate and export a function
containing the body of __DO_TRACE when the tracepoint should be callable
from Rust.

So in general, there is a tradeoff between placing parts of the
tracepoint (which is perf sensitive) behind an external call, and having
code duplicated in both C and Rust (which must be kept in sync when
changes are made). This is an important point that I would like feedback
on from the C maintainers.

Link: 
https://lore.kernel.org/rust-for-linux/20231101-rust-binder-v1-0-08ba9197f...@google.com/
 [1]
Link: https://r.android.com/3110088 [2]
Signed-off-by: Alice Ryhl 
---
Alice Ryhl (3):
  rust: add static_call support
  rust: add static_key_false
  rust: add tracepoint support

 rust/bindings/bindings_helper.h |  1 +
 rust/bindings/lib.rs| 15 +++
 rust/helpers.c  | 24 +++
 rust/kernel/lib.rs  |  3 ++
 rust/kernel/static_call.rs  | 92 +
 rust/kernel/static_key.rs   | 87 ++
 rust/kernel/tracepoint.rs   | 92 +
 scripts/Makefile.build  |  2 +-
 8 files changed, 315 insertions(+), 1 deletion(-)
---
base-commit: 1613e604df0cd359cf2a7fbd9be7a0bcfacfabd0
change-id: 20240606-tracepoint-31e15b90e471

Best regards,
-- 
Alice Ryhl

[PATCH 3/3] rust: add tracepoint support

2024-06-06 Thread Alice Ryhl

Make it possible to have Rust code call into tracepoints defined by C
code. It is still required that the tracepoint is declared in a C
header, and that this header is included in the input to bindgen.

Signed-off-by: Alice Ryhl 
---
 rust/bindings/bindings_helper.h |  1 +
 rust/bindings/lib.rs| 15 +++
 rust/helpers.c  | 24 +++
 rust/kernel/lib.rs  |  1 +
 rust/kernel/tracepoint.rs   | 92 +
 5 files changed, 133 insertions(+)

diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index ddb5644d4fd9..d442f9ccfc2c 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
diff --git a/rust/bindings/lib.rs b/rust/bindings/lib.rs
index 40ddaee50d8b..48856761d682 100644
--- a/rust/bindings/lib.rs
+++ b/rust/bindings/lib.rs
@@ -48,3 +48,18 @@ mod bindings_helper {
 }
 
 pub use bindings_raw::*;
+
+/// Rust version of the C macro `rcu_dereference_raw`.
+///
+/// The rust helper only works with void pointers, but this wrapper method 
makes it work with any
+/// pointer type using pointer casts.
+///
+/// # Safety
+///
+/// This method has the same safety requirements as the C macro of the same 
name.
+#[inline(always)]
+pub unsafe fn rcu_dereference_raw(p: *const *mut T) -> *mut T {
+// SAFETY: This helper calls into the C macro, so the caller promises to 
uphold the safety
+// requirements.
+unsafe { __rcu_dereference_raw(p as *mut *mut _) as *mut T }
+}
diff --git a/rust/helpers.c b/rust/helpers.c
index 2c37a0f5d7a8..0560cc2a512a 100644
--- a/rust/helpers.c
+++ b/rust/helpers.c
@@ -165,6 +165,30 @@ rust_helper_krealloc(const void *objp, size_t new_size, 
gfp_t flags)
 }
 EXPORT_SYMBOL_GPL(rust_helper_krealloc);
 
+void rust_helper_preempt_enable_notrace(void)
+{
+   preempt_enable_notrace();
+}
+EXPORT_SYMBOL_GPL(rust_helper_preempt_enable_notrace);
+
+void rust_helper_preempt_disable_notrace(void)
+{
+   preempt_disable_notrace();
+}
+EXPORT_SYMBOL_GPL(rust_helper_preempt_disable_notrace);
+
+bool rust_helper_current_cpu_online(void)
+{
+   return cpu_online(raw_smp_processor_id());
+}
+EXPORT_SYMBOL_GPL(rust_helper_current_cpu_online);
+
+void *rust_helper___rcu_dereference_raw(void **p)
+{
+   return rcu_dereference_raw(p);
+}
+EXPORT_SYMBOL_GPL(rust_helper___rcu_dereference_raw);
+
 /*
  * `bindgen` binds the C `size_t` type as the Rust `usize` type, so we can
  * use it in contexts where Rust expects a `usize` like slice (array) indices.
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index 22e1fedd0774..3f3b280bb437 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -46,6 +46,7 @@
 pub mod sync;
 pub mod task;
 pub mod time;
+pub mod tracepoint;
 pub mod types;
 pub mod workqueue;
 
diff --git a/rust/kernel/tracepoint.rs b/rust/kernel/tracepoint.rs
new file mode 100644
index ..d628ae71fc58
--- /dev/null
+++ b/rust/kernel/tracepoint.rs
@@ -0,0 +1,92 @@
+// SPDX-License-Identifier: GPL-2.0
+
+// Copyright (C) 2024 Google LLC.
+
+//! Logic for tracepoints.
+
+/// Declare the Rust entry point for a tracepoint.
+#[macro_export]
+macro_rules! declare_trace {
+($($(#[$attr:meta])* $pub:vis fn $name:ident($($argname:ident : 
$argtyp:ty),* $(,)?);)*) => {$(
+$( #[$attr] )*
+#[inline(always)]
+$pub unsafe fn $name($($argname : $argtyp),*) {
+#[cfg(CONFIG_TRACEPOINTS)]
+{
+use $crate::bindings::*;
+
+// SAFETY: This macro only compiles if $name is a real 
tracepoint, and if it is a
+// real tracepoint, then it is okay to query the static key.
+let should_trace = unsafe {
+$crate::macros::paste! {
+$crate::static_key::static_key_false!(
+[< __tracepoint_ $name >],
+$crate::bindings::tracepoint,
+key
+)
+}
+};
+
+if should_trace {
+$crate::tracepoint::do_trace!($name($($argname : 
$argtyp),*), cond);
+}
+}
+
+#[cfg(not(CONFIG_TRACEPOINTS))]
+{
+// If tracepoints are disabled, insert a trivial use of each 
argument
+// to avoid unused argument warnings.
+$( let _unused = $argname; )*
+}
+}
+)*}
+}
+
+#[doc(hidden)]
+#[macro_export]
+macro_rules! do_trace {
+($name:ident($($argname:ident : $argtyp:ty),* $(,)?), $cond:expr) => {{
+if !$crate::bindings::current_cpu_online() {
+return;
+}
+
+// SAFETY: This call is balanced with the call below.
+unsafe { $crate::bindings::preempt_disable_notrace() };
+
+// SAFETY: This calls the trace

Re: [PATCH 3/3] rust: add tracepoint support

2024-06-06 Thread Mathieu Desnoyers


On 2024-06-06 11:05, Alice Ryhl wrote:

Make it possible to have Rust code call into tracepoints defined by C
code. It is still required that the tracepoint is declared in a C
header, and that this header is included in the input to bindgen.

Signed-off-by: Alice Ryhl 


[...]


diff --git a/rust/helpers.c b/rust/helpers.c
index 2c37a0f5d7a8..0560cc2a512a 100644
--- a/rust/helpers.c
+++ b/rust/helpers.c
@@ -165,6 +165,30 @@ rust_helper_krealloc(const void *objp, size_t new_size, 
gfp_t flags)
  }
  EXPORT_SYMBOL_GPL(rust_helper_krealloc);
  
+void rust_helper_preempt_enable_notrace(void)

+{
+   preempt_enable_notrace();
+}
+EXPORT_SYMBOL_GPL(rust_helper_preempt_enable_notrace);
+
+void rust_helper_preempt_disable_notrace(void)
+{
+   preempt_disable_notrace();
+}
+EXPORT_SYMBOL_GPL(rust_helper_preempt_disable_notrace);
+
+bool rust_helper_current_cpu_online(void)
+{
+   return cpu_online(raw_smp_processor_id());
+}
+EXPORT_SYMBOL_GPL(rust_helper_current_cpu_online);
+
+void *rust_helper___rcu_dereference_raw(void **p)
+{
+   return rcu_dereference_raw(p);
+}
+EXPORT_SYMBOL_GPL(rust_helper___rcu_dereference_raw);


Ouch. Doing a function call for each of those small operations will
have a rather large performance impact when tracing is active. If it is
not possible to inline those in Rust, then implementing __DO_TRACE in
a C function would at least allow Rust to only do a single call to C
rather than go back and forth between Rust and C.

What prevents inlining those helpers in Rust ?

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

[PATCH 1/3] rust: add static_call support

2024-06-06 Thread Alice Ryhl

Add static_call support by mirroring how C does. When the platform does
not support static calls (right now, that means that it is not x86),
then the function pointer is loaded from a global and called. Otherwise,
we generate a call to a trampoline function, and objtool is used to make
these calls patchable at runtime.

Signed-off-by: Alice Ryhl 
---
 rust/kernel/lib.rs |  1 +
 rust/kernel/static_call.rs | 92 ++
 2 files changed, 93 insertions(+)

diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index fbd91a48ff8b..d534b1178955 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -38,6 +38,7 @@
 pub mod prelude;
 pub mod print;
 mod static_assert;
+pub mod static_call;
 #[doc(hidden)]
 pub mod std_vendor;
 pub mod str;
diff --git a/rust/kernel/static_call.rs b/rust/kernel/static_call.rs
new file mode 100644
index ..f7b8ba7bf1fb
--- /dev/null
+++ b/rust/kernel/static_call.rs
@@ -0,0 +1,92 @@
+// SPDX-License-Identifier: GPL-2.0
+
+// Copyright (C) 2024 Google LLC.
+
+//! Logic for static calls.
+
+#[macro_export]
+#[doc(hidden)]
+macro_rules! ty_underscore_for {
+($arg:expr) => {
+_
+};
+}
+
+#[doc(hidden)]
+#[repr(transparent)]
+pub struct AddressableStaticCallKey {
+_ptr: *const bindings::static_call_key,
+}
+unsafe impl Sync for AddressableStaticCallKey {}
+impl AddressableStaticCallKey {
+pub const fn new(ptr: *const bindings::static_call_key) -> Self {
+Self { _ptr: ptr }
+}
+}
+
+#[cfg(CONFIG_HAVE_STATIC_CALL)]
+#[doc(hidden)]
+#[macro_export]
+macro_rules! _static_call {
+($name:ident($($args:expr),* $(,)?)) => {{
+// Symbol mangling will give this symbol a unique name.
+#[cfg(CONFIG_HAVE_STATIC_CALL_INLINE)]
+#[link_section = ".discard.addressable"]
+#[used]
+static __ADDRESSABLE: $crate::static_call::AddressableStaticCallKey = 
unsafe {
+
$crate::static_call::AddressableStaticCallKey::new(::core::ptr::addr_of!(
+$crate::macros::paste! { $crate::bindings:: [<__SCK__ $name 
>]; }
+))
+};
+
+let fn_ptr: unsafe extern "C" 
fn($($crate::static_call::ty_underscore_for!($args)),*) -> _ =
+$crate::macros::paste! { $crate::bindings:: [<__SCT__ $name >]; };
+(fn_ptr)($($args),*)
+}};
+}
+
+#[cfg(not(CONFIG_HAVE_STATIC_CALL))]
+#[doc(hidden)]
+#[macro_export]
+macro_rules! _static_call {
+($name:ident($($args:expr),* $(,)?)) => {{
+let void_ptr_fn: *mut ::core::ffi::c_void =
+$crate::macros::paste! { $crate::bindings:: [<__SCK__ $name >]; 
}.func;
+
+let fn_ptr: unsafe extern "C" 
fn($($crate::static_call::ty_underscore_for!($args)),*) -> _ =
+if true {
+::core::mem::transmute(void_ptr_fn)
+} else {
+// This is dead code, but it influences type inference on 
`fn_ptr` so that we
+// transmute the function pointer to the right type.
+$crate::macros::paste! { $crate::bindings:: [<__SCT__ $name 
>]; }
+};
+
+(fn_ptr)($($args),*)
+}};
+}
+
+/// Statically call a global function.
+///
+/// # Safety
+///
+/// This macro will call the provided function. It is up to the caller to 
uphold the safety
+/// guarantees of the function.
+///
+/// # Examples
+///
+/// ```ignore
+/// fn call_static() {
+/// unsafe {
+/// static_call! { your_static_call() };
+/// }
+/// }
+/// ```
+#[macro_export]
+macro_rules! static_call {
+// Forward to the real implementation. Separated like this so that we 
don't have to duplicate
+// the documentation.
+($($args:tt)*) => { $crate::static_call::_static_call! { $($args)* } };
+}
+
+pub use {_static_call, static_call, ty_underscore_for};

-- 
2.45.2.505.gda0bf45e8d-goog

[PATCH 2/3] rust: add static_key_false

2024-06-06 Thread Alice Ryhl

Add just enough support for static key so that we can use it from
tracepoints. Tracepoints rely on `static_key_false` even though it is
deprecated, so we add the same functionality to Rust.

Signed-off-by: Alice Ryhl 
---
 rust/kernel/lib.rs|  1 +
 rust/kernel/static_key.rs | 87 +++
 scripts/Makefile.build|  2 +-
 3 files changed, 89 insertions(+), 1 deletion(-)

diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index d534b1178955..22e1fedd0774 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -39,6 +39,7 @@
 pub mod print;
 mod static_assert;
 pub mod static_call;
+pub mod static_key;
 #[doc(hidden)]
 pub mod std_vendor;
 pub mod str;
diff --git a/rust/kernel/static_key.rs b/rust/kernel/static_key.rs
new file mode 100644
index ..6c3dbe14c98a
--- /dev/null
+++ b/rust/kernel/static_key.rs
@@ -0,0 +1,87 @@
+// SPDX-License-Identifier: GPL-2.0
+
+// Copyright (C) 2024 Google LLC.
+
+//! Logic for static keys.
+
+use crate::bindings::*;
+
+#[doc(hidden)]
+#[macro_export]
+#[cfg(target_arch = "x86_64")]
+macro_rules! _static_key_false {
+($key:path, $keytyp:ty, $field:ident) => {'my_label: {
+core::arch::asm!(
+r#"
+1: .byte 0x0f,0x1f,0x44,0x00,0x00
+
+.pushsection __jump_table,  "aw"
+.balign 8
+.long 1b - .
+.long {0} - .
+.quad {1} + {2} - .
+.popsection
+"#,
+label {
+break 'my_label true;
+},
+sym $key,
+const ::core::mem::offset_of!($keytyp, $field),
+);
+
+break 'my_label false;
+}};
+}
+
+#[doc(hidden)]
+#[macro_export]
+#[cfg(target_arch = "aarch64")]
+macro_rules! _static_key_false {
+($key:path, $keytyp:ty, $field:ident) => {'my_label: {
+core::arch::asm!(
+r#"
+1: nop
+
+.pushsection __jump_table,  "aw"
+.align 3
+.long 1b - ., {0} - .
+.quad {1} + {2} - .
+.popsection
+"#,
+label {
+break 'my_label true;
+},
+sym $key,
+const ::core::mem::offset_of!($keytyp, $field),
+);
+
+break 'my_label false;
+}};
+}
+
+/// Branch based on a static key.
+///
+/// Takes three arguments:
+///
+/// * `key` - the path to the static variable containing the `static_key`.
+/// * `keytyp` - the type of `key`.
+/// * `field` - the name of the field of `key` that contains the `static_key`.
+#[macro_export]
+macro_rules! static_key_false {
+// Forward to the real implementation. Separated like this so that we 
don't have to duplicate
+// the documentation.
+($key:path, $keytyp:ty, $field:ident) => {{
+// Assert that `$key` has type `$keytyp` and that `$key.$field` has 
type `static_key`.
+//
+// SAFETY: We know that `$key` is a static because otherwise the 
inline assembly will not
+// compile. The raw pointers created in this block are in-bounds of 
`$key`.
+static _TY_ASSERT: () = unsafe {
+let key: *const $keytyp = ::core::ptr::addr_of!($key);
+let _: *const $crate::bindings::static_key = 
::core::ptr::addr_of!((*key).$field);
+};
+
+$crate::static_key::_static_key_false! { $key, $keytyp, $field }
+}};
+}
+
+pub use {_static_key_false, static_key_false};
diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index efacca63c897..60197c1c063f 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -263,7 +263,7 @@ $(obj)/%.lst: $(obj)/%.c FORCE
 # Compile Rust sources (.rs)
 # ---
 
-rust_allowed_features := new_uninit
+rust_allowed_features := asm_const,asm_goto,new_uninit
 
 # `--out-dir` is required to avoid temporaries being created by `rustc` in the
 # current working directory, which may be not accessible in the out-of-tree

-- 
2.45.2.505.gda0bf45e8d-goog

Re: [PATCH 2/3] rust: add static_key_false

2024-06-06 Thread Mathieu Desnoyers


On 2024-06-06 11:05, Alice Ryhl wrote:

Add just enough support for static key so that we can use it from
tracepoints. Tracepoints rely on `static_key_false` even though it is
deprecated, so we add the same functionality to Rust.

Signed-off-by: Alice Ryhl 
---
  rust/kernel/lib.rs|  1 +
  rust/kernel/static_key.rs | 87 +++
  scripts/Makefile.build|  2 +-
  3 files changed, 89 insertions(+), 1 deletion(-)

diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index d534b1178955..22e1fedd0774 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -39,6 +39,7 @@
  pub mod print;
  mod static_assert;
  pub mod static_call;
+pub mod static_key;
  #[doc(hidden)]
  pub mod std_vendor;
  pub mod str;
diff --git a/rust/kernel/static_key.rs b/rust/kernel/static_key.rs
new file mode 100644
index ..6c3dbe14c98a
--- /dev/null
+++ b/rust/kernel/static_key.rs
@@ -0,0 +1,87 @@
+// SPDX-License-Identifier: GPL-2.0


This static key code is something that can be generally useful
both in kernel and user-space. Is there anything that would prevent
licensing this under MIT right away instead so it could eventually be
re-used more broadly in userspace as well ?

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

Re: [RFC v3 net-next 1/7] net: add rx_sk to trace_kfree_skb

2024-06-06 Thread Yan Zhai

On Wed, Jun 5, 2024 at 6:57 PM Steven Rostedt  wrote:
>
> On Tue, 4 Jun 2024 14:47:38 -0700
> Yan Zhai  wrote:
>
> > skb does not include enough information to find out receiving
> > sockets/services and netns/containers on packet drops. In theory
> > skb->dev tells about netns, but it can get cleared/reused, e.g. by TCP
> > stack for OOO packet lookup. Similarly, skb->sk often identifies a local
> > sender, and tells nothing about a receiver.
> >
> > Allow passing an extra receiving socket to the tracepoint to improve
> > the visibility on receiving drops.
> >
> > Signed-off-by: Yan Zhai 
> > ---
> > v2->v3: fixed drop_monitor function prototype
> > ---
> >  include/trace/events/skb.h | 11 +++
> >  net/core/dev.c |  2 +-
> >  net/core/drop_monitor.c|  9 ++---
> >  net/core/skbuff.c  |  2 +-
> >  4 files changed, 15 insertions(+), 9 deletions(-)
> >
> > diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
> > index 07e0715628ec..aa6b46b6172c 100644
> > --- a/include/trace/events/skb.h
> > +++ b/include/trace/events/skb.h
> > @@ -24,15 +24,16 @@ DEFINE_DROP_REASON(FN, FN)
> >  TRACE_EVENT(kfree_skb,
> >
> >   TP_PROTO(struct sk_buff *skb, void *location,
> > -  enum skb_drop_reason reason),
> > +  enum skb_drop_reason reason, struct sock *rx_sk),
> >
> > - TP_ARGS(skb, location, reason),
> > + TP_ARGS(skb, location, reason, rx_sk),
> >
> >   TP_STRUCT__entry(
> >   __field(void *, skbaddr)
> >   __field(void *, location)
> >   __field(unsigned short, protocol)
> >   __field(enum skb_drop_reason,   reason)
> > + __field(void *, rx_skaddr)
>
> Please add the pointer after the other pointers:
>
> __field(void *, skbaddr)
> __field(void *, location)
> +   __field(void *, rx_skaddr)
> __field(unsigned short, protocol)
> __field(enum skb_drop_reason,   reason)
>
> otherwise you are adding holes in the ring buffer event.
>
> The TP_STRUCT__entry() is a structure that is saved in the ring buffer. We
> want to avoid alignment holes. I also question having a short before the
> enum, if the emum is 4 bytes. The short should be at the end.
>
> In fact, looking at the format file, there is a 2 byte hole:
>
>  # cat /sys/kernel/tracing/events/skb/kfree_skb/format
>
> name: kfree_skb
> ID: 1799
> format:
> field:unsigned short common_type;   offset:0;   size:2; 
> signed:0;
> field:unsigned char common_flags;   offset:2;   size:1; 
> signed:0;
> field:unsigned char common_preempt_count;   offset:3;   
> size:1; signed:0;
> field:int common_pid;   offset:4;   size:4; signed:1;
>
> field:void * skbaddr;   offset:8;   size:8; signed:0;
> field:void * location;  offset:16;  size:8; signed:0;
> field:unsigned short protocol;  offset:24;  size:2; signed:0;
> field:enum skb_drop_reason reason;  offset:28;  size:4; 
> signed:0;
>
> Notice that "protocol" is 2 bytes in size at offset 24, but "reason" starts
> at offset 28. This means at offset 26, there's a 2 byte hole.
>
The reason I added the pointer as the last argument is trying to
minimize the surprise to existing TP users, because for common ABIs
it's fine to omit later arguments when defining a function, but it
needs change and recompilation if the order of arguments changed.

Looking at the actual format after the change, it does not add a new
hole since protocol and reason are already packed into the same 8-byte
block, so rx_skaddr starts at 8-byte aligned offset:

# cat /sys/kernel/debug/tracing/events/skb/kfree_skb/format
name: kfree_skb
ID: 2260
format:
field:unsigned short common_type;   offset:0;
size:2; signed:0;
field:unsigned char common_flags;   offset:2;
size:1; signed:0;
field:unsigned char common_preempt_count;   offset:3;
 size:1; signed:0;
field:int common_pid;   offset:4;   size:4; signed:1;

field:void * skbaddr;   offset:8;   size:8; signed:0;
field:void * location;  offset:16;  size:8; signed:0;
field:unsigned short protocol;  offset:24;  size:2; signed:0;
field:enum skb_drop_reason reason;  offset:28;
size:4; signed:0;
field:void * rx_skaddr; offset:32;  size:8; signed:0;

Do you think we still need to change the order?

Yan


> -- Steve
>
>
>
> >   ),
> >
> >   TP_fast_assign(
> > @@ -40,12 +41,14 @@ TRACE_EVENT(kfree_skb,
> >   __entry->location = location;
> >   __entry->protocol = ntohs(skb->protocol);
> >   __entry->reason = reason;
> > + __entry->rx_skaddr = rx_sk;
> >   ),
> >
>

Re: [PATCH 0/3] Tracepoints and static branch/call in Rust

2024-06-06 Thread Mathieu Desnoyers

lication of binary code (instructions).

Thanks,

Mathieu




Link: 
https://lore.kernel.org/rust-for-linux/20231101-rust-binder-v1-0-08ba9197f...@google.com/
 [1]
Link: https://r.android.com/3110088 [2]
Signed-off-by: Alice Ryhl 
---
Alice Ryhl (3):
   rust: add static_call support
   rust: add static_key_false
   rust: add tracepoint support

  rust/bindings/bindings_helper.h |  1 +
  rust/bindings/lib.rs| 15 +++
  rust/helpers.c  | 24 +++
  rust/kernel/lib.rs  |  3 ++
  rust/kernel/static_call.rs  | 92 +
  rust/kernel/static_key.rs   | 87 ++
  rust/kernel/tracepoint.rs   | 92 +
  scripts/Makefile.build  |  2 +-
  8 files changed, 315 insertions(+), 1 deletion(-)
---
base-commit: 1613e604df0cd359cf2a7fbd9be7a0bcfacfabd0
change-id: 20240606-tracepoint-31e15b90e471

Best regards,


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

Re: [PATCH 0/3] Tracepoints and static branch/call in Rust

2024-06-06 Thread Alice Ryhl

On Thu, Jun 6, 2024 at 5:25 PM Mathieu Desnoyers
 wrote:
>
> On 2024-06-06 11:05, Alice Ryhl wrote:
> > This implementation implements support for static keys in Rust so that
> > the actual static branch will end up in the Rust object file. However,
> > it would also be possible to just wrap the trace_##name generated by
> > __DECLARE_TRACE in an extern C function and then call that from Rust.
> > This will simplify the Rust code by removing the need for static
> > branches and calls, but it places the static branch behind an external
> > call, which has performance implications.
>
> The tracepoints try very hard to minimize overhead of dormant tracepoints
> so it is not frowned-upon to have them built into production binaries.
> This is needed to make sure distribution vendors keep those tracepoints
> in the kernel binaries that reach end-users.
>
> Adding a function call before evaluation of the static branch goes against
> this major goal.
>
> >
> > A possible middle ground would be to place just the __DO_TRACE body in
> > an extern C function and to implement the Rust wrapper by doing the
> > static branch in Rust, and then calling into C the code that contains
> > __DO_TRACE when the tracepoint is active. However, this would need some
> > changes to include/linux/tracepoint.h to generate and export a function
> > containing the body of __DO_TRACE when the tracepoint should be callable
> > from Rust.
>
> This tradeoff is more acceptable than having a function call before
> evaluation of the static branch, but I wonder what is the upside of
> this tradeoff compared to inlining the whole __DO_TRACE in Rust ?
>
> > So in general, there is a tradeoff between placing parts of the
> > tracepoint (which is perf sensitive) behind an external call, and having
> > code duplicated in both C and Rust (which must be kept in sync when
> > changes are made). This is an important point that I would like feedback
> > on from the C maintainers.
>
> I don't see how the duplication happens there: __DO_TRACE is meant to be
> inlined into each C tracepoint caller site, so the code is already meant
> to be duplicated. Having an explicit function wrapping the tracepoint
> for Rust would just create an extra instance of __DO_TRACE if it happens
> to be also inlined into C code.
>
> Or do you meant you would like to prevent having to duplicate the
> implementation of __DO_TRACE in both C and Rust ?
>
> I'm not sure if you mean to prevent source code duplication between
> C and Rust or duplication of binary code (instructions).

It's a question of maintenance burden. If you change how __DO_TRACE is
implemented, then those changes must also be reflected in the Rust
version. There's no issue in the binary code.

Alice

Re: [PATCH 3/3] rust: add tracepoint support

2024-06-06 Thread Boqun Feng

On Thu, Jun 06, 2024 at 11:30:03AM -0400, Mathieu Desnoyers wrote:
> On 2024-06-06 11:05, Alice Ryhl wrote:
> > Make it possible to have Rust code call into tracepoints defined by C
> > code. It is still required that the tracepoint is declared in a C
> > header, and that this header is included in the input to bindgen.
> > 
> > Signed-off-by: Alice Ryhl 
> 
> [...]
> 
> > diff --git a/rust/helpers.c b/rust/helpers.c
> > index 2c37a0f5d7a8..0560cc2a512a 100644
> > --- a/rust/helpers.c
> > +++ b/rust/helpers.c
> > @@ -165,6 +165,30 @@ rust_helper_krealloc(const void *objp, size_t 
> > new_size, gfp_t flags)
> >   }
> >   EXPORT_SYMBOL_GPL(rust_helper_krealloc);
> > +void rust_helper_preempt_enable_notrace(void)
> > +{
> > +   preempt_enable_notrace();
> > +}
> > +EXPORT_SYMBOL_GPL(rust_helper_preempt_enable_notrace);
> > +
> > +void rust_helper_preempt_disable_notrace(void)
> > +{
> > +   preempt_disable_notrace();
> > +}
> > +EXPORT_SYMBOL_GPL(rust_helper_preempt_disable_notrace);
> > +
> > +bool rust_helper_current_cpu_online(void)
> > +{
> > +   return cpu_online(raw_smp_processor_id());
> > +}
> > +EXPORT_SYMBOL_GPL(rust_helper_current_cpu_online);
> > +
> > +void *rust_helper___rcu_dereference_raw(void **p)
> > +{
> > +   return rcu_dereference_raw(p);
> > +}
> > +EXPORT_SYMBOL_GPL(rust_helper___rcu_dereference_raw);
> 
> Ouch. Doing a function call for each of those small operations will
> have a rather large performance impact when tracing is active. If it is

Long-term plan is to 1) compile the C helpers in some IR and 2) inline
the helpers with Rust in IR-level, as what Gary has:


https://lore.kernel.org/rust-for-linux/20240529202817.3641974-1-g...@garyguo.net/

and I use it for the upcoming atomic API support:

https://github.com/fbq/linux/tree/dev/rust/atomic-rfc

and it works very well.

Regards,
Boqun

> not possible to inline those in Rust, then implementing __DO_TRACE in
> a C function would at least allow Rust to only do a single call to C
> rather than go back and forth between Rust and C.
> 
> What prevents inlining those helpers in Rust ?
> 
> Thanks,
> 
> Mathieu
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com
>

Re: [PATCH] ACPI: NFIT: add missing MODULE_DESCRIPTION() macro

2024-06-06 Thread Dave Jiang




On 6/3/24 6:30 AM, Jeff Johnson wrote:
> make allmodconfig && make W=1 C=1 reports:
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/acpi/nfit/nfit.o
> 
> Add the missing invocation of the MODULE_DESCRIPTION() macro.
> 
> Signed-off-by: Jeff Johnson 

Reviewed-by: Dave Jiang 
> ---
>  drivers/acpi/nfit/core.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
> index d4595d1985b1..e8520fb8af4f 100644
> --- a/drivers/acpi/nfit/core.c
> +++ b/drivers/acpi/nfit/core.c
> @@ -3531,5 +3531,6 @@ static __exit void nfit_exit(void)
>  
>  module_init(nfit_init);
>  module_exit(nfit_exit);
> +MODULE_DESCRIPTION("ACPI NVDIMM Firmware Interface Table (NFIT) module");
>  MODULE_LICENSE("GPL v2");
>  MODULE_AUTHOR("Intel Corporation");
> 
> ---
> base-commit: a693b9c95abd4947c2d06e05733de5d470ab6586
> change-id: 20240603-md-drivers-acpi-nfit-e032bad0b189
>

Re: [PATCH RESEND] nvdimm: add missing MODULE_DESCRIPTION() macros

2024-06-06 Thread Dave Jiang




On 5/26/24 10:07 AM, Jeff Johnson wrote:
> Fix the 'make W=1' warnings:
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvdimm/libnvdimm.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvdimm/nd_pmem.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvdimm/nd_btt.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvdimm/nd_e820.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvdimm/of_pmem.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvdimm/nd_virtio.o
> 
> Signed-off-by: Jeff Johnson 
Reviewed-by: Dave Jiang 
> ---
>  drivers/nvdimm/btt.c   | 1 +
>  drivers/nvdimm/core.c  | 1 +
>  drivers/nvdimm/e820.c  | 1 +
>  drivers/nvdimm/nd_virtio.c | 1 +
>  drivers/nvdimm/of_pmem.c   | 1 +
>  drivers/nvdimm/pmem.c  | 1 +
>  6 files changed, 6 insertions(+)
> 
> diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
> index 1e5aedaf8c7b..a47acc5d05df 100644
> --- a/drivers/nvdimm/btt.c
> +++ b/drivers/nvdimm/btt.c
> @@ -1721,6 +1721,7 @@ static void __exit nd_btt_exit(void)
>  
>  MODULE_ALIAS_ND_DEVICE(ND_DEVICE_BTT);
>  MODULE_AUTHOR("Vishal Verma ");
> +MODULE_DESCRIPTION("NVDIMM Block Translation Table");
>  MODULE_LICENSE("GPL v2");
>  module_init(nd_btt_init);
>  module_exit(nd_btt_exit);
> diff --git a/drivers/nvdimm/core.c b/drivers/nvdimm/core.c
> index 2023a661bbb0..f4b6fb4b9828 100644
> --- a/drivers/nvdimm/core.c
> +++ b/drivers/nvdimm/core.c
> @@ -540,6 +540,7 @@ static __exit void libnvdimm_exit(void)
>   nvdimm_devs_exit();
>  }
>  
> +MODULE_DESCRIPTION("NVDIMM (Non-Volatile Memory Device) core module");
>  MODULE_LICENSE("GPL v2");
>  MODULE_AUTHOR("Intel Corporation");
>  subsys_initcall(libnvdimm_init);
> diff --git a/drivers/nvdimm/e820.c b/drivers/nvdimm/e820.c
> index 4cd18be9d0e9..008b9aae74ff 100644
> --- a/drivers/nvdimm/e820.c
> +++ b/drivers/nvdimm/e820.c
> @@ -69,5 +69,6 @@ static struct platform_driver e820_pmem_driver = {
>  module_platform_driver(e820_pmem_driver);
>  
>  MODULE_ALIAS("platform:e820_pmem*");
> +MODULE_DESCRIPTION("NVDIMM support for e820 type-12 memory");
>  MODULE_LICENSE("GPL v2");
>  MODULE_AUTHOR("Intel Corporation");
> diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
> index 1f8c667c6f1e..35c8fbbba10e 100644
> --- a/drivers/nvdimm/nd_virtio.c
> +++ b/drivers/nvdimm/nd_virtio.c
> @@ -123,4 +123,5 @@ int async_pmem_flush(struct nd_region *nd_region, struct 
> bio *bio)
>   return 0;
>  };
>  EXPORT_SYMBOL_GPL(async_pmem_flush);
> +MODULE_DESCRIPTION("Virtio Persistent Memory Driver");
>  MODULE_LICENSE("GPL");
> diff --git a/drivers/nvdimm/of_pmem.c b/drivers/nvdimm/of_pmem.c
> index d3fca0ab6290..5134a8d08bf9 100644
> --- a/drivers/nvdimm/of_pmem.c
> +++ b/drivers/nvdimm/of_pmem.c
> @@ -111,5 +111,6 @@ static struct platform_driver of_pmem_region_driver = {
>  
>  module_platform_driver(of_pmem_region_driver);
>  MODULE_DEVICE_TABLE(of, of_pmem_region_match);
> +MODULE_DESCRIPTION("NVDIMM Device Tree support");
>  MODULE_LICENSE("GPL");
>  MODULE_AUTHOR("IBM Corporation");
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index 598fe2e89bda..57cb30f8a3b8 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -768,4 +768,5 @@ static struct nd_device_driver nd_pmem_driver = {
>  module_nd_driver(nd_pmem_driver);
>  
>  MODULE_AUTHOR("Ross Zwisler ");
> +MODULE_DESCRIPTION("NVDIMM Persistent Memory Driver");
>  MODULE_LICENSE("GPL v2");
> 
> ---
> base-commit: 416ff45264d50a983c3c0b99f0da6ee59f9acd68
> change-id: 20240526-md-drivers-nvdimm-121215a4b93f

Re: [PATCH 3/3] rust: add tracepoint support

2024-06-06 Thread Alice Ryhl

On Thu, Jun 6, 2024 at 5:29 PM Mathieu Desnoyers
 wrote:
>
> On 2024-06-06 11:05, Alice Ryhl wrote:
> > Make it possible to have Rust code call into tracepoints defined by C
> > code. It is still required that the tracepoint is declared in a C
> > header, and that this header is included in the input to bindgen.
> >
> > Signed-off-by: Alice Ryhl 
>
> [...]
>
> > diff --git a/rust/helpers.c b/rust/helpers.c
> > index 2c37a0f5d7a8..0560cc2a512a 100644
> > --- a/rust/helpers.c
> > +++ b/rust/helpers.c
> > @@ -165,6 +165,30 @@ rust_helper_krealloc(const void *objp, size_t 
> > new_size, gfp_t flags)
> >   }
> >   EXPORT_SYMBOL_GPL(rust_helper_krealloc);
> >
> > +void rust_helper_preempt_enable_notrace(void)
> > +{
> > + preempt_enable_notrace();
> > +}
> > +EXPORT_SYMBOL_GPL(rust_helper_preempt_enable_notrace);
> > +
> > +void rust_helper_preempt_disable_notrace(void)
> > +{
> > + preempt_disable_notrace();
> > +}
> > +EXPORT_SYMBOL_GPL(rust_helper_preempt_disable_notrace);
> > +
> > +bool rust_helper_current_cpu_online(void)
> > +{
> > + return cpu_online(raw_smp_processor_id());
> > +}
> > +EXPORT_SYMBOL_GPL(rust_helper_current_cpu_online);
> > +
> > +void *rust_helper___rcu_dereference_raw(void **p)
> > +{
> > + return rcu_dereference_raw(p);
> > +}
> > +EXPORT_SYMBOL_GPL(rust_helper___rcu_dereference_raw);
>
> Ouch. Doing a function call for each of those small operations will
> have a rather large performance impact when tracing is active. If it is
> not possible to inline those in Rust, then implementing __DO_TRACE in
> a C function would at least allow Rust to only do a single call to C
> rather than go back and forth between Rust and C.
>
> What prevents inlining those helpers in Rust ?

There's nothing fundamental that prevents it. But they contain a large
amount of architecture specific code, which makes them a significant
amount of work to reimplement in Rust.

For example, rcu_dereference_raw calls into READ_ONCE. READ_ONCE is
usually a volatile load, but under arm64+LTO, you get a bunch of
inline assembly that relies on ALTERNATIVE for feature detection:
https://elixir.bootlin.com/linux/v6.9/source/arch/arm64/include/asm/rwonce.h#L36

And preempt_enable_notrace has a similar story.

The solution that Boqun mentions is nice, but it relies on rustc and
clang using the same version of LLVM. You are unlikely to have
compilers with matching LLVMs unless you intentionally take steps to
make that happen.

But yes, perhaps these helpers are an argument to have a single call
for __DO_TRACE instead.

Alice

Re: [PATCH 3/3] rust: add tracepoint support

2024-06-06 Thread Mathieu Desnoyers


On 2024-06-06 11:49, Boqun Feng wrote:

On Thu, Jun 06, 2024 at 11:30:03AM -0400, Mathieu Desnoyers wrote:

On 2024-06-06 11:05, Alice Ryhl wrote:

Make it possible to have Rust code call into tracepoints defined by C
code. It is still required that the tracepoint is declared in a C
header, and that this header is included in the input to bindgen.

Signed-off-by: Alice Ryhl 


[...]


diff --git a/rust/helpers.c b/rust/helpers.c
index 2c37a0f5d7a8..0560cc2a512a 100644
--- a/rust/helpers.c
+++ b/rust/helpers.c
@@ -165,6 +165,30 @@ rust_helper_krealloc(const void *objp, size_t new_size, 
gfp_t flags)
   }
   EXPORT_SYMBOL_GPL(rust_helper_krealloc);
+void rust_helper_preempt_enable_notrace(void)
+{
+   preempt_enable_notrace();
+}
+EXPORT_SYMBOL_GPL(rust_helper_preempt_enable_notrace);
+
+void rust_helper_preempt_disable_notrace(void)
+{
+   preempt_disable_notrace();
+}
+EXPORT_SYMBOL_GPL(rust_helper_preempt_disable_notrace);
+
+bool rust_helper_current_cpu_online(void)
+{
+   return cpu_online(raw_smp_processor_id());
+}
+EXPORT_SYMBOL_GPL(rust_helper_current_cpu_online);
+
+void *rust_helper___rcu_dereference_raw(void **p)
+{
+   return rcu_dereference_raw(p);
+}
+EXPORT_SYMBOL_GPL(rust_helper___rcu_dereference_raw);


Ouch. Doing a function call for each of those small operations will
have a rather large performance impact when tracing is active. If it is


Long-term plan is to 1) compile the C helpers in some IR and 2) inline
the helpers with Rust in IR-level, as what Gary has:


https://lore.kernel.org/rust-for-linux/20240529202817.3641974-1-g...@garyguo.net/

and I use it for the upcoming atomic API support:

https://github.com/fbq/linux/tree/dev/rust/atomic-rfc

and it works very well.


Thanks for the pointers, it makes sense.

Thanks,

Mathieu



Regards,
Boqun


not possible to inline those in Rust, then implementing __DO_TRACE in
a C function would at least allow Rust to only do a single call to C
rather than go back and forth between Rust and C.

What prevents inlining those helpers in Rust ?

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com



--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

Re: [PATCH 3/3] rust: add tracepoint support

2024-06-06 Thread Mathieu Desnoyers


On 2024-06-06 12:16, Alice Ryhl wrote:

On Thu, Jun 6, 2024 at 5:29 PM Mathieu Desnoyers
 wrote:


On 2024-06-06 11:05, Alice Ryhl wrote:

Make it possible to have Rust code call into tracepoints defined by C
code. It is still required that the tracepoint is declared in a C
header, and that this header is included in the input to bindgen.

Signed-off-by: Alice Ryhl 


[...]


diff --git a/rust/helpers.c b/rust/helpers.c
index 2c37a0f5d7a8..0560cc2a512a 100644
--- a/rust/helpers.c
+++ b/rust/helpers.c
@@ -165,6 +165,30 @@ rust_helper_krealloc(const void *objp, size_t new_size, 
gfp_t flags)
   }
   EXPORT_SYMBOL_GPL(rust_helper_krealloc);

+void rust_helper_preempt_enable_notrace(void)
+{
+ preempt_enable_notrace();
+}
+EXPORT_SYMBOL_GPL(rust_helper_preempt_enable_notrace);
+
+void rust_helper_preempt_disable_notrace(void)
+{
+ preempt_disable_notrace();
+}
+EXPORT_SYMBOL_GPL(rust_helper_preempt_disable_notrace);
+
+bool rust_helper_current_cpu_online(void)
+{
+ return cpu_online(raw_smp_processor_id());
+}
+EXPORT_SYMBOL_GPL(rust_helper_current_cpu_online);
+
+void *rust_helper___rcu_dereference_raw(void **p)
+{
+ return rcu_dereference_raw(p);
+}
+EXPORT_SYMBOL_GPL(rust_helper___rcu_dereference_raw);


Ouch. Doing a function call for each of those small operations will
have a rather large performance impact when tracing is active. If it is
not possible to inline those in Rust, then implementing __DO_TRACE in
a C function would at least allow Rust to only do a single call to C
rather than go back and forth between Rust and C.

What prevents inlining those helpers in Rust ?


There's nothing fundamental that prevents it. But they contain a large
amount of architecture specific code, which makes them a significant
amount of work to reimplement in Rust.

For example, rcu_dereference_raw calls into READ_ONCE. READ_ONCE is
usually a volatile load, but under arm64+LTO, you get a bunch of
inline assembly that relies on ALTERNATIVE for feature detection:
https://elixir.bootlin.com/linux/v6.9/source/arch/arm64/include/asm/rwonce.h#L36

And preempt_enable_notrace has a similar story.

The solution that Boqun mentions is nice, but it relies on rustc and
clang using the same version of LLVM. You are unlikely to have
compilers with matching LLVMs unless you intentionally take steps to
make that happen.

But yes, perhaps these helpers are an argument to have a single call
for __DO_TRACE instead.


If those helpers end up being inlined into Rust with the solution
pointed to by Boqun, then it makes sense to implement __DO_TRACE
in Rust. Otherwise doing a single call to C would be more efficient
than calling each of the helpers individually.

Thanks,

Mathieu



Alice


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

Re: [PATCH 2/3] rust: add static_key_false

2024-06-06 Thread Alice Ryhl

On Thu, Jun 6, 2024 at 5:37 PM Mathieu Desnoyers
 wrote:
>
> On 2024-06-06 11:05, Alice Ryhl wrote:
> > Add just enough support for static key so that we can use it from
> > tracepoints. Tracepoints rely on `static_key_false` even though it is
> > deprecated, so we add the same functionality to Rust.
> >
> > Signed-off-by: Alice Ryhl 
> > ---
> >   rust/kernel/lib.rs|  1 +
> >   rust/kernel/static_key.rs | 87 
> > +++
> >   scripts/Makefile.build|  2 +-
> >   3 files changed, 89 insertions(+), 1 deletion(-)
> >
> > diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
> > index d534b1178955..22e1fedd0774 100644
> > --- a/rust/kernel/lib.rs
> > +++ b/rust/kernel/lib.rs
> > @@ -39,6 +39,7 @@
> >   pub mod print;
> >   mod static_assert;
> >   pub mod static_call;
> > +pub mod static_key;
> >   #[doc(hidden)]
> >   pub mod std_vendor;
> >   pub mod str;
> > diff --git a/rust/kernel/static_key.rs b/rust/kernel/static_key.rs
> > new file mode 100644
> > index ..6c3dbe14c98a
> > --- /dev/null
> > +++ b/rust/kernel/static_key.rs
> > @@ -0,0 +1,87 @@
> > +// SPDX-License-Identifier: GPL-2.0
>
> This static key code is something that can be generally useful
> both in kernel and user-space. Is there anything that would prevent
> licensing this under MIT right away instead so it could eventually be
> re-used more broadly in userspace as well ?

I would not mind.

Alice

Re: [PATCH] nvdimm: Fix devs leaks in scan_labels()

2024-06-06 Thread Dave Jiang




On 6/3/24 8:16 PM, Li Zhijian wrote:
> Don't allocate devs again when it's valid pointer which has pionted to
> the memory allocated above with size (count + 2 * sizeof(dev)).
> 
> A kmemleak reports:
> unreferenced object 0x88800dda1980 (size 16):
>   comm "kworker/u10:5", pid 69, jiffies 4294671781
>   hex dump (first 16 bytes):
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
>   backtrace (crc 0):
> [] __kmalloc+0x32c/0x470
> [<9ed43c83>] nd_region_register_namespaces+0x6fb/0x1120 
> [libnvdimm]
> [<0e07a65c>] nd_region_probe+0xfe/0x210 [libnvdimm]
> [<7b79ce5f>] nvdimm_bus_probe+0x7a/0x1e0 [libnvdimm]
> [] really_probe+0xc6/0x390
> [<129e2a69>] __driver_probe_device+0x78/0x150
> [<2dfed28b>] driver_probe_device+0x1e/0x90
> [] __device_attach_driver+0x85/0x110
> [<32dca295>] bus_for_each_drv+0x85/0xe0
> [<391c5a7d>] __device_attach+0xbe/0x1e0
> [<26dabec0>] bus_probe_device+0x94/0xb0
> [] device_add+0x656/0x870
> [<3d69bfaa>] nd_async_device_register+0xe/0x50 [libnvdimm]
> [<3f4c52a4>] async_run_entry_fn+0x2e/0x110
> [] process_one_work+0x1ee/0x600
> [<6d90d5a9>] worker_thread+0x183/0x350
> 
> Fixes: 1b40e09a1232 ("libnvdimm: blk labels and namespace instantiation")
> Signed-off-by: Li Zhijian 
> ---
>  drivers/nvdimm/namespace_devs.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
> index d6d558f94d6b..56b016dbe307 100644
> --- a/drivers/nvdimm/namespace_devs.c
> +++ b/drivers/nvdimm/namespace_devs.c
> @@ -1994,7 +1994,9 @@ static struct device **scan_labels(struct nd_region 
> *nd_region)
>   /* Publish a zero-sized namespace for userspace to configure. */
>   nd_mapping_free_labels(nd_mapping);
>  
> - devs = kcalloc(2, sizeof(dev), GFP_KERNEL);
> + /* devs probably has been allocated */
> + if (!devs)
> + devs = kcalloc(2, sizeof(dev), GFP_KERNEL);

This changes the behavior of this code and possibly corrupting the previously 
allocated memory at times when 'devs' is valid. Was the 'devs' leaked out from 
the previous loop and should be freed instead?

>   if (!devs)
>   goto err;
>

Re: [PATCH 0/3] Tracepoints and static branch/call in Rust

2024-06-06 Thread Mathieu Desnoyers


On 2024-06-06 11:46, Alice Ryhl wrote:

On Thu, Jun 6, 2024 at 5:25 PM Mathieu Desnoyers
 wrote:


On 2024-06-06 11:05, Alice Ryhl wrote:

This implementation implements support for static keys in Rust so that
the actual static branch will end up in the Rust object file. However,
it would also be possible to just wrap the trace_##name generated by
__DECLARE_TRACE in an extern C function and then call that from Rust.
This will simplify the Rust code by removing the need for static
branches and calls, but it places the static branch behind an external
call, which has performance implications.


The tracepoints try very hard to minimize overhead of dormant tracepoints
so it is not frowned-upon to have them built into production binaries.
This is needed to make sure distribution vendors keep those tracepoints
in the kernel binaries that reach end-users.

Adding a function call before evaluation of the static branch goes against
this major goal.



A possible middle ground would be to place just the __DO_TRACE body in
an extern C function and to implement the Rust wrapper by doing the
static branch in Rust, and then calling into C the code that contains
__DO_TRACE when the tracepoint is active. However, this would need some
changes to include/linux/tracepoint.h to generate and export a function
containing the body of __DO_TRACE when the tracepoint should be callable
from Rust.


This tradeoff is more acceptable than having a function call before
evaluation of the static branch, but I wonder what is the upside of
this tradeoff compared to inlining the whole __DO_TRACE in Rust ?


So in general, there is a tradeoff between placing parts of the
tracepoint (which is perf sensitive) behind an external call, and having
code duplicated in both C and Rust (which must be kept in sync when
changes are made). This is an important point that I would like feedback
on from the C maintainers.


I don't see how the duplication happens there: __DO_TRACE is meant to be
inlined into each C tracepoint caller site, so the code is already meant
to be duplicated. Having an explicit function wrapping the tracepoint
for Rust would just create an extra instance of __DO_TRACE if it happens
to be also inlined into C code.

Or do you meant you would like to prevent having to duplicate the
implementation of __DO_TRACE in both C and Rust ?

I'm not sure if you mean to prevent source code duplication between
C and Rust or duplication of binary code (instructions).


It's a question of maintenance burden. If you change how __DO_TRACE is
implemented, then those changes must also be reflected in the Rust
version. There's no issue in the binary code.


As long as it is only __DO_TRACE that is duplicated between C and Rust,
I don't see it as a large burden.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

Re: [RFC bpf-next 01/10] uprobe: Add session callbacks to uprobe_consumer

2024-06-06 Thread Jiri Olsa

On Wed, Jun 05, 2024 at 10:50:11PM +0200, Jiri Olsa wrote:
> On Wed, Jun 05, 2024 at 07:56:19PM +0200, Oleg Nesterov wrote:
> > On 06/05, Andrii Nakryiko wrote:
> > >
> > > so any such
> > > limitations will cause problems, issue reports, investigation, etc.
> > 
> > Agreed...
> > 
> > > As one possible solution, what if we do
> > >
> > > struct return_instance {
> > > ...
> > > u64 session_cookies[];
> > > };
> > >
> > > and allocate sizeof(struct return_instance) + 8 *
> > >  and then at runtime pass
> > > &session_cookies[i] as data pointer to session-aware callbacks?
> > 
> > I too thought about this, but I guess it is not that simple.
> > 
> > Just for example. Suppose we have 2 session-consumers C1 and C2.
> > What if uprobe_unregister(C1) comes before the probed function
> > returns?
> > 
> > We need something like map_cookie_to_consumer().
> 
> I guess we could have hash table in return_instance that gets 'consumer -> 
> cookie' ?

ok, hash table is probably too big for this.. I guess some solution that
would iterate consumers and cookies made sure it matches would be fine

jirka

> 
> return instance is freed after the consumers' return handlers are executed,
> so there's no leak if some consumer gets unregistered before that
> 
> > 
> > > > +   /* The handler_session callback return value controls execution 
> > > > of
> > > > +* the return uprobe and ret_handler_session callback.
> > > > +*  0 on success
> > > > +*  1 on failure, DO NOT install/execute the return uprobe
> > > > +*console warning for anything else
> > > > +*/
> > > > +   int (*handler_session)(struct uprobe_consumer *self, struct 
> > > > pt_regs *regs,
> > > > +  unsigned long *data);
> > > > +   int (*ret_handler_session)(struct uprobe_consumer *self, 
> > > > unsigned long func,
> > > > +  struct pt_regs *regs, unsigned long 
> > > > *data);
> > > > +
> > >
> > > We should try to avoid an alternative set of callbacks, IMO. Let's
> > > extend existing ones with `unsigned long *data`,
> > 
> > Oh yes, agreed.
> > 
> > And the comment about the return value looks confusing too. I mean, the
> > logic doesn't differ from the ret-code from ->handler().
> > 
> > "DO NOT install/execute the return uprobe" is not true if another
> > non-session-consumer returns 0.
> 
> well they are meant to be exclusive, so there'd be no other 
> non-session-consumer
> 
> jirka

Re: [RFC bpf-next 01/10] uprobe: Add session callbacks to uprobe_consumer

2024-06-06 Thread Andrii Nakryiko

On Thu, Jun 6, 2024 at 9:46 AM Jiri Olsa  wrote:
>
> On Wed, Jun 05, 2024 at 10:50:11PM +0200, Jiri Olsa wrote:
> > On Wed, Jun 05, 2024 at 07:56:19PM +0200, Oleg Nesterov wrote:
> > > On 06/05, Andrii Nakryiko wrote:
> > > >
> > > > so any such
> > > > limitations will cause problems, issue reports, investigation, etc.
> > >
> > > Agreed...
> > >
> > > > As one possible solution, what if we do
> > > >
> > > > struct return_instance {
> > > > ...
> > > > u64 session_cookies[];
> > > > };
> > > >
> > > > and allocate sizeof(struct return_instance) + 8 *
> > > >  and then at runtime pass
> > > > &session_cookies[i] as data pointer to session-aware callbacks?
> > >
> > > I too thought about this, but I guess it is not that simple.
> > >
> > > Just for example. Suppose we have 2 session-consumers C1 and C2.
> > > What if uprobe_unregister(C1) comes before the probed function
> > > returns?
> > >
> > > We need something like map_cookie_to_consumer().
> >
> > I guess we could have hash table in return_instance that gets 'consumer -> 
> > cookie' ?
>
> ok, hash table is probably too big for this.. I guess some solution that
> would iterate consumers and cookies made sure it matches would be fine
>

Yes, I was hoping to avoid hash tables for this, and in the common
case have no added overhead.

> jirka
>
> >
> > return instance is freed after the consumers' return handlers are executed,
> > so there's no leak if some consumer gets unregistered before that
> >
> > >
> > > > > +   /* The handler_session callback return value controls 
> > > > > execution of
> > > > > +* the return uprobe and ret_handler_session callback.
> > > > > +*  0 on success
> > > > > +*  1 on failure, DO NOT install/execute the return uprobe
> > > > > +*console warning for anything else
> > > > > +*/
> > > > > +   int (*handler_session)(struct uprobe_consumer *self, struct 
> > > > > pt_regs *regs,
> > > > > +  unsigned long *data);
> > > > > +   int (*ret_handler_session)(struct uprobe_consumer *self, 
> > > > > unsigned long func,
> > > > > +  struct pt_regs *regs, unsigned 
> > > > > long *data);
> > > > > +
> > > >
> > > > We should try to avoid an alternative set of callbacks, IMO. Let's
> > > > extend existing ones with `unsigned long *data`,
> > >
> > > Oh yes, agreed.
> > >
> > > And the comment about the return value looks confusing too. I mean, the
> > > logic doesn't differ from the ret-code from ->handler().
> > >
> > > "DO NOT install/execute the return uprobe" is not true if another
> > > non-session-consumer returns 0.
> >
> > well they are meant to be exclusive, so there'd be no other 
> > non-session-consumer
> >
> > jirka

Re: [PATCH] nvdimm: Fix devs leaks in scan_labels()

2024-06-06 Thread Ira Weiny

Li Zhijian wrote:
> Don't allocate devs again when it's valid pointer which has pionted to
> the memory allocated above with size (count + 2 * sizeof(dev)).
> 
> A kmemleak reports:
> unreferenced object 0x88800dda1980 (size 16):
>   comm "kworker/u10:5", pid 69, jiffies 4294671781
>   hex dump (first 16 bytes):
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
>   backtrace (crc 0):
> [] __kmalloc+0x32c/0x470
> [<9ed43c83>] nd_region_register_namespaces+0x6fb/0x1120 
> [libnvdimm]
> [<0e07a65c>] nd_region_probe+0xfe/0x210 [libnvdimm]
> [<7b79ce5f>] nvdimm_bus_probe+0x7a/0x1e0 [libnvdimm]
> [] really_probe+0xc6/0x390
> [<129e2a69>] __driver_probe_device+0x78/0x150
> [<2dfed28b>] driver_probe_device+0x1e/0x90
> [] __device_attach_driver+0x85/0x110
> [<32dca295>] bus_for_each_drv+0x85/0xe0
> [<391c5a7d>] __device_attach+0xbe/0x1e0
> [<26dabec0>] bus_probe_device+0x94/0xb0
> [] device_add+0x656/0x870
> [<3d69bfaa>] nd_async_device_register+0xe/0x50 [libnvdimm]
> [<3f4c52a4>] async_run_entry_fn+0x2e/0x110
> [] process_one_work+0x1ee/0x600
> [<6d90d5a9>] worker_thread+0x183/0x350
> 
> Fixes: 1b40e09a1232 ("libnvdimm: blk labels and namespace instantiation")
> Signed-off-by: Li Zhijian 
> ---
>  drivers/nvdimm/namespace_devs.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
> index d6d558f94d6b..56b016dbe307 100644
> --- a/drivers/nvdimm/namespace_devs.c
> +++ b/drivers/nvdimm/namespace_devs.c
> @@ -1994,7 +1994,9 @@ static struct device **scan_labels(struct nd_region 
> *nd_region)
>   /* Publish a zero-sized namespace for userspace to configure. */
>   nd_mapping_free_labels(nd_mapping);
>  
> - devs = kcalloc(2, sizeof(dev), GFP_KERNEL);
> + /* devs probably has been allocated */

I don't think this is where the bug is.  The loop above is processing the
known labels and should exit with a count > 0 if devs is not NULL.

>From what I can tell create_namespace_pmem() must be returning EAGAIN
which leaves devs allocated but fails to increment count.  Thus there are
no valid labels but devs was not free'ed.

Can you trace the error you are seeing a bit more to see if this is the
case?  

Thanks,
Ira

> + if (!devs)
> + devs = kcalloc(2, sizeof(dev), GFP_KERNEL);
>   if (!devs)
>   goto err;
>  
> -- 
> 2.29.2
>

Re: [PATCH 1/3] rust: add static_call support

2024-06-06 Thread Peter Zijlstra

On Thu, Jun 06, 2024 at 03:05:24PM +, Alice Ryhl wrote:
> Add static_call support by mirroring how C does. When the platform does
> not support static calls (right now, that means that it is not x86),
> then the function pointer is loaded from a global and called. Otherwise,
> we generate a call to a trampoline function, and objtool is used to make
> these calls patchable at runtime.

This is absolutely unreadable gibberish -- how am I supposed to keep
this in sync with the rest of the static_call infrastructure?

> Signed-off-by: Alice Ryhl 
> ---
>  rust/kernel/lib.rs |  1 +
>  rust/kernel/static_call.rs | 92 
> ++
>  2 files changed, 93 insertions(+)
> 
> diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
> index fbd91a48ff8b..d534b1178955 100644
> --- a/rust/kernel/lib.rs
> +++ b/rust/kernel/lib.rs
> @@ -38,6 +38,7 @@
>  pub mod prelude;
>  pub mod print;
>  mod static_assert;
> +pub mod static_call;
>  #[doc(hidden)]
>  pub mod std_vendor;
>  pub mod str;
> diff --git a/rust/kernel/static_call.rs b/rust/kernel/static_call.rs
> new file mode 100644
> index ..f7b8ba7bf1fb
> --- /dev/null
> +++ b/rust/kernel/static_call.rs
> @@ -0,0 +1,92 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +// Copyright (C) 2024 Google LLC.
> +
> +//! Logic for static calls.
> +
> +#[macro_export]
> +#[doc(hidden)]
> +macro_rules! ty_underscore_for {
> +($arg:expr) => {
> +_
> +};
> +}
> +
> +#[doc(hidden)]
> +#[repr(transparent)]
> +pub struct AddressableStaticCallKey {
> +_ptr: *const bindings::static_call_key,
> +}
> +unsafe impl Sync for AddressableStaticCallKey {}
> +impl AddressableStaticCallKey {
> +pub const fn new(ptr: *const bindings::static_call_key) -> Self {
> +Self { _ptr: ptr }
> +}
> +}
> +
> +#[cfg(CONFIG_HAVE_STATIC_CALL)]
> +#[doc(hidden)]
> +#[macro_export]
> +macro_rules! _static_call {
> +($name:ident($($args:expr),* $(,)?)) => {{
> +// Symbol mangling will give this symbol a unique name.
> +#[cfg(CONFIG_HAVE_STATIC_CALL_INLINE)]
> +#[link_section = ".discard.addressable"]
> +#[used]
> +static __ADDRESSABLE: $crate::static_call::AddressableStaticCallKey 
> = unsafe {
> +
> $crate::static_call::AddressableStaticCallKey::new(::core::ptr::addr_of!(
> +$crate::macros::paste! { $crate::bindings:: [<__SCK__ $name 
> >]; }
> +))
> +};
> +
> +let fn_ptr: unsafe extern "C" 
> fn($($crate::static_call::ty_underscore_for!($args)),*) -> _ =
> +$crate::macros::paste! { $crate::bindings:: [<__SCT__ $name >]; 
> };
> +(fn_ptr)($($args),*)
> +}};
> +}
> +
> +#[cfg(not(CONFIG_HAVE_STATIC_CALL))]
> +#[doc(hidden)]
> +#[macro_export]
> +macro_rules! _static_call {
> +($name:ident($($args:expr),* $(,)?)) => {{
> +let void_ptr_fn: *mut ::core::ffi::c_void =
> +$crate::macros::paste! { $crate::bindings:: [<__SCK__ $name >]; 
> }.func;
> +
> +let fn_ptr: unsafe extern "C" 
> fn($($crate::static_call::ty_underscore_for!($args)),*) -> _ =
> +if true {
> +::core::mem::transmute(void_ptr_fn)
> +} else {
> +// This is dead code, but it influences type inference on 
> `fn_ptr` so that we
> +// transmute the function pointer to the right type.
> +$crate::macros::paste! { $crate::bindings:: [<__SCT__ $name 
> >]; }
> +};
> +
> +(fn_ptr)($($args),*)
> +}};
> +}
> +
> +/// Statically call a global function.
> +///
> +/// # Safety
> +///
> +/// This macro will call the provided function. It is up to the caller to 
> uphold the safety
> +/// guarantees of the function.
> +///
> +/// # Examples
> +///
> +/// ```ignore
> +/// fn call_static() {
> +/// unsafe {
> +/// static_call! { your_static_call() };
> +/// }
> +/// }
> +/// ```
> +#[macro_export]
> +macro_rules! static_call {
> +// Forward to the real implementation. Separated like this so that we 
> don't have to duplicate
> +// the documentation.
> +($($args:tt)*) => { $crate::static_call::_static_call! { $($args)* } };
> +}
> +
> +pub use {_static_call, static_call, ty_underscore_for};
> 
> -- 
> 2.45.2.505.gda0bf45e8d-goog
>

Re: [PATCH 2/3] rust: add static_key_false

2024-06-06 Thread Peter Zijlstra

On Thu, Jun 06, 2024 at 03:05:25PM +, Alice Ryhl wrote:
> Add just enough support for static key so that we can use it from
> tracepoints. Tracepoints rely on `static_key_false` even though it is
> deprecated, so we add the same functionality to Rust.

Urgh, more unreadable gibberish :-(

Can we please get bindgen to translate asm/jump_table.h instead of
randomly duplicating random bits.

I really think that whoever created rust was an esoteric language freak.
Hideous crap :-(.

Re: [PATCH 3/3] rust: add tracepoint support

2024-06-06 Thread Peter Zijlstra

On Thu, Jun 06, 2024 at 03:05:26PM +, Alice Ryhl wrote:
> Make it possible to have Rust code call into tracepoints defined by C
> code. It is still required that the tracepoint is declared in a C
> header, and that this header is included in the input to bindgen.
> 
> Signed-off-by: Alice Ryhl 
> ---
>  rust/bindings/bindings_helper.h |  1 +
>  rust/bindings/lib.rs| 15 +++
>  rust/helpers.c  | 24 +++
>  rust/kernel/lib.rs  |  1 +
>  rust/kernel/tracepoint.rs   | 92 
> +
>  5 files changed, 133 insertions(+)
> 
> diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
> index ddb5644d4fd9..d442f9ccfc2c 100644
> --- a/rust/bindings/bindings_helper.h
> +++ b/rust/bindings/bindings_helper.h
> @@ -15,6 +15,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  
> diff --git a/rust/bindings/lib.rs b/rust/bindings/lib.rs
> index 40ddaee50d8b..48856761d682 100644
> --- a/rust/bindings/lib.rs
> +++ b/rust/bindings/lib.rs
> @@ -48,3 +48,18 @@ mod bindings_helper {
>  }
>  
>  pub use bindings_raw::*;
> +
> +/// Rust version of the C macro `rcu_dereference_raw`.
> +///
> +/// The rust helper only works with void pointers, but this wrapper method 
> makes it work with any
> +/// pointer type using pointer casts.
> +///
> +/// # Safety
> +///
> +/// This method has the same safety requirements as the C macro of the same 
> name.
> +#[inline(always)]
> +pub unsafe fn rcu_dereference_raw(p: *const *mut T) -> *mut T {
> +// SAFETY: This helper calls into the C macro, so the caller promises to 
> uphold the safety
> +// requirements.
> +unsafe { __rcu_dereference_raw(p as *mut *mut _) as *mut T }
> +}
> diff --git a/rust/helpers.c b/rust/helpers.c
> index 2c37a0f5d7a8..0560cc2a512a 100644
> --- a/rust/helpers.c
> +++ b/rust/helpers.c
> @@ -165,6 +165,30 @@ rust_helper_krealloc(const void *objp, size_t new_size, 
> gfp_t flags)
>  }
>  EXPORT_SYMBOL_GPL(rust_helper_krealloc);
>  
> +void rust_helper_preempt_enable_notrace(void)
> +{
> + preempt_enable_notrace();
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_preempt_enable_notrace);
> +
> +void rust_helper_preempt_disable_notrace(void)
> +{
> + preempt_disable_notrace();
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_preempt_disable_notrace);

A notrace wrapper that is tracable, lol.

> +bool rust_helper_current_cpu_online(void)
> +{
> + return cpu_online(raw_smp_processor_id());
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_current_cpu_online);
> +
> +void *rust_helper___rcu_dereference_raw(void **p)
> +{
> + return rcu_dereference_raw(p);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper___rcu_dereference_raw);

I'm going to keep yelling and objecting to these wrappers.

Fix bindgen already. Or whatever is needed to get it to interoperate
with C inline / asm.

NAK NAK NAK

Re: [PATCH 3/3] rust: add tracepoint support

2024-06-06 Thread Peter Zijlstra

On Thu, Jun 06, 2024 at 08:49:06AM -0700, Boqun Feng wrote:

> Long-term plan is to 1) compile the C helpers in some IR and 2) inline
> the helpers with Rust in IR-level, as what Gary has:
> 
>   
> https://lore.kernel.org/rust-for-linux/20240529202817.3641974-1-g...@garyguo.net/

Urgh, that still needs us to maintain that silly list of helpers :-/

Can't we pretty please have clang parse the actual header files into IR
and munge that into rust? So that we don't get to manually duplicate
everything+dog.

Re: [PATCH v2 2/5] ftrace: Remove "ftrace_hash" parameter from __ftrace_hash_rec_update()

2024-06-06 Thread Mark Rutland

On Wed, Jun 05, 2024 at 02:03:36PM -0400, Steven Rostedt wrote:
> From: "Steven Rostedt (Google)" 
> 
> While adding comments to the function __ftrace_hash_rec_update() and
> trying to describe in detail what the parameter for "ftrace_hash" does, I
> realized that it basically does exactly the same thing (but differently)
> if it is set or not!

Typo: "ftrace_hash" should be "filter_hash", and likewise in the commit
title.

> If it is set, the idea was the ops->filter_hash was being updated, and the
> code should focus on the functions that are in the ops->filter_hash and
> add them. But it still had to pay attention to the functions in the
> ops->notrace_hash, to ignore them.
> 
> If it was cleared, it focused on the ops->notrace_hash, and would add
> functions that were not in the ops->notrace_hash but would still keep
> functions in the "ops->filter_hash". Basically doing the same thing.
> 
> In reality, the __ftrace_hash_rec_update() only needs to either remove the
> functions associated to the give ops (if "inc" is set) or remove them (if
> "inc" is cleared). It has to pay attention to both the filter_hash and
> notrace_hash regardless.

AFAICT, we actually remove a latent bug here, but that bug wasn't
reachable because we never call __ftrace_hash_rec_update() with
"filter_hash" clear.

Before this patch, if we did call __ftrace_hash_rec_update() with
"filter_hash" clear, we'd setup:

all = false;
...
if (filter_hash) {
...
} else {
hash = ops->func_hash->notrace_hash;
other_hash = ops->func_hash->filter_hash;
}

... and then at the tail of the loop where we do:

count++;

[...] 

/* Shortcut, if we handled all records, we are done. */
if (!all && count == hash->count) {
pr_info("HARK: stopping after %d recs\n", count);
return update;
}

... we'd be checking whether we've updated notrace_hash->count entries,
when that could be smaller than the number of entries we actually need
to update (e.g. if the notrace hash contains a single entry, but the
filter_hash contained more).

As above, we never actually hit that because we never call with
"filter_hash" clear, so it might be good to explicitly say that before
this patch we never actually call __ftrace_hash_rec_update() with
"filter_hash" clear, and this is removing dead (and potentially broken)
code.

> Remove the "filter_hash" parameter from __filter_hash_rec_update() and
> comment the function for what it really is doing.
> 
> Signed-off-by: Steven Rostedt (Google) 

FWIW, this looks good to me, and works in testing, so:

Reviewed-by: Mark Rutland 
Tested-by: Mark Rutland 

I have one comment below, but the above stands regardless.

[...]

> +/*
> + * This is the main engine to the ftrace updates to the dyn_ftrace records.
> + *
> + * It will iterate through all the available ftrace functions
> + * (the ones that ftrace can have callbacks to) and set the flags
> + * in the associated dyn_ftrace records.
> + *
> + * @inc: If true, the functions associated to @ops are added to
> + *   the dyn_ftrace records, otherwise they are removed.
> + */
>  static bool __ftrace_hash_rec_update(struct ftrace_ops *ops,
> -  int filter_hash,
>bool inc)
>  {
>   struct ftrace_hash *hash;
> - struct ftrace_hash *other_hash;
> + struct ftrace_hash *notrace_hash;
>   struct ftrace_page *pg;
>   struct dyn_ftrace *rec;
>   bool update = false;
> @@ -1718,35 +1725,16 @@ static bool __ftrace_hash_rec_update(struct 
> ftrace_ops *ops,
>   return false;
>  
>   /*
> -  * In the filter_hash case:
>*   If the count is zero, we update all records.
>*   Otherwise we just update the items in the hash.
> -  *
> -  * In the notrace_hash case:
> -  *   We enable the update in the hash.
> -  *   As disabling notrace means enabling the tracing,
> -  *   and enabling notrace means disabling, the inc variable
> -  *   gets inversed.
>*/
> - if (filter_hash) {
> - hash = ops->func_hash->filter_hash;
> - other_hash = ops->func_hash->notrace_hash;
> - if (ftrace_hash_empty(hash))
> - all = true;
> - } else {
> - inc = !inc;
> - hash = ops->func_hash->notrace_hash;
> - other_hash = ops->func_hash->filter_hash;
> - /*
> -  * If the notrace hash has no items,
> -  * then there's nothing to do.
> -  */
> - if (ftrace_hash_empty(hash))
> - return false;
> - }
> + hash = ops->func_hash->filter_hash;
> + notrace_hash = ops->func_hash->notrace_hash;
> + if (ftrace_hash_empty(hash))
> + all = true;
>  
>   do_for_each_ftrace_rec(pg, rec) {
> - int in_other_hash =

Re: [PATCH v2 4/5] ftrace: Convert "inc" parameter to bool in ftrace_hash_rec_update_modify()

2024-06-06 Thread Mark Rutland

On Wed, Jun 05, 2024 at 02:03:38PM -0400, Steven Rostedt wrote:
> From: "Steven Rostedt (Google)" 
> 
> The parameter "inc" in the function ftrace_hash_rec_update_modify() is
> boolean. Change it to be such.
> 
> Also add documentation to what the function does.
> 
> Signed-off-by: Steven Rostedt (Google) 

Acked-by: Mark Rutland 

Mark.

> ---
>  kernel/trace/ftrace.c | 23 ---
>  1 file changed, 20 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 1a2444199740..83f23f8fc26d 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -1882,7 +1882,24 @@ static bool ftrace_hash_rec_enable(struct ftrace_ops 
> *ops)
>   return __ftrace_hash_rec_update(ops, true);
>  }
>  
> -static void ftrace_hash_rec_update_modify(struct ftrace_ops *ops, int inc)
> +/*
> + * This function will update what functions @ops traces when its filter
> + * changes.
> + *
> + * The @inc states if the @ops callbacks are going to be added or removed.
> + * When one of the @ops hashes are updated to a "new_hash" the dyn_ftrace
> + * records are update via:
> + *
> + * ftrace_hash_rec_disable_modify(ops);
> + * ops->hash = new_hash
> + * ftrace_hash_rec_enable_modify(ops);
> + *
> + * Where the @ops is removed from all the records it is tracing using
> + * its old hash. The @ops hash is updated to the new hash, and then
> + * the @ops is added back to the records so that it is tracing all
> + * the new functions.
> + */
> +static void ftrace_hash_rec_update_modify(struct ftrace_ops *ops, bool inc)
>  {
>   struct ftrace_ops *op;
>  
> @@ -1906,12 +1923,12 @@ static void ftrace_hash_rec_update_modify(struct 
> ftrace_ops *ops, int inc)
>  
>  static void ftrace_hash_rec_disable_modify(struct ftrace_ops *ops)
>  {
> - ftrace_hash_rec_update_modify(ops, 0);
> + ftrace_hash_rec_update_modify(ops, false);
>  }
>  
>  static void ftrace_hash_rec_enable_modify(struct ftrace_ops *ops)
>  {
> - ftrace_hash_rec_update_modify(ops, 1);
> + ftrace_hash_rec_update_modify(ops, true);
>  }
>  
>  /*
> -- 
> 2.43.0
> 
>

Re: [PATCH v2 5/5] ftrace: Add comments to ftrace_hash_move() and friends

2024-06-06 Thread Mark Rutland

On Wed, Jun 05, 2024 at 02:03:39PM -0400, Steven Rostedt wrote:
> From: "Steven Rostedt (Google)" 
> 
> Describe what ftrace_hash_move() does and add some more comments to some
> other functions to make it easier to understand.
> 
> Signed-off-by: Steven Rostedt (Google) 

Acked-by: Mark Rutland 

Mark.

> ---
>  kernel/trace/ftrace.c | 24 +++-
>  1 file changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 83f23f8fc26d..ae1603e771c5 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -169,6 +169,7 @@ static inline void ftrace_ops_init(struct ftrace_ops *ops)
>  #endif
>  }
>  
> +/* Call this function for when a callback filters on set_ftrace_pid */
>  static void ftrace_pid_func(unsigned long ip, unsigned long parent_ip,
>   struct ftrace_ops *op, struct ftrace_regs *fregs)
>  {
> @@ -1317,7 +1318,7 @@ static struct ftrace_hash *alloc_ftrace_hash(int 
> size_bits)
>   return hash;
>  }
>  
> -
> +/* Used to save filters on functions for modules not loaded yet */
>  static int ftrace_add_mod(struct trace_array *tr,
> const char *func, const char *module,
> int enable)
> @@ -1429,6 +1430,7 @@ static struct ftrace_hash *__move_hash(struct 
> ftrace_hash *src, int size)
>   return new_hash;
>  }
>  
> +/* Move the @src entries to a newly allocated hash */
>  static struct ftrace_hash *
>  __ftrace_hash_move(struct ftrace_hash *src)
>  {
> @@ -1443,6 +1445,26 @@ __ftrace_hash_move(struct ftrace_hash *src)
>   return __move_hash(src, size);
>  }
>  
> +/**
> + * ftrace_hash_move - move a new hash to a filter and do updates
> + * @ops: The ops with the hash that @dst points to
> + * @enable: True if for the filter hash, false for the notrace hash
> + * @dst: Points to the @ops hash that should be updated
> + * @src: The hash to update @dst with
> + *
> + * This is called when an ftrace_ops hash is being updated and the
> + * the kernel needs to reflect this. Note, this only updates the kernel
> + * function callbacks if the @ops is enabled (not to be confused with
> + * @enable above). If the @ops is enabled, its hash determines what
> + * callbacks get called. This function gets called when the @ops hash
> + * is updated and it requires new callbacks.
> + *
> + * On success the elements of @src is moved to @dst, and @dst is updated
> + * properly, as well as the functions determined by the @ops hashes
> + * are now calling the @ops callback function.
> + *
> + * Regardless of return type, @src should be freed with free_ftrace_hash().
> + */
>  static int
>  ftrace_hash_move(struct ftrace_ops *ops, int enable,
>struct ftrace_hash **dst, struct ftrace_hash *src)
> -- 
> 2.43.0
> 
>

Re: [PATCH v2 2/5] ftrace: Remove "ftrace_hash" parameter from __ftrace_hash_rec_update()

2024-06-06 Thread Steven Rostedt

On Thu, 6 Jun 2024 18:53:12 +0100
Mark Rutland  wrote:

> On Wed, Jun 05, 2024 at 02:03:36PM -0400, Steven Rostedt wrote:
> > From: "Steven Rostedt (Google)" 
> > 
> > While adding comments to the function __ftrace_hash_rec_update() and
> > trying to describe in detail what the parameter for "ftrace_hash" does, I
> > realized that it basically does exactly the same thing (but differently)
> > if it is set or not!  
> 
> Typo: "ftrace_hash" should be "filter_hash", and likewise in the commit
> title.

Let me go fix up linux-next :-p

> 
> > If it is set, the idea was the ops->filter_hash was being updated, and the
> > code should focus on the functions that are in the ops->filter_hash and
> > add them. But it still had to pay attention to the functions in the
> > ops->notrace_hash, to ignore them.
> > 
> > If it was cleared, it focused on the ops->notrace_hash, and would add
> > functions that were not in the ops->notrace_hash but would still keep
> > functions in the "ops->filter_hash". Basically doing the same thing.
> > 
> > In reality, the __ftrace_hash_rec_update() only needs to either remove the
> > functions associated to the give ops (if "inc" is set) or remove them (if
> > "inc" is cleared). It has to pay attention to both the filter_hash and
> > notrace_hash regardless.  
> 
> AFAICT, we actually remove a latent bug here, but that bug wasn't
> reachable because we never call __ftrace_hash_rec_update() with
> "filter_hash" clear.
> 
> Before this patch, if we did call __ftrace_hash_rec_update() with
> "filter_hash" clear, we'd setup:
> 
>   all = false;
>   ...
>   if (filter_hash) {
>   ...
>   } else {
>   hash = ops->func_hash->notrace_hash;
>   other_hash = ops->func_hash->filter_hash;
>   }
> 
> ... and then at the tail of the loop where we do:
> 
>   count++;
> 
>   [...] 
> 
>   /* Shortcut, if we handled all records, we are done. */
>   if (!all && count == hash->count) {
>   pr_info("HARK: stopping after %d recs\n", count);
>   return update;
>   }
> 
> ... we'd be checking whether we've updated notrace_hash->count entries,
> when that could be smaller than the number of entries we actually need
> to update (e.g. if the notrace hash contains a single entry, but the
> filter_hash contained more).

I noticed this too as well as:

if (filter_hash) {
hash = ops->func_hash->filter_hash;
other_hash = ops->func_hash->notrace_hash;
if (ftrace_hash_empty(hash))
all = true;
} else {
inc = !inc;
hash = ops->func_hash->notrace_hash;
other_hash = ops->func_hash->filter_hash;
/*
 * If the notrace hash has no items,
 * then there's nothing to do.
 */
if (ftrace_hash_empty(hash))
return false;
}

That "return false" is actually a mistake as well. But I tried to hit
it, but found that I could not. I think I updated the code due to bugs
where I prevent that from happening, but the real fix would have been
this patch. :-p

> 
> As above, we never actually hit that because we never call with
> "filter_hash" clear, so it might be good to explicitly say that before
> this patch we never actually call __ftrace_hash_rec_update() with
> "filter_hash" clear, and this is removing dead (and potentially broken)
> code.

Right.

> 
> > Remove the "filter_hash" parameter from __filter_hash_rec_update() and
> > comment the function for what it really is doing.
> > 
> > Signed-off-by: Steven Rostedt (Google)   
> 
> FWIW, this looks good to me, and works in testing, so:
> 
> Reviewed-by: Mark Rutland 
> Tested-by: Mark Rutland 
> 
> I have one comment below, but the above stands regardless.
> 
> [...]
> 
> > +/*
> > + * This is the main engine to the ftrace updates to the dyn_ftrace records.
> > + *
> > + * It will iterate through all the available ftrace functions
> > + * (the ones that ftrace can have callbacks to) and set the flags
> > + * in the associated dyn_ftrace records.
> > + *
> > + * @inc: If true, the functions associated to @ops are added to
> > + *   the dyn_ftrace records, otherwise they are removed.
> > + */
> >  static bool __ftrace_hash_rec_update(struct ftrace_ops *ops,
> > -int filter_hash,
> >  bool inc)
> >  {
> > struct ftrace_hash *hash;
> > -   struct ftrace_hash *other_hash;
> > +   struct ftrace_hash *notrace_hash;
> > struct ftrace_page *pg;
> > struct dyn_ftrace *rec;
> > bool update = false;
> > @@ -1718,35 +1725,16 @@ static bool __ftrace_hash_rec_update(struct 
> > ftrace_ops *ops,
> > return false;
> >  
> > /*
> > -* In the filter_hash case:
> >  *   If the count is zero, we update all records.
> >  *   Otherwise we just update the item

Re: [PATCH 3/3] rust: add tracepoint support

2024-06-06 Thread Boqun Feng

On Thu, Jun 06, 2024 at 07:35:44PM +0200, Peter Zijlstra wrote:
> On Thu, Jun 06, 2024 at 08:49:06AM -0700, Boqun Feng wrote:
> 
> > Long-term plan is to 1) compile the C helpers in some IR and 2) inline
> > the helpers with Rust in IR-level, as what Gary has:
> > 
> > 
> > https://lore.kernel.org/rust-for-linux/20240529202817.3641974-1-g...@garyguo.net/
> 
> Urgh, that still needs us to maintain that silly list of helpers :-/
> 

But it's an improvement from the current stage, right? ;-)

> Can't we pretty please have clang parse the actual header files into IR
> and munge that into rust? So that we don't get to manually duplicate
> everything+dog.

That won't always work, because some of our kernel APIs are defined as
macros, and I don't think it's a trivial job to generate a macro
definition to a function definition so that it can be translated to
something in IR. We will have to do the macro -> function mapping
ourselves somewhere, if we want to inline the API across languages.

Regards,
Boqun

[PATCH v2] rpmsg: qcom_smd: Improve error handling for qcom_smd_parse_edge

2024-06-06 Thread Luca Weiss

When the mailbox driver has not probed yet, the error message "failed to
parse smd edge" is just going to confuse users, so improve the error
prints a bit.

Cover the last remaining exits from qcom_smd_parse_edge with proper
error prints, especially the one for the mbox_chan deserved
dev_err_probe to handle EPROBE_DEFER nicely. And add one for ipc_regmap
also to be complete.

With this done, we can remove the outer print completely.

Signed-off-by: Luca Weiss 
---
Changes in v2:
- Rebase on qcom for-next, drop dts patches which have been applied
- Improve error printing situation (Bjorn)
- Link to v1: 
https://lore.kernel.org/r/20240424-apcs-mboxes-v1-0-6556c47cb...@z3ntu.xyz
---
 drivers/rpmsg/qcom_smd.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/rpmsg/qcom_smd.c b/drivers/rpmsg/qcom_smd.c
index 43f601c84b4f..06e6ba653ea1 100644
--- a/drivers/rpmsg/qcom_smd.c
+++ b/drivers/rpmsg/qcom_smd.c
@@ -1369,7 +1369,8 @@ static int qcom_smd_parse_edge(struct device *dev,
edge->mbox_chan = mbox_request_channel(&edge->mbox_client, 0);
if (IS_ERR(edge->mbox_chan)) {
if (PTR_ERR(edge->mbox_chan) != -ENODEV) {
-   ret = PTR_ERR(edge->mbox_chan);
+   ret = dev_err_probe(dev, PTR_ERR(edge->mbox_chan),
+   "failed to acquire IPC mailbox\n");
goto put_node;
}
 
@@ -1386,6 +1387,7 @@ static int qcom_smd_parse_edge(struct device *dev,
of_node_put(syscon_np);
if (IS_ERR(edge->ipc_regmap)) {
ret = PTR_ERR(edge->ipc_regmap);
+   dev_err(dev, "failed to get regmap from syscon: %d\n", 
ret);
goto put_node;
}
 
@@ -1501,10 +1503,8 @@ struct qcom_smd_edge *qcom_smd_register_edge(struct 
device *parent,
}
 
ret = qcom_smd_parse_edge(&edge->dev, node, edge);
-   if (ret) {
-   dev_err(&edge->dev, "failed to parse smd edge\n");
+   if (ret)
goto unregister_dev;
-   }
 
ret = qcom_smd_create_chrdev(edge);
if (ret) {

---
base-commit: 2c79712cc83b172ce26c3086ced1c1fae087d8fb
change-id: 20240423-apcs-mboxes-12ee6c01a5b3

Best regards,
-- 
Luca Weiss

Re: [PATCH v3 4/6] module: script to generate offset ranges for builtin modules

2024-06-06 Thread Kris Van Hees

On Tue, May 21, 2024 at 01:53:27AM +0900, Masahiro Yamada wrote:
> On Fri, May 17, 2024 at 1:31???PM Kris Van Hees  
> wrote:
> >
> > The offset range data for builtin modules is generated using:
> >  - modules.builtin.modinfo: associates object files with module names
> >  - vmlinux.map: provides load order of sections and offset of first member
> > per section
> >  - vmlinux.o.map: provides offset of object file content per section
> >  - .*.cmd: build cmd file with KBUILD_MODFILE and KBUILD_MODNAME
> >
> > The generated data will look like:
> >
> > .text - = _text
> > .text baf0-cb10 amd_uncore
> > .text 0009bd10-0009c8e0 iosf_mbi
> > ...
> > .text 008e6660-008e9630 snd_soc_wcd_mbhc
> > .text 008e9630-008ea610 snd_soc_wcd9335 snd_soc_wcd934x snd_soc_wcd938x
> > .text 008ea610-008ea780 snd_soc_wcd9335
> > ...
> > .data - = _sdata
> > .data f020-f680 amd_uncore
> >
> > For each ELF section, it lists the offset of the first symbol.  This can
> > be used to determine the base address of the section at runtime.
> >
> > Next, it lists (in strict ascending order) offset ranges in that section
> > that cover the symbols of one or more builtin modules.  Multiple ranges
> > can apply to a single module, and ranges can be shared between modules.
> >
> > Signed-off-by: Kris Van Hees 
> > Reviewed-by: Nick Alcock 
> > ---
> > Changes since v2:
> >  - 1st arg to generate_builtin_ranges.awk is now modules.builtin.modinfo
> >  - Switched from using modules.builtin.objs to parsing .*.cmd files
> >  - Parse data from .*.cmd in generate_builtin_ranges.awk
> > ---
> >  scripts/generate_builtin_ranges.awk | 232 
> >  1 file changed, 232 insertions(+)
> >  create mode 100755 scripts/generate_builtin_ranges.awk
> >
> > diff --git a/scripts/generate_builtin_ranges.awk 
> > b/scripts/generate_builtin_ranges.awk
> > new file mode 100755
> > index 0..6975a9c7266d9
> > --- /dev/null
> > +++ b/scripts/generate_builtin_ranges.awk
> > @@ -0,0 +1,232 @@
> > +#!/usr/bin/gawk -f
> > +# SPDX-License-Identifier: GPL-2.0
> > +# generate_builtin_ranges.awk: Generate address range data for builtin 
> > modules
> > +# Written by Kris Van Hees 
> > +#
> > +# Usage: generate_builtin_ranges.awk modules.builtin.modinfo vmlinux.map \
> > +#  vmlinux.o.map > modules.builtin.ranges
> > +#
> > +
> > +BEGIN {
> > +   # modules.builtin.modinfo uses \0 as record separator
> > +   # All other files use \n.
> > +   RS = "[\n\0]";
> > +}
> 
> 
> Why do you want to parse modules.builtin.modinfo
> instead of modules.builtin?
> 
> modules.builtin uses \n separator.

Oh my, I completely overlooked modules.builtin.  Thank you!  That is indeed
much easier.

> > +
> > +# Return the module name(s) (if any) associated with the given object.
> > +#
> > +# If we have seen this object before, return information from the cache.
> > +# Otherwise, retrieve it from the corresponding .cmd file.
> > +#
> > +function get_module_info(fn, mod, obj, mfn, s) {
> 
> 
> There are 5 arguments, while the caller passes only 1 argument
> ( get_module_info($4) )

That is the way to be able to have local variables in an AWK function - every
variable mentioned in the function declaration is local to the function.  It
is an oddity in AWK.

> > +   if (fn in omod)
> > +   return omod[fn];
> > +
> > +   if (match(fn, /\/[^/]+$/) == 0)
> > +   return "";
> > +
> > +   obj = fn;
> > +   mod = "";
> > +   mfn = "";
> > +   fn = substr(fn, 1, RSTART) "." substr(fn, RSTART + 1) ".cmd";
> > +   if (getline s  > +   if (match(s, /DKBUILD_MODNAME=[^ ]+/) > 0) {
> > +   mod = substr(s, RSTART + 17, RLENGTH - 17);
> > +   gsub(/['"]/, "", mod);
> > +   gsub(/:/, " ", mod);
> > +   }
> > +
> > +   if (match(s, /DKBUILD_MODFILE=[^ ]+/) > 0) {
> > +   mfn = substr(s, RSTART + 17, RLENGTH - 17);
> > +   gsub(/['"]/, "", mfn);
> > +   gsub(/:/, " ", mfn);
> > +   }
> > +   }
> > +   close(fn);
> > +
> > +# tmp = $0;
> > +# $0 = s;
> > +# print mod " " mfn " " obj " " $NF;
> > +# $0 = tmp;
> > +
> > +   # A single module (common case) also reflects objects that are not 
> > part
> > +   # of a module.  Some of those objects have names that are also a 
> > module
> > +   # name (e.g. core).  We check the associated module file name, and 
> > if
> > +   # they do not match, the object is not part of a module.
> 
> 
> You do not need to use KBUILD_MODNAME.
> 
> Just use KBUILD_MODFILE only.
> If the same path is found in modules.builtin,
> it is a built-in module.
> 
> Its basename is modname.

Yes, that is true.  I can do all this based on KBUILD_MODFILE.  Thank you.
Adjusting the patch that way.

> One more question in a corner case.
> 
> How does your c

Re: [PATCH 1/3] rust: add static_call support

2024-06-06 Thread Miguel Ojeda

On Thu, Jun 6, 2024 at 7:19 PM Peter Zijlstra  wrote:
>
> This is absolutely unreadable gibberish -- how am I supposed to keep
> this in sync with the rest of the static_call infrastructure?

Yeah, they are macros, which look different from "normal" Rust code.

Is there something we could do to help here? I think Alice and others
would be happy to explain how it works and/or help maintain it in the
future if you prefer.

Cheers,
Miguel

Re: [PATCH] remoteproc: mediatek: Don't print error when optional scp handle is missing

2024-06-06 Thread Nícolas F . R . A . Prado

On Thu, Jun 06, 2024 at 12:50:56PM +0200, AngeloGioacchino Del Regno wrote:
> Il 05/06/24 21:35, Nícolas F. R. A. Prado ha scritto:
> > The scp_get() helper has two users: the mtk-vcodec and the mtk-mdp3
> > drivers. mdp3 considers the mediatek,scp phandle optional, and when it's
> > missing mdp3 will directly look for the scp node based on compatible.
> > 
> > For that reason printing an error in the helper when the handle is
> > missing is wrong, as it's not always an actual error. Besides, the other
> > user, vcodec, already prints an error message on its own when the helper
> > fails so this message doesn't add that much information.
> > 
> > Remove the error message from the helper. This gets rid of the deceptive
> > error on MT8195-Tomato:
> > 
> >mtk-mdp3 14001000.dma-controller: can't get SCP node
> 
> I'm way more for giving it a SCP handle instead of silencing this print... the
> SCP handle way *is* the correct one and I prefer it, as that'd also be the 
> only
> way to support Dual-Core SCP in MDP3.
> 
> I really want to see hardcoded stuff disappear and I really really *really* 
> want
> to see more consistency across DT nodes in MediaTek platforms.
> 
> So, still a one-line change, but do it on the devicetree instead of here 
> please.

Sure. So the missing scp phandle case is maintained exclusively for
backwards-compatibility. And we're ok that the old DTs will keep getting the
error.

I'll send a v2 adding the phandle in the DT instead.

Thanks,
Nícolas

[PATCH v2 1/2] dt-bindings: soc: qcom,smsm: Allow specifying mboxes instead of qcom,ipc

2024-06-06 Thread Luca Weiss

The qcom,ipc-N properties are essentially providing a reference to a
mailbox, so allow using the mboxes property to do the same in a more
structured way.

Since multiple SMSM hosts are supported, we need to be able to provide
the correct mailbox for each host. The old qcom,ipc-N properties map to
the mboxes property by index, starting at 0 since that's a valid SMSM
host also.

Mark the older qcom,ipc-N as deprecated and update the example with
mboxes.

Signed-off-by: Luca Weiss 
---
 .../devicetree/bindings/soc/qcom/qcom,smsm.yaml| 30 +++---
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/Documentation/devicetree/bindings/soc/qcom/qcom,smsm.yaml 
b/Documentation/devicetree/bindings/soc/qcom/qcom,smsm.yaml
index db67cf043256..4900215f26af 100644
--- a/Documentation/devicetree/bindings/soc/qcom/qcom,smsm.yaml
+++ b/Documentation/devicetree/bindings/soc/qcom/qcom,smsm.yaml
@@ -33,6 +33,14 @@ properties:
   specifier of the column in the subscription matrix representing the local
   processor.
 
+  mboxes:
+minItems: 1
+maxItems: 5
+description:
+  Reference to the mailbox representing the outgoing doorbell in APCS for
+  this client. Each entry represents the N:th remote processor by index
+  (0-indexed).
+
   '#size-cells':
 const: 0
 
@@ -47,6 +55,7 @@ patternProperties:
 description:
   Three entries specifying the outgoing ipc bit used for signaling the N:th
   remote processor.
+deprecated: true
 
   "@[0-9a-f]$":
 type: object
@@ -98,15 +107,18 @@ required:
   - '#address-cells'
   - '#size-cells'
 
-anyOf:
+oneOf:
   - required:
-  - qcom,ipc-1
-  - required:
-  - qcom,ipc-2
-  - required:
-  - qcom,ipc-3
-  - required:
-  - qcom,ipc-4
+  - mboxes
+  - anyOf:
+  - required:
+  - qcom,ipc-1
+  - required:
+  - qcom,ipc-2
+  - required:
+  - qcom,ipc-3
+  - required:
+  - qcom,ipc-4
 
 additionalProperties: false
 
@@ -122,7 +134,7 @@ examples:
 compatible = "qcom,smsm";
 #address-cells = <1>;
 #size-cells = <0>;
-qcom,ipc-3 = <&apcs 8 19>;
+mboxes = <0>, <0>, <0>, <&apcs 19>;
 
 apps_smsm: apps@0 {
 reg = <0>;

-- 
2.45.2

[PATCH v2 2/2] soc: qcom: smsm: Support using mailbox interface

2024-06-06 Thread Luca Weiss

Add support for using the mbox interface instead of manually writing to
the syscon. With this change the driver will attempt to get the mailbox
first, and if that fails it will fall back to the existing way of using
qcom,ipc-* properties and converting to syscon.

Reviewed-by: Konrad Dybcio 
Signed-off-by: Luca Weiss 
---
 drivers/soc/qcom/smsm.c | 51 -
 1 file changed, 50 insertions(+), 1 deletion(-)

diff --git a/drivers/soc/qcom/smsm.c b/drivers/soc/qcom/smsm.c
index e7c7e9a640a6..ffe78ae34386 100644
--- a/drivers/soc/qcom/smsm.c
+++ b/drivers/soc/qcom/smsm.c
@@ -5,6 +5,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -71,6 +72,7 @@ struct smsm_host;
  * @lock:  spinlock for read-modify-write of the outgoing state
  * @entries:   context for each of the entries
  * @hosts: context for each of the hosts
+ * @mbox_client: mailbox client handle
  */
 struct qcom_smsm {
struct device *dev;
@@ -88,6 +90,8 @@ struct qcom_smsm {
 
struct smsm_entry *entries;
struct smsm_host *hosts;
+
+   struct mbox_client mbox_client;
 };
 
 /**
@@ -120,11 +124,14 @@ struct smsm_entry {
  * @ipc_regmap:regmap for outgoing interrupt
  * @ipc_offset:offset in @ipc_regmap for outgoing interrupt
  * @ipc_bit:   bit in @ipc_regmap + @ipc_offset for outgoing interrupt
+ * @mbox_chan: apcs ipc mailbox channel handle
  */
 struct smsm_host {
struct regmap *ipc_regmap;
int ipc_offset;
int ipc_bit;
+
+   struct mbox_chan *mbox_chan;
 };
 
 /**
@@ -172,7 +179,13 @@ static int smsm_update_bits(void *data, u32 mask, u32 
value)
hostp = &smsm->hosts[host];
 
val = readl(smsm->subscription + host);
-   if (val & changes && hostp->ipc_regmap) {
+   if (!(val & changes))
+   continue;
+
+   if (hostp->mbox_chan) {
+   mbox_send_message(hostp->mbox_chan, NULL);
+   mbox_client_txdone(hostp->mbox_chan, 0);
+   } else if (hostp->ipc_regmap) {
regmap_write(hostp->ipc_regmap,
 hostp->ipc_offset,
 BIT(hostp->ipc_bit));
@@ -352,6 +365,28 @@ static const struct irq_domain_ops smsm_irq_ops = {
.xlate = irq_domain_xlate_twocell,
 };
 
+/**
+ * smsm_parse_mbox() - requests an mbox channel
+ * @smsm:  smsm driver context
+ * @host_id:   index of the remote host to be resolved
+ *
+ * Requests the desired channel using the mbox interface which is needed for
+ * sending the outgoing interrupts to a remove hosts - identified by @host_id.
+ */
+static int smsm_parse_mbox(struct qcom_smsm *smsm, unsigned int host_id)
+{
+   struct smsm_host *host = &smsm->hosts[host_id];
+   int ret = 0;
+
+   host->mbox_chan = mbox_request_channel(&smsm->mbox_client, host_id);
+   if (IS_ERR(host->mbox_chan)) {
+   ret = PTR_ERR(host->mbox_chan);
+   host->mbox_chan = NULL;
+   }
+
+   return ret;
+}
+
 /**
  * smsm_parse_ipc() - parses a qcom,ipc-%d device tree property
  * @smsm:  smsm driver context
@@ -521,8 +556,16 @@ static int qcom_smsm_probe(struct platform_device *pdev)
 "qcom,local-host",
 &smsm->local_host);
 
+   smsm->mbox_client.dev = &pdev->dev;
+   smsm->mbox_client.knows_txdone = true;
+
/* Parse the host properties */
for (id = 0; id < smsm->num_hosts; id++) {
+   /* Try using mbox interface first, otherwise fall back to 
syscon */
+   ret = smsm_parse_mbox(smsm, id);
+   if (!ret)
+   continue;
+
ret = smsm_parse_ipc(smsm, id);
if (ret < 0)
goto out_put;
@@ -609,6 +652,9 @@ static int qcom_smsm_probe(struct platform_device *pdev)
 
qcom_smem_state_unregister(smsm->state);
 out_put:
+   for (id = 0; id < smsm->num_hosts; id++)
+   mbox_free_channel(smsm->hosts[id].mbox_chan);
+
of_node_put(local_node);
return ret;
 }
@@ -622,6 +668,9 @@ static void qcom_smsm_remove(struct platform_device *pdev)
if (smsm->entries[id].domain)
irq_domain_remove(smsm->entries[id].domain);
 
+   for (id = 0; id < smsm->num_hosts; id++)
+   mbox_free_channel(smsm->hosts[id].mbox_chan);
+
qcom_smem_state_unregister(smsm->state);
 }
 

-- 
2.45.2

[PATCH v2 0/2] Support mailbox interface in qcom,smsm driver

2024-06-06 Thread Luca Weiss

Take a shot at converting the last driver that requires direct
"qcom,ipc*" syscon references in devicetree by allowing the smsm driver
to use the mailbox interface to achieve the same effect.

Still not sure if the devicetree bindings are the prettiest but they're
functional.

One alternative I'm thinking of is to use mbox-names to not have <0>
elements in dt, and reference the items by name from the driver?

e.g. this change for msm8226 could be represented differently.

-   qcom,ipc-1 = <&apcs 8 13>;
-   qcom,ipc-2 = <&apcs 8 9>;
-   qcom,ipc-3 = <&apcs 8 19>;
+   mboxes = <0>, <&apcs 13>, <&apcs 9>, <&apcs 19>;

vs. for example:

-   qcom,ipc-1 = <&apcs 8 13>;
-   qcom,ipc-2 = <&apcs 8 9>;
-   qcom,ipc-3 = <&apcs 8 19>;
+   mboxes = <&apcs 13>, <&apcs 9>, <&apcs 19>;
+   mbox-names = "ipc-1", "ipc-2", "ipc-3";

But also here the name with 'ipc-N' is probably not particularly
fitting?

Please let me know your thoughts and any suggestions.

Signed-off-by: Luca Weiss 
---
Changes in v2:
- Mark qcom,ipc-N as deprecated
- Update & expand description for mboxes property
- Don't duplicate example, just update existing one since qcom,ipc-N is
  deprecated now anyways
- Pick up tags
- Link to v1: 
https://lore.kernel.org/r/20240424-smsm-mbox-v1-0-555f3f442...@z3ntu.xyz

---
Luca Weiss (2):
  dt-bindings: soc: qcom,smsm: Allow specifying mboxes instead of qcom,ipc
  soc: qcom: smsm: Support using mailbox interface

 .../devicetree/bindings/soc/qcom/qcom,smsm.yaml| 30 +
 drivers/soc/qcom/smsm.c| 51 +-
 2 files changed, 71 insertions(+), 10 deletions(-)
---
base-commit: ee78a17615ad0cfdbbc27182b1047cd36c9d4d5f
change-id: 20240424-smsm-mbox-0666f35eae44

Best regards,
-- 
Luca Weiss

Re: [PATCH 3/3] rust: add tracepoint support

2024-06-06 Thread Peter Zijlstra

On Thu, Jun 06, 2024 at 12:00:36PM -0700, Boqun Feng wrote:
> On Thu, Jun 06, 2024 at 07:35:44PM +0200, Peter Zijlstra wrote:
> > On Thu, Jun 06, 2024 at 08:49:06AM -0700, Boqun Feng wrote:
> > 
> > > Long-term plan is to 1) compile the C helpers in some IR and 2) inline
> > > the helpers with Rust in IR-level, as what Gary has:
> > > 
> > >   
> > > https://lore.kernel.org/rust-for-linux/20240529202817.3641974-1-g...@garyguo.net/
> > 
> > Urgh, that still needs us to maintain that silly list of helpers :-/
> > 
> 
> But it's an improvement from the current stage, right? ;-)

Somewhat, but only marginal.

> > Can't we pretty please have clang parse the actual header files into IR
> > and munge that into rust? So that we don't get to manually duplicate
> > everything+dog.
> 
> That won't always work, because some of our kernel APIs are defined as
> macros, and I don't think it's a trivial job to generate a macro
> definition to a function definition so that it can be translated to
> something in IR. We will have to do the macro -> function mapping
> ourselves somewhere, if we want to inline the API across languages.

We can try and see how far we can get with moving a bunch of stuff into
inlines. There's quite a bit of simple CPP that could be inlines or
const objects I suppose.

Things like the tracepoints are of course glorious CPP abuse and are
never going to work.

But perhaps you can have an explicit 'eval-CPP on this here' construct
or whatnot. If I squit I see this paste! thingy (WTF's up with that !
operator?) to munge function names in the static_call thing. So
something like apply CPP from over there on this here can also be done
:-)

Re: [PATCH 1/3] rust: add static_call support

2024-06-06 Thread Peter Zijlstra

On Thu, Jun 06, 2024 at 09:09:00PM +0200, Miguel Ojeda wrote:
> On Thu, Jun 6, 2024 at 7:19 PM Peter Zijlstra  wrote:
> >
> > This is absolutely unreadable gibberish -- how am I supposed to keep
> > this in sync with the rest of the static_call infrastructure?
> 
> Yeah, they are macros, which look different from "normal" Rust code.

Macros like CPP ?

> Is there something we could do to help here? I think Alice and others
> would be happy to explain how it works and/or help maintain it in the
> future if you prefer.

Write a new language that looks more like C -- pretty please ? :-)

Mostly I would just really like you to just use arm/jump_label.h,
they're inline functions and should be doable with IR, no weirdo CPP
involved in this case.

[PATCH v3 2/4] tracing/timer: use __print_sym()

2024-06-06 Thread Johannes Berg

From: Johannes Berg 

Use the new __print_sym() in the timer tracing, just to show
how to convert something. This adds ~80 bytes of .text for a
saving of ~1.5K of data in my builds.

Note the format changes from

print fmt: "success=%d dependency=%s", REC->success, 
__print_symbolic(REC->dependency, { 0, "NONE" }, { (1 << 0), "POSIX_TIMER" }, { 
(1 << 1), "PERF_EVENTS" }, { (1 << 2), "SCHED" }, { (1 << 3), "CLOCK_UNSTABLE" 
}, { (1 << 4), "RCU" }, { (1 << 5), "RCU_EXP" })

to

print fmt: "success=%d dependency=%s", REC->success, 
__print_symbolic(REC->dependency, { 0, "NONE" }, { 1, "POSIX_TIMER" }, { 2, 
"PERF_EVENTS" }, { 4, "SCHED" }, { 8, "CLOCK_UNSTABLE" }, { 16, "RCU" }, { 32, 
"RCU_EXP" })

since the values are now just printed in the show function as
pure decimal values.

Signed-off-by: Johannes Berg 
---
 include/trace/events/timer.h | 22 +++---
 1 file changed, 7 insertions(+), 15 deletions(-)

diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h
index 1ef58a04fc57..d483abffed78 100644
--- a/include/trace/events/timer.h
+++ b/include/trace/events/timer.h
@@ -402,26 +402,18 @@ TRACE_EVENT(itimer_expire,
 #undef tick_dep_mask_name
 #undef tick_dep_name_end
 
-/* The MASK will convert to their bits and they need to be processed too */
-#define tick_dep_name(sdep) TRACE_DEFINE_ENUM(TICK_DEP_BIT_##sdep); \
-   TRACE_DEFINE_ENUM(TICK_DEP_MASK_##sdep);
-#define tick_dep_name_end(sdep)  TRACE_DEFINE_ENUM(TICK_DEP_BIT_##sdep); \
-   TRACE_DEFINE_ENUM(TICK_DEP_MASK_##sdep);
-/* NONE only has a mask defined for it */
-#define tick_dep_mask_name(sdep) TRACE_DEFINE_ENUM(TICK_DEP_MASK_##sdep);
-
-TICK_DEP_NAMES
-
-#undef tick_dep_name
-#undef tick_dep_mask_name
-#undef tick_dep_name_end
-
 #define tick_dep_name(sdep) { TICK_DEP_MASK_##sdep, #sdep },
 #define tick_dep_mask_name(sdep) { TICK_DEP_MASK_##sdep, #sdep },
 #define tick_dep_name_end(sdep) { TICK_DEP_MASK_##sdep, #sdep }
 
+TRACE_DEFINE_SYM_LIST(tick_dep_names, TICK_DEP_NAMES);
+
+#undef tick_dep_name
+#undef tick_dep_mask_name
+#undef tick_dep_name_end
+
 #define show_tick_dep_name(val)\
-   __print_symbolic(val, TICK_DEP_NAMES)
+   __print_sym(val, tick_dep_names)
 
 TRACE_EVENT(tick_stop,
 
-- 
2.45.1

[PATCH v3 0/4] tracing: improve symbolic printing

2024-06-06 Thread Johannes Berg

Completely forgot about this again ... here's another respin.

v2 was:
 - rebased on 6.9-rc1
 - always search for __print_sym() and get rid of the DYNPRINT flag
   and associated code; I think ideally we'll just remove the older
   __print_symbolic() entirely
 - use ':' as the separator instead of "//" since that makes searching
   for it much easier and it's still not a valid char in an identifier
 - fix RCU

v3:
 - fix #undef issues
 - fix drop_monitor default
 - rebase on linux-trace/for-next (there were no conflicts)
 - move net patches to 3/4
 - clarify symbol name matching logic (and remove ")" from it)

To recap, it's annoying to have

 irq/65-iwlwifi:-401   [000]22.79: kfree_skb:  ...  reason: 0x2

and much nicer to see

 irq/65-iwlwifi:-401   [000]22.79: kfree_skb:  ...  reason: 
RX_DROP_MONITOR

but this doesn't work now because __print_symbolic() can only
deal with a hard-coded list (which is actually really big.)

So here's __print_sym() which doesn't build the list into the
kernel image, but creates it at runtime. For userspace, it
will look the same as __print_symbolic() (it literally shows
__print_symbolic() to userspace) so no changes are needed,
but the actual list of values exposed to userspace in there
is built dynamically. For SKB drop reasons, this then has all
the reasons known when userspace queries the trace format.

I guess patch 3/4 should go through net-next, so not sure
how to handle this patch series. Or perhaps, as this will not
cause conflicts, in fact I've been rebasing it for a long time,
go through tracing anyway with an Ack from netdev? But I can
also just wait for the trace patch(es) to land and resubmit the
net patches after. Assuming this looks good at all :-)

Thanks,
johannes

%

[PATCH v3 3/4] net: dropreason: use new __print_sym() in tracing

2024-06-06 Thread Johannes Berg

From: Johannes Berg 

The __print_symbolic() could only ever print the core
drop reasons, since that's the way the infrastructure
works. Now that we have __print_sym() with all the
advantages mentioned in that commit, convert to that
and get all the drop reasons from all subsystems. As
we already have a list of them, that's really easy.

This is a little bit of .text (~100 bytes in my build)
and saves a lot of .data (~17k).

Signed-off-by: Johannes Berg 
---
 include/net/dropreason.h   |  5 +
 include/trace/events/skb.h | 16 +++---
 net/core/skbuff.c  | 43 ++
 3 files changed, 51 insertions(+), 13 deletions(-)

diff --git a/include/net/dropreason.h b/include/net/dropreason.h
index 56cb7be92244..c157070b5303 100644
--- a/include/net/dropreason.h
+++ b/include/net/dropreason.h
@@ -42,6 +42,11 @@ struct drop_reason_list {
 extern const struct drop_reason_list __rcu *
 drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_NUM];
 
+#ifdef CONFIG_TRACEPOINTS
+const char *drop_reason_lookup(unsigned long long value);
+void drop_reason_show(struct seq_file *m);
+#endif
+
 void drop_reasons_register_subsys(enum skb_drop_reason_subsys subsys,
  const struct drop_reason_list *list);
 void drop_reasons_unregister_subsys(enum skb_drop_reason_subsys subsys);
diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
index 07e0715628ec..8a1a63f9e796 100644
--- a/include/trace/events/skb.h
+++ b/include/trace/events/skb.h
@@ -8,15 +8,9 @@
 #include 
 #include 
 #include 
+#include 
 
-#undef FN
-#define FN(reason) TRACE_DEFINE_ENUM(SKB_DROP_REASON_##reason);
-DEFINE_DROP_REASON(FN, FN)
-
-#undef FN
-#undef FNe
-#define FN(reason) { SKB_DROP_REASON_##reason, #reason },
-#define FNe(reason){ SKB_DROP_REASON_##reason, #reason }
+TRACE_DEFINE_SYM_FNS(drop_reason, drop_reason_lookup, drop_reason_show);
 
 /*
  * Tracepoint for free an sk_buff:
@@ -44,13 +38,9 @@ TRACE_EVENT(kfree_skb,
 
TP_printk("skbaddr=%p protocol=%u location=%pS reason: %s",
  __entry->skbaddr, __entry->protocol, __entry->location,
- __print_symbolic(__entry->reason,
-  DEFINE_DROP_REASON(FN, FNe)))
+ __print_sym(__entry->reason, drop_reason ))
 );
 
-#undef FN
-#undef FNe
-
 TRACE_EVENT(consume_skb,
 
TP_PROTO(struct sk_buff *skb, void *location),
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 466999a7515e..cd1ea6c3e0f8 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -145,6 +145,49 @@ drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_NUM] = {
 };
 EXPORT_SYMBOL(drop_reasons_by_subsys);
 
+#ifdef CONFIG_TRACEPOINTS
+const char *drop_reason_lookup(unsigned long long value)
+{
+   unsigned long long subsys_id = value >> SKB_DROP_REASON_SUBSYS_SHIFT;
+   u32 reason = value & ~SKB_DROP_REASON_SUBSYS_MASK;
+   const struct drop_reason_list *subsys;
+
+   if (subsys_id >= SKB_DROP_REASON_SUBSYS_NUM)
+   return NULL;
+
+   subsys = rcu_dereference(drop_reasons_by_subsys[subsys_id]);
+   if (!subsys)
+   return NULL;
+   if (reason >= subsys->n_reasons)
+   return NULL;
+   return subsys->reasons[reason];
+}
+
+void drop_reason_show(struct seq_file *m)
+{
+   u32 subsys_id;
+
+   rcu_read_lock();
+   for (subsys_id = 0; subsys_id < SKB_DROP_REASON_SUBSYS_NUM; 
subsys_id++) {
+   const struct drop_reason_list *subsys;
+   u32 i;
+
+   subsys = rcu_dereference(drop_reasons_by_subsys[subsys_id]);
+   if (!subsys)
+   continue;
+
+   for (i = 0; i < subsys->n_reasons; i++) {
+   if (!subsys->reasons[i])
+   continue;
+   seq_printf(m, ", { %u, \"%s\" }",
+  (subsys_id << SKB_DROP_REASON_SUBSYS_SHIFT) 
| i,
+  subsys->reasons[i]);
+   }
+   }
+   rcu_read_unlock();
+}
+#endif
+
 /**
  * drop_reasons_register_subsys - register another drop reason subsystem
  * @subsys: the subsystem to register, must not be the core
-- 
2.45.1

[PATCH v3 1/4] tracing: add __print_sym() to replace __print_symbolic()

2024-06-06 Thread Johannes Berg

From: Johannes Berg 

The way __print_symbolic() works is limited and inefficient
in multiple ways:
 - you can only use it with a static list of symbols, but
   e.g. the SKB dropreasons are now a dynamic list

 - it builds the list in memory _three_ times, so it takes
   a lot of memory:
   - The print_fmt contains the list (since it's passed to
 the macro there). This actually contains the names
 _twice_, which is fixed up at runtime.
   - TRACE_DEFINE_ENUM() puts a 24-byte struct trace_eval_map
 for every entry, plus the string pointed to by it, which
 cannot be deduplicated with the strings in the print_fmt
   - The in-kernel symbolic printing creates yet another list
 of struct trace_print_flags for trace_print_symbols_seq()

 - it also requires runtime fixup during init, which is a lot
   of string parsing due to the print_fmt fixup

Introduce __print_sym() to - over time - replace the old one.
We can easily extend this also to __print_flags later, but I
cared only about the SKB dropreasons for now, which has only
__print_symbolic().

This new __print_sym() requires only a single list of items,
created by TRACE_DEFINE_SYM_LIST(), or can even use another
already existing list by using TRACE_DEFINE_SYM_FNS() with
lookup and show methods.

Then, instead of doing an init-time fixup, just do this at the
time when userspace reads the print_fmt. This way, dynamically
updated lists are possible.

For userspace, nothing actually changes, because the print_fmt
is shown exactly the same way the old __print_symbolic() was.

This adds about 4k .text in my test builds, but that'll be
more than paid for by the actual conversions.

Signed-off-by: Johannes Berg 
---
v2:
 - fix RCU
 - use ':' as separator to simplify the code, that's
   still not valid in a C identifier
v3:
 - add missing #undef lines (reported by Simon Horman)
 - clarify name is not NUL-terminated and fix logic
   for the comparison for that
---
 include/asm-generic/vmlinux.lds.h  |  3 +-
 include/linux/module.h |  2 +
 include/linux/trace_events.h   |  7 ++
 include/linux/tracepoint.h | 20 +
 include/trace/stages/init.h| 55 +
 include/trace/stages/stage2_data_offsets.h |  6 ++
 include/trace/stages/stage3_trace_output.h |  9 ++
 include/trace/stages/stage7_class_define.h |  3 +
 kernel/module/main.c   |  3 +
 kernel/trace/trace_events.c| 95 +-
 kernel/trace/trace_output.c| 45 ++
 11 files changed, 245 insertions(+), 3 deletions(-)

diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index 5703526d6ebf..8275a06bcaee 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -258,7 +258,8 @@
 #define FTRACE_EVENTS()
\
. = ALIGN(8);   \
BOUNDED_SECTION(_ftrace_events) \
-   BOUNDED_SECTION_BY(_ftrace_eval_map, _ftrace_eval_maps)
+   BOUNDED_SECTION_BY(_ftrace_eval_map, _ftrace_eval_maps) \
+   BOUNDED_SECTION(_ftrace_sym_defs)
 #else
 #define FTRACE_EVENTS()
 #endif
diff --git a/include/linux/module.h b/include/linux/module.h
index ffa1c603163c..7256762d59e1 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -524,6 +524,8 @@ struct module {
unsigned int num_trace_events;
struct trace_eval_map **trace_evals;
unsigned int num_trace_evals;
+   struct trace_sym_def **trace_sym_defs;
+   unsigned int num_trace_sym_defs;
 #endif
 #ifdef CONFIG_FTRACE_MCOUNT_RECORD
unsigned int num_ftrace_callsites;
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 9df3e2973626..2743280c9a46 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -27,6 +27,13 @@ const char *trace_print_flags_seq(struct trace_seq *p, const 
char *delim,
 const char *trace_print_symbols_seq(struct trace_seq *p, unsigned long val,
const struct trace_print_flags 
*symbol_array);
 
+const char *trace_print_sym_seq(struct trace_seq *p, unsigned long long val,
+   const char *(*lookup)(unsigned long long val));
+const char *trace_sym_lookup(const struct trace_sym_entry *list,
+size_t len, unsigned long long value);
+void trace_sym_show(struct seq_file *m,
+   const struct trace_sym_entry *list, size_t len);
+
 #if BITS_PER_LONG == 32
 const char *trace_print_flags_seq_u64(struct trace_seq *p, const char *delim,
  unsigned long long flags,
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 689b6d71590e..cc3b387953d1 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -31,6 +31,24 @@ struct trace_eval_map {

[PATCH v3 4/4] net: drop_monitor: use drop_reason_lookup()

2024-06-06 Thread Johannes Berg

From: Johannes Berg 

Now that we have drop_reason_lookup(), we can just use it for
drop_monitor as well, rather than exporting the list itself.

Signed-off-by: Johannes Berg 
---
v3:
 - look up SKB_DROP_REASON_NOT_SPECIFIED if initial lookup
   returns NULL, to preserve previous behaviour
---
 include/net/dropreason.h |  4 
 net/core/drop_monitor.c  | 20 +---
 net/core/skbuff.c|  6 +++---
 3 files changed, 8 insertions(+), 22 deletions(-)

diff --git a/include/net/dropreason.h b/include/net/dropreason.h
index c157070b5303..0e2195ccf2cd 100644
--- a/include/net/dropreason.h
+++ b/include/net/dropreason.h
@@ -38,10 +38,6 @@ struct drop_reason_list {
size_t n_reasons;
 };
 
-/* Note: due to dynamic registrations, access must be under RCU */
-extern const struct drop_reason_list __rcu *
-drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_NUM];
-
 #ifdef CONFIG_TRACEPOINTS
 const char *drop_reason_lookup(unsigned long long value);
 void drop_reason_show(struct seq_file *m);
diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
index 430ed18f8584..fddf6b68bf06 100644
--- a/net/core/drop_monitor.c
+++ b/net/core/drop_monitor.c
@@ -610,9 +610,8 @@ static int net_dm_packet_report_fill(struct sk_buff *msg, 
struct sk_buff *skb,
 size_t payload_len)
 {
struct net_dm_skb_cb *cb = NET_DM_SKB_CB(skb);
-   const struct drop_reason_list *list = NULL;
-   unsigned int subsys, subsys_reason;
char buf[NET_DM_MAX_SYMBOL_LEN];
+   const char *reason_str;
struct nlattr *attr;
void *hdr;
int rc;
@@ -630,19 +629,10 @@ static int net_dm_packet_report_fill(struct sk_buff *msg, 
struct sk_buff *skb,
goto nla_put_failure;
 
rcu_read_lock();
-   subsys = u32_get_bits(cb->reason, SKB_DROP_REASON_SUBSYS_MASK);
-   if (subsys < SKB_DROP_REASON_SUBSYS_NUM)
-   list = rcu_dereference(drop_reasons_by_subsys[subsys]);
-   subsys_reason = cb->reason & ~SKB_DROP_REASON_SUBSYS_MASK;
-   if (!list ||
-   subsys_reason >= list->n_reasons ||
-   !list->reasons[subsys_reason] ||
-   strlen(list->reasons[subsys_reason]) > NET_DM_MAX_REASON_LEN) {
-   list = 
rcu_dereference(drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_CORE]);
-   subsys_reason = SKB_DROP_REASON_NOT_SPECIFIED;
-   }
-   if (nla_put_string(msg, NET_DM_ATTR_REASON,
-  list->reasons[subsys_reason])) {
+   reason_str = drop_reason_lookup(cb->reason);
+   if (unlikely(!reason_str))
+   reason_str = drop_reason_lookup(SKB_DROP_REASON_NOT_SPECIFIED);
+   if (nla_put_string(msg, NET_DM_ATTR_REASON, reason_str)) {
rcu_read_unlock();
goto nla_put_failure;
}
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index cd1ea6c3e0f8..bd4fb7410284 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -139,13 +139,11 @@ static const struct drop_reason_list drop_reasons_core = {
.n_reasons = ARRAY_SIZE(drop_reasons),
 };
 
-const struct drop_reason_list __rcu *
+static const struct drop_reason_list __rcu *
 drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_NUM] = {
[SKB_DROP_REASON_SUBSYS_CORE] = RCU_INITIALIZER(&drop_reasons_core),
 };
-EXPORT_SYMBOL(drop_reasons_by_subsys);
 
-#ifdef CONFIG_TRACEPOINTS
 const char *drop_reason_lookup(unsigned long long value)
 {
unsigned long long subsys_id = value >> SKB_DROP_REASON_SUBSYS_SHIFT;
@@ -162,7 +160,9 @@ const char *drop_reason_lookup(unsigned long long value)
return NULL;
return subsys->reasons[reason];
 }
+EXPORT_SYMBOL(drop_reason_lookup);
 
+#ifdef CONFIG_TRACEPOINTS
 void drop_reason_show(struct seq_file *m)
 {
u32 subsys_id;
-- 
2.45.1

Re: [PATCH v4] drivers: remoteproc: xlnx: add attach detach support

2024-06-06 Thread Mathieu Poirier

On Wed, 5 Jun 2024 at 11:47, Tanmay Shah  wrote:
>
>
>
> On 6/4/24 3:23 PM, Bjorn Andersson wrote:
> > On Mon, Jun 03, 2024 at 01:34:38PM -0700, Tanmay Shah wrote:
> >> It is possible that remote processor is already running before
> >> linux boot or remoteproc platform driver probe. Implement required
> >> remoteproc framework ops to provide resource table address and
> >> connect or disconnect with remote processor in such case.
> >>
> >
> > I know if changes the current style for this driver, but could we drop
> > "drivers: " from the subject prefix, to align changes to this driver
> > with others?
> >
>
> Ack. Will be fixed. Is it convention not to have "drivers" ?

"drivers" is not needed.

> If so, from now on, I will format subject prefix: /:
>

That will be just fine.

> > Regards,
> > Bjorn
>

Re: [PATCH v4] drivers: remoteproc: xlnx: add attach detach support

2024-06-06 Thread Mathieu Poirier

On Wed, Jun 05, 2024 at 12:45:17PM -0500, Tanmay Shah wrote:
> 
> 
> On 6/4/24 10:34 AM, Mathieu Poirier wrote:
> 
> Hi Mathieu,
> 
> Thanks for reviews.
> Please find my comments below.
> 
> > Hi Tanmay,
> > 
> > On Mon, Jun 03, 2024 at 01:34:38PM -0700, Tanmay Shah wrote:
> >> It is possible that remote processor is already running before
> >> linux boot or remoteproc platform driver probe. Implement required
> >> remoteproc framework ops to provide resource table address and
> >> connect or disconnect with remote processor in such case.
> >> 
> >> Signed-off-by: Tanmay Shah 
> >> 
> >> ---
> >> Changes in v4:
> >>   - Move change log out of commit text
> >> 
> >> Changes in v3:
> >> 
> >>   - Drop SRAM patch from the series
> >>   - Change type from "struct resource_table *" to void __iomem *
> >>   - Change comment format from /** to /*
> >>   - Remove unmap of resource table va address during detach, allowing
> >> attach-detach-reattach use case.
> >>   - Unmap rsc_data_va after retrieving resource table data structure.
> >>   - Unmap resource table va during driver remove op
> >> 
> >> Changes in v2:
> >> 
> >>   - Fix typecast warnings reported using sparse tool.
> >>   - Fix following sparse warnings:
> >> 
> >> drivers/remoteproc/xlnx_r5_remoteproc.c:827:21: sparse: warning: incorrect 
> >> type in assignment (different address spaces)
> >> drivers/remoteproc/xlnx_r5_remoteproc.c:844:18: sparse: warning: incorrect 
> >> type in assignment (different address spaces)
> >> drivers/remoteproc/xlnx_r5_remoteproc.c:898:24: sparse: warning: incorrect 
> >> type in argument 1 (different address spaces)
> >>  drivers/remoteproc/xlnx_r5_remoteproc.c | 173 +++-
> >>  1 file changed, 169 insertions(+), 4 deletions(-)
> >> 
> >> diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c 
> >> b/drivers/remoteproc/xlnx_r5_remoteproc.c
> >> index 84243d1dff9f..6898d4761566 100644
> >> --- a/drivers/remoteproc/xlnx_r5_remoteproc.c
> >> +++ b/drivers/remoteproc/xlnx_r5_remoteproc.c
> >> @@ -25,6 +25,10 @@
> >>  /* RX mailbox client buffer max length */
> >>  #define MBOX_CLIENT_BUF_MAX   (IPI_BUF_LEN_MAX + \
> >> sizeof(struct zynqmp_ipi_message))
> >> +
> >> +#define RSC_TBL_XLNX_MAGIC((uint32_t)'x' << 24 | (uint32_t)'a' << 
> >> 16 | \
> >> +   (uint32_t)'m' << 8 | (uint32_t)'p')
> >> +
> >>  /*
> >>   * settings for RPU cluster mode which
> >>   * reflects possible values of xlnx,cluster-mode dt-property
> >> @@ -73,6 +77,26 @@ struct mbox_info {
> >>struct mbox_chan *rx_chan;
> >>  };
> >>  
> >> +/**
> >> + * struct rsc_tbl_data
> >> + *
> >> + * Platform specific data structure used to sync resource table address.
> >> + * It's important to maintain order and size of each field on remote side.
> >> + *
> >> + * @version: version of data structure
> >> + * @magic_num: 32-bit magic number.
> >> + * @comp_magic_num: complement of above magic number
> >> + * @rsc_tbl_size: resource table size
> >> + * @rsc_tbl: resource table address
> >> + */
> >> +struct rsc_tbl_data {
> >> +  const int version;
> >> +  const u32 magic_num;
> >> +  const u32 comp_magic_num;
> > 
> > I thought we agreed on making the magic number a u64 - did I get this wrong?
> > 
> 
> Looks like I missed this comment from previous reviews, so didn't address it.
> Thanks for pointing this.
> 
> So I think having two 32-bit numbers with proper name, implies what is 
> expected and less chance of errors.
> With 64-bit number, it's easy to create errors when assigning magic number.
> 
> However, if 64-bit number is preferred, I will change it and test it.
> Please let me know.

I was under the impression we had agreed on a u64 but that may just be my own
interpretation.  Things can stay as they are now.

> 
> 
> >> +  const u32 rsc_tbl_size;
> >> +  const uintptr_t rsc_tbl;
> >> +} __packed;
> >> +
> >>  /*
> >>   * Hardcoded TCM bank values. This will stay in driver to maintain 
> >> backward
> >>   * compatibility with device-tree that does not have TCM information.
> >> @@ -95,20 +119,24 @@ static const struct mem_bank_data 
> >> zynqmp_tcm_banks_lockstep[] = {
> >>  /**
> >>   * struct zynqmp_r5_core
> >>   *
> >> + * @rsc_tbl_va: resource table virtual address
> >>   * @dev: device of RPU instance
> >>   * @np: device node of RPU instance
> >>   * @tcm_bank_count: number TCM banks accessible to this RPU
> >>   * @tcm_banks: array of each TCM bank data
> >>   * @rproc: rproc handle
> >> + * @rsc_tbl_size: resource table size retrieved from remote
> >>   * @pm_domain_id: RPU CPU power domain id
> >>   * @ipi: pointer to mailbox information
> >>   */
> >>  struct zynqmp_r5_core {
> >> +  void __iomem *rsc_tbl_va;
> >>struct device *dev;
> >>struct device_node *np;
> >>int tcm_bank_count;
> >>struct mem_bank_data **tcm_banks;
> >>struct rproc *rproc;
> >> +  u32 rsc_tbl_size;
> >>u32 pm_domain_id;
> >>struct mbox_info *ipi;
> >>  };
>

[PATCH v3 01/13] ring-buffer: Allow mapped field to be set without mapping

2024-06-06 Thread Steven Rostedt

From: "Steven Rostedt (Google)" 

In preparation for having the ring buffer mapped to a dedicated location,
which will have the same restrictions as user space memory mapped buffers,
allow it to use the "mapped" field of the ring_buffer_per_cpu structure
without having the user space meta page mapping.

When this starts using the mapped field, it will need to handle adding a
user space mapping (and removing it) from a ring buffer that is using a
dedicated memory range.

Signed-off-by: Steven Rostedt (Google) 
---
 kernel/trace/ring_buffer.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 28853966aa9a..78beaccf9c8c 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -5224,6 +5224,9 @@ static void rb_update_meta_page(struct 
ring_buffer_per_cpu *cpu_buffer)
 {
struct trace_buffer_meta *meta = cpu_buffer->meta_page;
 
+   if (!meta)
+   return;
+
meta->reader.read = cpu_buffer->reader_page->read;
meta->reader.id = cpu_buffer->reader_page->id;
meta->reader.lost_events = cpu_buffer->lost_events;
@@ -6167,7 +6170,7 @@ rb_get_mapped_buffer(struct trace_buffer *buffer, int cpu)
 
mutex_lock(&cpu_buffer->mapping_lock);
 
-   if (!cpu_buffer->mapped) {
+   if (!cpu_buffer->mapped || !cpu_buffer->meta_page) {
mutex_unlock(&cpu_buffer->mapping_lock);
return ERR_PTR(-ENODEV);
}
@@ -6359,12 +6362,13 @@ int ring_buffer_map(struct trace_buffer *buffer, int 
cpu,
 */
raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
rb_setup_ids_meta_page(cpu_buffer, subbuf_ids);
+
raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
 
err = __rb_map_vma(cpu_buffer, vma);
if (!err) {
raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
-   cpu_buffer->mapped = 1;
+   cpu_buffer->mapped++;
raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
} else {
kfree(cpu_buffer->subbuf_ids);
@@ -6403,7 +6407,8 @@ int ring_buffer_unmap(struct trace_buffer *buffer, int 
cpu)
mutex_lock(&buffer->mutex);
raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
 
-   cpu_buffer->mapped = 0;
+   WARN_ON_ONCE(!cpu_buffer->mapped);
+   cpu_buffer->mapped--;
 
raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
 
-- 
2.43.0

[PATCH v3 02/13] ring-buffer: Add ring_buffer_alloc_range()

2024-06-06 Thread Steven Rostedt

From: "Steven Rostedt (Google)" 

In preparation to allowing the trace ring buffer to be allocated in a
range of memory that is persistent across reboots, add
ring_buffer_alloc_range(). It takes a contiguous range of memory and will
split it up evenly for the per CPU ring buffers.

If there's not enough memory to handle all CPUs with the minimum size, it
will fail to allocate the ring buffer.

Signed-off-by: Steven Rostedt (Google) 
---
 include/linux/ring_buffer.h |  17 +++
 kernel/trace/ring_buffer.c  | 239 ++--
 2 files changed, 220 insertions(+), 36 deletions(-)

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 96d2140b471e..a50b0223b1d3 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -89,6 +89,11 @@ void ring_buffer_discard_commit(struct trace_buffer *buffer,
 struct trace_buffer *
 __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key 
*key);
 
+struct trace_buffer *__ring_buffer_alloc_range(unsigned long size, unsigned 
flags,
+  int order, unsigned long start,
+  unsigned long range_size,
+  struct lock_class_key *key);
+
 /*
  * Because the ring buffer is generic, if other users of the ring buffer get
  * traced by ftrace, it can produce lockdep warnings. We need to keep each
@@ -100,6 +105,18 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, 
struct lock_class_key *k
__ring_buffer_alloc((size), (flags), &__key);   \
 })
 
+/*
+ * Because the ring buffer is generic, if other users of the ring buffer get
+ * traced by ftrace, it can produce lockdep warnings. We need to keep each
+ * ring buffer's lock class separate.
+ */
+#define ring_buffer_alloc_range(size, flags, order, start, range_size) \
+({ \
+   static struct lock_class_key __key; \
+   __ring_buffer_alloc_range((size), (flags), (order), (start),\
+ (range_size), &__key);\
+})
+
 typedef bool (*ring_buffer_cond_fn)(void *data);
 int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full,
 ring_buffer_cond_fn cond, void *data);
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 78beaccf9c8c..53abe7916f2b 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -42,6 +42,9 @@
 
 static void update_pages_handler(struct work_struct *work);
 
+struct ring_buffer_meta {
+};
+
 /*
  * The ring buffer header is special. We must manually up keep it.
  */
@@ -342,7 +345,8 @@ struct buffer_page {
local_t  entries;   /* entries on this page */
unsigned longreal_end;  /* real end of data */
unsigned order; /* order of the page */
-   u32  id;/* ID for external mapping */
+   u32  id:30; /* ID for external mapping */
+   u32  range:1;   /* Mapped via a range */
struct buffer_data_page *page;  /* Actual data page */
 };
 
@@ -373,7 +377,9 @@ static __always_inline unsigned int rb_page_commit(struct 
buffer_page *bpage)
 
 static void free_buffer_page(struct buffer_page *bpage)
 {
-   free_pages((unsigned long)bpage->page, bpage->order);
+   /* Range pages are not to be freed */
+   if (!bpage->range)
+   free_pages((unsigned long)bpage->page, bpage->order);
kfree(bpage);
 }
 
@@ -523,6 +529,9 @@ struct trace_buffer {
struct rb_irq_work  irq_work;
booltime_stamp_abs;
 
+   unsigned long   range_addr_start;
+   unsigned long   range_addr_end;
+
unsigned intsubbuf_size;
unsigned intsubbuf_order;
unsigned intmax_data_size;
@@ -1490,9 +1499,70 @@ static void rb_check_pages(struct ring_buffer_per_cpu 
*cpu_buffer)
}
 }
 
+/*
+ * Take an address, add the meta data size as well as the array of
+ * array subbuffer indexes, then align it to a subbuffer size.
+ *
+ * This is used to help find the next per cpu subbuffer within a mapped range.
+ */
+static unsigned long
+rb_range_align_subbuf(unsigned long addr, int subbuf_size, int nr_subbufs)
+{
+   addr += sizeof(struct ring_buffer_meta) +
+   sizeof(int) * nr_subbufs;
+   return ALIGN(addr, subbuf_size);
+}
+
+/*
+ * Return a specific sub-buffer for a given @cpu defined by @idx.
+ */
+static void *rb_range_buffer(struct trace_buffer *buffer, int cpu, int 
nr_pages, int idx)
+{
+   unsigned long ptr;
+   int subbuf_size = buffer->subbuf_size + BUF_PAGE_HDR_SIZE;
+   int nr_subbufs;
+
+   /* Include the reader page */
+   nr_subbufs = nr_pag

[PATCH v3 00/13] tracing: Persistent traces across a reboot or crash

2024-06-06 Thread Steven Rostedt



This is a way to map a ring buffer instance across reboots.
The requirement is that you have a memory region that is not erased.
I tested this on a Debian VM running on qemu on a Debian server,
and even tested it on a baremetal box running Fedora. I was
surprised that it worked on the baremetal box, but it does so
surprisingly consistently.

This series does not require the ring buffer mapping, but simply
takes a physical address that has been reserved via memmap (on x86 only)
An example of the kernel command line is:

  memmap=12M$0x28540  trace_instance=boot_mapped@0x28540:12M

The above will reserve 12M at physical address 0x28540 (done by the
existing memmap command line option), and then the trace_instance option was
extended to take an address and size (@0x28540:12M). It will then vmap()
that address and allocate a ring buffer in it. If a ring buffer already
exists, it will use it and expose the contents to user space.

The memory reserved is used by the ring buffer of this instance.
It acts like a memory mapped instance so it has some limitations. It does not
allow snapshots nor does it allow tracers which use a snapshot buffer (like
irqsoff and wakeup tracers).

On boot up, when setting up the ring buffer, it looks at the current
content and does a vigorous test to see if the content is valid.
It even walks the events in all the sub-buffers to make sure the
ring buffer meta data is correct. If it determines that the content
is valid, it will reconstruct the ring buffer to use the content
it has found.

If the buffer is valid, on the next boot, the boot_mapped instance
will contain the data from the previous boot. You can cat the
trace or trace_pipe file, or even run trace-cmd extract on it to
make a trace.dat file that holds the date. This is much better than
dealing with a ftrace_dump_on_opps (I wish I had this a decade ago!)

There are still some limitations of this buffer. One is that it assumes
that the kernel you are booting back into is the same one that crashed.
At least the trace_events (like sched_switch and friends) all have the
same ids. This would be true with the same kernel as the ids are determined
at link time.

Module events could possible be a problem as the ids may not match.

This version of the patch series saves a text function and a data
string address in the persistent memory, and this is used to calculate
the delta between text and data addresses of the new boot up. Now
function tracing and "%pS" still work across boots. Even the RCU
trace events that point to static strings work as well!

The delta is exported by a new file in the instance called "last_boot_info"
that has something like this:

 # cat last_boot_info
 text delta:-268435456
 data delta:-268435456

This can be used by trace-cmd that reads the trace_pipe_raw data and
now can figure out how to map the print_formats and kallsyms to the raw
data in the buffers.

This can be used to debug kernel shutdown. I ran the following:

  # trace-cmd start -B boot_mapped -p function
  # reboot

[after reboot]

  # trace-cmd show -B boot_mapped | tail -20
   swapper/0-1   [000] d..1.63.479667: preempt_count_add <-delay_tsc
   swapper/0-1   [000] d..2.63.479669: preempt_count_sub <-delay_tsc
   swapper/0-1   [000] d..1.63.479671: disable_local_APIC 
<-native_stop_other_cpus
   swapper/0-1   [000] d..1.63.479673: clear_local_APIC.part.0 
<-disable_local_APIC
   swapper/0-1   [000] d..1.63.479716: mcheck_cpu_clear 
<-native_stop_other_cpus
   swapper/0-1   [000] d..1.63.479718: mce_intel_feature_clear 
<-native_stop_other_cpus
   swapper/0-1   [000] d..1.63.479720: lmce_supported 
<-mce_intel_feature_clear
   swapper/0-1   [000] d..1.63.479732: lapic_shutdown 
<-native_machine_shutdown
   swapper/0-1   [000] d..1.63.479735: disable_local_APIC 
<-native_machine_shutdown
   swapper/0-1   [000] d..1.63.479736: clear_local_APIC.part.0 
<-disable_local_APIC
   swapper/0-1   [000] d..1.63.479763: restore_boot_irq_mode 
<-native_machine_shutdown
   swapper/0-1   [000] d..1.63.479763: native_restore_boot_irq_mode 
<-native_machine_shutdown
   swapper/0-1   [000] d..1.63.479764: disconnect_bsp_APIC 
<-native_machine_shutdown
   swapper/0-1   [000] d..1.63.479777: hpet_disable 
<-native_machine_shutdown
   swapper/0-1   [000] d..1.63.479778: iommu_shutdown_noop 
<-native_machine_restart
   swapper/0-1   [000] d..1.63.479779: 
native_machine_emergency_restart <-__do_sys_reboot
   swapper/0-1   [000] d..1.63.479779: tboot_shutdown 
<-native_machine_emergency_restart
   swapper/0-1   [000] d..1.63.479790: acpi_reboot 
<-native_machine_emergency_restart
   swapper/0-1   [000] d..1.63.479791: acpi_reset <-acpi_reboot
   swapper/0-1   [000] d..1.63.479791: acpi_os_write_p

[PATCH v3 04/13] tracing: Implement creating an instance based on a given memory region

2024-06-06 Thread Steven Rostedt

From: "Steven Rostedt (Google)" 

Allow for creating a new instance by passing in an address and size to map
the ring buffer for the instance to.

This will allow features like a pstore memory mapped region to be used for
an tracing instance ring buffer that can be retrieved from one boot to the
next.

Signed-off-by: Steven Rostedt (Google) 
---
 kernel/trace/trace.c | 50 +++-
 kernel/trace/trace.h |  4 
 2 files changed, 44 insertions(+), 10 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 578a49ff5c32..ff2b504fbe00 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4921,6 +4921,11 @@ static int tracing_open(struct inode *inode, struct file 
*file)
 static bool
 trace_ok_for_array(struct tracer *t, struct trace_array *tr)
 {
+#ifdef CONFIG_TRACER_SNAPSHOT
+   /* arrays with mapped buffer range do not have snapshots */
+   if (tr->range_addr_start && t->use_max_tr)
+   return false;
+#endif
return (tr->flags & TRACE_ARRAY_FL_GLOBAL) || t->allow_instances;
 }
 
@@ -8664,11 +8669,13 @@ tracing_init_tracefs_percpu(struct trace_array *tr, 
long cpu)
tr, cpu, &tracing_entries_fops);
 
 #ifdef CONFIG_TRACER_SNAPSHOT
-   trace_create_cpu_file("snapshot", TRACE_MODE_WRITE, d_cpu,
-   tr, cpu, &snapshot_fops);
+   if (!tr->range_addr_start) {
+   trace_create_cpu_file("snapshot", TRACE_MODE_WRITE, d_cpu,
+ tr, cpu, &snapshot_fops);
 
-   trace_create_cpu_file("snapshot_raw", TRACE_MODE_READ, d_cpu,
-   tr, cpu, &snapshot_raw_fops);
+   trace_create_cpu_file("snapshot_raw", TRACE_MODE_READ, d_cpu,
+ tr, cpu, &snapshot_raw_fops);
+   }
 #endif
 }
 
@@ -9205,7 +9212,18 @@ allocate_trace_buffer(struct trace_array *tr, struct 
array_buffer *buf, int size
 
buf->tr = tr;
 
-   buf->buffer = ring_buffer_alloc(size, rb_flags);
+   if (tr->range_addr_start && tr->range_addr_size) {
+   buf->buffer = ring_buffer_alloc_range(size, rb_flags, 0,
+ tr->range_addr_start,
+ tr->range_addr_size);
+   /*
+* This is basically the same as a mapped buffer,
+* with the same restrictions.
+*/
+   tr->mapped++;
+   } else {
+   buf->buffer = ring_buffer_alloc(size, rb_flags);
+   }
if (!buf->buffer)
return -ENOMEM;
 
@@ -9242,6 +9260,10 @@ static int allocate_trace_buffers(struct trace_array 
*tr, int size)
return ret;
 
 #ifdef CONFIG_TRACER_MAX_TRACE
+   /* Fix mapped buffer trace arrays do not have snapshot buffers */
+   if (tr->range_addr_start)
+   return 0;
+
ret = allocate_trace_buffer(tr, &tr->max_buffer,
allocate_snapshot ? size : 1);
if (MEM_FAIL(ret, "Failed to allocate trace buffer\n")) {
@@ -9342,7 +9364,9 @@ static int trace_array_create_dir(struct trace_array *tr)
 }
 
 static struct trace_array *
-trace_array_create_systems(const char *name, const char *systems)
+trace_array_create_systems(const char *name, const char *systems,
+  unsigned long range_addr_start,
+  unsigned long range_addr_size)
 {
struct trace_array *tr;
int ret;
@@ -9368,6 +9392,10 @@ trace_array_create_systems(const char *name, const char 
*systems)
goto out_free_tr;
}
 
+   /* Only for boot up memory mapped ring buffers */
+   tr->range_addr_start = range_addr_start;
+   tr->range_addr_size = range_addr_size;
+
tr->trace_flags = global_trace.trace_flags & ~ZEROED_TRACE_FLAGS;
 
cpumask_copy(tr->tracing_cpumask, cpu_all_mask);
@@ -9425,7 +9453,7 @@ trace_array_create_systems(const char *name, const char 
*systems)
 
 static struct trace_array *trace_array_create(const char *name)
 {
-   return trace_array_create_systems(name, NULL);
+   return trace_array_create_systems(name, NULL, 0, 0);
 }
 
 static int instance_mkdir(const char *name)
@@ -9479,7 +9507,7 @@ struct trace_array *trace_array_get_by_name(const char 
*name, const char *system
goto out_unlock;
}
 
-   tr = trace_array_create_systems(name, systems);
+   tr = trace_array_create_systems(name, systems, 0, 0);
 
if (IS_ERR(tr))
tr = NULL;
@@ -9672,8 +9700,10 @@ init_tracer_tracefs(struct trace_array *tr, struct 
dentry *d_tracer)
MEM_FAIL(1, "Could not allocate function filter files");
 
 #ifdef CONFIG_TRACER_SNAPSHOT
-   trace_create_file("snapshot", TRACE_MODE_WRITE, d_tracer,
- tr, &snapshot_fops);
+   if (!tr->ran

[PATCH v3 03/13] ring-buffer: Add ring_buffer_meta data

2024-06-06 Thread Steven Rostedt

From: "Steven Rostedt (Google)" 

Populate the ring_buffer_meta array. It holds the pointer to the
head_buffer (next to read), the commit_buffer (next to write) the size of
the sub-buffers, number of sub-buffers and an array that keeps track of
the order of the sub-buffers.

This information will be stored in the persistent memory to help on reboot
to reconstruct the ring buffer.

Signed-off-by: Steven Rostedt (Google) 
---
 kernel/trace/ring_buffer.c | 209 -
 1 file changed, 184 insertions(+), 25 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 53abe7916f2b..385dc1750fc7 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -43,6 +43,11 @@
 static void update_pages_handler(struct work_struct *work);
 
 struct ring_buffer_meta {
+   unsigned long   head_buffer;
+   unsigned long   commit_buffer;
+   __u32   subbuf_size;
+   __u32   nr_subbufs;
+   int buffers[];
 };
 
 /*
@@ -500,6 +505,7 @@ struct ring_buffer_per_cpu {
struct mutexmapping_lock;
unsigned long   *subbuf_ids;/* ID to subbuf VA */
struct trace_buffer_meta*meta_page;
+   struct ring_buffer_meta *ring_meta;
 
/* ring buffer pages to update, > 0 to add, < 0 to remove */
longnr_pages_to_update;
@@ -1260,6 +1266,11 @@ static void rb_head_page_activate(struct 
ring_buffer_per_cpu *cpu_buffer)
 * Set the previous list pointer to have the HEAD flag.
 */
rb_set_list_to_head(head->list.prev);
+
+   if (cpu_buffer->ring_meta) {
+   struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
+   meta->head_buffer = (unsigned long)head->page;
+   }
 }
 
 static void rb_list_head_clear(struct list_head *list)
@@ -1514,51 +1525,127 @@ rb_range_align_subbuf(unsigned long addr, int 
subbuf_size, int nr_subbufs)
 }
 
 /*
- * Return a specific sub-buffer for a given @cpu defined by @idx.
+ * Return the ring_buffer_meta for a given @cpu.
  */
-static void *rb_range_buffer(struct trace_buffer *buffer, int cpu, int 
nr_pages, int idx)
+static void *rb_range_meta(struct trace_buffer *buffer, int nr_pages, int cpu)
 {
-   unsigned long ptr;
int subbuf_size = buffer->subbuf_size + BUF_PAGE_HDR_SIZE;
+   unsigned long ptr = buffer->range_addr_start;
+   struct ring_buffer_meta *meta;
int nr_subbufs;
 
-   /* Include the reader page */
-   nr_subbufs = nr_pages + 1;
+   if (!ptr)
+   return NULL;
+
+   /* When nr_pages passed in is zero, the first meta has already been 
initialized */
+   if (!nr_pages) {
+   meta = (struct ring_buffer_meta *)ptr;
+   nr_subbufs = meta->nr_subbufs;
+   } else {
+   meta = NULL;
+   /* Include the reader page */
+   nr_subbufs = nr_pages + 1;
+   }
 
/*
 * The first chunk may not be subbuffer aligned, where as
 * the rest of the chunks are.
 */
-   ptr = buffer->range_addr_start;
-   ptr = rb_range_align_subbuf(ptr, subbuf_size, nr_subbufs);
if (cpu) {
-   unsigned long p;
-
-   ptr += subbuf_size * nr_subbufs;
-
-   /* Save the beginning of this CPU chunk */
-   p = ptr;
-
ptr = rb_range_align_subbuf(ptr, subbuf_size, nr_subbufs);
+   ptr += subbuf_size * nr_subbufs;
 
/* We can use multiplication to find chunks greater than 1 */
if (cpu > 1) {
unsigned long size;
+   unsigned long p;
 
+   /* Save the beginning of this CPU chunk */
+   p = ptr;
+   ptr = rb_range_align_subbuf(ptr, subbuf_size, 
nr_subbufs);
ptr += subbuf_size * nr_subbufs;
 
/* Now all chunks after this are the same size */
size = ptr - p;
ptr += size * (cpu - 2);
-
-   ptr = rb_range_align_subbuf(ptr, subbuf_size, 
nr_subbufs);
}
}
-   if (ptr + subbuf_size * nr_subbufs > buffer->range_addr_end)
+   return (void *)ptr;
+}
+
+/* Return the start of subbufs given the meta pointer */
+static void *rb_subbufs_from_meta(struct ring_buffer_meta *meta)
+{
+   int subbuf_size = meta->subbuf_size;
+   unsigned long ptr;
+
+   ptr = (unsigned long)meta;
+   ptr = rb_range_align_subbuf(ptr, subbuf_size, meta->nr_subbufs);
+
+   return (void *)ptr;
+}
+
+/*
+ * Return a specific sub-buffer for a given @cpu defined by @idx.
+ */
+static void *rb_range_buffer(struct ring_buffer_per_cpu *cpu_buffer, int idx)
+{
+   struct ring_buffer_meta *meta;
+   unsigned long ptr;
+   int subbuf_size;
+
+   meta =

[PATCH v3 06/13] ring-buffer: Add test if range of boot buffer is valid

2024-06-06 Thread Steven Rostedt

From: "Steven Rostedt (Google)" 

Add a test against the ring buffer memory range to see if it has valid
data. The ring_buffer_meta structure is given a new field called
"first_buffer" which holds the address of the first sub-buffer. This is
used to both determine if the other fields are valid as well as finding
the offset between the old addresses of the sub-buffer from the previous
boot to the new addresses of the current boot.

Since the values for nr_subbufs and subbuf_size is to be the same, check
if the values in the meta page match the values calculated.

Take the range of the first_buffer and the total size of all the buffers
and make sure the saved head_buffer and commit_buffer fall in the range.

Iterate through all the sub-buffers to make sure that the values in the
sub-buffer "commit" field (the field that holds the amount of data on the
sub-buffer) is within the end of the sub-buffer. Also check the index
array to make sure that all the indexes are within nr_subbufs.

Signed-off-by: Steven Rostedt (Google) 
---
 kernel/trace/ring_buffer.c | 143 ++---
 1 file changed, 135 insertions(+), 8 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 093c73c617cc..aecd4a7d62be 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -45,6 +45,7 @@
 static void update_pages_handler(struct work_struct *work);
 
 struct ring_buffer_meta {
+   unsigned long   first_buffer;
unsigned long   head_buffer;
unsigned long   commit_buffer;
__u32   subbuf_size;
@@ -1617,21 +1618,103 @@ static void *rb_range_buffer(struct 
ring_buffer_per_cpu *cpu_buffer, int idx)
return (void *)ptr;
 }
 
+/*
+ * See if the existing memory contains valid ring buffer data.
+ * As the previous kernel must be the same as this kernel, all
+ * the calculations (size of buffers and number of buffers)
+ * must be the same.
+ */
+static bool rb_meta_valid(struct ring_buffer_meta *meta, int cpu,
+ struct trace_buffer *buffer, int nr_pages)
+{
+   int subbuf_size = PAGE_SIZE;
+   struct buffer_data_page *subbuf;
+   unsigned long buffers_start;
+   unsigned long buffers_end;
+   int i;
+
+   /* The subbuffer's size and number of subbuffers must match */
+   if (meta->subbuf_size != subbuf_size ||
+   meta->nr_subbufs != nr_pages + 1) {
+   pr_info("Ring buffer boot meta [%d] mismatch of 
subbuf_size/nr_pages\n", cpu);
+   return false;
+   }
+
+   buffers_start = meta->first_buffer;
+   buffers_end = meta->first_buffer + (subbuf_size * meta->nr_subbufs);
+
+   /* Is the head and commit buffers within the range of buffers? */
+   if (meta->head_buffer < buffers_start ||
+   meta->head_buffer >= buffers_end) {
+   pr_info("Ring buffer boot meta [%d] head buffer out of 
range\n", cpu);
+   return false;
+   }
+
+   if (meta->commit_buffer < buffers_start ||
+   meta->commit_buffer >= buffers_end) {
+   pr_info("Ring buffer boot meta [%d] commit buffer out of 
range\n", cpu);
+   return false;
+   }
+
+   subbuf = rb_subbufs_from_meta(meta);
+
+   /* Is the meta buffers and the subbufs themselves have correct data? */
+   for (i = 0; i < meta->nr_subbufs; i++) {
+   if (meta->buffers[i] < 0 ||
+   meta->buffers[i] >= meta->nr_subbufs) {
+   pr_info("Ring buffer boot meta [%d] array out of 
range\n", cpu);
+   return false;
+   }
+
+   if ((unsigned)local_read(&subbuf->commit) > subbuf_size) {
+   pr_info("Ring buffer boot meta [%d] buffer invalid 
commit\n", cpu);
+   return false;
+   }
+
+   subbuf = (void *)subbuf + subbuf_size;
+   }
+
+   pr_info("Ring buffer meta is from previous boot!\n");
+   return true;
+}
+
 static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages)
 {
struct ring_buffer_meta *meta;
+   unsigned long delta;
void *subbuf;
int cpu;
int i;
 
for (cpu = 0; cpu < nr_cpu_ids; cpu++) {
+   void *next_meta;
+
meta = rb_range_meta(buffer, nr_pages, cpu);
 
+   if (rb_meta_valid(meta, cpu, buffer, nr_pages)) {
+   /* Make the mappings match the current address */
+   subbuf = rb_subbufs_from_meta(meta);
+   delta = (unsigned long)subbuf - meta->first_buffer;
+   meta->first_buffer += delta;
+   meta->head_buffer += delta;
+   meta->commit_buffer += delta;
+   continue;
+   }
+
+   if (cpu < nr_cpu_ids - 1)
+   next_meta = rb_range_meta(buffer, nr_pages, cpu + 1);
+

[PATCH v3 05/13] ring-buffer: Add output of ring buffer meta page

2024-06-06 Thread Steven Rostedt

From: "Steven Rostedt (Google)" 

Add a buffer_meta per-cpu file for the trace instance that is mapped to
boot memory. This shows the current meta-data and can be used by user
space tools to record off the current mappings to help reconstruct the
ring buffer after a reboot.

It does not expose any virtual addresses, just indexes into the sub-buffer
pages.

Signed-off-by: Steven Rostedt (Google) 
---
 kernel/trace/ring_buffer.c | 77 ++
 kernel/trace/trace.c   | 30 ++-
 kernel/trace/trace.h   |  2 +
 3 files changed, 107 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 385dc1750fc7..093c73c617cc 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -32,6 +32,8 @@
 #include 
 #include 
 
+#include "trace.h"
+
 /*
  * The "absolute" timestamp in the buffer is only 59 bits.
  * If a clock has the 5 MSBs set, it needs to be saved and
@@ -1646,6 +1648,81 @@ static void rb_range_meta_init(struct trace_buffer 
*buffer, int nr_pages)
}
 }
 
+static void *rbm_start(struct seq_file *m, loff_t *pos)
+{
+   struct ring_buffer_per_cpu *cpu_buffer = m->private;
+   struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
+   unsigned long val;
+
+   if (!meta)
+   return NULL;
+
+   if (*pos > meta->nr_subbufs)
+   return NULL;
+
+   val = *pos;
+   val++;
+
+   return (void *)val;
+}
+
+static void *rbm_next(struct seq_file *m, void *v, loff_t *pos)
+{
+   (*pos)++;
+
+   return rbm_start(m, pos);
+}
+
+static int rb_meta_subbuf_idx(struct ring_buffer_meta *meta, void *subbuf);
+
+static int rbm_show(struct seq_file *m, void *v)
+{
+   struct ring_buffer_per_cpu *cpu_buffer = m->private;
+   struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
+   unsigned long val = (unsigned long)v;
+
+   if (val == 1) {
+   seq_printf(m, "head_buffer:   %d\n",
+  rb_meta_subbuf_idx(meta, (void *)meta->head_buffer));
+   seq_printf(m, "commit_buffer: %d\n",
+  rb_meta_subbuf_idx(meta, (void 
*)meta->commit_buffer));
+   seq_printf(m, "subbuf_size:   %d\n", meta->subbuf_size);
+   seq_printf(m, "nr_subbufs:%d\n", meta->nr_subbufs);
+   return 0;
+   }
+
+   val -= 2;
+   seq_printf(m, "buffer[%ld]:%d\n", val, meta->buffers[val]);
+
+   return 0;
+}
+
+static void rbm_stop(struct seq_file *m, void *p)
+{
+}
+
+static const struct seq_operations rb_meta_seq_ops = {
+   .start  = rbm_start,
+   .next   = rbm_next,
+   .show   = rbm_show,
+   .stop   = rbm_stop,
+};
+
+int ring_buffer_meta_seq_init(struct file *file, struct trace_buffer *buffer, 
int cpu)
+{
+   struct seq_file *m;
+   int ret;
+
+   ret = seq_open(file, &rb_meta_seq_ops);
+   if (ret)
+   return ret;
+
+   m = file->private_data;
+   m->private = buffer->buffers[cpu];
+
+   return 0;
+}
+
 static int __rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
long nr_pages, struct list_head *pages)
 {
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index ff2b504fbe00..622fe670949d 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -5018,7 +5018,7 @@ static int show_traces_open(struct inode *inode, struct 
file *file)
return 0;
 }
 
-static int show_traces_release(struct inode *inode, struct file *file)
+static int tracing_seq_release(struct inode *inode, struct file *file)
 {
struct trace_array *tr = inode->i_private;
 
@@ -5059,7 +5059,7 @@ static const struct file_operations show_traces_fops = {
.open   = show_traces_open,
.read   = seq_read,
.llseek = seq_lseek,
-   .release= show_traces_release,
+   .release= tracing_seq_release,
 };
 
 static ssize_t
@@ -6860,6 +6860,22 @@ tracing_total_entries_read(struct file *filp, char 
__user *ubuf,
return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
 }
 
+static int tracing_buffer_meta_open(struct inode *inode, struct file *filp)
+{
+   struct trace_array *tr = inode->i_private;
+   int cpu = tracing_get_cpu(inode);
+   int ret;
+
+   ret = tracing_check_open_get_tr(tr);
+   if (ret)
+   return ret;
+
+   ret = ring_buffer_meta_seq_init(filp, tr->array_buffer.buffer, cpu);
+   if (ret < 0)
+   __trace_array_put(tr);
+   return ret;
+}
+
 static ssize_t
 tracing_free_buffer_write(struct file *filp, const char __user *ubuf,
  size_t cnt, loff_t *ppos)
@@ -7436,6 +7452,13 @@ static const struct file_operations tracing_entries_fops 
= {
.release= tracing_release_generic_tr,
 };
 
+static const struct file_operations tracing_buffer_meta_fops = {
+   .open

[PATCH v3 08/13] tracing: Add option to use memmapped memory for trace boot instance

2024-06-06 Thread Steven Rostedt

From: "Steven Rostedt (Google)" 

Add an option to the trace_instance kernel command line parameter that
allows it to use the reserved memory from memmap boot parameter.

  memmap=12M$0x28450 trace_instance=boot_mapped@0x28450:12M

The above will reserves 12 megs at the physical address 0x28450.
The second parameter will create a "boot_mapped" instance and use the
memory reserved as the memory for the ring buffer.

That will create an instance called "boot_mapped":

  /sys/kernel/tracing/instances/boot_mapped

Note, because the ring buffer is using a defined memory ranged, it will
act just like a memory mapped ring buffer. It will not have a snapshot
buffer, as it can't swap out the buffer. The snapshot files as well as any
tracers that uses a snapshot will not be present in the boot_mapped
instance.

Cc: linux...@kvack.org
Signed-off-by: Steven Rostedt (Google) 
---
 .../admin-guide/kernel-parameters.txt |  9 +++
 kernel/trace/trace.c  | 75 +--
 2 files changed, 78 insertions(+), 6 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index b600df82669d..ff26b6094e79 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -6754,6 +6754,15 @@
the same thing would happen if it was left off). The 
irq_handler_entry
event, and all events under the "initcall" system.
 
+   If memory has been reserved (see memmap for x86), the 
instance
+   can use that memory:
+
+   memmap=12M$0x28450 
trace_instance=boot_map@0x28450:12M
+
+   The above will create a "boot_map" instance that uses 
the physical
+   memory at 0x28450 that is 12Megs. The per CPU 
buffers of that
+   instance will be split up accordingly.
+
trace_options=[option-list]
[FTRACE] Enable or disable tracer options at boot.
The option-list is a comma delimited list of options
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 622fe670949d..13e89023f33b 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -9504,6 +9504,31 @@ static int instance_mkdir(const char *name)
return ret;
 }
 
+static u64 map_pages(u64 start, u64 size)
+{
+   struct page **pages;
+   phys_addr_t page_start;
+   unsigned int page_count;
+   unsigned int i;
+   void *vaddr;
+
+   page_count = DIV_ROUND_UP(size, PAGE_SIZE);
+
+   page_start = start;
+   pages = kmalloc_array(page_count, sizeof(struct page *), GFP_KERNEL);
+   if (!pages)
+   return 0;
+
+   for (i = 0; i < page_count; i++) {
+   phys_addr_t addr = page_start + i * PAGE_SIZE;
+   pages[i] = pfn_to_page(addr >> PAGE_SHIFT);
+   }
+   vaddr = vmap(pages, page_count, VM_MAP, PAGE_KERNEL);
+   kfree(pages);
+
+   return (u64)(unsigned long)vaddr;
+}
+
 /**
  * trace_array_get_by_name - Create/Lookup a trace array, given its name.
  * @name: The name of the trace array to be looked up/created.
@@ -10350,6 +10375,7 @@ __init static void enable_instances(void)
 {
struct trace_array *tr;
char *curr_str;
+   char *name;
char *str;
char *tok;
 
@@ -10358,19 +10384,56 @@ __init static void enable_instances(void)
str = boot_instance_info;
 
while ((curr_str = strsep(&str, "\t"))) {
+   unsigned long start = 0;
+   unsigned long size = 0;
+   unsigned long addr = 0;
 
tok = strsep(&curr_str, ",");
+   name = strsep(&tok, "@");
+   if (tok) {
+   start = memparse(tok, &tok);
+   if (!start) {
+   pr_warn("Tracing: Invalid boot instance address 
for %s\n",
+   name);
+   continue;
+   }
+   }
 
-   if (IS_ENABLED(CONFIG_TRACER_MAX_TRACE))
-   do_allocate_snapshot(tok);
+   if (start) {
+   if (*tok != ':') {
+   pr_warn("Tracing: No size specified for 
instance %s\n", name);
+   continue;
+   }
+   tok++;
+   size = memparse(tok, &tok);
+   if (!size) {
+   pr_warn("Tracing: Invalid boot instance size 
for %s\n",
+   name);
+   continue;
+   }
+   addr = map_pages(start, size);
+   if (addr) {
+   pr_info("Tracing: mapped

[PATCH v3 07/13] ring-buffer: Validate boot range memory events

2024-06-06 Thread Steven Rostedt

From: "Steven Rostedt (Google)" 

Make sure all the events in each of the sub-buffers that were mapped in a
memory region are valid. This moves the code that walks the buffers for
time-stamp validation out of the CONFIG_RING_BUFFER_VALIDATE_TIME_DELTAS
ifdef block and is used to validate the content. Only the ring buffer
event meta data and time stamps are checked and not the data load.

This also has a second purpose. The buffer_page structure that points to
the data sub-buffers has accounting that keeps track of the number of
events that are on the sub-buffer. This updates that counter as well. That
counter is used in reading the buffer and knowing if the ring buffer is
empty or not.

Signed-off-by: Steven Rostedt (Google) 
---
 kernel/trace/ring_buffer.c | 188 +
 1 file changed, 151 insertions(+), 37 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index aecd4a7d62be..1e959ff71855 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1674,10 +1674,151 @@ static bool rb_meta_valid(struct ring_buffer_meta 
*meta, int cpu,
subbuf = (void *)subbuf + subbuf_size;
}
 
-   pr_info("Ring buffer meta is from previous boot!\n");
return true;
 }
 
+static int rb_meta_subbuf_idx(struct ring_buffer_meta *meta, void *subbuf);
+
+static int rb_read_data_buffer(struct buffer_data_page *dpage, int tail, int 
cpu,
+  unsigned long long *timestamp, u64 *delta_ptr)
+{
+   struct ring_buffer_event *event;
+   u64 ts, delta;
+   int events = 0;
+   int e;
+
+   *delta_ptr = 0;
+   *timestamp = 0;
+
+   ts = dpage->time_stamp;
+
+   for (e = 0; e < tail; e += rb_event_length(event)) {
+
+   event = (struct ring_buffer_event *)(dpage->data + e);
+
+   switch (event->type_len) {
+
+   case RINGBUF_TYPE_TIME_EXTEND:
+   delta = rb_event_time_stamp(event);
+   ts += delta;
+   break;
+
+   case RINGBUF_TYPE_TIME_STAMP:
+   delta = rb_event_time_stamp(event);
+   if (delta < ts) {
+   *delta_ptr = delta;
+   *timestamp = ts;
+   return -1;
+   }
+   ts = delta;
+   break;
+
+   case RINGBUF_TYPE_PADDING:
+   if (event->time_delta == 1)
+   break;
+   fallthrough;
+   case RINGBUF_TYPE_DATA:
+   events++;
+   ts += event->time_delta;
+   break;
+
+   default:
+   return -1;
+   }
+   }
+   *timestamp = ts;
+   return events;
+}
+
+static int rb_validate_buffer(struct buffer_data_page *dpage, int cpu)
+{
+   unsigned long long ts;
+   u64 delta;
+   int tail;
+
+   tail = local_read(&dpage->commit);
+   return rb_read_data_buffer(dpage, tail, cpu, &ts, &delta);
+}
+
+/* If the meta data has been validated, now validate the events */
+static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
+{
+   struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
+   struct buffer_page *head_page;
+   unsigned long entry_bytes = 0;
+   unsigned long entries = 0;
+   int ret;
+   int i;
+
+   if (!meta || !meta->head_buffer)
+   return;
+
+   /* Do the reader page first */
+   ret = rb_validate_buffer(cpu_buffer->reader_page->page, 
cpu_buffer->cpu);
+   if (ret < 0) {
+   pr_info("Ring buffer reader page is invalid\n");
+   goto invalid;
+   }
+   entries += ret;
+   entry_bytes += local_read(&cpu_buffer->reader_page->page->commit);
+   local_set(&cpu_buffer->reader_page->entries, ret);
+
+   head_page = cpu_buffer->head_page;
+
+   /* If both the head and commit are on the reader_page then we are done. 
*/
+   if (head_page == cpu_buffer->reader_page &&
+   head_page == cpu_buffer->commit_page)
+   goto done;
+
+   /* Iterate until finding the commit page */
+   for (i = 0; i < meta->nr_subbufs + 1; i++, rb_inc_page(&head_page)) {
+
+   /* Reader page has already been done */
+   if (head_page == cpu_buffer->reader_page)
+   continue;
+
+   ret = rb_validate_buffer(head_page->page, cpu_buffer->cpu);
+   if (ret < 0) {
+   pr_info("Ring buffer meta [%d] invalid buffer page\n",
+   cpu_buffer->cpu);
+   goto invalid;
+   }
+   entries += ret;
+   entry_bytes += local_read(&head_page->page->commit);
+   local_set(&cpu_buffe

[PATCH v3 12/13] tracing: Update function tracing output for previous boot buffer

2024-06-06 Thread Steven Rostedt

From: "Steven Rostedt (Google)" 

For a persistent ring buffer that is saved across boots, if function
tracing was performed in the previous boot, it only saves the address of
the functions and uses "%pS" to print their names. But the current boot,
those functions may be in different locations. The persistent meta-data
saves the text delta between the two boots and can be used to find the
address of the saved function of where it is located in the current boot.

Signed-off-by: Steven Rostedt (Google) 
---
 kernel/trace/trace_output.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index d8b302d01083..b9d2c64c0648 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -990,8 +990,11 @@ enum print_line_t trace_nop_print(struct trace_iterator 
*iter, int flags,
 }
 
 static void print_fn_trace(struct trace_seq *s, unsigned long ip,
-  unsigned long parent_ip, int flags)
+  unsigned long parent_ip, long delta, int flags)
 {
+   ip += delta;
+   parent_ip += delta;
+
seq_print_ip_sym(s, ip, flags);
 
if ((flags & TRACE_ITER_PRINT_PARENT) && parent_ip) {
@@ -1009,7 +1012,7 @@ static enum print_line_t trace_fn_trace(struct 
trace_iterator *iter, int flags,
 
trace_assign_type(field, iter->ent);
 
-   print_fn_trace(s, field->ip, field->parent_ip, flags);
+   print_fn_trace(s, field->ip, field->parent_ip, iter->tr->text_delta, 
flags);
trace_seq_putc(s, '\n');
 
return trace_handle_return(s);
@@ -1674,7 +1677,7 @@ trace_func_repeats_print(struct trace_iterator *iter, int 
flags,
 
trace_assign_type(field, iter->ent);
 
-   print_fn_trace(s, field->ip, field->parent_ip, flags);
+   print_fn_trace(s, field->ip, field->parent_ip, iter->tr->text_delta, 
flags);
trace_seq_printf(s, " (repeats: %u, last_ts:", field->count);
trace_print_time(s, iter,
 iter->ts - FUNC_REPEATS_GET_DELTA_TS(field));
-- 
2.43.0

[PATCH v3 11/13] tracing: Handle old buffer mappings for event strings and functions

2024-06-06 Thread Steven Rostedt

From: "Steven Rostedt (Google)" 

Use the saved text_delta and data_delta of a persistent memory mapped ring
buffer that was saved from a previous boot, and use the delta in the trace
event print output so that strings and functions show up normally.

That is, for an event like trace_kmalloc() that prints the callsite via
"%pS", if it used the address saved in the ring buffer it will not match
the function that was saved in the previous boot if the kernel remaps
itself between boots.

For RCU events that point to saved static strings where only the address
of the string is saved in the ring buffer, it too will be adjusted to
point to where the string is on the current boot.

Signed-off-by: Steven Rostedt (Google) 
---
 kernel/trace/trace.c | 42 +++---
 1 file changed, 39 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 7dbb20b31ae5..b641f6f1db6a 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -3671,8 +3671,11 @@ static void test_can_verify(void)
 void trace_check_vprintf(struct trace_iterator *iter, const char *fmt,
 va_list ap)
 {
+   long text_delta = iter->tr->text_delta;
+   long data_delta = iter->tr->data_delta;
const char *p = fmt;
const char *str;
+   bool good;
int i, j;
 
if (WARN_ON_ONCE(!fmt))
@@ -3691,7 +3694,10 @@ void trace_check_vprintf(struct trace_iterator *iter, 
const char *fmt,
 
j = 0;
 
-   /* We only care about %s and variants */
+   /*
+* We only care about %s and variants
+* as well as %p[sS] if delta is non-zero
+*/
for (i = 0; p[i]; i++) {
if (i + 1 >= iter->fmt_size) {
/*
@@ -3720,6 +3726,11 @@ void trace_check_vprintf(struct trace_iterator *iter, 
const char *fmt,
}
if (p[i+j] == 's')
break;
+
+   if (text_delta && p[i+1] == 'p' &&
+   ((p[i+2] == 's' || p[i+2] == 'S')))
+   break;
+
star = false;
}
j = 0;
@@ -3733,6 +3744,24 @@ void trace_check_vprintf(struct trace_iterator *iter, 
const char *fmt,
iter->fmt[i] = '\0';
trace_seq_vprintf(&iter->seq, iter->fmt, ap);
 
+   /* Add delta to %pS pointers */
+   if (p[i+1] == 'p') {
+   unsigned long addr;
+   char fmt[4];
+
+   fmt[0] = '%';
+   fmt[1] = 'p';
+   fmt[2] = p[i+2]; /* Either %ps or %pS */
+   fmt[3] = '\0';
+
+   addr = va_arg(ap, unsigned long);
+   addr += text_delta;
+   trace_seq_printf(&iter->seq, fmt, (void *)addr);
+
+   p += i + 3;
+   continue;
+   }
+
/*
 * If iter->seq is full, the above call no longer guarantees
 * that ap is in sync with fmt processing, and further calls
@@ -3751,6 +3780,14 @@ void trace_check_vprintf(struct trace_iterator *iter, 
const char *fmt,
/* The ap now points to the string data of the %s */
str = va_arg(ap, const char *);
 
+   good = trace_safe_str(iter, str, star, len);
+
+   /* Could be from the last boot */
+   if (data_delta && !good) {
+   str += data_delta;
+   good = trace_safe_str(iter, str, star, len);
+   }
+
/*
 * If you hit this warning, it is likely that the
 * trace event in question used %s on a string that
@@ -3760,8 +3797,7 @@ void trace_check_vprintf(struct trace_iterator *iter, 
const char *fmt,
 * instead. See samples/trace_events/trace-events-sample.h
 * for reference.
 */
-   if (WARN_ONCE(!trace_safe_str(iter, str, star, len),
- "fmt: '%s' current_buffer: '%s'",
+   if (WARN_ONCE(!good, "fmt: '%s' current_buffer: '%s'",
  fmt, seq_buf_str(&iter->seq.seq))) {
int ret;
 
-- 
2.43.0

[PATCH v3 09/13] ring-buffer: Save text and data locations in mapped meta data

2024-06-06 Thread Steven Rostedt

From: "Steven Rostedt (Google)" 

When a ring buffer is mapped to a specific address, save the address of a
text function and some data. This will be used to determine the delta
between the last boot and the current boot for pointers to functions as
well as to data.

Signed-off-by: Steven Rostedt (Google) 
---
 kernel/trace/ring_buffer.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 1e959ff71855..755ef278276b 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -45,6 +45,8 @@
 static void update_pages_handler(struct work_struct *work);
 
 struct ring_buffer_meta {
+   unsigned long   text_addr;
+   unsigned long   data_addr;
unsigned long   first_buffer;
unsigned long   head_buffer;
unsigned long   commit_buffer;
@@ -541,6 +543,9 @@ struct trace_buffer {
unsigned long   range_addr_start;
unsigned long   range_addr_end;
 
+   longlast_text_delta;
+   longlast_data_delta;
+
unsigned intsubbuf_size;
unsigned intsubbuf_order;
unsigned intmax_data_size;
@@ -1819,10 +1824,15 @@ static void rb_meta_validate_events(struct 
ring_buffer_per_cpu *cpu_buffer)
}
 }
 
+/* Used to calculate data delta */
+static char rb_data_ptr[] = "";
+
 static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages)
 {
struct ring_buffer_meta *meta;
unsigned long delta;
+   unsigned long this_text = (unsigned long)rb_range_meta_init;
+   unsigned long this_data = (unsigned long)rb_data_ptr;
void *subbuf;
int cpu;
int i;
@@ -1839,6 +1849,10 @@ static void rb_range_meta_init(struct trace_buffer 
*buffer, int nr_pages)
meta->first_buffer += delta;
meta->head_buffer += delta;
meta->commit_buffer += delta;
+   buffer->last_text_delta = this_text - meta->text_addr;
+   buffer->last_data_delta = this_data - meta->data_addr;
+   meta->text_addr = this_text;
+   meta->data_addr = this_data;
continue;
}
 
@@ -1855,6 +1869,8 @@ static void rb_range_meta_init(struct trace_buffer 
*buffer, int nr_pages)
subbuf = rb_subbufs_from_meta(meta);
 
meta->first_buffer = (unsigned long)subbuf;
+   meta->text_addr = this_text;
+   meta->data_addr = this_data;
 
/*
 * The buffers[] array holds the order of the sub-buffers
-- 
2.43.0

[PATCH v3 13/13] tracing: Add last boot delta offset for stack traces

2024-06-06 Thread Steven Rostedt

From: "Steven Rostedt (Google)" 

The addresses of a stack trace event are relative to the kallsyms. As that
can change between boots, when printing the stack trace from a buffer that
was from the last boot, it needs all the addresses to be added to the
"text_delta" that gives the delta between the addresses of the functions
for the current boot compared to the address of the last boot. Then it can
be passed to kallsyms to find the function name, otherwise it just shows a
useless list of addresses.

Signed-off-by: Steven Rostedt (Google) 
---
 kernel/trace/trace_output.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index b9d2c64c0648..48de93598897 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -1233,6 +1233,7 @@ static enum print_line_t trace_stack_print(struct 
trace_iterator *iter,
struct trace_seq *s = &iter->seq;
unsigned long *p;
unsigned long *end;
+   long delta = iter->tr->text_delta;
 
trace_assign_type(field, iter->ent);
end = (unsigned long *)((long)iter->ent + iter->ent_size);
@@ -1245,7 +1246,7 @@ static enum print_line_t trace_stack_print(struct 
trace_iterator *iter,
break;
 
trace_seq_puts(s, " => ");
-   seq_print_ip_sym(s, *p, flags);
+   seq_print_ip_sym(s, (*p) + delta, flags);
trace_seq_putc(s, '\n');
}
 
-- 
2.43.0

[PATCH v3 10/13] tracing/ring-buffer: Add last_boot_info file to boot instance

2024-06-06 Thread Steven Rostedt

From: "Steven Rostedt (Google)" 

If an instance is mapped to memory on boot up, create a new file called
"last_boot_info" that will hold information that can be used to properly
parse the raw data in the ring buffer.

It will export the delta of the addresses for text and data from what it
was from the last boot. It does not expose actually addresses (unless you
knew what the actual address was from the last boot).

The output will look like:

 # cat last_boot_info
 text delta:-268435456
 data delta:-268435456

The text and data are kept separate in case they are ever made different.

Signed-off-by: Steven Rostedt (Google) 
---
 include/linux/ring_buffer.h |  3 +++
 kernel/trace/ring_buffer.c  | 23 ++
 kernel/trace/trace.c| 47 -
 kernel/trace/trace.h|  2 ++
 4 files changed, 74 insertions(+), 1 deletion(-)

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index a50b0223b1d3..55de3798a9b9 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -94,6 +94,9 @@ struct trace_buffer *__ring_buffer_alloc_range(unsigned long 
size, unsigned flag
   unsigned long range_size,
   struct lock_class_key *key);
 
+bool ring_buffer_last_boot_delta(struct trace_buffer *buffer, long *text,
+long *data);
+
 /*
  * Because the ring buffer is generic, if other users of the ring buffer get
  * traced by ftrace, it can produce lockdep warnings. We need to keep each
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 755ef278276b..3a927c6ddd14 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2394,6 +2394,29 @@ struct trace_buffer *__ring_buffer_alloc_range(unsigned 
long size, unsigned flag
return alloc_buffer(size, flags, order, start, start + range_size, key);
 }
 
+/**
+ * ring_buffer_last_boot_delta - return the delta offset from last boot
+ * @buffer: The buffer to return the delta from
+ * @text: Return text delta
+ * @data: Return data delta
+ *
+ * Returns: The true if the delta is non zero
+ */
+bool ring_buffer_last_boot_delta(struct trace_buffer *buffer, long *text,
+long *data)
+{
+   if (!buffer)
+   return false;
+
+   if (!buffer->last_text_delta)
+   return false;
+
+   *text = buffer->last_text_delta;
+   *data = buffer->last_data_delta;
+
+   return true;
+}
+
 /**
  * ring_buffer_free - free a ring buffer.
  * @buffer: the buffer to free.
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 13e89023f33b..7dbb20b31ae5 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -6041,6 +6041,18 @@ ssize_t tracing_resize_ring_buffer(struct trace_array 
*tr,
return ret;
 }
 
+static void update_last_data(struct trace_array *tr)
+{
+   if (!tr->text_delta && !tr->data_delta)
+   return;
+
+   /* Clear old data */
+   tracing_reset_online_cpus(&tr->array_buffer);
+
+   /* Using current data now */
+   tr->text_delta = 0;
+   tr->data_delta = 0;
+}
 
 /**
  * tracing_update_buffers - used by tracing facility to expand ring buffers
@@ -6058,6 +6070,9 @@ int tracing_update_buffers(struct trace_array *tr)
int ret = 0;
 
mutex_lock(&trace_types_lock);
+
+   update_last_data(tr);
+
if (!tr->ring_buffer_expanded)
ret = __tracing_resize_ring_buffer(tr, trace_buf_size,
RING_BUFFER_ALL_CPUS);
@@ -6113,6 +6128,8 @@ int tracing_set_tracer(struct trace_array *tr, const char 
*buf)
 
mutex_lock(&trace_types_lock);
 
+   update_last_data(tr);
+
if (!tr->ring_buffer_expanded) {
ret = __tracing_resize_ring_buffer(tr, trace_buf_size,
RING_BUFFER_ALL_CPUS);
@@ -6860,6 +6877,21 @@ tracing_total_entries_read(struct file *filp, char 
__user *ubuf,
return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
 }
 
+static ssize_t
+tracing_last_boot_read(struct file *filp, char __user *ubuf, size_t cnt, 
loff_t *ppos)
+{
+   struct trace_array *tr = filp->private_data;
+   struct seq_buf seq;
+   char buf[64];
+
+   seq_buf_init(&seq, buf, 64);
+
+   seq_buf_printf(&seq, "text delta:\t%ld\n", tr->text_delta);
+   seq_buf_printf(&seq, "data delta:\t%ld\n", tr->data_delta);
+
+   return simple_read_from_buffer(ubuf, cnt, ppos, buf, 
seq_buf_used(&seq));
+}
+
 static int tracing_buffer_meta_open(struct inode *inode, struct file *filp)
 {
struct trace_array *tr = inode->i_private;
@@ -7499,6 +7531,13 @@ static const struct file_operations 
trace_time_stamp_mode_fops = {
.release= tracing_single_release_tr,
 };
 
+static const struct file_operations last_boot_fops = {
+   .open

Re: [PATCH v3 08/13] tracing: Add option to use memmapped memory for trace boot instance

2024-06-06 Thread Steven Rostedt



Memory management folks. Please review this patch.

Specifically the "map_pages()" function below.

On Thu, 06 Jun 2024 17:17:43 -0400
Steven Rostedt  wrote:

> From: "Steven Rostedt (Google)" 
> 
> Add an option to the trace_instance kernel command line parameter that
> allows it to use the reserved memory from memmap boot parameter.
> 
>   memmap=12M$0x28450 trace_instance=boot_mapped@0x28450:12M
> 
> The above will reserves 12 megs at the physical address 0x28450.
> The second parameter will create a "boot_mapped" instance and use the
> memory reserved as the memory for the ring buffer.
> 
> That will create an instance called "boot_mapped":
> 
>   /sys/kernel/tracing/instances/boot_mapped
> 
> Note, because the ring buffer is using a defined memory ranged, it will
> act just like a memory mapped ring buffer. It will not have a snapshot
> buffer, as it can't swap out the buffer. The snapshot files as well as any
> tracers that uses a snapshot will not be present in the boot_mapped
> instance.
> 
> Cc: linux...@kvack.org
> Signed-off-by: Steven Rostedt (Google) 
> ---
>  .../admin-guide/kernel-parameters.txt |  9 +++
>  kernel/trace/trace.c  | 75 +--
>  2 files changed, 78 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index b600df82669d..ff26b6094e79 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -6754,6 +6754,15 @@
>   the same thing would happen if it was left off). The 
> irq_handler_entry
>   event, and all events under the "initcall" system.
>  
> + If memory has been reserved (see memmap for x86), the 
> instance
> + can use that memory:
> +
> + memmap=12M$0x28450 
> trace_instance=boot_map@0x28450:12M
> +
> + The above will create a "boot_map" instance that uses 
> the physical
> + memory at 0x28450 that is 12Megs. The per CPU 
> buffers of that
> + instance will be split up accordingly.
> +
>   trace_options=[option-list]
>   [FTRACE] Enable or disable tracer options at boot.
>   The option-list is a comma delimited list of options
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 622fe670949d..13e89023f33b 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -9504,6 +9504,31 @@ static int instance_mkdir(const char *name)
>   return ret;
>  }
>  
> +static u64 map_pages(u64 start, u64 size)
> +{
> + struct page **pages;
> + phys_addr_t page_start;
> + unsigned int page_count;
> + unsigned int i;
> + void *vaddr;
> +
> + page_count = DIV_ROUND_UP(size, PAGE_SIZE);
> +
> + page_start = start;
> + pages = kmalloc_array(page_count, sizeof(struct page *), GFP_KERNEL);
> + if (!pages)
> + return 0;
> +
> + for (i = 0; i < page_count; i++) {
> + phys_addr_t addr = page_start + i * PAGE_SIZE;
> + pages[i] = pfn_to_page(addr >> PAGE_SHIFT);
> + }
> + vaddr = vmap(pages, page_count, VM_MAP, PAGE_KERNEL);
> + kfree(pages);
> +
> + return (u64)(unsigned long)vaddr;
> +}

If for some reason the memmap=nn$ss fails, but this still gets called,
will the above just map over any memory. That is, is it possible that
the kernel could have used this memory?

Is there a way to detect this? That is, I don't want this to succeed if
the memory location it's about to map to is used by the kernel, or will
be used by user space.

-- Steve


> +
>  /**
>   * trace_array_get_by_name - Create/Lookup a trace array, given its name.
>   * @name: The name of the trace array to be looked up/created.
> @@ -10350,6 +10375,7 @@ __init static void enable_instances(void)
>  {
>   struct trace_array *tr;
>   char *curr_str;
> + char *name;
>   char *str;
>   char *tok;
>  
> @@ -10358,19 +10384,56 @@ __init static void enable_instances(void)
>   str = boot_instance_info;
>  
>   while ((curr_str = strsep(&str, "\t"))) {
> + unsigned long start = 0;
> + unsigned long size = 0;
> + unsigned long addr = 0;
>  
>   tok = strsep(&curr_str, ",");
> + name = strsep(&tok, "@");
> + if (tok) {
> + start = memparse(tok, &tok);
> + if (!start) {
> + pr_warn("Tracing: Invalid boot instance address 
> for %s\n",
> + name);
> + continue;
> + }
> + }
>  
> - if (IS_ENABLED(CONFIG_TRACER_MAX_TRACE))
> - do_allocate_snapshot(tok);
> + if (start) {
> +

Re: [PATCH 3/3] rust: add tracepoint support

2024-06-06 Thread Boqun Feng

On Thu, Jun 06, 2024 at 09:29:51PM +0200, Peter Zijlstra wrote:
> On Thu, Jun 06, 2024 at 12:00:36PM -0700, Boqun Feng wrote:
> > On Thu, Jun 06, 2024 at 07:35:44PM +0200, Peter Zijlstra wrote:
> > > On Thu, Jun 06, 2024 at 08:49:06AM -0700, Boqun Feng wrote:
> > > 
> > > > Long-term plan is to 1) compile the C helpers in some IR and 2) inline
> > > > the helpers with Rust in IR-level, as what Gary has:
> > > > 
> > > > 
> > > > https://lore.kernel.org/rust-for-linux/20240529202817.3641974-1-g...@garyguo.net/
> > > 
> > > Urgh, that still needs us to maintain that silly list of helpers :-/
> > > 
> > 
> > But it's an improvement from the current stage, right? ;-)
> 
> Somewhat, but only marginal.
> 
> > > Can't we pretty please have clang parse the actual header files into IR
> > > and munge that into rust? So that we don't get to manually duplicate
> > > everything+dog.
> > 
> > That won't always work, because some of our kernel APIs are defined as
> > macros, and I don't think it's a trivial job to generate a macro
> > definition to a function definition so that it can be translated to
> > something in IR. We will have to do the macro -> function mapping
> > ourselves somewhere, if we want to inline the API across languages.
> 
> We can try and see how far we can get with moving a bunch of stuff into
> inlines. There's quite a bit of simple CPP that could be inlines or
> const objects I suppose.
> 

We can, but I'd first stick with what we have, improve it and make it
stable until we go to the next stage. Plus, there's benefit of keeping
an explicit helper list: it's clear what APIs are called by Rust, and
moreover, it's easier to modify the helpers if you were to change an
API, other than chasing where Rust code calls it. (Don't make me wrong,
I'm happy if you want to do that ;-))

Regards,
Boqun

> Things like the tracepoints are of course glorious CPP abuse and are
> never going to work.
> 
> But perhaps you can have an explicit 'eval-CPP on this here' construct
> or whatnot. If I squit I see this paste! thingy (WTF's up with that !
> operator?) to munge function names in the static_call thing. So
> something like apply CPP from over there on this here can also be done
> :-)

Re: [PATCH v14 02/14] cgroup/misc: Add per resource callbacks for CSS events

2024-06-06 Thread chenridong

I think it is better when _misc_cg_res_alloc fails, it just calls 
_misc_cg_res_free(cg, index)(add index parameter, it means ending of 
iterator), so it can avoid calling ->free() that do not call ->alloc().


And in misc_cg_free, just call _misc_cg_res_free(cg, MISC_CG_RES_TYPES)  
to free all.



Thanks

Ridong


On 2024/6/6 22:51, Haitao Huang wrote:
On Thu, 06 Jun 2024 08:37:31 -0500, chenridong  
wrote:




  If _misc_cg_res_alloc fails, maybe some types do not call 
->alloc(), but all types ->free() callback >will be called, is that ok?


Not sure I understand. Are you suggesting we ignore failures from 
->alloc() callback in _misc_cg_res_alloc() as it is per-resource, and 
have ->free() callback and resource provider of the failing type to 
handle the failure internally?


IIUC, this failure only happens when a specific subcgroup is created 
(memory running out for allocation) so failing that subcgroup as a 
whole seems fine to me. Note the root node is static and no 
pre-resource callbacks invoked by misc. And resource provider handles 
its own allocations for root. In SGX case we too declare a static 
object for corresponding root sgx_cgroup struct.


Note also misc cgroup (except for setting capacity[res] = 0 at root) 
is all or nothing so no mechanism to tell user "this resource does not 
work but others are fine in this particular cgroup."


Thanks
Haitao

Re: [PATCH] nvdimm: Fix devs leaks in scan_labels()

2024-06-06 Thread Zhijian Li (Fujitsu)



On 07/06/2024 00:27, Dave Jiang wrote:
> 
> 
> On 6/3/24 8:16 PM, Li Zhijian wrote:
>> Don't allocate devs again when it's valid pointer which has pionted to
>> the memory allocated above with size (count + 2 * sizeof(dev)).
>>
>> A kmemleak reports:
>> unreferenced object 0x88800dda1980 (size 16):
>>comm "kworker/u10:5", pid 69, jiffies 4294671781
>>hex dump (first 16 bytes):
>>  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
>>backtrace (crc 0):
>>  [] __kmalloc+0x32c/0x470
>>  [<9ed43c83>] nd_region_register_namespaces+0x6fb/0x1120 
>> [libnvdimm]
>>  [<0e07a65c>] nd_region_probe+0xfe/0x210 [libnvdimm]
>>  [<7b79ce5f>] nvdimm_bus_probe+0x7a/0x1e0 [libnvdimm]
>>  [] really_probe+0xc6/0x390
>>  [<129e2a69>] __driver_probe_device+0x78/0x150
>>  [<2dfed28b>] driver_probe_device+0x1e/0x90
>>  [] __device_attach_driver+0x85/0x110
>>  [<32dca295>] bus_for_each_drv+0x85/0xe0
>>  [<391c5a7d>] __device_attach+0xbe/0x1e0
>>  [<26dabec0>] bus_probe_device+0x94/0xb0
>>  [] device_add+0x656/0x870
>>  [<3d69bfaa>] nd_async_device_register+0xe/0x50 [libnvdimm]
>>  [<3f4c52a4>] async_run_entry_fn+0x2e/0x110
>>  [] process_one_work+0x1ee/0x600
>>  [<6d90d5a9>] worker_thread+0x183/0x350
>>
>> Fixes: 1b40e09a1232 ("libnvdimm: blk labels and namespace instantiation")
>> Signed-off-by: Li Zhijian 
>> ---
>>   drivers/nvdimm/namespace_devs.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/nvdimm/namespace_devs.c 
>> b/drivers/nvdimm/namespace_devs.c
>> index d6d558f94d6b..56b016dbe307 100644
>> --- a/drivers/nvdimm/namespace_devs.c
>> +++ b/drivers/nvdimm/namespace_devs.c
>> @@ -1994,7 +1994,9 @@ static struct device **scan_labels(struct nd_region 
>> *nd_region)
>>  /* Publish a zero-sized namespace for userspace to configure. */
>>  nd_mapping_free_labels(nd_mapping);
>>   
>> -devs = kcalloc(2, sizeof(dev), GFP_KERNEL);
>> +/* devs probably has been allocated */
>> +if (!devs)
>> +devs = kcalloc(2, sizeof(dev), GFP_KERNEL);
> 
> This changes the behavior of this code and possibly corrupting the previously 
> allocated memory at times when 'devs' is valid.

If we reach here, that means count == 0 is true. so we can deduce that
the previously allocated memory was not touched at all in previous loop.
and its size was (2 * sizeof(dev)) too.

Was the 'devs' leaked out from the previous loop and should be freed instead?

this option is also fine to me. we can also fix this by free(devs) before 
allocate it again.:

+ kfree(devs); // kfree(NULL) is safe.
   devs = kcalloc(2, sizeof(dev), GFP_KERNEL);


Thanks
Zhijian

> 
>>  if (!devs)
>>  goto err;
>>   
>

Re: [PATCH] livepatch: introduce klp_func called interface

2024-06-06 Thread zhang warden




> On Jun 6, 2024, at 23:01, Joe Lawrence  wrote:
> 
> Hi Wardenjohn,
> 
> To follow up, Red Hat kpatch QE pointed me toward this test:
> 
> https://gitlab.com/redhat/centos-stream/tests/kernel/kernel-tests/-/tree/main/general/kpatch/kpatch-trace?ref_type=heads
> 
> which reports a few interesting things via systemd service and ftrace:
> 
> - Patched functions
> - Traced functions
> - Code that was modified, but didn't run
> - Other code that ftrace saw, but is not listed in the sysfs directory
> 
> The code looks to be built on top of the same basic ftrace commands that
> I sent the other day.  You may be able to reuse/copy/etc bits from the
> scripts if they are handy.
> 
> --
> Joe
> 

Thank you so much, you really helped me, Joe!

I will try to learn the script and make it suitable for our system.

Again, thank you, Joe!

Regards,
Wardenjohn

Re: [PATCH] nvdimm: Fix devs leaks in scan_labels()

2024-06-06 Thread Zhijian Li (Fujitsu)



On 07/06/2024 00:49, Ira Weiny wrote:
> Li Zhijian wrote:
>> Don't allocate devs again when it's valid pointer which has pionted to
>> the memory allocated above with size (count + 2 * sizeof(dev)).
>>
>> A kmemleak reports:
>> unreferenced object 0x88800dda1980 (size 16):
>>comm "kworker/u10:5", pid 69, jiffies 4294671781
>>hex dump (first 16 bytes):
>>  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
>>backtrace (crc 0):
>>  [] __kmalloc+0x32c/0x470
>>  [<9ed43c83>] nd_region_register_namespaces+0x6fb/0x1120 
>> [libnvdimm]
>>  [<0e07a65c>] nd_region_probe+0xfe/0x210 [libnvdimm]
>>  [<7b79ce5f>] nvdimm_bus_probe+0x7a/0x1e0 [libnvdimm]
>>  [] really_probe+0xc6/0x390
>>  [<129e2a69>] __driver_probe_device+0x78/0x150
>>  [<2dfed28b>] driver_probe_device+0x1e/0x90
>>  [] __device_attach_driver+0x85/0x110
>>  [<32dca295>] bus_for_each_drv+0x85/0xe0
>>  [<391c5a7d>] __device_attach+0xbe/0x1e0
>>  [<26dabec0>] bus_probe_device+0x94/0xb0
>>  [] device_add+0x656/0x870
>>  [<3d69bfaa>] nd_async_device_register+0xe/0x50 [libnvdimm]
>>  [<3f4c52a4>] async_run_entry_fn+0x2e/0x110
>>  [] process_one_work+0x1ee/0x600
>>  [<6d90d5a9>] worker_thread+0x183/0x350
>>
>> Fixes: 1b40e09a1232 ("libnvdimm: blk labels and namespace instantiation")
>> Signed-off-by: Li Zhijian 
>> ---
>>   drivers/nvdimm/namespace_devs.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/nvdimm/namespace_devs.c 
>> b/drivers/nvdimm/namespace_devs.c
>> index d6d558f94d6b..56b016dbe307 100644
>> --- a/drivers/nvdimm/namespace_devs.c
>> +++ b/drivers/nvdimm/namespace_devs.c
>> @@ -1994,7 +1994,9 @@ static struct device **scan_labels(struct nd_region 
>> *nd_region)
>>  /* Publish a zero-sized namespace for userspace to configure. */
>>  nd_mapping_free_labels(nd_mapping);
>>   
>> -devs = kcalloc(2, sizeof(dev), GFP_KERNEL);
>> +/* devs probably has been allocated */
> 
> I don't think this is where the bug is.  The loop above is processing the
> known labels and should exit with a count > 0 if devs is not NULL.
> 
>  From what I can tell create_namespace_pmem() must be returning EAGAIN
> which leaves devs allocated but fails to increment count.  Thus there are
> no valid labels but devs was not free'ed.

Per the piece of the code, return EAGAIN and ENODEV could cause this issue in 
theory.

1980 dev = create_namespace_pmem(nd_region, nd_mapping, 
nd_label);
1981 if (IS_ERR(dev)) {
1982 switch (PTR_ERR(dev)) {
1983 case -EAGAIN:
1984 /* skip invalid labels */
1985 continue;
1986 case -ENODEV:
1987 /* fallthrough to seed creation */
1988 break;
1989 default:
1990 goto err;
1991 }
1992 } else
1993 devs[count++] = dev;


> 
> Can you trace the error you are seeing a bit more to see if this is the
> case?


I just tried, but I cannot reproduce this leaking again.
When it happened(100% reproduce at that time), I probably had a corrupted LSA(I 
see empty
output with command 'ndctl list'). It seemed the QEMU emulated Nvdimm device 
was broken
for some reasons.


Thanks
Zhijian


> 
> Thanks,
> Ira
> 
>> +if (!devs)
>> +devs = kcalloc(2, sizeof(dev), GFP_KERNEL);
>>  if (!devs)
>>  goto err;
>>   
>> -- 
>> 2.29.2
>>
> 
>

Re: [PATCH 3/5] arm64: dts: qcom: sdx75: add missing qlink_logging reserved memory for mpss

2024-06-06 Thread Naina Mehta


On 6/6/2024 8:20 PM, Krzysztof Kozlowski wrote:

On 06/06/2024 16:38, Naina Mehta wrote:

The qlink_logging memory region is also used by the modem firmware,
add it to reserved memory regions.
Also split MPSS DSM region into 2 separate regions.

Signed-off-by: Naina Mehta 
---
  arch/arm64/boot/dts/qcom/sdx75.dtsi | 17 +
  1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/boot/dts/qcom/sdx75.dtsi 
b/arch/arm64/boot/dts/qcom/sdx75.dtsi
index 9b93f6501d55..9349b1c4e196 100644
--- a/arch/arm64/boot/dts/qcom/sdx75.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdx75.dtsi
@@ -366,7 +366,12 @@
no-map;
};
  
-		qdss_mem: qdss@8880 {

+   qdss_mem: qdss@8850 {
+   reg = <0x0 0x8850 0x0 0x30>;
+   no-map;
+   };
+
+   qlink_logging_mem: qlink_logging@8880 {


Sorry, no downstream code.

Please follow DTS coding style - no underscores in node names. This
applies to all work sent upstream.



Thanks for pointing this out. I will update in next revision.

Regards,
Naina




Best regards,
Krzysztof

Re: [PATCH] net: missing check

2024-06-06 Thread kernel test robot

Hi Denis,

kernel test robot noticed the following build warnings:

[auto build test WARNING on linus/master]
[also build test WARNING on v6.10-rc2 next-20240607]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:
https://github.com/intel-lab-lkp/linux/commits/Denis-Arefev/net-missing-check/20240606-230540
base:   linus/master
patch link:
https://lore.kernel.org/r/20240606141450.44709-1-arefev%40swemel.ru
patch subject: [PATCH] net: missing check
config: x86_64-randconfig-121-20240607 
(https://download.01.org/0day-ci/archive/20240607/202406071404.oilhfohm-...@intel.com/config)
compiler: gcc-13 (Ubuntu 13.2.0-4ubuntu3) 13.2.0
reproduce (this is a W=1 build): 
(https://download.01.org/0day-ci/archive/20240607/202406071404.oilhfohm-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202406071404.oilhfohm-...@intel.com/

sparse warnings: (new ones prefixed by >>)
   drivers/net/virtio_net.c: note: in included file:
>> include/linux/virtio_net.h:103:58: sparse: sparse: incorrect type in 
>> argument 2 (different base types) @@ expected unsigned long long 
>> [usertype] divisor @@ got restricted __virtio16 const [usertype] 
>> gso_size @@
   include/linux/virtio_net.h:103:58: sparse: expected unsigned long long 
[usertype] divisor
   include/linux/virtio_net.h:103:58: sparse: got restricted __virtio16 
const [usertype] gso_size
>> include/linux/virtio_net.h:104:42: sparse: sparse: restricted __virtio16 
>> degrades to integer

vim +103 include/linux/virtio_net.h

49  
50  static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
51  const struct virtio_net_hdr 
*hdr,
52  bool little_endian)
53  {
54  unsigned int nh_min_len = sizeof(struct iphdr);
55  unsigned int gso_type = 0;
56  unsigned int thlen = 0;
57  unsigned int p_off = 0;
58  unsigned int ip_proto;
59  u64 ret, remainder;
60  
61  if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
62  switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
63  case VIRTIO_NET_HDR_GSO_TCPV4:
64  gso_type = SKB_GSO_TCPV4;
65  ip_proto = IPPROTO_TCP;
66  thlen = sizeof(struct tcphdr);
67  break;
68  case VIRTIO_NET_HDR_GSO_TCPV6:
69  gso_type = SKB_GSO_TCPV6;
70  ip_proto = IPPROTO_TCP;
71  thlen = sizeof(struct tcphdr);
72  nh_min_len = sizeof(struct ipv6hdr);
73  break;
74  case VIRTIO_NET_HDR_GSO_UDP:
75  gso_type = SKB_GSO_UDP;
76  ip_proto = IPPROTO_UDP;
77  thlen = sizeof(struct udphdr);
78  break;
79  case VIRTIO_NET_HDR_GSO_UDP_L4:
80  gso_type = SKB_GSO_UDP_L4;
81  ip_proto = IPPROTO_UDP;
82  thlen = sizeof(struct udphdr);
83  break;
84  default:
85  return -EINVAL;
86  }
87  
88  if (hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN)
89  gso_type |= SKB_GSO_TCP_ECN;
90  
91  if (hdr->gso_size == 0)
92  return -EINVAL;
93  }
94  
95  skb_reset_mac_header(skb);
96  
97  if (hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
98  u32 start = __virtio16_to_cpu(little_endian, 
hdr->csum_start);
99  u32 off = __virtio16_to_cpu(little_endian, 
hdr->csum_offset);
   100  u32 needed = start + max_t(u32, thlen, off + 
sizeof(__sum16));
   101  
   102  if (hdr->gso_size) {
 > 103  ret = div64_u64_rem(skb->len, hdr->gso_size, 
 > &remainder);
 > 104  if (!(ret && (hdr->gso_size > needed) &&
   105  ((remainder > needed) 
|| (remainder == 0 {
   106  return -EINVAL;
   107  }
   108  skb_s

99 matches

Mail list logo