Re: [PATCH] virtio_net: Support RX hash XDP hint

2024-01-24 Thread Liang Chen
On Wed, Jan 24, 2024 at 10:12 AM Jason Wang  wrote:
>
> On Mon, Jan 22, 2024 at 6:23 PM Liang Chen  wrote:
> >
> > The RSS hash report is a feature that's part of the virtio specification.
> > Currently, virtio backends like qemu, vdpa (mlx5), and potentially vhost
> > (still a work in progress as per [1]) support this feature. While the
> > capability to obtain the RSS hash has been enabled in the normal path,
> > it's currently missing in the XDP path. Therefore, we are introducing XDP
> > hints through kfuncs to allow XDP programs to access the RSS hash.
> >
> > Signed-off-by: Liang Chen 
> > ---
> >  drivers/net/virtio_net.c | 56 
> >  1 file changed, 56 insertions(+)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index d7ce4a1011ea..1463a4709e3c 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -4579,6 +4579,60 @@ static void virtnet_set_big_packets(struct 
> > virtnet_info *vi, const int mtu)
> > }
> >  }
> >
> > +static int virtnet_xdp_rx_hash(const struct xdp_md *_ctx, u32 *hash,
> > +  enum xdp_rss_hash_type *rss_type)
> > +{
> > +   const struct xdp_buff *xdp = (void *)_ctx;
> > +   struct virtio_net_hdr_v1_hash *hdr_hash;
> > +   struct virtnet_info *vi;
> > +
> > +   if (!(xdp->rxq->dev->features & NETIF_F_RXHASH))
> > +   return -ENODATA;
> > +
> > +   vi = netdev_priv(xdp->rxq->dev);
> > +   hdr_hash = (struct virtio_net_hdr_v1_hash *)(xdp->data - 
> > vi->hdr_len);
>
> Is there a guarantee that the hdr is not modified?
>

We cannot guarantee the hdr is not modified by the XDP prog, So the
idea is to save it in a wrapper structure before running the xdp prog.
Patch is coming soon.

Thanks,
Liang

> Thanks
>



Re: [PATCH] virtio_net: Support RX hash XDP hint

2024-01-24 Thread Liang Chen
On Wed, Jan 24, 2024 at 2:06 PM Xuan Zhuo  wrote:
>
> On Wed, 24 Jan 2024 10:04:51 +0800, Liang Chen  
> wrote:
> > On Mon, Jan 22, 2024 at 7:10 PM Heng Qi  wrote:
> > >
> > > Hi Liang Chen,
> > >
> > > 在 2024/1/22 下午6:22, Liang Chen 写道:
> > > > The RSS hash report is a feature that's part of the virtio 
> > > > specification.
> > > > Currently, virtio backends like qemu, vdpa (mlx5), and potentially vhost
> > > > (still a work in progress as per [1]) support this feature. While the
> > > > capability to obtain the RSS hash has been enabled in the normal path,
> > > > it's currently missing in the XDP path. Therefore, we are introducing 
> > > > XDP
> > > > hints through kfuncs to allow XDP programs to access the RSS hash.
> > > >
> > > > Signed-off-by: Liang Chen 
> > > > ---
> > > >   drivers/net/virtio_net.c | 56 
> > > >   1 file changed, 56 insertions(+)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index d7ce4a1011ea..1463a4709e3c 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -4579,6 +4579,60 @@ static void virtnet_set_big_packets(struct 
> > > > virtnet_info *vi, const int mtu)
> > > >   }
> > > >   }
> > > >
> > > > +static int virtnet_xdp_rx_hash(const struct xdp_md *_ctx, u32 *hash,
> > > > +enum xdp_rss_hash_type *rss_type)
> > > > +{
> > > > + const struct xdp_buff *xdp = (void *)_ctx;
> > > > + struct virtio_net_hdr_v1_hash *hdr_hash;
> > > > + struct virtnet_info *vi;
> > > > +
> > > > + if (!(xdp->rxq->dev->features & NETIF_F_RXHASH))
> > >
> > > I think 'vi->has_rss_hash_report' should be used here.
> > > NETIF_F_RXHASH cannot guarantee that the hash report feature is 
> > > negotiated,
> > > and accessing hash_report and hash_value is unsafe at this time.
> > >
> >
> > My understanding is that rxhash in dev->features is turned on only if
> > the corresponding hw feature is set. We will double check on this.
> >
> > > > + return -ENODATA;
> > > > +
> > > > + vi = netdev_priv(xdp->rxq->dev);
> > > > + hdr_hash = (struct virtio_net_hdr_v1_hash *)(xdp->data - 
> > > > vi->hdr_len);
> > >
> > > If the virtio-net-hdr is overrided by the XDP prog, how can this be done
> > > correctly and as expected?
> > >
> >
> > Yeah, thanks for bringing up this concern. We are actively working on
> > a fix of this issue(possibly with a wrapper structure of xdp_buff).
>
> Are there some places to save the hash before run xdp?
>

Yeah, it will be saved in a wrapper structure. Thanks.

> Thanks.
>
>
> >
> > Thanks,
> > Liang
> >
> > > Thanks,
> > > Heng
> > >
> > > > +
> > > > + switch (__le16_to_cpu(hdr_hash->hash_report)) {
> > > > + case VIRTIO_NET_HASH_REPORT_TCPv4:
> > > > + *rss_type = XDP_RSS_TYPE_L4_IPV4_TCP;
> > > > + break;
> > > > + case VIRTIO_NET_HASH_REPORT_UDPv4:
> > > > + *rss_type = XDP_RSS_TYPE_L4_IPV4_UDP;
> > > > + break;
> > > > + case VIRTIO_NET_HASH_REPORT_TCPv6:
> > > > + *rss_type = XDP_RSS_TYPE_L4_IPV6_TCP;
> > > > + break;
> > > > + case VIRTIO_NET_HASH_REPORT_UDPv6:
> > > > + *rss_type = XDP_RSS_TYPE_L4_IPV6_UDP;
> > > > + break;
> > > > + case VIRTIO_NET_HASH_REPORT_TCPv6_EX:
> > > > + *rss_type = XDP_RSS_TYPE_L4_IPV6_TCP_EX;
> > > > + break;
> > > > + case VIRTIO_NET_HASH_REPORT_UDPv6_EX:
> > > > + *rss_type = XDP_RSS_TYPE_L4_IPV6_UDP_EX;
> > > > + break;
> > > > + case VIRTIO_NET_HASH_REPORT_IPv4:
> > > > + *rss_type = XDP_RSS_TYPE_L3_IPV4;
> > > > + break;
> > > > + case VIRTIO_NET_HASH_REPORT_IPv6:
> > > > + *rss_type = XDP_RSS_TYPE_L3_IPV6;
> > > > + break;
> > > > + case VIRTIO_NET_HASH_REPORT_IPv6_EX:
> > > > + *rss_type = XDP_RSS_TYPE_L3_IPV6_EX;
> > > > + break;
> > > > + case VIRTIO_NET_HASH_REPORT_NONE:
> > > > + default:
> > > > + *rss_type = XDP_RSS_TYPE_NONE;
> > > > + }
> > > > +
> > > > + *hash = __le32_to_cpu(hdr_hash->hash_value);
> > > > + return 0;
> > > > +}
> > > > +
> > > > +static const struct xdp_metadata_ops virtnet_xdp_metadata_ops = {
> > > > + .xmo_rx_hash= virtnet_xdp_rx_hash,
> > > > +};
> > > > +
> > > >   static int virtnet_probe(struct virtio_device *vdev)
> > > >   {
> > > >   int i, err = -ENOMEM;
> > > > @@ -4613,6 +4667,8 @@ static int virtnet_probe(struct virtio_device 
> > > > *vdev)
> > > >   dev->ethtool_ops = &virtnet_ethtool_ops;
> > > >   SET_NETDEV_DEV(dev, &vdev->dev);
> > > >
> > > > + d

Re: [PATCH] percpu: improve percpu_alloc_percpu_fail event trace

2024-01-24 Thread Dennis Zhou
Hello,

On Mon, Jan 22, 2024 at 08:55:39PM -0500, Steven Rostedt wrote:
> On Tue, 23 Jan 2024 09:44:43 +0800
> George Guo  wrote:
> 
> > There are two reasons of percpu_alloc failed without warnings: 
> > 
> > 1. do_warn is false
> > 2. do_warn is true and warn_limit is reached the limit.
> 
> Yes I know the reasons.
> 
> > 
> > Showing do_warn and warn_limit makes things simple, maybe dont need
> > kprobe again.
> 
> It's up to the maintainers of that code to decide if it's worth it or not,
> but honestly, my opinion it is not.
> 

I agree, I don't think this is a worthwhile change. If we do change
this, I'd like it to be more actionable in some way and as a result
something we can fix or tune accordingly.

George is this a common problem you're seeing?

> The trace event in question is to trace that percpu_alloc failed and why.
> It's not there to determine why it did not produce a printk message.
> 
> -- Steve

Thanks,
Dennis



[PATCH v2 0/3] virtio_net: Support the RX hash XDP hint

2024-01-24 Thread Liang Chen
The RSS hash report is a feature that's part of the virtio specification.
Currently, virtio backends like qemu and vdpa (mlx5) support it(potentially
vhost). While the capability to obtain the RSS hash has been enabled in the
normal path, it's currently missing in the XDP path. 

Changes from v1:
- introduce a wrapper structure to preserve virtio header

Liang Chen (3):
  virtio_net: Preserve virtio header before XDP program execution
  virtio_net: Add missing virtio header in skb for XDP_PASS
  virtio_net: Support RX hash XDP hint

 drivers/net/virtio_net.c | 102 ++-
 1 file changed, 90 insertions(+), 12 deletions(-)

-- 
2.40.1




[PATCH v2 1/3] virtio_net: Preserve virtio header before XDP program execution

2024-01-24 Thread Liang Chen
The xdp program may overwrite the inline virtio header. To ensure the
integrity of the virtio header, it is saved in a data structure that
wraps both the xdp_buff and the header before running the xdp program.

Signed-off-by: Liang Chen 
---
 drivers/net/virtio_net.c | 43 +---
 1 file changed, 31 insertions(+), 12 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index d7ce4a1011ea..b56828804e5f 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -349,6 +349,11 @@ struct virtio_net_common_hdr {
};
 };
 
+struct virtnet_xdp_buff {
+   struct xdp_buff xdp;
+   struct virtio_net_common_hdr hdr;
+};
+
 static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf);
 
 static bool is_xdp_frame(void *ptr)
@@ -1199,9 +1204,10 @@ static struct sk_buff *receive_small_xdp(struct 
net_device *dev,
unsigned int headroom = vi->hdr_len + header_offset;
struct virtio_net_hdr_mrg_rxbuf *hdr = buf + header_offset;
struct page *page = virt_to_head_page(buf);
+   struct virtnet_xdp_buff virtnet_xdp;
struct page *xdp_page;
+   struct xdp_buff *xdp;
unsigned int buflen;
-   struct xdp_buff xdp;
struct sk_buff *skb;
unsigned int metasize = 0;
u32 act;
@@ -1233,17 +1239,23 @@ static struct sk_buff *receive_small_xdp(struct 
net_device *dev,
page = xdp_page;
}
 
-   xdp_init_buff(&xdp, buflen, &rq->xdp_rxq);
-   xdp_prepare_buff(&xdp, buf + VIRTNET_RX_PAD + vi->hdr_len,
+   xdp = &virtnet_xdp.xdp;
+   xdp_init_buff(xdp, buflen, &rq->xdp_rxq);
+   xdp_prepare_buff(xdp, buf + VIRTNET_RX_PAD + vi->hdr_len,
 xdp_headroom, len, true);
 
-   act = virtnet_xdp_handler(xdp_prog, &xdp, dev, xdp_xmit, stats);
+   /* Copy out the virtio header, as it may be overwritten by the
+* xdp program.
+*/
+   memcpy(&virtnet_xdp.hdr, hdr, vi->hdr_len);
+
+   act = virtnet_xdp_handler(xdp_prog, xdp, dev, xdp_xmit, stats);
 
switch (act) {
case XDP_PASS:
/* Recalculate length in case bpf program changed it */
-   len = xdp.data_end - xdp.data;
-   metasize = xdp.data - xdp.data_meta;
+   len = xdp->data_end - xdp->data;
+   metasize = xdp->data - xdp->data_meta;
break;
 
case XDP_TX:
@@ -1254,7 +1266,7 @@ static struct sk_buff *receive_small_xdp(struct 
net_device *dev,
goto err_xdp;
}
 
-   skb = virtnet_build_skb(buf, buflen, xdp.data - buf, len);
+   skb = virtnet_build_skb(buf, buflen, xdp->data - buf, len);
if (unlikely(!skb))
goto err;
 
@@ -1591,10 +1603,11 @@ static struct sk_buff *receive_mergeable_xdp(struct 
net_device *dev,
int num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
struct page *page = virt_to_head_page(buf);
int offset = buf - page_address(page);
+   struct virtnet_xdp_buff virtnet_xdp;
unsigned int xdp_frags_truesz = 0;
struct sk_buff *head_skb;
unsigned int frame_sz;
-   struct xdp_buff xdp;
+   struct xdp_buff *xdp;
void *data;
u32 act;
int err;
@@ -1604,16 +1617,22 @@ static struct sk_buff *receive_mergeable_xdp(struct 
net_device *dev,
if (unlikely(!data))
goto err_xdp;
 
-   err = virtnet_build_xdp_buff_mrg(dev, vi, rq, &xdp, data, len, frame_sz,
+   xdp = &virtnet_xdp.xdp;
+   err = virtnet_build_xdp_buff_mrg(dev, vi, rq, xdp, data, len, frame_sz,
 &num_buf, &xdp_frags_truesz, stats);
if (unlikely(err))
goto err_xdp;
 
-   act = virtnet_xdp_handler(xdp_prog, &xdp, dev, xdp_xmit, stats);
+   /* Copy out the virtio header, as it may be overwritten by the
+* xdp program.
+*/
+   memcpy(&virtnet_xdp.hdr, hdr, vi->hdr_len);
+
+   act = virtnet_xdp_handler(xdp_prog, xdp, dev, xdp_xmit, stats);
 
switch (act) {
case XDP_PASS:
-   head_skb = build_skb_from_xdp_buff(dev, vi, &xdp, 
xdp_frags_truesz);
+   head_skb = build_skb_from_xdp_buff(dev, vi, xdp, 
xdp_frags_truesz);
if (unlikely(!head_skb))
break;
return head_skb;
@@ -1626,7 +1645,7 @@ static struct sk_buff *receive_mergeable_xdp(struct 
net_device *dev,
break;
}
 
-   put_xdp_frags(&xdp);
+   put_xdp_frags(xdp);
 
 err_xdp:
put_page(page);
-- 
2.40.1




[PATCH v2 2/3] virtio_net: Add missing virtio header in skb for XDP_PASS

2024-01-24 Thread Liang Chen
For the XDP_PASS scenario of the XDP path, the skb constructed with
xdp_buff does not include the virtio header. Adding the virtio header
information back when creating the skb.

Signed-off-by: Liang Chen 
---
 drivers/net/virtio_net.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index b56828804e5f..2de46eb4c661 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1270,6 +1270,9 @@ static struct sk_buff *receive_small_xdp(struct 
net_device *dev,
if (unlikely(!skb))
goto err;
 
+   /* Store the original virtio header for subsequent use by the driver. */
+   memcpy(skb_vnet_common_hdr(skb), &virtnet_xdp.hdr, vi->hdr_len);
+
if (metasize)
skb_metadata_set(skb, metasize);
 
@@ -1635,6 +1638,9 @@ static struct sk_buff *receive_mergeable_xdp(struct 
net_device *dev,
head_skb = build_skb_from_xdp_buff(dev, vi, xdp, 
xdp_frags_truesz);
if (unlikely(!head_skb))
break;
+   /* Store the original virtio header for subsequent use by the 
driver. */
+   memcpy(skb_vnet_common_hdr(head_skb), &virtnet_xdp.hdr, 
vi->hdr_len);
+
return head_skb;
 
case XDP_TX:
-- 
2.40.1




[PATCH v2 3/3] virtio_net: Support RX hash XDP hint

2024-01-24 Thread Liang Chen
The RSS hash report is a feature that's part of the virtio specification.
Currently, virtio backends like qemu, vdpa (mlx5), and potentially vhost
(still a work in progress as per [1]) support this feature. While the
capability to obtain the RSS hash has been enabled in the normal path,
it's currently missing in the XDP path. Therefore, we are introducing XDP
hints through kfuncs to allow XDP programs to access the RSS hash.

1.
https://lore.kernel.org/all/20231015141644.260646-1-akihiko.od...@daynix.com/#r

Signed-off-by: Liang Chen 
---
 drivers/net/virtio_net.c | 53 
 1 file changed, 53 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 2de46eb4c661..ed6a4aa3fe04 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -4604,6 +4604,58 @@ static void virtnet_set_big_packets(struct virtnet_info 
*vi, const int mtu)
}
 }
 
+static int virtnet_xdp_rx_hash(const struct xdp_md *_ctx, u32 *hash,
+  enum xdp_rss_hash_type *rss_type)
+{
+   const struct virtnet_xdp_buff *virtnet_xdp = (void *)_ctx;
+   struct virtio_net_hdr_v1_hash *hdr_hash;
+
+   if (!(virtnet_xdp->xdp.rxq->dev->features & NETIF_F_RXHASH))
+   return -ENODATA;
+
+   hdr_hash = (struct virtio_net_hdr_v1_hash *)(&virtnet_xdp->hdr);
+
+   switch (__le16_to_cpu(hdr_hash->hash_report)) {
+   case VIRTIO_NET_HASH_REPORT_TCPv4:
+   *rss_type = XDP_RSS_TYPE_L4_IPV4_TCP;
+   break;
+   case VIRTIO_NET_HASH_REPORT_UDPv4:
+   *rss_type = XDP_RSS_TYPE_L4_IPV4_UDP;
+   break;
+   case VIRTIO_NET_HASH_REPORT_TCPv6:
+   *rss_type = XDP_RSS_TYPE_L4_IPV6_TCP;
+   break;
+   case VIRTIO_NET_HASH_REPORT_UDPv6:
+   *rss_type = XDP_RSS_TYPE_L4_IPV6_UDP;
+   break;
+   case VIRTIO_NET_HASH_REPORT_TCPv6_EX:
+   *rss_type = XDP_RSS_TYPE_L4_IPV6_TCP_EX;
+   break;
+   case VIRTIO_NET_HASH_REPORT_UDPv6_EX:
+   *rss_type = XDP_RSS_TYPE_L4_IPV6_UDP_EX;
+   break;
+   case VIRTIO_NET_HASH_REPORT_IPv4:
+   *rss_type = XDP_RSS_TYPE_L3_IPV4;
+   break;
+   case VIRTIO_NET_HASH_REPORT_IPv6:
+   *rss_type = XDP_RSS_TYPE_L3_IPV6;
+   break;
+   case VIRTIO_NET_HASH_REPORT_IPv6_EX:
+   *rss_type = XDP_RSS_TYPE_L3_IPV6_EX;
+   break;
+   case VIRTIO_NET_HASH_REPORT_NONE:
+   default:
+   *rss_type = XDP_RSS_TYPE_NONE;
+   }
+
+   *hash = __le32_to_cpu(hdr_hash->hash_value);
+   return 0;
+}
+
+static const struct xdp_metadata_ops virtnet_xdp_metadata_ops = {
+   .xmo_rx_hash= virtnet_xdp_rx_hash,
+};
+
 static int virtnet_probe(struct virtio_device *vdev)
 {
int i, err = -ENOMEM;
@@ -4729,6 +4781,7 @@ static int virtnet_probe(struct virtio_device *vdev)
  VIRTIO_NET_RSS_HASH_TYPE_UDP_EX);
 
dev->hw_features |= NETIF_F_RXHASH;
+   dev->xdp_metadata_ops = &virtnet_xdp_metadata_ops;
}
 
if (vi->has_rss_hash_report)
-- 
2.40.1




Re: [PATCH v2 1/3] virtio_net: Preserve virtio header before XDP program execution

2024-01-24 Thread Xuan Zhuo
On Wed, 24 Jan 2024 16:57:19 +0800, Liang Chen  
wrote:
> The xdp program may overwrite the inline virtio header. To ensure the
> integrity of the virtio header, it is saved in a data structure that
> wraps both the xdp_buff and the header before running the xdp program.
>
> Signed-off-by: Liang Chen 
> ---
>  drivers/net/virtio_net.c | 43 +---
>  1 file changed, 31 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index d7ce4a1011ea..b56828804e5f 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -349,6 +349,11 @@ struct virtio_net_common_hdr {
>   };
>  };
>
> +struct virtnet_xdp_buff {
> + struct xdp_buff xdp;
> + struct virtio_net_common_hdr hdr;

Not all items of the hdr are useful, we can just save the useful items.

> +};
> +
>  static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf);
>
>  static bool is_xdp_frame(void *ptr)
> @@ -1199,9 +1204,10 @@ static struct sk_buff *receive_small_xdp(struct 
> net_device *dev,
>   unsigned int headroom = vi->hdr_len + header_offset;
>   struct virtio_net_hdr_mrg_rxbuf *hdr = buf + header_offset;
>   struct page *page = virt_to_head_page(buf);
> + struct virtnet_xdp_buff virtnet_xdp;
>   struct page *xdp_page;
> + struct xdp_buff *xdp;
>   unsigned int buflen;
> - struct xdp_buff xdp;
>   struct sk_buff *skb;
>   unsigned int metasize = 0;
>   u32 act;
> @@ -1233,17 +1239,23 @@ static struct sk_buff *receive_small_xdp(struct 
> net_device *dev,
>   page = xdp_page;
>   }
>
> - xdp_init_buff(&xdp, buflen, &rq->xdp_rxq);
> - xdp_prepare_buff(&xdp, buf + VIRTNET_RX_PAD + vi->hdr_len,
> + xdp = &virtnet_xdp.xdp;
> + xdp_init_buff(xdp, buflen, &rq->xdp_rxq);
> + xdp_prepare_buff(xdp, buf + VIRTNET_RX_PAD + vi->hdr_len,
>xdp_headroom, len, true);
>
> - act = virtnet_xdp_handler(xdp_prog, &xdp, dev, xdp_xmit, stats);
> + /* Copy out the virtio header, as it may be overwritten by the
> +  * xdp program.
> +  */
> + memcpy(&virtnet_xdp.hdr, hdr, vi->hdr_len);

Can we put this into virtnet_xdp_handler?

And just do that when the hash is negotiated.

> +
> + act = virtnet_xdp_handler(xdp_prog, xdp, dev, xdp_xmit, stats);
>
>   switch (act) {
>   case XDP_PASS:
>   /* Recalculate length in case bpf program changed it */
> - len = xdp.data_end - xdp.data;
> - metasize = xdp.data - xdp.data_meta;
> + len = xdp->data_end - xdp->data;
> + metasize = xdp->data - xdp->data_meta;
>   break;
>
>   case XDP_TX:
> @@ -1254,7 +1266,7 @@ static struct sk_buff *receive_small_xdp(struct 
> net_device *dev,
>   goto err_xdp;
>   }
>
> - skb = virtnet_build_skb(buf, buflen, xdp.data - buf, len);
> + skb = virtnet_build_skb(buf, buflen, xdp->data - buf, len);
>   if (unlikely(!skb))
>   goto err;
>
> @@ -1591,10 +1603,11 @@ static struct sk_buff *receive_mergeable_xdp(struct 
> net_device *dev,
>   int num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
>   struct page *page = virt_to_head_page(buf);
>   int offset = buf - page_address(page);
> + struct virtnet_xdp_buff virtnet_xdp;
>   unsigned int xdp_frags_truesz = 0;
>   struct sk_buff *head_skb;
>   unsigned int frame_sz;
> - struct xdp_buff xdp;
> + struct xdp_buff *xdp;
>   void *data;
>   u32 act;
>   int err;
> @@ -1604,16 +1617,22 @@ static struct sk_buff *receive_mergeable_xdp(struct 
> net_device *dev,
>   if (unlikely(!data))
>   goto err_xdp;
>
> - err = virtnet_build_xdp_buff_mrg(dev, vi, rq, &xdp, data, len, frame_sz,
> + xdp = &virtnet_xdp.xdp;
> + err = virtnet_build_xdp_buff_mrg(dev, vi, rq, xdp, data, len, frame_sz,
>&num_buf, &xdp_frags_truesz, stats);
>   if (unlikely(err))
>   goto err_xdp;
>
> - act = virtnet_xdp_handler(xdp_prog, &xdp, dev, xdp_xmit, stats);
> + /* Copy out the virtio header, as it may be overwritten by the
> +  * xdp program.
> +  */
> + memcpy(&virtnet_xdp.hdr, hdr, vi->hdr_len);
> +
> + act = virtnet_xdp_handler(xdp_prog, xdp, dev, xdp_xmit, stats);
>
>   switch (act) {
>   case XDP_PASS:
> - head_skb = build_skb_from_xdp_buff(dev, vi, &xdp, 
> xdp_frags_truesz);
> + head_skb = build_skb_from_xdp_buff(dev, vi, xdp, 
> xdp_frags_truesz);
>   if (unlikely(!head_skb))
>   break;
>   return head_skb;
> @@ -1626,7 +1645,7 @@ static struct sk_buff *receive_mergeable_xdp(struct 
> net_device *dev,
>   break;
>   }
>
> - put_xdp_frags(&xdp);
> + put_xdp_frags(xdp);
>
>  err_xdp:
>   put_page(page);
> --
> 2.40.1
>



Re: [PATCH v2 2/3] virtio_net: Add missing virtio header in skb for XDP_PASS

2024-01-24 Thread Xuan Zhuo
On Wed, 24 Jan 2024 16:57:20 +0800, Liang Chen  
wrote:
> For the XDP_PASS scenario of the XDP path, the skb constructed with
> xdp_buff does not include the virtio header. Adding the virtio header
> information back when creating the skb.
>
> Signed-off-by: Liang Chen 
> ---
>  drivers/net/virtio_net.c | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index b56828804e5f..2de46eb4c661 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1270,6 +1270,9 @@ static struct sk_buff *receive_small_xdp(struct 
> net_device *dev,
>   if (unlikely(!skb))
>   goto err;
>
> + /* Store the original virtio header for subsequent use by the driver. */
> + memcpy(skb_vnet_common_hdr(skb), &virtnet_xdp.hdr, vi->hdr_len);

About this, a spec is waiting for voting.

This may change the logic of the csum offset and so on.

Please not do this.

Thanks.


> +
>   if (metasize)
>   skb_metadata_set(skb, metasize);
>
> @@ -1635,6 +1638,9 @@ static struct sk_buff *receive_mergeable_xdp(struct 
> net_device *dev,
>   head_skb = build_skb_from_xdp_buff(dev, vi, xdp, 
> xdp_frags_truesz);
>   if (unlikely(!head_skb))
>   break;
> + /* Store the original virtio header for subsequent use by the 
> driver. */
> + memcpy(skb_vnet_common_hdr(head_skb), &virtnet_xdp.hdr, 
> vi->hdr_len);
> +
>   return head_skb;
>
>   case XDP_TX:
> --
> 2.40.1
>



[PATCH net-next 0/2] tun: AF_XDP Rx zero-copy support

2024-01-24 Thread Yunjian Wang
Now, some drivers support the zero-copy feature of AF_XDP sockets,
which can significantly reduce CPU utilization for XDP programs.

This patch set enables tun to also support the AF_XDP Rx zero-copy
feature, with the following changes:
1. The unnecessary 'dma_page' check has been removed when binding
the AF_XDP socket to the device, as it is not needed for the vNIC.
2. A check has been added when consuming the buffer in
vhost_net_buf_produce(), as the vq's rx_array may be empty after
AF_XDP Rx zero-copy is enabled.
3. The peek_len function is now used to consume a xsk->desc and
retrieve its length.
4. AF_XDP Rx zero-copy support has been added for tun.

This patchset is based on Linux 6.4.0+(openEuler 23.09) and has
successfully passed Netperf and Netserver stress testing with
multiple streams between VM A and VM B, using AF_XDP and OVS.

With this patch applied, the system CPU usage for OVS' PMD has
decreased from 59.8% to 8.8% while forwarding 70pps. 

Yunjian Wang (2):
  xsk: Remove non-zero 'dma_page' check in xp_assign_dev
  tun: AF_XDP Rx zero-copy support

 drivers/net/tun.c   | 165 +++-
 drivers/vhost/net.c |  18 +++--
 net/xdp/xsk_buff_pool.c |   7 --
 3 files changed, 176 insertions(+), 14 deletions(-)

-- 
2.33.0




[PATCH net-next 1/2] xsk: Remove non-zero 'dma_page' check in xp_assign_dev

2024-01-24 Thread Yunjian Wang
Now dma mappings are used by the physical NICs. However the vNIC
maybe do not need them. So remove non-zero 'dma_page' check in
xp_assign_dev.

Signed-off-by: Yunjian Wang 
---
 net/xdp/xsk_buff_pool.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index 28711cc44ced..939b6e7b59ff 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -219,16 +219,9 @@ int xp_assign_dev(struct xsk_buff_pool *pool,
if (err)
goto err_unreg_pool;
 
-   if (!pool->dma_pages) {
-   WARN(1, "Driver did not DMA map zero-copy buffers");
-   err = -EINVAL;
-   goto err_unreg_xsk;
-   }
pool->umem->zc = true;
return 0;
 
-err_unreg_xsk:
-   xp_disable_drv_zc(pool);
 err_unreg_pool:
if (!force_zc)
err = 0; /* fallback to copy mode */
-- 
2.33.0




[PATCH net-next 2/2] tun: AF_XDP Rx zero-copy support

2024-01-24 Thread Yunjian Wang
Now the zero-copy feature of AF_XDP socket is supported by some
drivers, which can reduce CPU utilization on the xdp program.
This patch set allows tun to support AF_XDP Rx zero-copy feature.

This patch tries to address this by:
- Use peek_len to consume a xsk->desc and get xsk->desc length.
- When the tun support AF_XDP Rx zero-copy, the vq's array maybe empty.
So add a check for empty vq's array in vhost_net_buf_produce().
- add XDP_SETUP_XSK_POOL and ndo_xsk_wakeup callback support
- add tun_put_user_desc function to copy the Rx data to VM

Signed-off-by: Yunjian Wang 
---
 drivers/net/tun.c   | 165 +++-
 drivers/vhost/net.c |  18 +++--
 2 files changed, 176 insertions(+), 7 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index afa5497f7c35..248b0f8e07d1 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -77,6 +77,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -145,6 +146,10 @@ struct tun_file {
struct tun_struct *detached;
struct ptr_ring tx_ring;
struct xdp_rxq_info xdp_rxq;
+   struct xdp_desc desc;
+   /* protects xsk pool */
+   spinlock_t pool_lock;
+   struct xsk_buff_pool *pool;
 };
 
 struct tun_page {
@@ -208,6 +213,8 @@ struct tun_struct {
struct bpf_prog __rcu *xdp_prog;
struct tun_prog __rcu *steering_prog;
struct tun_prog __rcu *filter_prog;
+   /* tracks AF_XDP ZC enabled queues */
+   unsigned long *af_xdp_zc_qps;
struct ethtool_link_ksettings link_ksettings;
/* init args */
struct file *file;
@@ -795,6 +802,8 @@ static int tun_attach(struct tun_struct *tun, struct file 
*file,
 
tfile->queue_index = tun->numqueues;
tfile->socket.sk->sk_shutdown &= ~RCV_SHUTDOWN;
+   tfile->desc.len = 0;
+   tfile->pool = NULL;
 
if (tfile->detached) {
/* Re-attach detached tfile, updating XDP queue_index */
@@ -989,6 +998,13 @@ static int tun_net_init(struct net_device *dev)
return err;
}
 
+   tun->af_xdp_zc_qps = bitmap_zalloc(MAX_TAP_QUEUES, GFP_KERNEL);
+   if (!tun->af_xdp_zc_qps) {
+   security_tun_dev_free_security(tun->security);
+   free_percpu(dev->tstats);
+   return -ENOMEM;
+   }
+
tun_flow_init(tun);
 
dev->hw_features = NETIF_F_SG | NETIF_F_FRAGLIST |
@@ -1009,6 +1025,7 @@ static int tun_net_init(struct net_device *dev)
tun_flow_uninit(tun);
security_tun_dev_free_security(tun->security);
free_percpu(dev->tstats);
+   bitmap_free(tun->af_xdp_zc_qps);
return err;
}
return 0;
@@ -1222,11 +1239,77 @@ static int tun_xdp_set(struct net_device *dev, struct 
bpf_prog *prog,
return 0;
 }
 
+static int tun_xsk_pool_enable(struct net_device *netdev,
+  struct xsk_buff_pool *pool,
+  u16 qid)
+{
+   struct tun_struct *tun = netdev_priv(netdev);
+   struct tun_file *tfile;
+   unsigned long flags;
+
+   rcu_read_lock();
+   tfile = rtnl_dereference(tun->tfiles[qid]);
+   if (!tfile) {
+   rcu_read_unlock();
+   return -ENODEV;
+   }
+
+   spin_lock_irqsave(&tfile->pool_lock, flags);
+   xsk_pool_set_rxq_info(pool, &tfile->xdp_rxq);
+   tfile->pool = pool;
+   spin_unlock_irqrestore(&tfile->pool_lock, flags);
+
+   rcu_read_unlock();
+   set_bit(qid, tun->af_xdp_zc_qps);
+
+   return 0;
+}
+
+static int tun_xsk_pool_disable(struct net_device *netdev, u16 qid)
+{
+   struct tun_struct *tun = netdev_priv(netdev);
+   struct tun_file *tfile;
+   unsigned long flags;
+
+   if (!test_bit(qid, tun->af_xdp_zc_qps))
+   return 0;
+
+   clear_bit(qid, tun->af_xdp_zc_qps);
+
+   rcu_read_lock();
+   tfile = rtnl_dereference(tun->tfiles[qid]);
+   if (!tfile) {
+   rcu_read_unlock();
+   return 0;
+   }
+
+   spin_lock_irqsave(&tfile->pool_lock, flags);
+   if (tfile->desc.len) {
+   xsk_tx_completed(tfile->pool, 1);
+   tfile->desc.len = 0;
+   }
+   tfile->pool = NULL;
+   spin_unlock_irqrestore(&tfile->pool_lock, flags);
+
+   rcu_read_unlock();
+   return 0;
+}
+
+int tun_xsk_pool_setup(struct net_device *dev, struct xsk_buff_pool *pool,
+  u16 qid)
+{
+   return pool ? tun_xsk_pool_enable(dev, pool, qid) :
+   tun_xsk_pool_disable(dev, qid);
+}
+
 static int tun_xdp(struct net_device *dev, struct netdev_bpf *xdp)
 {
switch (xdp->command) {
case XDP_SETUP_PROG:
return tun_xdp_set(dev, xdp->prog, xdp->extack);
+   case XDP_SETUP_XSK_POOL:
+   return tun_xsk_pool_setup(dev, xdp->xsk.pool,
+  xdp->xsk.queue_id);
 

Re: [PATCH] tracing: Include PPIN in mce_record tracepoint

2024-01-24 Thread Borislav Petkov
On Tue, Jan 23, 2024 at 08:38:53PM -0500, Steven Rostedt wrote:
> Yes, rasdaemon uses libtraceevent (or a copy of it internally) that
> reads the format file to find fields. You can safely add fields to the
> middle of the event structure and the parsing will be just fine.

Should we worry about tools who consume the event "blindly", without the
lib?

I guess no until we break some use case and then we will have to revert.
At least this is what we've done in the past...

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [syzbot] [virtualization?] KMSAN: uninit-value in virtqueue_add (4)

2024-01-24 Thread Alexander Potapenko
On Thu, Jan 4, 2024 at 9:45 PM Stefan Hajnoczi  wrote:
>
> On Tue, Jan 02, 2024 at 08:03:46AM -0500, Michael S. Tsirkin wrote:
> > On Mon, Jan 01, 2024 at 05:38:24AM -0800, syzbot wrote:
> > > Hello,
> > >
> > > syzbot found the following issue on:
> > >
> > > HEAD commit:fbafc3e621c3 Merge tag 'for_linus' of 
> > > git://git.kernel.org..
> > > git tree:   upstream
> > > console+strace: https://syzkaller.appspot.com/x/log.txt?x=173df3e9e8
> > > kernel config:  https://syzkaller.appspot.com/x/.config?x=e0c7078a6b901aa3
> > > dashboard link: 
> > > https://syzkaller.appspot.com/bug?extid=d7521c1e3841ed075a42
> > > compiler:   Debian clang version 15.0.6, GNU ld (GNU Binutils for 
> > > Debian) 2.40
> > > syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=1300b4a1e8
> > > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=130b0379e8
> > >
> > > Downloadable assets:
> > > disk image: 
> > > https://storage.googleapis.com/syzbot-assets/1520f7b6daa4/disk-fbafc3e6.raw.xz
> > > vmlinux: 
> > > https://storage.googleapis.com/syzbot-assets/8b490af009d5/vmlinux-fbafc3e6.xz
> > > kernel image: 
> > > https://storage.googleapis.com/syzbot-assets/202ca200f4a4/bzImage-fbafc3e6.xz
> > >
> > > IMPORTANT: if you fix the issue, please add the following tag to the 
> > > commit:
> > > Reported-by: syzbot+d7521c1e3841ed075...@syzkaller.appspotmail.com
> > >
> > > =
>
> Hi Alexander,
> Please take a look at this KMSAN failure. The uninitialized memory was
> created for the purpose of writing a coredump. vring_map_one_sg() should
> have direction=DMA_TO_DEVICE.
>
Hi Stefan,

I took a closer look, and am pretty confident this is a false positive.
I tried adding memset(..., 0xab, PAGE_SIZE << order) to alloc_pages()
and never saw
the 0xab pattern in the buffers for which KMSAN reported an error.

This probably isn't an error in 88938359e2df ("virtio: kmsan:
check/unpoison scatterlist in
vring_map_one_sg()"), which by itself should be doing a sane thing:
report an error if an
uninitialized buffer is passed to it. It is more likely that we're
missing some initialization that
happens in coredump.c

Does anyone have an idea where coredump.c is supposed to be
initializing these pages?
Maybe there are some inline assembly functions involved in copying the data?



Re: [PATCH v2 2/3] virtio_net: Add missing virtio header in skb for XDP_PASS

2024-01-24 Thread Heng Qi




在 2024/1/24 下午4:57, Liang Chen 写道:

For the XDP_PASS scenario of the XDP path, the skb constructed with
xdp_buff does not include the virtio header. Adding the virtio header
information back when creating the skb.

Signed-off-by: Liang Chen 
---
  drivers/net/virtio_net.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index b56828804e5f..2de46eb4c661 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1270,6 +1270,9 @@ static struct sk_buff *receive_small_xdp(struct 
net_device *dev,
if (unlikely(!skb))
goto err;
  
+	/* Store the original virtio header for subsequent use by the driver. */

+   memcpy(skb_vnet_common_hdr(skb), &virtnet_xdp.hdr, vi->hdr_len);


If xdp push or xdp pull modifies xdp_buff, will the original header 
still apply to the modified data?


Thanks,
Heng


+
if (metasize)
skb_metadata_set(skb, metasize);
  
@@ -1635,6 +1638,9 @@ static struct sk_buff *receive_mergeable_xdp(struct net_device *dev,

head_skb = build_skb_from_xdp_buff(dev, vi, xdp, 
xdp_frags_truesz);
if (unlikely(!head_skb))
break;
+   /* Store the original virtio header for subsequent use by the 
driver. */
+   memcpy(skb_vnet_common_hdr(head_skb), &virtnet_xdp.hdr, 
vi->hdr_len);
+
return head_skb;
  
  	case XDP_TX:





Re: [PATCH 2/3] clk: qcom: gcc-msm8953: add MDSS_BCR reset

2024-01-24 Thread Konrad Dybcio




On 1/23/24 22:03, Luca Weiss wrote:

From: Vladimir Lypak 

Add an entry in the gcc driver for the MDSS_BCR reset found on MSM8953.

Signed-off-by: Vladimir Lypak 
[luca: expand commit message, move entry]
Signed-off-by: Luca Weiss 
---


I found some more definitions in lk2nd

88:#define  GCC_CRYPTO_BCR(CLK_CTL_BASE + 0x16000)
106:#define SDCC1_BCR  (CLK_CTL_BASE + 0x42000) /* 
block reset*/
125:#define SDCC2_BCR  (CLK_CTL_BASE + 0x43000) /* 
block reset */
150:#define USB_HS_BCR (CLK_CTL_BASE + 0x41000)
155:#define GCC_QUSB2_PHY_BCR  (CLK_CTL_BASE + 0x4103C)
168:#define USB_30_BCR  (CLK_CTL_BASE + 0x3F070)
189:#define USB3_PHY_BCR(CLK_CTL_BASE + 0x3F034)
190:#define USB3PHY_PHY_BCR (CLK_CTL_BASE + 0x3F03C)

Couldn't find this one though, did you confirm that MDSS goes off
when you assert it?

Konrad



Re: [PATCH v4 3/7] tracing/probes: support '%pd' type for print struct dentry's name

2024-01-24 Thread Google
On Wed, 24 Jan 2024 10:46:10 +0800
"yebin (H)"  wrote:

> 
> 
> On 2024/1/23 22:40, Masami Hiramatsu (Google) wrote:
> > On Tue, 23 Jan 2024 17:21:35 +0800
> > Ye Bin  wrote:
> >
> >> Similar to '%pd' for printk, use '%pd' for print struct dentry's name.
> >>
> >> Signed-off-by: Ye Bin 
> >> ---
> >>   kernel/trace/trace_kprobe.c | 6 ++
> >>   kernel/trace/trace_probe.h  | 1 +
> >>   2 files changed, 7 insertions(+)
> >>
> >> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> >> index c4c6e0e0068b..00b74530fbad 100644
> >> --- a/kernel/trace/trace_kprobe.c
> >> +++ b/kernel/trace/trace_kprobe.c
> >> @@ -779,6 +779,7 @@ static int __trace_kprobe_create(int argc, const char 
> >> *argv[])
> >>char buf[MAX_EVENT_NAME_LEN];
> >>char gbuf[MAX_EVENT_NAME_LEN];
> >>char abuf[MAX_BTF_ARGS_LEN];
> >> +  char dbuf[MAX_DENTRY_ARGS_LEN];
> > Hmm, no, I don't like to expand stack anymore. Please allocate it
> > from heap.
> Do I need to change the other buffers on the stacks to allocate memory 
> from heap?

No, that is not needed for this series, but if you want, you can :)

> >>struct traceprobe_parse_context ctx = { .flags = TPARG_FL_KERNEL };
> >>   
> >>switch (argv[0][0]) {
> >> @@ -930,6 +931,11 @@ static int __trace_kprobe_create(int argc, const char 
> >> *argv[])
> >>argv = new_argv;
> >>}
> >>   
> >> +  ret = traceprobe_expand_dentry_args(argc, argv, dbuf,
> >> +  MAX_DENTRY_ARGS_LEN);
> >> +  if (ret)
> >> +  goto out;
> > And calling this here will not cover the trace_fprobe.
> >
> > Could you call this from traceprobe_expand_meta_args() instead of
> > calling it directly from trace_kprobe? Then it can be used from
> > fprobe_event too.
> >
> > Thank you,
> At first I wanted to implement the extension logic in 
> traceprobe_expand_meta_args(),
> but I found that the code was difficult to understand when I started 
> writing.

Yeah, I also found that is a bit different usage.

> If fprobe_event
> wants to support this function, is traceprobe_expand_dentry_args() also 
> called?

Yes, it is for expanding '$arg*' into '$arg1 $arg2 ...'

> Or re-encapsulate
> an interface to include the logic of different extensions. In this way, 
> the same buffer is used for
> the entire extension process, and the extension function needs to return 
> the information about
> the length of the buffer.

OK, I confirmed that will be too much complicated. Then can you just call
it from where the traceprobe_expand_meta_args() is called, which is 
__trace_fprobe_create()@trace_fprobe.c ?

But those should be simplified later.

Thank you,

-- 
Masami Hiramatsu (Google) 



Re: [PATCH 3/3] arm64: dts: qcom: msm8953: add reset for display subsystem

2024-01-24 Thread Konrad Dybcio




On 1/23/24 22:03, Luca Weiss wrote:

From: Vladimir Lypak 

With this reset we can avoid situations like IRQ storms from DSI host
before it even started probing (because boot-loader left DSI IRQs on).

Signed-off-by: Vladimir Lypak 
Signed-off-by: Luca Weiss 
---


Reviewed-by: Konrad Dybcio 

Konrad



Re: [PATCH] tracing: Include PPIN in mce_record tracepoint

2024-01-24 Thread Steven Rostedt
On Wed, 24 Jan 2024 10:57:08 +0100
Borislav Petkov  wrote:

> On Tue, Jan 23, 2024 at 08:38:53PM -0500, Steven Rostedt wrote:
> > Yes, rasdaemon uses libtraceevent (or a copy of it internally) that
> > reads the format file to find fields. You can safely add fields to the
> > middle of the event structure and the parsing will be just fine.  
> 
> Should we worry about tools who consume the event "blindly", without the
> lib?

I don't think that's a worry anymore. The offsets can change based on
kernel config. PowerTop needed to have the library ported to it because
it use to hardcode the offsets but then it broke when running the 32bit
version on a 64bit kernel.

> 
> I guess no until we break some use case and then we will have to revert.
> At least this is what we've done in the past...
> 

But that revert was reverted when we converted PowerTop to use libtraceevent.

-- Steve



Re: [PATCH v12 3/6] tracing: Add snapshot refcount

2024-01-24 Thread Google
On Tue, 23 Jan 2024 11:07:54 +
Vincent Donnefort  wrote:

[...]
> @@ -6592,8 +6641,11 @@ int tracing_set_tracer(struct trace_array *tr, const 
> char *buf)
>  
>   if (t->init) {
>   ret = tracer_init(t, tr);
> - if (ret)
> + if (ret) {
> + if (t->use_max_tr)
> + tracing_disarm_snapshot_locked(tr);

This part is out of CONFIG_TRACER_MAX_TRACE, so it may cause a compile error
if CONFIG_TRACER_MAX_TRACE is not set.

>   goto out;
> + }
>   }
>  
>   tr->current_trace = t;
[...]
> diff --git a/kernel/trace/trace_events_trigger.c 
> b/kernel/trace/trace_events_trigger.c
> index 46439e3bcec4..d41bf64741e2 100644
> --- a/kernel/trace/trace_events_trigger.c
> +++ b/kernel/trace/trace_events_trigger.c
> @@ -597,20 +597,9 @@ static int register_trigger(char *glob,
>   return ret;
>  }
>  
> -/**
> - * unregister_trigger - Generic event_command @unreg implementation
> - * @glob: The raw string used to register the trigger
> - * @test: Trigger-specific data used to find the trigger to remove
> - * @file: The trace_event_file associated with the event
> - *
> - * Common implementation for event trigger unregistration.
> - *
> - * Usually used directly as the @unreg method in event command
> - * implementations.
> - */
> -static void unregister_trigger(char *glob,
> -struct event_trigger_data *test,
> -struct trace_event_file *file)

OK, so __unregister_trigger returns true if data exists, but
unregister_trigger() ignores results. (I want some comment here)

> +static bool __unregister_trigger(char *glob,
> +  struct event_trigger_data *test,
> +  struct trace_event_file *file)
>  {
>   struct event_trigger_data *data = NULL, *iter;
>  
> @@ -626,8 +615,32 @@ static void unregister_trigger(char *glob,
>   }
>   }
>  
> - if (data && data->ops->free)
> - data->ops->free(data);
> + if (data) {
> + if (data->ops->free)
> + data->ops->free(data);
> +
> + return true;
> + }
> +
> + return false;
> +}
> +
> +/**
> + * unregister_trigger - Generic event_command @unreg implementation
> + * @glob: The raw string used to register the trigger
> + * @test: Trigger-specific data used to find the trigger to remove
> + * @file: The trace_event_file associated with the event
> + *
> + * Common implementation for event trigger unregistration.
> + *
> + * Usually used directly as the @unreg method in event command
> + * implementations.
> + */
> +static void unregister_trigger(char *glob,
> +struct event_trigger_data *test,
> +struct trace_event_file *file)
> +{
> + __unregister_trigger(glob, test, file);
>  }
>  
>  /*
> @@ -1470,12 +1483,20 @@ register_snapshot_trigger(char *glob,
> struct event_trigger_data *data,
> struct trace_event_file *file)
>  {
> - if (tracing_alloc_snapshot_instance(file->tr) != 0)
> + if (tracing_arm_snapshot(file->tr))
>   return 0;

BTW, is this return value correct? It seems that the register_*_trigger()
will return error code when it fails.

Thanks,

>  
>   return register_trigger(glob, data, file);
>  }
>  
> +static void unregister_snapshot_trigger(char *glob,
> + struct event_trigger_data *data,
> + struct trace_event_file *file)
> +{
> + if (__unregister_trigger(glob, data, file))
> + tracing_disarm_snapshot(file->tr);
> +}
> +
>  static int
>  snapshot_trigger_print(struct seq_file *m, struct event_trigger_data *data)
>  {
> @@ -1508,7 +1529,7 @@ static struct event_command trigger_snapshot_cmd = {
>   .trigger_type   = ETT_SNAPSHOT,
>   .parse  = event_trigger_parse,
>   .reg= register_snapshot_trigger,
> - .unreg  = unregister_trigger,
> + .unreg  = unregister_snapshot_trigger,
>   .get_trigger_ops= snapshot_get_trigger_ops,
>   .set_filter = set_trigger_filter,
>  };
> -- 
> 2.43.0.429.g432eaa2c6b-goog
> 


-- 
Masami Hiramatsu (Google) 



[PATCH] arm64: dts: qcom: sm6350: Add tsens thermal zones

2024-01-24 Thread Luca Weiss
   type = "critical";
+   };
+   };
+
+   cooling-maps {
+   map0 {
+   trip = <&gpuss1_alert0>;
+   cooling-device = <&gpu THERMAL_NO_LIMIT 
THERMAL_NO_LIMIT>;
+   };
+   };
+   };
+
+   modem-core0-thermal {
+   polling-delay-passive = <0>;
+   polling-delay = <0>;
+
+   thermal-sensors = <&tsens1 6>;
+
+   trips {
+   modem-core0-crit {
+   temperature = <125000>;
+   hysteresis = <0>;
+   type = "critical";
+   };
+   };
+   };
+
+   modem-core1-thermal {
+   polling-delay-passive = <0>;
+   polling-delay = <0>;
+
+   thermal-sensors = <&tsens1 7>;
+
+   trips {
+   modem-core1-crit {
+   temperature = <125000>;
+   hysteresis = <0>;
+   type = "critical";
+   };
+   };
+   };
+
+   modem-scl-thermal {
+   polling-delay-passive = <0>;
+   polling-delay = <0>;
+
+   thermal-sensors = <&tsens1 9>;
+
+   trips {
+   modem-scl-crit {
+   temperature = <125000>;
+   hysteresis = <0>;
+   type = "critical";
+   };
+   };
+   };
+
+   modem-vec-thermal {
+   polling-delay-passive = <0>;
+   polling-delay = <0>;
+
+   thermal-sensors = <&tsens1 8>;
+
+   trips {
+   modem-vec-crit {
+   temperature = <125000>;
+   hysteresis = <0>;
+   type = "critical";
+   };
+   };
+   };
+
+   npu-thermal {
+   polling-delay-passive = <0>;
+   polling-delay = <0>;
+
+   thermal-sensors = <&tsens1 10>;
+
+   trips {
+   npu-crit {
+   temperature = <125000>;
+   hysteresis = <0>;
+   type = "critical";
+   };
+   };
+   };
+
+   q6-hvx-thermal {
+   polling-delay-passive = <0>;
+   polling-delay = <0>;
+
+   thermal-sensors = <&tsens1 4>;
+
+   trips {
+   q6-hvx-crit {
+   temperature = <125000>;
+   hysteresis = <0>;
+   type = "critical";
+   };
+   };
+   };
+
+   video-thermal {
+   polling-delay-passive = <0>;
+   polling-delay = <0>;
+
+   thermal-sensors = <&tsens1 11>;
+
+   trips {
+   video-crit {
+   temperature = <125000>;
+   hysteresis = <0>;
+   type = "critical";
+   };
+   };
+   };
+   };
+
timer {
compatible = "arm,armv8-timer";
clock-frequency = <1920>;

---
base-commit: 6fd66bc55f7ed910980b9cc4afe31e1bb066c7e0
change-id: 20240124-sm6350-tsens-c17692ffad3b

Best regards,
-- 
Luca Weiss 




Re: general protection fault in ath9k_wmi_event_tasklet

2024-01-24 Thread Toke Høiland-Jørgensen
"Ubisectech Sirius"  writes:

> Hello.
> We are Ubisectech Sirius Team, the vulnerability lab of China ValiantSec. 
> Recently, our team has discovered a issue in Linux kernel 
> 6.7.0-g9d1694dc91ce. Attached to the email were a POC file of the issue.
> Stack dump:
> general protection fault, probably for non-canonical address 
> 0xdc38:  [#1] PREEMPT SMP KASAN
> KASAN: null-ptr-deref in range [0x01c0-0x01c7]
> CPU: 0 PID: 16 Comm: ksoftirqd/0 Not tainted 6.7.0-g9d1694dc91ce #20
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 
> 04/01/2014
> RIP: 0010:__queue_work+0x9d/0x1160 kernel/workqueue.c:1727
> Call Trace:
> 
> queue_work_on+0xf2/0x110 kernel/workqueue.c:1837
> queue_work include/linux/workqueue.h:548 [inline]
> ieee80211_queue_work net/mac80211/util.c:898 [inline]
> ieee80211_queue_work+0x111/0x180 net/mac80211/util.c:891
> ath9k_wmi_event_tasklet+0x327/0x450 drivers/net/wireless/ath/ath9k/wmi.c:168
> tasklet_action_common.constprop.0+0x229/0x390 kernel/softirq.c:780
> __do_softirq+0x1d4/0x85e kernel/softirq.c:553
> run_ksoftirqd kernel/softirq.c:921 [inline]
> run_ksoftirqd+0x31/0x60 kernel/softirq.c:913
> smpboot_thread_fn+0x63c/0x9f0 kernel/smpboot.c:164
> kthread+0x2cc/0x3b0 kernel/kthread.c:388
> ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
> ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
> 
> Modules linked in:
> ---[ end trace  ]---
> RIP: 0010:__queue_work+0x9d/0x1160 kernel/workqueue.c:1727
> Thank you for taking the time to read this email and we look forward
> to working with you further.

Hmm, so from eyeballing the code in question, this looks like it is
another initialisation race along the lines of the one fixed in commit:
8b3046abc99e ("ath9k_htc: fix NULL pointer dereference at 
ath9k_htc_tx_get_packet()")

Could you please test the patch below and see if you can still reproduce
this issue with that applied?

-Toke



diff --git a/drivers/net/wireless/ath/ath9k/htc.h 
b/drivers/net/wireless/ath/ath9k/htc.h
index 237f4ec2cffd..6c33e898b300 100644
--- a/drivers/net/wireless/ath/ath9k/htc.h
+++ b/drivers/net/wireless/ath/ath9k/htc.h
@@ -306,7 +306,6 @@ struct ath9k_htc_tx {
DECLARE_BITMAP(tx_slot, MAX_TX_BUF_NUM);
struct timer_list cleanup_timer;
spinlock_t tx_lock;
-   bool initialized;
 };
 
 struct ath9k_htc_tx_ctl {
@@ -515,6 +514,7 @@ struct ath9k_htc_priv {
unsigned long ps_usecount;
bool ps_enabled;
bool ps_idle;
+   bool initialized;
 
 #ifdef CONFIG_MAC80211_LEDS
enum led_brightness brightness;
diff --git a/drivers/net/wireless/ath/ath9k/htc_drv_init.c 
b/drivers/net/wireless/ath/ath9k/htc_drv_init.c
index dae3d9c7b640..fc339079ee8c 100644
--- a/drivers/net/wireless/ath/ath9k/htc_drv_init.c
+++ b/drivers/net/wireless/ath/ath9k/htc_drv_init.c
@@ -966,6 +966,10 @@ int ath9k_htc_probe_device(struct htc_target *htc_handle, 
struct device *dev,
 
htc_handle->drv_priv = priv;
 
+   /* Allow ath9k_wmi_event_tasklet() to operate. */
+   smp_wmb();
+   priv->initialized = true;
+
return 0;
 
 err_init:
diff --git a/drivers/net/wireless/ath/ath9k/htc_drv_txrx.c 
b/drivers/net/wireless/ath/ath9k/htc_drv_txrx.c
index 672789e3c55d..768ed8ea5c9f 100644
--- a/drivers/net/wireless/ath/ath9k/htc_drv_txrx.c
+++ b/drivers/net/wireless/ath/ath9k/htc_drv_txrx.c
@@ -814,10 +814,6 @@ int ath9k_tx_init(struct ath9k_htc_priv *priv)
skb_queue_head_init(&priv->tx.data_vo_queue);
skb_queue_head_init(&priv->tx.tx_failed);
 
-   /* Allow ath9k_wmi_event_tasklet(WMI_TXSTATUS_EVENTID) to operate. */
-   smp_wmb();
-   priv->tx.initialized = true;
-
return 0;
 }
 
diff --git a/drivers/net/wireless/ath/ath9k/wmi.c 
b/drivers/net/wireless/ath/ath9k/wmi.c
index 1476b42b52a9..805ad31edba2 100644
--- a/drivers/net/wireless/ath/ath9k/wmi.c
+++ b/drivers/net/wireless/ath/ath9k/wmi.c
@@ -155,6 +155,12 @@ void ath9k_wmi_event_tasklet(struct tasklet_struct *t)
}
spin_unlock_irqrestore(&wmi->wmi_lock, flags);
 
+   /* Check if ath9k_htc_probe_device() completed. */
+   if (!data_race(priv->initialized)) {
+   kfree_skb(skb);
+   continue;
+   }
+
hdr = (struct wmi_cmd_hdr *) skb->data;
cmd_id = be16_to_cpu(hdr->command_id);
wmi_event = skb_pull(skb, sizeof(struct wmi_cmd_hdr));
@@ -169,10 +175,6 @@ void ath9k_wmi_event_tasklet(struct tasklet_struct *t)
 &wmi->drv_priv->fatal_work);
break;
case WMI_TXSTATUS_EVENTID:
-   /* Check if ath9k_tx_init() completed. */
-   if (!data_race(priv->tx.initialized))
-   break;
-
spin_lock_bh(&priv->tx.tx_lock);

Re: Front camera on pinephone

2024-01-24 Thread Pavel Machek
Hi!

> Isn't this simply the case of picking the gc2145 bits from Megis tree?
> 
> https://megous.com/git/linux/tree/arch/arm64/boot/dts/allwinner/sun50i-a64-pinephone.dtsi?h=orange-pi-5.10#n410

Well, that, adjusting dts bits to the new driver, testing and
submitting the patch. And it looks someone did the work and patches
are floating around. GOod :-).

I'm fighting with kexec crashing as soon as I pass dtb. Do I need to
modify the dtb somehow?

Best regards,
Pavel



Re: [PATCH net-next 2/2] tun: AF_XDP Rx zero-copy support

2024-01-24 Thread Willem de Bruijn
Yunjian Wang wrote:
> Now the zero-copy feature of AF_XDP socket is supported by some
> drivers, which can reduce CPU utilization on the xdp program.
> This patch set allows tun to support AF_XDP Rx zero-copy feature.
> 
> This patch tries to address this by:
> - Use peek_len to consume a xsk->desc and get xsk->desc length.
> - When the tun support AF_XDP Rx zero-copy, the vq's array maybe empty.
> So add a check for empty vq's array in vhost_net_buf_produce().
> - add XDP_SETUP_XSK_POOL and ndo_xsk_wakeup callback support
> - add tun_put_user_desc function to copy the Rx data to VM
> 
> Signed-off-by: Yunjian Wang 

I don't fully understand the higher level design of this feature yet.

But some initial comments at the code level.

> ---
>  drivers/net/tun.c   | 165 +++-
>  drivers/vhost/net.c |  18 +++--
>  2 files changed, 176 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index afa5497f7c35..248b0f8e07d1 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -77,6 +77,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -145,6 +146,10 @@ struct tun_file {
>   struct tun_struct *detached;
>   struct ptr_ring tx_ring;
>   struct xdp_rxq_info xdp_rxq;
> + struct xdp_desc desc;
> + /* protects xsk pool */
> + spinlock_t pool_lock;
> + struct xsk_buff_pool *pool;
>  };
>  
>  struct tun_page {
> @@ -208,6 +213,8 @@ struct tun_struct {
>   struct bpf_prog __rcu *xdp_prog;
>   struct tun_prog __rcu *steering_prog;
>   struct tun_prog __rcu *filter_prog;
> + /* tracks AF_XDP ZC enabled queues */
> + unsigned long *af_xdp_zc_qps;
>   struct ethtool_link_ksettings link_ksettings;
>   /* init args */
>   struct file *file;
> @@ -795,6 +802,8 @@ static int tun_attach(struct tun_struct *tun, struct file 
> *file,
>  
>   tfile->queue_index = tun->numqueues;
>   tfile->socket.sk->sk_shutdown &= ~RCV_SHUTDOWN;
> + tfile->desc.len = 0;
> + tfile->pool = NULL;
>  
>   if (tfile->detached) {
>   /* Re-attach detached tfile, updating XDP queue_index */
> @@ -989,6 +998,13 @@ static int tun_net_init(struct net_device *dev)
>   return err;
>   }
>  
> + tun->af_xdp_zc_qps = bitmap_zalloc(MAX_TAP_QUEUES, GFP_KERNEL);
> + if (!tun->af_xdp_zc_qps) {
> + security_tun_dev_free_security(tun->security);
> + free_percpu(dev->tstats);
> + return -ENOMEM;
> + }
> +
>   tun_flow_init(tun);
>  
>   dev->hw_features = NETIF_F_SG | NETIF_F_FRAGLIST |
> @@ -1009,6 +1025,7 @@ static int tun_net_init(struct net_device *dev)
>   tun_flow_uninit(tun);
>   security_tun_dev_free_security(tun->security);
>   free_percpu(dev->tstats);
> + bitmap_free(tun->af_xdp_zc_qps);

Please release state in inverse order of acquire.

>   return err;
>   }
>   return 0;
> @@ -1222,11 +1239,77 @@ static int tun_xdp_set(struct net_device *dev, struct 
> bpf_prog *prog,
>   return 0;
>  }
>  
> +static int tun_xsk_pool_enable(struct net_device *netdev,
> +struct xsk_buff_pool *pool,
> +u16 qid)
> +{
> + struct tun_struct *tun = netdev_priv(netdev);
> + struct tun_file *tfile;
> + unsigned long flags;
> +
> + rcu_read_lock();
> + tfile = rtnl_dereference(tun->tfiles[qid]);
> + if (!tfile) {
> + rcu_read_unlock();
> + return -ENODEV;
> + }

No need for rcu_read_lock with rtnl_dereference.

Consider ASSERT_RTNL() if unsure whether this patch could be reached
without the rtnl held.

> +
> + spin_lock_irqsave(&tfile->pool_lock, flags);
> + xsk_pool_set_rxq_info(pool, &tfile->xdp_rxq);
> + tfile->pool = pool;
> + spin_unlock_irqrestore(&tfile->pool_lock, flags);
> +
> + rcu_read_unlock();
> + set_bit(qid, tun->af_xdp_zc_qps);

What are the concurrency semantics: there's a spinlock to make
the update to xdp_rxq and pool a critical section, but the bitmap
is not part of this? Please also then document why the irqsave.

> +
> + return 0;
> +}
> +
> +static int tun_xsk_pool_disable(struct net_device *netdev, u16 qid)
> +{
> + struct tun_struct *tun = netdev_priv(netdev);
> + struct tun_file *tfile;
> + unsigned long flags;
> +
> + if (!test_bit(qid, tun->af_xdp_zc_qps))
> + return 0;
> +
> + clear_bit(qid, tun->af_xdp_zc_qps);

Time of check to time of use race between test and clear? Or is
there no race because anything that clears will hold the RTNL? If so,
please add a comment.

> +
> + rcu_read_lock();
> + tfile = rtnl_dereference(tun->tfiles[qid]);
> + if (!tfile) {
> + rcu_read_unlock();
> + return 0;
> + }
> +
> + spin_lock_irqsave(&tfile->pool_lock, flags);
> + if (tfile->desc.len) {
> +   

[PATCH net-next v1] vsock/test: print type for SOCK_SEQPACKET

2024-01-24 Thread Arseniy Krasnov
SOCK_SEQPACKET is supported for virtio transport, so do not interpret
such type of socket as unknown.

Signed-off-by: Arseniy Krasnov 
---
 tools/testing/vsock/vsock_diag_test.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/testing/vsock/vsock_diag_test.c 
b/tools/testing/vsock/vsock_diag_test.c
index 5e6049226b77..17aeba7cbd14 100644
--- a/tools/testing/vsock/vsock_diag_test.c
+++ b/tools/testing/vsock/vsock_diag_test.c
@@ -39,6 +39,8 @@ static const char *sock_type_str(int type)
return "DGRAM";
case SOCK_STREAM:
return "STREAM";
+   case SOCK_SEQPACKET:
+   return "SEQPACKET";
default:
return "INVALID TYPE";
}
-- 
2.25.1




[PATCH v7 0/5] Add DAX ABI for memmap_on_memory

2024-01-24 Thread Vishal Verma
This series adds sysfs ABI to control memmap_on_memory behavior for DAX
devices.

Patch 1 replaces incorrect device_lock() usage with a local rwsem - this
was identified during review.

Patch 2 is also a preparatory patch that replaces sprintf() for sysfs
operations with sysfs_emit()

Patch 3 adds the missing documentation for the sysfs ABI for DAX regions
and Dax devices.

Patch 4 exports mhp_supports_memmap_on_memory().

Patch 5 adds the new ABI for toggling memmap_on_memory semantics for dax
devices.

---
Changes in v7:
- Rebase to v6.8-rc1
- Remove an unnecessary 'size' variable. (Matthew)
- Replace device lock (ab)use in dax/bus.c with local rwsems (Greg)
- Replace sprintf() usage with sysfs_emit() (Greg)
- Link to v6: 
https://lore.kernel.org/r/20231214-vv-dax_abi-v6-0-ad900d698...@intel.com

Changes in v6:
- Use sysfs_emit() in memmap_on_memory_show() (Greg)
- Change the ABI documentation date for memmap_on_memory to January 2024
  as that's likely when the 6.8 merge window will fall (Greg)
- Fix dev->driver check (Ying)
- Link to v5: 
https://lore.kernel.org/r/20231214-vv-dax_abi-v5-0-3f7b00696...@intel.com

Changes in v5:
- Export and check mhp_supports_memmap_on_memory() in the DAX sysfs ABI
  (David)
- Obtain dax_drv under the device lock (Ying)
- Check dax_drv for NULL before dereferencing it (Ying)
- Clean up some repetition in sysfs-bus-dax documentation entries
  (Jonathan)
- A few additional cleanups enabled by guard(device) (Jonathan)
- Drop the DEFINE_GUARD() part of patch 2, add dependency on Dan's patch
  above so it can be backported / applied separately (Jonathan, Dan)
- Link to v4: 
https://lore.kernel.org/r/20231212-vv-dax_abi-v4-0-1351758f0...@intel.com

Changes in v4:
- Hold the device lock when checking if the dax_dev is bound to kmem
  (Ying, Dan)
- Remove dax region checks (and locks) as they were unnecessary.
- Introduce guard(device) for device lock/unlock (Dan)
- Convert the rest of drivers/dax/bus.c to guard(device)
- Link to v3: 
https://lore.kernel.org/r/20231211-vv-dax_abi-v3-0-acf6cc1bd...@intel.com

Changes in v3:
- Fix typo in ABI docs (Zhijian Li)
- Add kernel config and module parameter dependencies to the ABI docs
  entry (David Hildenbrand)
- Ensure kmem isn't active when setting the sysfs attribute (Ying
  Huang)
- Simplify returning from memmap_on_memory_store()
- Link to v2: 
https://lore.kernel.org/r/20231206-vv-dax_abi-v2-0-f4f4f2336...@intel.com

Changes in v2:
- Fix CC lists, patch 1/2 didn't get sent correctly in v1
- Link to v1: 
https://lore.kernel.org/r/20231206-vv-dax_abi-v1-0-474eb88e2...@intel.com

Cc: 
Cc: 
Cc: 
Cc: David Hildenbrand 
Cc: Dave Hansen 
Cc: Huang Ying 
Cc: Greg Kroah-Hartman 
Cc: Matthew Wilcox 
Cc: 
To: Dan Williams 
To: Vishal Verma 
To: Dave Jiang 
To: Andrew Morton 
To: Oscar Salvador 

---
Vishal Verma (5):
  dax/bus.c: replace driver-core lock usage by a local rwsem
  dax/bus.c: replace several sprintf() with sysfs_emit()
  Documentatiion/ABI: Add ABI documentation for sys-bus-dax
  mm/memory_hotplug: export mhp_supports_memmap_on_memory()
  dax: add a sysfs knob to control memmap_on_memory behavior

 include/linux/memory_hotplug.h  |   6 +
 drivers/dax/bus.c   | 295 +++-
 mm/memory_hotplug.c |  17 +-
 Documentation/ABI/testing/sysfs-bus-dax | 153 +
 4 files changed, 381 insertions(+), 90 deletions(-)
---
base-commit: 6613476e225e090cc9aad49be7fa504e290dd33d
change-id: 20231025-vv-dax_abi-17a219c46076

Best regards,
-- 
Vishal Verma 




[PATCH v7 1/5] dax/bus.c: replace driver-core lock usage by a local rwsem

2024-01-24 Thread Vishal Verma
The dax driver incorrectly used driver-core device locks to protect
internal dax region and dax device configuration structures. Replace the
device lock usage with a local rwsem, one each for dax region
configuration and dax device configuration. As a result of this
conversion, no device_lock() usage remains in dax/bus.c.

Cc: Dan Williams 
Reported-by: Greg Kroah-Hartman 
Signed-off-by: Vishal Verma 
---
 drivers/dax/bus.c | 220 ++
 1 file changed, 157 insertions(+), 63 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 1ff1ab5fa105..cb148f74ceda 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -12,6 +12,18 @@
 
 static DEFINE_MUTEX(dax_bus_lock);
 
+/*
+ * All changes to the dax region configuration occur with this lock held
+ * for write.
+ */
+DECLARE_RWSEM(dax_region_rwsem);
+
+/*
+ * All changes to the dax device configuration occur with this lock held
+ * for write.
+ */
+DECLARE_RWSEM(dax_dev_rwsem);
+
 #define DAX_NAME_LEN 30
 struct dax_id {
struct list_head list;
@@ -180,7 +192,7 @@ static u64 dev_dax_size(struct dev_dax *dev_dax)
u64 size = 0;
int i;
 
-   device_lock_assert(&dev_dax->dev);
+   WARN_ON_ONCE(!rwsem_is_locked(&dax_dev_rwsem));
 
for (i = 0; i < dev_dax->nr_range; i++)
size += range_len(&dev_dax->ranges[i].range);
@@ -194,8 +206,15 @@ static int dax_bus_probe(struct device *dev)
struct dev_dax *dev_dax = to_dev_dax(dev);
struct dax_region *dax_region = dev_dax->region;
int rc;
+   u64 size;
 
-   if (dev_dax_size(dev_dax) == 0 || dev_dax->id < 0)
+   rc = down_read_interruptible(&dax_dev_rwsem);
+   if (rc)
+   return rc;
+   size = dev_dax_size(dev_dax);
+   up_read(&dax_dev_rwsem);
+
+   if (size == 0 || dev_dax->id < 0)
return -ENXIO;
 
rc = dax_drv->probe(dev_dax);
@@ -283,7 +302,7 @@ static unsigned long long dax_region_avail_size(struct 
dax_region *dax_region)
resource_size_t size = resource_size(&dax_region->res);
struct resource *res;
 
-   device_lock_assert(dax_region->dev);
+   WARN_ON_ONCE(!rwsem_is_locked(&dax_region_rwsem));
 
for_each_dax_region_resource(dax_region, res)
size -= resource_size(res);
@@ -295,10 +314,13 @@ static ssize_t available_size_show(struct device *dev,
 {
struct dax_region *dax_region = dev_get_drvdata(dev);
unsigned long long size;
+   int rc;
 
-   device_lock(dev);
+   rc = down_read_interruptible(&dax_region_rwsem);
+   if (rc)
+   return rc;
size = dax_region_avail_size(dax_region);
-   device_unlock(dev);
+   up_read(&dax_region_rwsem);
 
return sprintf(buf, "%llu\n", size);
 }
@@ -314,10 +336,12 @@ static ssize_t seed_show(struct device *dev,
if (is_static(dax_region))
return -EINVAL;
 
-   device_lock(dev);
+   rc = down_read_interruptible(&dax_region_rwsem);
+   if (rc)
+   return rc;
seed = dax_region->seed;
rc = sprintf(buf, "%s\n", seed ? dev_name(seed) : "");
-   device_unlock(dev);
+   up_read(&dax_region_rwsem);
 
return rc;
 }
@@ -333,14 +357,18 @@ static ssize_t create_show(struct device *dev,
if (is_static(dax_region))
return -EINVAL;
 
-   device_lock(dev);
+   rc = down_read_interruptible(&dax_region_rwsem);
+   if (rc)
+   return rc;
youngest = dax_region->youngest;
rc = sprintf(buf, "%s\n", youngest ? dev_name(youngest) : "");
-   device_unlock(dev);
+   up_read(&dax_region_rwsem);
 
return rc;
 }
 
+static struct dev_dax *__devm_create_dev_dax(struct dev_dax_data *data);
+
 static ssize_t create_store(struct device *dev, struct device_attribute *attr,
const char *buf, size_t len)
 {
@@ -358,7 +386,9 @@ static ssize_t create_store(struct device *dev, struct 
device_attribute *attr,
if (val != 1)
return -EINVAL;
 
-   device_lock(dev);
+   rc = down_write_killable(&dax_region_rwsem);
+   if (rc)
+   return rc;
avail = dax_region_avail_size(dax_region);
if (avail == 0)
rc = -ENOSPC;
@@ -369,7 +399,7 @@ static ssize_t create_store(struct device *dev, struct 
device_attribute *attr,
.id = -1,
.memmap_on_memory = false,
};
-   struct dev_dax *dev_dax = devm_create_dev_dax(&data);
+   struct dev_dax *dev_dax = __devm_create_dev_dax(&data);
 
if (IS_ERR(dev_dax))
rc = PTR_ERR(dev_dax);
@@ -387,7 +417,7 @@ static ssize_t create_store(struct device *dev, struct 
device_attribute *attr,
rc = len;
}
}
-   device_unlock(dev);
+   up_write(&dax_region_rwsem);
 
   

[PATCH v7 3/5] Documentatiion/ABI: Add ABI documentation for sys-bus-dax

2024-01-24 Thread Vishal Verma
Add the missing sysfs ABI documentation for the device DAX subsystem.
Various ABI attributes under this have been present since v5.1, and more
have been added over time. In preparation for adding a new attribute,
add this file with the historical details.

Cc: Dan Williams 
Signed-off-by: Vishal Verma 
---
 Documentation/ABI/testing/sysfs-bus-dax | 136 
 1 file changed, 136 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-dax 
b/Documentation/ABI/testing/sysfs-bus-dax
new file mode 100644
index ..6359f7bc9bf4
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-dax
@@ -0,0 +1,136 @@
+What:  /sys/bus/dax/devices/daxX.Y/align
+Date:  October, 2020
+KernelVersion: v5.10
+Contact:   nvd...@lists.linux.dev
+Description:
+   (RW) Provides a way to specify an alignment for a dax device.
+   Values allowed are constrained by the physical address ranges
+   that back the dax device, and also by arch requirements.
+
+What:  /sys/bus/dax/devices/daxX.Y/mapping
+Date:  October, 2020
+KernelVersion: v5.10
+Contact:   nvd...@lists.linux.dev
+Description:
+   (WO) Provides a way to allocate a mapping range under a dax
+   device. Specified in the format -.
+
+What:  /sys/bus/dax/devices/daxX.Y/mapping[0..N]/start
+What:  /sys/bus/dax/devices/daxX.Y/mapping[0..N]/end
+What:  /sys/bus/dax/devices/daxX.Y/mapping[0..N]/page_offset
+Date:  October, 2020
+KernelVersion: v5.10
+Contact:   nvd...@lists.linux.dev
+Description:
+   (RO) A dax device may have multiple constituent discontiguous
+   address ranges. These are represented by the different
+   'mappingX' subdirectories. The 'start' attribute indicates the
+   start physical address for the given range. The 'end' attribute
+   indicates the end physical address for the given range. The
+   'page_offset' attribute indicates the offset of the current
+   range in the dax device.
+
+What:  /sys/bus/dax/devices/daxX.Y/resource
+Date:  June, 2019
+KernelVersion: v5.3
+Contact:   nvd...@lists.linux.dev
+Description:
+   (RO) The resource attribute indicates the starting physical
+   address of a dax device. In case of a device with multiple
+   constituent ranges, it indicates the starting address of the
+   first range.
+
+What:  /sys/bus/dax/devices/daxX.Y/size
+Date:  October, 2020
+KernelVersion: v5.10
+Contact:   nvd...@lists.linux.dev
+Description:
+   (RW) The size attribute indicates the total size of a dax
+   device. For creating subdivided dax devices, or for resizing
+   an existing device, the new size can be written to this as
+   part of the reconfiguration process.
+
+What:  /sys/bus/dax/devices/daxX.Y/numa_node
+Date:  November, 2019
+KernelVersion: v5.5
+Contact:   nvd...@lists.linux.dev
+Description:
+   (RO) If NUMA is enabled and the platform has affinitized the
+   backing device for this dax device, emit the CPU node
+   affinity for this device.
+
+What:  /sys/bus/dax/devices/daxX.Y/target_node
+Date:  February, 2019
+KernelVersion: v5.1
+Contact:   nvd...@lists.linux.dev
+Description:
+   (RO) The target-node attribute is the Linux numa-node that a
+   device-dax instance may create when it is online. Prior to
+   being online the device's 'numa_node' property reflects the
+   closest online cpu node which is the typical expectation of a
+   device 'numa_node'. Once it is online it becomes its own
+   distinct numa node.
+
+What:  $(readlink -f 
/sys/bus/dax/devices/daxX.Y)/../dax_region/available_size
+Date:  October, 2020
+KernelVersion: v5.10
+Contact:   nvd...@lists.linux.dev
+Description:
+   (RO) The available_size attribute tracks available dax region
+   capacity. This only applies to volatile hmem devices, not pmem
+   devices, since pmem devices are defined by nvdimm namespace
+   boundaries.
+
+What:  $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/size
+Date:  July, 2017
+KernelVersion: v5.1
+Contact:   nvd...@lists.linux.dev
+Description:
+   (RO) The size attribute indicates the size of a given dax region
+   in bytes.
+
+What:  $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/align
+Date:  October, 2020
+KernelVersion: v5.10
+Contact:   nvd...@lists.linux.dev
+Description:
+   (RO) The align attribute indicates alignment of the dax region.
+   Changes on align may not always be valid, when say certain
+  

[PATCH v7 4/5] mm/memory_hotplug: export mhp_supports_memmap_on_memory()

2024-01-24 Thread Vishal Verma
In preparation for adding sysfs ABI to toggle memmap_on_memory semantics
for drivers adding memory, export the mhp_supports_memmap_on_memory()
helper. This allows drivers to check if memmap_on_memory support is
available before trying to request it, and display an appropriate
message if it isn't available. As part of this, remove the size argument
to this - with recent updates to allow memmap_on_memory for larger
ranges, and the internal splitting of altmaps into respective memory
blocks, the size argument is meaningless.

Cc: Andrew Morton 
Cc: David Hildenbrand 
Cc: Michal Hocko 
Cc: Oscar Salvador 
Cc: Dan Williams 
Cc: Dave Jiang 
Cc: Dave Hansen 
Cc: Huang Ying 
Suggested-by: David Hildenbrand 
Acked-by: David Hildenbrand 
Signed-off-by: Vishal Verma 
---
 include/linux/memory_hotplug.h |  6 ++
 mm/memory_hotplug.c| 17 ++---
 2 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 7d2076583494..ebc9d528f00c 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -121,6 +121,7 @@ struct mhp_params {
 
 bool mhp_range_allowed(u64 start, u64 size, bool need_mapping);
 struct range mhp_get_pluggable_range(bool need_mapping);
+bool mhp_supports_memmap_on_memory(void);
 
 /*
  * Zone resizing functions
@@ -262,6 +263,11 @@ static inline bool movable_node_is_enabled(void)
return false;
 }
 
+static bool mhp_supports_memmap_on_memory(void)
+{
+   return false;
+}
+
 static inline void pgdat_kswapd_lock(pg_data_t *pgdat) {}
 static inline void pgdat_kswapd_unlock(pg_data_t *pgdat) {}
 static inline void pgdat_kswapd_lock_init(pg_data_t *pgdat) {}
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 21890994c1d3..065fb4804f1b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1328,7 +1328,7 @@ static inline bool 
arch_supports_memmap_on_memory(unsigned long vmemmap_size)
 }
 #endif
 
-static bool mhp_supports_memmap_on_memory(unsigned long size)
+bool mhp_supports_memmap_on_memory(void)
 {
unsigned long vmemmap_size = memory_block_memmap_size();
unsigned long memmap_pages = memory_block_memmap_on_memory_pages();
@@ -1337,17 +1337,11 @@ static bool mhp_supports_memmap_on_memory(unsigned long 
size)
 * Besides having arch support and the feature enabled at runtime, we
 * need a few more assumptions to hold true:
 *
-* a) We span a single memory block: memory onlining/offlinin;g happens
-*in memory block granularity. We don't want the vmemmap of online
-*memory blocks to reside on offline memory blocks. In the future,
-*we might want to support variable-sized memory blocks to make the
-*feature more versatile.
-*
-* b) The vmemmap pages span complete PMDs: We don't want vmemmap code
+* a) The vmemmap pages span complete PMDs: We don't want vmemmap code
 *to populate memory from the altmap for unrelated parts (i.e.,
 *other memory blocks)
 *
-* c) The vmemmap pages (and thereby the pages that will be exposed to
+* b) The vmemmap pages (and thereby the pages that will be exposed to
 *the buddy) have to cover full pageblocks: memory 
onlining/offlining
 *code requires applicable ranges to be page-aligned, for example, 
to
 *set the migratetypes properly.
@@ -1359,7 +1353,7 @@ static bool mhp_supports_memmap_on_memory(unsigned long 
size)
 *   altmap as an alternative source of memory, and we do not 
exactly
 *   populate a single PMD.
 */
-   if (!mhp_memmap_on_memory() || size != memory_block_size_bytes())
+   if (!mhp_memmap_on_memory())
return false;
 
/*
@@ -1382,6 +1376,7 @@ static bool mhp_supports_memmap_on_memory(unsigned long 
size)
 
return arch_supports_memmap_on_memory(vmemmap_size);
 }
+EXPORT_SYMBOL_GPL(mhp_supports_memmap_on_memory);
 
 static void __ref remove_memory_blocks_and_altmaps(u64 start, u64 size)
 {
@@ -1515,7 +1510,7 @@ int __ref add_memory_resource(int nid, struct resource 
*res, mhp_t mhp_flags)
 * Self hosted memmap array
 */
if ((mhp_flags & MHP_MEMMAP_ON_MEMORY) &&
-   mhp_supports_memmap_on_memory(memory_block_size_bytes())) {
+   mhp_supports_memmap_on_memory()) {
ret = create_altmaps_and_memory_blocks(nid, group, start, size);
if (ret)
goto error;

-- 
2.43.0




[PATCH v7 2/5] dax/bus.c: replace several sprintf() with sysfs_emit()

2024-01-24 Thread Vishal Verma
There were several places where drivers/dax/bus.c uses 'sprintf' to
print sysfs data. Since a sysfs_emit() helper is available specifically
for this purpose, replace all the sprintf() usage for sysfs with
sysfs_emit() in this file.

Cc: Dan Williams 
Reported-by: Greg Kroah-Hartman 
Signed-off-by: Vishal Verma 
---
 drivers/dax/bus.c | 32 
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index cb148f74ceda..0fd948a4443e 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -269,7 +269,7 @@ static ssize_t id_show(struct device *dev,
 {
struct dax_region *dax_region = dev_get_drvdata(dev);
 
-   return sprintf(buf, "%d\n", dax_region->id);
+   return sysfs_emit(buf, "%d\n", dax_region->id);
 }
 static DEVICE_ATTR_RO(id);
 
@@ -278,8 +278,8 @@ static ssize_t region_size_show(struct device *dev,
 {
struct dax_region *dax_region = dev_get_drvdata(dev);
 
-   return sprintf(buf, "%llu\n", (unsigned long long)
-   resource_size(&dax_region->res));
+   return sysfs_emit(buf, "%llu\n",
+ (unsigned long long)resource_size(&dax_region->res));
 }
 static struct device_attribute dev_attr_region_size = __ATTR(size, 0444,
region_size_show, NULL);
@@ -289,7 +289,7 @@ static ssize_t region_align_show(struct device *dev,
 {
struct dax_region *dax_region = dev_get_drvdata(dev);
 
-   return sprintf(buf, "%u\n", dax_region->align);
+   return sysfs_emit(buf, "%u\n", dax_region->align);
 }
 static struct device_attribute dev_attr_region_align =
__ATTR(align, 0400, region_align_show, NULL);
@@ -322,7 +322,7 @@ static ssize_t available_size_show(struct device *dev,
size = dax_region_avail_size(dax_region);
up_read(&dax_region_rwsem);
 
-   return sprintf(buf, "%llu\n", size);
+   return sysfs_emit(buf, "%llu\n", size);
 }
 static DEVICE_ATTR_RO(available_size);
 
@@ -340,7 +340,7 @@ static ssize_t seed_show(struct device *dev,
if (rc)
return rc;
seed = dax_region->seed;
-   rc = sprintf(buf, "%s\n", seed ? dev_name(seed) : "");
+   rc = sysfs_emit(buf, "%s\n", seed ? dev_name(seed) : "");
up_read(&dax_region_rwsem);
 
return rc;
@@ -361,7 +361,7 @@ static ssize_t create_show(struct device *dev,
if (rc)
return rc;
youngest = dax_region->youngest;
-   rc = sprintf(buf, "%s\n", youngest ? dev_name(youngest) : "");
+   rc = sysfs_emit(buf, "%s\n", youngest ? dev_name(youngest) : "");
up_read(&dax_region_rwsem);
 
return rc;
@@ -763,7 +763,7 @@ static ssize_t start_show(struct device *dev,
dax_range = get_dax_range(dev);
if (!dax_range)
return -ENXIO;
-   rc = sprintf(buf, "%#llx\n", dax_range->range.start);
+   rc = sysfs_emit(buf, "%#llx\n", dax_range->range.start);
put_dax_range();
 
return rc;
@@ -779,7 +779,7 @@ static ssize_t end_show(struct device *dev,
dax_range = get_dax_range(dev);
if (!dax_range)
return -ENXIO;
-   rc = sprintf(buf, "%#llx\n", dax_range->range.end);
+   rc = sysfs_emit(buf, "%#llx\n", dax_range->range.end);
put_dax_range();
 
return rc;
@@ -795,7 +795,7 @@ static ssize_t pgoff_show(struct device *dev,
dax_range = get_dax_range(dev);
if (!dax_range)
return -ENXIO;
-   rc = sprintf(buf, "%#lx\n", dax_range->pgoff);
+   rc = sysfs_emit(buf, "%#lx\n", dax_range->pgoff);
put_dax_range();
 
return rc;
@@ -969,7 +969,7 @@ static ssize_t size_show(struct device *dev,
size = dev_dax_size(dev_dax);
up_write(&dax_dev_rwsem);
 
-   return sprintf(buf, "%llu\n", size);
+   return sysfs_emit(buf, "%llu\n", size);
 }
 
 static bool alloc_is_aligned(struct dev_dax *dev_dax, resource_size_t size)
@@ -1233,7 +1233,7 @@ static ssize_t align_show(struct device *dev,
 {
struct dev_dax *dev_dax = to_dev_dax(dev);
 
-   return sprintf(buf, "%d\n", dev_dax->align);
+   return sysfs_emit(buf, "%d\n", dev_dax->align);
 }
 
 static ssize_t dev_dax_validate_align(struct dev_dax *dev_dax)
@@ -1311,7 +1311,7 @@ static ssize_t target_node_show(struct device *dev,
 {
struct dev_dax *dev_dax = to_dev_dax(dev);
 
-   return sprintf(buf, "%d\n", dev_dax_target_node(dev_dax));
+   return sysfs_emit(buf, "%d\n", dev_dax_target_node(dev_dax));
 }
 static DEVICE_ATTR_RO(target_node);
 
@@ -1327,7 +1327,7 @@ static ssize_t resource_show(struct device *dev,
else
start = dev_dax->ranges[0].range.start;
 
-   return sprintf(buf, "%#llx\n", start);
+   return sysfs_emit(buf, "%#llx\n", start);
 }
 static DEVICE_ATTR(resource, 0400, resource_show, NULL);
 
@@ -1338,14 +1338,14 @@ static ssize_t modalias_show(struct device *dev, struct 
device_attribute *at

[PATCH v7 5/5] dax: add a sysfs knob to control memmap_on_memory behavior

2024-01-24 Thread Vishal Verma
Add a sysfs knob for dax devices to control the memmap_on_memory setting
if the dax device were to be hotplugged as system memory.

The default memmap_on_memory setting for dax devices originating via
pmem or hmem is set to 'false' - i.e. no memmap_on_memory semantics, to
preserve legacy behavior. For dax devices via CXL, the default is on.
The sysfs control allows the administrator to override the above
defaults if needed.

Cc: David Hildenbrand 
Cc: Dan Williams 
Cc: Dave Jiang 
Cc: Dave Hansen 
Cc: Huang Ying 
Tested-by: Li Zhijian 
Reviewed-by: Jonathan Cameron 
Reviewed-by: David Hildenbrand 
Reviewed-by: Huang, Ying 
Signed-off-by: Vishal Verma 
---
 drivers/dax/bus.c   | 43 +
 Documentation/ABI/testing/sysfs-bus-dax | 17 +
 2 files changed, 60 insertions(+)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 0fd948a4443e..27c86d0ca711 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -1349,6 +1349,48 @@ static ssize_t numa_node_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(numa_node);
 
+static ssize_t memmap_on_memory_show(struct device *dev,
+struct device_attribute *attr, char *buf)
+{
+   struct dev_dax *dev_dax = to_dev_dax(dev);
+
+   return sysfs_emit(buf, "%d\n", dev_dax->memmap_on_memory);
+}
+
+static ssize_t memmap_on_memory_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t len)
+{
+   struct dev_dax *dev_dax = to_dev_dax(dev);
+   bool val;
+   int rc;
+
+   rc = kstrtobool(buf, &val);
+   if (rc)
+   return rc;
+
+   if (val == true && !mhp_supports_memmap_on_memory()) {
+   dev_dbg(dev, "memmap_on_memory is not available\n");
+   return -EOPNOTSUPP;
+   }
+
+   rc = down_write_killable(&dax_dev_rwsem);
+   if (rc)
+   return rc;
+
+   if (dev_dax->memmap_on_memory != val && dev->driver &&
+   to_dax_drv(dev->driver)->type == DAXDRV_KMEM_TYPE) {
+   up_write(&dax_dev_rwsem);
+   return -EBUSY;
+   }
+
+   dev_dax->memmap_on_memory = val;
+   up_write(&dax_dev_rwsem);
+
+   return len;
+}
+static DEVICE_ATTR_RW(memmap_on_memory);
+
 static umode_t dev_dax_visible(struct kobject *kobj, struct attribute *a, int 
n)
 {
struct device *dev = container_of(kobj, struct device, kobj);
@@ -1375,6 +1417,7 @@ static struct attribute *dev_dax_attributes[] = {
&dev_attr_align.attr,
&dev_attr_resource.attr,
&dev_attr_numa_node.attr,
+   &dev_attr_memmap_on_memory.attr,
NULL,
 };
 
diff --git a/Documentation/ABI/testing/sysfs-bus-dax 
b/Documentation/ABI/testing/sysfs-bus-dax
index 6359f7bc9bf4..b34266bfae49 100644
--- a/Documentation/ABI/testing/sysfs-bus-dax
+++ b/Documentation/ABI/testing/sysfs-bus-dax
@@ -134,3 +134,20 @@ KernelVersion: v5.1
 Contact:   nvd...@lists.linux.dev
 Description:
(RO) The id attribute indicates the region id of a dax region.
+
+What:  /sys/bus/dax/devices/daxX.Y/memmap_on_memory
+Date:  January, 2024
+KernelVersion: v6.8
+Contact:   nvd...@lists.linux.dev
+Description:
+   (RW) Control the memmap_on_memory setting if the dax device
+   were to be hotplugged as system memory. This determines whether
+   the 'altmap' for the hotplugged memory will be placed on the
+   device being hotplugged (memmap_on_memory=1) or if it will be
+   placed on regular memory (memmap_on_memory=0). This attribute
+   must be set before the device is handed over to the 'kmem'
+   driver (i.e.  hotplugged into system-ram). Additionally, this
+   depends on CONFIG_MHP_MEMMAP_ON_MEMORY, and a globally enabled
+   memmap_on_memory parameter for memory_hotplug. This is
+   typically set on the kernel command line -
+   memory_hotplug.memmap_on_memory set to 'true' or 'force'."

-- 
2.43.0




Re: [syzbot] [virtualization?] KMSAN: uninit-value in virtqueue_add (4)

2024-01-24 Thread Stefan Hajnoczi
On Wed, Jan 24, 2024 at 11:47:32AM +0100, Alexander Potapenko wrote:
> On Thu, Jan 4, 2024 at 9:45 PM Stefan Hajnoczi  wrote:
> >
> > On Tue, Jan 02, 2024 at 08:03:46AM -0500, Michael S. Tsirkin wrote:
> > > On Mon, Jan 01, 2024 at 05:38:24AM -0800, syzbot wrote:
> > > > Hello,
> > > >
> > > > syzbot found the following issue on:
> > > >
> > > > HEAD commit:fbafc3e621c3 Merge tag 'for_linus' of 
> > > > git://git.kernel.org..
> > > > git tree:   upstream
> > > > console+strace: https://syzkaller.appspot.com/x/log.txt?x=173df3e9e8
> > > > kernel config:  
> > > > https://syzkaller.appspot.com/x/.config?x=e0c7078a6b901aa3
> > > > dashboard link: 
> > > > https://syzkaller.appspot.com/bug?extid=d7521c1e3841ed075a42
> > > > compiler:   Debian clang version 15.0.6, GNU ld (GNU Binutils for 
> > > > Debian) 2.40
> > > > syz repro:  
> > > > https://syzkaller.appspot.com/x/repro.syz?x=1300b4a1e8
> > > > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=130b0379e8
> > > >
> > > > Downloadable assets:
> > > > disk image: 
> > > > https://storage.googleapis.com/syzbot-assets/1520f7b6daa4/disk-fbafc3e6.raw.xz
> > > > vmlinux: 
> > > > https://storage.googleapis.com/syzbot-assets/8b490af009d5/vmlinux-fbafc3e6.xz
> > > > kernel image: 
> > > > https://storage.googleapis.com/syzbot-assets/202ca200f4a4/bzImage-fbafc3e6.xz
> > > >
> > > > IMPORTANT: if you fix the issue, please add the following tag to the 
> > > > commit:
> > > > Reported-by: syzbot+d7521c1e3841ed075...@syzkaller.appspotmail.com
> > > >
> > > > =
> >
> > Hi Alexander,
> > Please take a look at this KMSAN failure. The uninitialized memory was
> > created for the purpose of writing a coredump. vring_map_one_sg() should
> > have direction=DMA_TO_DEVICE.
> >
> Hi Stefan,
> 
> I took a closer look, and am pretty confident this is a false positive.
> I tried adding memset(..., 0xab, PAGE_SIZE << order) to alloc_pages()
> and never saw
> the 0xab pattern in the buffers for which KMSAN reported an error.
> 
> This probably isn't an error in 88938359e2df ("virtio: kmsan:
> check/unpoison scatterlist in
> vring_map_one_sg()"), which by itself should be doing a sane thing:
> report an error if an
> uninitialized buffer is passed to it. It is more likely that we're
> missing some initialization that
> happens in coredump.c
> 
> Does anyone have an idea where coredump.c is supposed to be
> initializing these pages?
> Maybe there are some inline assembly functions involved in copying the data?

Thanks for your time looking into this!

Stefan


signature.asc
Description: PGP signature


Re: [PATCH 1/9] remoteproc: imx_dsp_rproc: Use devm_rproc_alloc() helper

2024-01-24 Thread Iuliana Prodan

On 1/23/2024 8:46 PM, Andrew Davis wrote:

Use the device lifecycle managed allocation function. This helps prevent
mistakes like freeing out of order in cleanup functions and forgetting to
free on error paths.

Signed-off-by: Andrew Davis 


Reviewed-by: Iuliana Prodan 

Thanks,
Iulia


---
  drivers/remoteproc/imx_dsp_rproc.c | 11 ---
  1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/remoteproc/imx_dsp_rproc.c 
b/drivers/remoteproc/imx_dsp_rproc.c
index a1c62d15f16c6..56a799cb8b363 100644
--- a/drivers/remoteproc/imx_dsp_rproc.c
+++ b/drivers/remoteproc/imx_dsp_rproc.c
@@ -1104,8 +1104,8 @@ static int imx_dsp_rproc_probe(struct platform_device 
*pdev)
return ret;
}
  
-	rproc = rproc_alloc(dev, "imx-dsp-rproc", &imx_dsp_rproc_ops, fw_name,

-   sizeof(*priv));
+   rproc = devm_rproc_alloc(dev, "imx-dsp-rproc", &imx_dsp_rproc_ops,
+fw_name, sizeof(*priv));
if (!rproc)
return -ENOMEM;
  
@@ -1125,14 +1125,14 @@ static int imx_dsp_rproc_probe(struct platform_device *pdev)

ret = imx_dsp_rproc_detect_mode(priv);
if (ret) {
dev_err(dev, "failed on imx_dsp_rproc_detect_mode\n");
-   goto err_put_rproc;
+   return ret;
}
  
  	/* There are multiple power domains required by DSP on some platform */

ret = imx_dsp_attach_pm_domains(priv);
if (ret) {
dev_err(dev, "failed on imx_dsp_attach_pm_domains\n");
-   goto err_put_rproc;
+   return ret;
}
/* Get clocks */
ret = imx_dsp_rproc_clk_get(priv);
@@ -1155,8 +1155,6 @@ static int imx_dsp_rproc_probe(struct platform_device 
*pdev)
  
  err_detach_domains:

imx_dsp_detach_pm_domains(priv);
-err_put_rproc:
-   rproc_free(rproc);
  
  	return ret;

  }
@@ -1169,7 +1167,6 @@ static void imx_dsp_rproc_remove(struct platform_device 
*pdev)
pm_runtime_disable(&pdev->dev);
rproc_del(rproc);
imx_dsp_detach_pm_domains(priv);
-   rproc_free(rproc);
  }
  
  /* pm runtime functions */




Re: [PATCH 2/9] remoteproc: imx_rproc: Use devm_rproc_alloc() helper

2024-01-24 Thread Iuliana Prodan

On 1/23/2024 8:46 PM, Andrew Davis wrote:

Use the device lifecycle managed allocation function. This helps prevent
mistakes like freeing out of order in cleanup functions and forgetting to
free on error paths.

Signed-off-by: Andrew Davis 


Reviewed-by: Iuliana Prodan 

Thanks,
Iulia


---
  drivers/remoteproc/imx_rproc.c | 16 +---
  1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/drivers/remoteproc/imx_rproc.c b/drivers/remoteproc/imx_rproc.c
index 8bb293b9f327c..55ecce3ab5f75 100644
--- a/drivers/remoteproc/imx_rproc.c
+++ b/drivers/remoteproc/imx_rproc.c
@@ -1104,16 +1104,14 @@ static int imx_rproc_probe(struct platform_device *pdev)
int ret;
  
  	/* set some other name then imx */

-   rproc = rproc_alloc(dev, "imx-rproc", &imx_rproc_ops,
-   NULL, sizeof(*priv));
+   rproc = devm_rproc_alloc(dev, "imx-rproc", &imx_rproc_ops,
+NULL, sizeof(*priv));
if (!rproc)
return -ENOMEM;
  
  	dcfg = of_device_get_match_data(dev);

-   if (!dcfg) {
-   ret = -EINVAL;
-   goto err_put_rproc;
-   }
+   if (!dcfg)
+   return -EINVAL;
  
  	priv = rproc->priv;

priv->rproc = rproc;
@@ -1124,8 +1122,7 @@ static int imx_rproc_probe(struct platform_device *pdev)
priv->workqueue = create_workqueue(dev_name(dev));
if (!priv->workqueue) {
dev_err(dev, "cannot create workqueue\n");
-   ret = -ENOMEM;
-   goto err_put_rproc;
+   return -ENOMEM;
}
  
  	ret = imx_rproc_xtr_mbox_init(rproc);

@@ -1167,8 +1164,6 @@ static int imx_rproc_probe(struct platform_device *pdev)
imx_rproc_free_mbox(rproc);
  err_put_wkq:
destroy_workqueue(priv->workqueue);
-err_put_rproc:
-   rproc_free(rproc);
  
  	return ret;

  }
@@ -1183,7 +1178,6 @@ static void imx_rproc_remove(struct platform_device *pdev)
imx_rproc_put_scu(rproc);
imx_rproc_free_mbox(rproc);
destroy_workqueue(priv->workqueue);
-   rproc_free(rproc);
  }
  
  static const struct of_device_id imx_rproc_of_match[] = {




Re: [PATCH v4 7/7] selftests/ftrace: add test cases for VFS type "%pd" and "%pD"

2024-01-24 Thread Google
On Wed, 24 Jan 2024 11:21:45 +0800
"yebin (H)"  wrote:

> 
> 
> On 2024/1/24 9:32, Masami Hiramatsu (Google) wrote:
> > On Tue, 23 Jan 2024 17:21:39 +0800
> > Ye Bin  wrote:
> >
> >> This patch adds test cases for new print format type "%pd/%pD".The test 
> >> cases
> >> test the following items:
> >> 1. Test README if add "%pd/%pD" type;
> >> 2. Test "%pd" type for dput();
> >> 3. Test "%pD" type for vfs_read();
> >>
> >> Signed-off-by: Ye Bin 
> >> ---
> >>   .../ftrace/test.d/kprobe/kprobe_args_vfs.tc   | 79 +++
> >>   1 file changed, 79 insertions(+)
> >>   create mode 100644 
> >> tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_vfs.tc
> >>
> >> diff --git 
> >> a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_vfs.tc 
> >> b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_vfs.tc
> >> new file mode 100644
> >> index ..1d8edd294dd6
> >> --- /dev/null
> >> +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_vfs.tc
> >> @@ -0,0 +1,79 @@
> >> +#!/bin/sh
> >> +# SPDX-License-Identifier: GPL-2.0
> >> +# description: Kprobe event VFS type argument
> >> +# requires: kprobe_events
> >> +
> >> +case `uname -m` in
> >> +x86_64)
> >> +  ARG1=%di
> >> +;;
> >> +i[3456]86)
> >> +  ARG1=%ax
> >> +;;
> >> +aarch64)
> >> +  ARG1=%x0
> >> +;;
> >> +arm*)
> >> +  ARG1=%r0
> >> +;;
> >> +ppc64*)
> >> +  ARG1=%r3
> >> +;;
> >> +ppc*)
> >> +  ARG1=%r3
> > You can merge this ppc* and ppc64* cases :)
> >
> >> +;;
> >> +s390*)
> >> +  ARG1=%r2
> >> +;;
> >> +mips*)
> >> +  ARG1=%r4
> >> +;;
> >> +loongarch*)
> >> +  ARG1=%r4
> >> +;;
> >> +riscv*)
> >> +  ARG1=%a0
> > Anyway, I wonder why don't you use '$arg1' instead of these registers.
> > Is there any reason?
> >
> > Thank you,
> I looked at the parameter parsing code again, and using "$arg1" requires 
> the kernel to
> enable the CONFIG_HAVE_FUNCTION_ARG_ACCESS_API configuration.

Yes, and it is recommended (required) for supporting kprobe event
via ftrace. So, if you see any error on this test, that machine should
implement it.

Thank you,

> >> +;;
> >> +*)
> >> +  echo "Please implement other architecture here"
> >> +  exit_untested
> >> +esac
> >> +
> >> +: "Test argument %pd/%pD in README"
> >> +grep -q "%pd/%pD" README
> >> +
> >> +: "Test argument %pd with name"
> >> +echo "p:testprobe dput name=${ARG1}:%pd" > kprobe_events
> >> +echo 1 > events/kprobes/testprobe/enable
> >> +grep -q "1" events/kprobes/testprobe/enable
> >> +echo 0 > events/kprobes/testprobe/enable
> >> +grep "dput" trace | grep -q "enable"
> >> +echo "" > kprobe_events
> >> +echo "" > trace
> >> +
> >> +: "Test argument %pd without name"
> >> +echo "p:testprobe dput ${ARG1}:%pd" > kprobe_events
> >> +echo 1 > events/kprobes/testprobe/enable
> >> +grep -q "1" events/kprobes/testprobe/enable
> >> +echo 0 > events/kprobes/testprobe/enable
> >> +grep "dput" trace | grep -q "enable"
> >> +echo "" > kprobe_events
> >> +echo "" > trace
> >> +
> >> +: "Test argument %pD with name"
> >> +echo "p:testprobe vfs_read name=${ARG1}:%pD" > kprobe_events
> >> +echo 1 > events/kprobes/testprobe/enable
> >> +grep -q "1" events/kprobes/testprobe/enable
> >> +echo 0 > events/kprobes/testprobe/enable
> >> +grep "vfs_read" trace | grep -q "enable"
> >> +echo "" > kprobe_events
> >> +echo "" > trace
> >> +
> >> +: "Test argument %pD without name"
> >> +echo "p:testprobe vfs_read ${ARG1}:%pD" > kprobe_events
> >> +echo 1 > events/kprobes/testprobe/enable
> >> +grep -q "1"  events/kprobes/testprobe/enable
> >> +echo 0 > events/kprobes/testprobe/enable
> >> +grep "vfs_read" trace | grep -q "enable"
> >> +echo "" > kprobe_events
> >> +echo "" > trace
> >> -- 
> >> 2.31.1
> >>
> >
> 


-- 
Masami Hiramatsu (Google) 



Re: [PATCH 1/4] tracing/user_events: Prepare find/delete for same name events

2024-01-24 Thread Google
On Tue, 23 Jan 2024 22:08:41 +
Beau Belgrave  wrote:

> The current code for finding and deleting events assumes that there will
> never be cases when user_events are registered with the same name, but
> different formats. In the future this scenario will exist to ensure
> user programs can be updated or modify their events and run different
> versions of their programs side-by-side without being blocked.

Ah, this is a very important point. Kernel always has only one instance
but user program doesn't. Thus it can define the same event name.
For the similar problem, uprobe event assumes that the user (here
admin) will define different group name to avoid it. But for the user
event, it is embedded, hmm.

> 
> This change does not yet allow for multi-format events. If user_events
> are registered with the same name but different arguments the programs
> see the same return values as before. This change simply makes it
> possible to easily accomodate for this in future changes.
> 
> Update find_user_event() to take in argument parameters and register
> flags to accomodate future multi-format event scenarios. Have find
> validate argument matching and return error pointers to cover address
> in use cases, or allocation errors. Update callers to handle error
> pointer logic.

Understand, that is similar to what probe events do.

> 
> Move delete_user_event() to use hash walking directly now that find has
> changed. Delete all events found that match the register name, stop
> if an error occurs and report back to the user.

What happen if we run 2 different version of the applications and terminate
one of them? The event which is used by others will be kept?

Thank you,

> 
> Update user_fields_match() to cover list_empty() scenarios instead of
> each callsite doing it now that find_user_event() uses it directly.
> 
> Signed-off-by: Beau Belgrave 
> ---
>  kernel/trace/trace_events_user.c | 106 +--
>  1 file changed, 58 insertions(+), 48 deletions(-)
> 
> diff --git a/kernel/trace/trace_events_user.c 
> b/kernel/trace/trace_events_user.c
> index 9365ce407426..0480579ba563 100644
> --- a/kernel/trace/trace_events_user.c
> +++ b/kernel/trace/trace_events_user.c
> @@ -202,6 +202,8 @@ static struct user_event_mm *user_event_mm_get(struct 
> user_event_mm *mm);
>  static struct user_event_mm *user_event_mm_get_all(struct user_event *user);
>  static void user_event_mm_put(struct user_event_mm *mm);
>  static int destroy_user_event(struct user_event *user);
> +static bool user_fields_match(struct user_event *user, int argc,
> +   const char **argv);
>  
>  static u32 user_event_key(char *name)
>  {
> @@ -1493,17 +1495,24 @@ static int destroy_user_event(struct user_event *user)
>  }
>  
>  static struct user_event *find_user_event(struct user_event_group *group,
> -   char *name, u32 *outkey)
> +   char *name, int argc, const char 
> **argv,
> +   u32 flags, u32 *outkey)
>  {
>   struct user_event *user;
>   u32 key = user_event_key(name);
>  
>   *outkey = key;
>  
> - hash_for_each_possible(group->register_table, user, node, key)
> - if (!strcmp(EVENT_NAME(user), name))
> + hash_for_each_possible(group->register_table, user, node, key) {
> + if (strcmp(EVENT_NAME(user), name))
> + continue;
> +
> + if (user_fields_match(user, argc, argv))
>   return user_event_get(user);
>  
> + return ERR_PTR(-EADDRINUSE);
> + }
> +
>   return NULL;
>  }
>  
> @@ -1860,6 +1869,9 @@ static bool user_fields_match(struct user_event *user, 
> int argc,
>   struct list_head *head = &user->fields;
>   int i = 0;
>  
> + if (argc == 0)
> + return list_empty(head);
> +
>   list_for_each_entry_reverse(field, head, link) {
>   if (!user_field_match(field, argc, argv, &i))
>   return false;
> @@ -1880,10 +1892,8 @@ static bool user_event_match(const char *system, const 
> char *event,
>   match = strcmp(EVENT_NAME(user), event) == 0 &&
>   (!system || strcmp(system, USER_EVENTS_SYSTEM) == 0);
>  
> - if (match && argc > 0)
> + if (match)
>   match = user_fields_match(user, argc, argv);
> - else if (match && argc == 0)
> - match = list_empty(&user->fields);
>  
>   return match;
>  }
> @@ -1922,11 +1932,11 @@ static int user_event_parse(struct user_event_group 
> *group, char *name,
>   char *args, char *flags,
>   struct user_event **newuser, int reg_flags)
>  {
> - int ret;
> - u32 key;
>   struct user_event *user;
> + char **argv = NULL;
>   int argc = 0;
> - char **argv;
> + int ret;
> + u32 key;
>  
>   /* Currently don't support any text based flags */
>   if (fl

Re: [PATCH v9] bus: mhi: host: Add tracing support

2024-01-24 Thread Krishna Chaitanya Chundru




On 1/8/2024 6:52 PM, Krishna Chaitanya Chundru wrote:

Hi Steven,

Even though I added your reviewed-by tag, I incorporated changes 
mentioned in the previous patch.


Can you please review it once.

Thanks & Regards,

Krishna Chaitanya.


Hi Steven,

Can you please review it once.

Thanks & Regards,
Krishna Chaitanya.


On 1/5/2024 5:53 PM, Krishna chaitanya chundru wrote:

This change adds ftrace support for following functions which
helps in debugging the issues when there is Channel state & MHI
state change and also when we receive data and control events:
1. mhi_intvec_mhi_states
2. mhi_process_data_event_ring
3. mhi_process_ctrl_ev_ring
4. mhi_gen_tre
5. mhi_update_channel_state
6. mhi_tryset_pm_state
7. mhi_pm_st_worker

Change the implementation of the arrays which has enum to strings mapping
to make it consistent in both trace header file and other files.

Where ever the trace events are added, debug messages are removed.

Signed-off-by: Krishna chaitanya chundru 
Reviewed-by: "Steven Rostedt (Google)" 
---
Changes in v9:
- Change the implementations of some array so that the strings to enum 
mapping

- is same in both trace header and other files as suggested by steve.
- Link to v8: 
https://lore.kernel.org/r/20231207-ftrace_support-v8-1-7f62d4558...@quicinc.com 



Changes in v8:
- Pass the structure and derefernce the variables in TP_fast_assign as 
suggested by steve
- Link to v7: 
https://lore.kernel.org/r/20231206-ftrace_support-v7-1-aca49a042...@quicinc.com 



Changes in v7:
- change log format as pointed by mani.
- Link to v6: 
https://lore.kernel.org/r/20231204-ftrace_support-v6-1-9b206546d...@quicinc.com 



Changes in v6:
- use 'rp' directly as suggested by jeffrey.
- Link to v5: 
https://lore.kernel.org/r/20231127-ftrace_support-v5-1-eb67daead...@quicinc.com 



Changes in v5:
- Use DECLARE_EVENT_CLASS for multiple events as suggested by steve.
- Instead of converting to u64 to print address, use %px to print the 
address to avoid

- warnings in some platforms.
- Link to v4: 
https://lore.kernel.org/r/2023-ftrace_support-v4-1-c83602399...@quicinc.com 



Changes in v4:
- Fix compilation issues in previous patch which happended due to 
rebasing.
- In the defconfig FTRACE config is not enabled due to that the 
compilation issue is not

- seen in my workspace.
- Link to v3: 
https://lore.kernel.org/r/2023-ftrace_support-v3-1-f358d2911...@quicinc.com 



Changes in v3:
- move trace header file from include/trace/events to 
drivers/bus/mhi/host/ so that

- we can include driver header files.
- Use macros directly in the trace events as suggested Jeffrey Hugo.
- Reorder the structure in the events as suggested by steve to avoid 
holes in the buffer.

- removed the mhi_to_physical function as this can give security issues.
- removed macros to define strings as we can get those from driver 
headers.
- Link to v2: 
https://lore.kernel.org/r/20231013-ftrace_support-v2-1-6e893ce01...@quicinc.com 



Changes in v2:
- Passing the raw state into the trace event and using  
__print_symbolic() as suggested by bjorn.

- Change mhi_pm_st_worker to mhi_pm_st_transition as suggested by bjorn.
- Fixed the kernel test rebot issues.
- Link to v1: 
https://lore.kernel.org/r/20231005-ftrace_support-v1-1-23a2f394f...@quicinc.com 


---
  drivers/bus/mhi/common.h    |  38 +++---
  drivers/bus/mhi/host/init.c |  63 +
  drivers/bus/mhi/host/internal.h |  40 ++
  drivers/bus/mhi/host/main.c |  19 ++-
  drivers/bus/mhi/host/pm.c   |   7 +-
  drivers/bus/mhi/host/trace.h    | 275 


  6 files changed, 378 insertions(+), 64 deletions(-)

diff --git a/drivers/bus/mhi/common.h b/drivers/bus/mhi/common.h
index f794b9c8049e..dda340aaed95 100644
--- a/drivers/bus/mhi/common.h
+++ b/drivers/bus/mhi/common.h
@@ -297,30 +297,30 @@ struct mhi_ring_element {
  __le32 dword[2];
  };
+#define MHI_STATE_LIST    \
+    mhi_state(RESET,    "RESET")    \
+    mhi_state(READY,    "READY")    \
+    mhi_state(M0,    "M0")    \
+    mhi_state(M1,    "M1")    \
+    mhi_state(M2,    "M2")    \
+    mhi_state(M3,    "M3")    \
+    mhi_state(M3_FAST,    "M3_FAST")    \
+    mhi_state(BHI,    "BHI")    \
+    mhi_state_end(SYS_ERR,    "SYS ERROR")
+
+#undef mhi_state
+#undef mhi_state_end
+
+#define mhi_state(a, b)    case MHI_STATE_##a: return b;
+#define mhi_state_end(a, b)    case MHI_STATE_##a: return b;
+
  static inline const char *mhi_state_str(enum mhi_state state)
  {
  switch (state) {
-    case MHI_STATE_RESET:
-    return "RESET";
-    case MHI_STATE_READY:
-    return "READY";
-    case MHI_STATE_M0:
-    return "M0";
-    case MHI_STATE_M1:
-    return "M1";
-    case MHI_STATE_M2:
-    return "M2";
-    case MHI_STATE_M3:
-    return "M3";
-    case MHI_STATE_M3_FAST:
-    return "M3 FAST";
-    case MHI_STATE_BHI:
-    return "BHI";
-    case MHI_STA

Re: [PATCH net-next v2] vsock/test: add '--peer-port' input argument

2024-01-24 Thread patchwork-bot+netdevbpf
Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski :

On Tue, 23 Jan 2024 10:27:50 +0300 you wrote:
> Implement port for given CID as input argument instead of using
> hardcoded value '1234'. This allows to run different test instances
> on a single CID. Port argument is not required parameter and if it is
> not set, then default value will be '1234' - thus we preserve previous
> behaviour.
> 
> Signed-off-by: Arseniy Krasnov 
> 
> [...]

Here is the summary with links:
  - [net-next,v2] vsock/test: add '--peer-port' input argument
https://git.kernel.org/netdev/net-next/c/e18c709230cb

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html





Re: [PATCH net-next 2/2] tun: AF_XDP Rx zero-copy support

2024-01-24 Thread Jason Wang
On Thu, Jan 25, 2024 at 3:05 AM Willem de Bruijn
 wrote:
>
> Yunjian Wang wrote:
> > Now the zero-copy feature of AF_XDP socket is supported by some
> > drivers, which can reduce CPU utilization on the xdp program.
> > This patch set allows tun to support AF_XDP Rx zero-copy feature.
> >
> > This patch tries to address this by:
> > - Use peek_len to consume a xsk->desc and get xsk->desc length.
> > - When the tun support AF_XDP Rx zero-copy, the vq's array maybe empty.
> > So add a check for empty vq's array in vhost_net_buf_produce().
> > - add XDP_SETUP_XSK_POOL and ndo_xsk_wakeup callback support
> > - add tun_put_user_desc function to copy the Rx data to VM
> >
> > Signed-off-by: Yunjian Wang 
>
> I don't fully understand the higher level design of this feature yet.
>
> But some initial comments at the code level.
>
> > ---
> >  drivers/net/tun.c   | 165 +++-
> >  drivers/vhost/net.c |  18 +++--
> >  2 files changed, 176 insertions(+), 7 deletions(-)
> >

[...]

> >  struct tun_page {
> > @@ -208,6 +21
> >
> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > index f2ed7167c848..a1f143ad2341 100644
> > --- a/drivers/vhost/net.c
> > +++ b/drivers/vhost/net.c
>
> For virtio maintainer: is it okay to have tun and vhost/net changes in
> the same patch, or is it better to split them?

It's better to split, but as you comment below, if it must be done in
one patch we need to explain why.

>
> > @@ -169,9 +169,10 @@ static int vhost_net_buf_is_empty(struct vhost_net_buf 
> > *rxq)
> >
> >  static void *vhost_net_buf_consume(struct vhost_net_buf *rxq)
> >  {
> > - void *ret = vhost_net_buf_get_ptr(rxq);
> > - ++rxq->head;
> > - return ret;
> > + if (rxq->tail == rxq->head)
> > + return NULL;
> > +
> > + return rxq->queue[rxq->head++];
>
> Why this change?

Thanks




Re: [PATCH v2 2/3] virtio_net: Add missing virtio header in skb for XDP_PASS

2024-01-24 Thread Jason Wang
On Wed, Jan 24, 2024 at 5:16 PM Xuan Zhuo  wrote:
>
> On Wed, 24 Jan 2024 16:57:20 +0800, Liang Chen  
> wrote:
> > For the XDP_PASS scenario of the XDP path, the skb constructed with
> > xdp_buff does not include the virtio header. Adding the virtio header
> > information back when creating the skb.
> >
> > Signed-off-by: Liang Chen 
> > ---
> >  drivers/net/virtio_net.c | 6 ++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index b56828804e5f..2de46eb4c661 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -1270,6 +1270,9 @@ static struct sk_buff *receive_small_xdp(struct 
> > net_device *dev,
> >   if (unlikely(!skb))
> >   goto err;
> >
> > + /* Store the original virtio header for subsequent use by the driver. 
> > */
> > + memcpy(skb_vnet_common_hdr(skb), &virtnet_xdp.hdr, vi->hdr_len);
>
> About this, a spec is waiting for voting.
>

A pointer?

> This may change the logic of the csum offset and so on.

 Btw, doesn't it need a new feature bit?

Thanks

>
> Please not do this.
>
> Thanks.
>
>
> > +
> >   if (metasize)
> >   skb_metadata_set(skb, metasize);
> >
> > @@ -1635,6 +1638,9 @@ static struct sk_buff *receive_mergeable_xdp(struct 
> > net_device *dev,
> >   head_skb = build_skb_from_xdp_buff(dev, vi, xdp, 
> > xdp_frags_truesz);
> >   if (unlikely(!head_skb))
> >   break;
> > + /* Store the original virtio header for subsequent use by the 
> > driver. */
> > + memcpy(skb_vnet_common_hdr(head_skb), &virtnet_xdp.hdr, 
> > vi->hdr_len);
> > +
> >   return head_skb;
> >
> >   case XDP_TX:
> > --
> > 2.40.1
> >
>




Re: [PATCH net-next 2/2] tun: AF_XDP Rx zero-copy support

2024-01-24 Thread Jason Wang
On Wed, Jan 24, 2024 at 5:38 PM Yunjian Wang  wrote:
>
> Now the zero-copy feature of AF_XDP socket is supported by some
> drivers, which can reduce CPU utilization on the xdp program.
> This patch set allows tun to support AF_XDP Rx zero-copy feature.
>
> This patch tries to address this by:
> - Use peek_len to consume a xsk->desc and get xsk->desc length.
> - When the tun support AF_XDP Rx zero-copy, the vq's array maybe empty.
> So add a check for empty vq's array in vhost_net_buf_produce().
> - add XDP_SETUP_XSK_POOL and ndo_xsk_wakeup callback support
> - add tun_put_user_desc function to copy the Rx data to VM

Code explains themselves, let's explain why you need to do this.

1) why you want to use peek_len
2) for "vq's array", what does it mean?
3) from the view of TUN/TAP tun_put_user_desc() is the TX path, so I
guess you meant TX zerocopy instead of RX (as I don't see codes for
RX?)

A big question is how could you handle GSO packets from userspace/guests?

>
> Signed-off-by: Yunjian Wang 
> ---
>  drivers/net/tun.c   | 165 +++-
>  drivers/vhost/net.c |  18 +++--
>  2 files changed, 176 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index afa5497f7c35..248b0f8e07d1 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -77,6 +77,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include 
>  #include 
> @@ -145,6 +146,10 @@ struct tun_file {
> struct tun_struct *detached;
> struct ptr_ring tx_ring;
> struct xdp_rxq_info xdp_rxq;
> +   struct xdp_desc desc;
> +   /* protects xsk pool */
> +   spinlock_t pool_lock;
> +   struct xsk_buff_pool *pool;
>  };
>
>  struct tun_page {
> @@ -208,6 +213,8 @@ struct tun_struct {
> struct bpf_prog __rcu *xdp_prog;
> struct tun_prog __rcu *steering_prog;
> struct tun_prog __rcu *filter_prog;
> +   /* tracks AF_XDP ZC enabled queues */
> +   unsigned long *af_xdp_zc_qps;
> struct ethtool_link_ksettings link_ksettings;
> /* init args */
> struct file *file;
> @@ -795,6 +802,8 @@ static int tun_attach(struct tun_struct *tun, struct file 
> *file,
>
> tfile->queue_index = tun->numqueues;
> tfile->socket.sk->sk_shutdown &= ~RCV_SHUTDOWN;
> +   tfile->desc.len = 0;
> +   tfile->pool = NULL;
>
> if (tfile->detached) {
> /* Re-attach detached tfile, updating XDP queue_index */
> @@ -989,6 +998,13 @@ static int tun_net_init(struct net_device *dev)
> return err;
> }
>
> +   tun->af_xdp_zc_qps = bitmap_zalloc(MAX_TAP_QUEUES, GFP_KERNEL);
> +   if (!tun->af_xdp_zc_qps) {
> +   security_tun_dev_free_security(tun->security);
> +   free_percpu(dev->tstats);
> +   return -ENOMEM;
> +   }
> +
> tun_flow_init(tun);
>
> dev->hw_features = NETIF_F_SG | NETIF_F_FRAGLIST |
> @@ -1009,6 +1025,7 @@ static int tun_net_init(struct net_device *dev)
> tun_flow_uninit(tun);
> security_tun_dev_free_security(tun->security);
> free_percpu(dev->tstats);
> +   bitmap_free(tun->af_xdp_zc_qps);
> return err;
> }
> return 0;
> @@ -1222,11 +1239,77 @@ static int tun_xdp_set(struct net_device *dev, struct 
> bpf_prog *prog,
> return 0;
>  }
>
> +static int tun_xsk_pool_enable(struct net_device *netdev,
> +  struct xsk_buff_pool *pool,
> +  u16 qid)
> +{
> +   struct tun_struct *tun = netdev_priv(netdev);
> +   struct tun_file *tfile;
> +   unsigned long flags;
> +
> +   rcu_read_lock();
> +   tfile = rtnl_dereference(tun->tfiles[qid]);
> +   if (!tfile) {
> +   rcu_read_unlock();
> +   return -ENODEV;
> +   }
> +
> +   spin_lock_irqsave(&tfile->pool_lock, flags);
> +   xsk_pool_set_rxq_info(pool, &tfile->xdp_rxq);
> +   tfile->pool = pool;
> +   spin_unlock_irqrestore(&tfile->pool_lock, flags);
> +
> +   rcu_read_unlock();
> +   set_bit(qid, tun->af_xdp_zc_qps);
> +
> +   return 0;
> +}
> +
> +static int tun_xsk_pool_disable(struct net_device *netdev, u16 qid)
> +{
> +   struct tun_struct *tun = netdev_priv(netdev);
> +   struct tun_file *tfile;
> +   unsigned long flags;
> +
> +   if (!test_bit(qid, tun->af_xdp_zc_qps))
> +   return 0;
> +
> +   clear_bit(qid, tun->af_xdp_zc_qps);
> +
> +   rcu_read_lock();
> +   tfile = rtnl_dereference(tun->tfiles[qid]);
> +   if (!tfile) {
> +   rcu_read_unlock();
> +   return 0;
> +   }
> +
> +   spin_lock_irqsave(&tfile->pool_lock, flags);
> +   if (tfile->desc.len) {
> +   xsk_tx_completed(tfile->pool, 1);
> +   tfile->desc.len = 0;
> +   }
> +   tfile->pool = NULL;
> +   spin_unlock_irqre

Re: [PATCH v2 2/3] virtio_net: Add missing virtio header in skb for XDP_PASS

2024-01-24 Thread Xuan Zhuo
On Thu, 25 Jan 2024 11:48:18 +0800, Jason Wang  wrote:
> On Wed, Jan 24, 2024 at 5:16 PM Xuan Zhuo  wrote:
> >
> > On Wed, 24 Jan 2024 16:57:20 +0800, Liang Chen  
> > wrote:
> > > For the XDP_PASS scenario of the XDP path, the skb constructed with
> > > xdp_buff does not include the virtio header. Adding the virtio header
> > > information back when creating the skb.
> > >
> > > Signed-off-by: Liang Chen 
> > > ---
> > >  drivers/net/virtio_net.c | 6 ++
> > >  1 file changed, 6 insertions(+)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index b56828804e5f..2de46eb4c661 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -1270,6 +1270,9 @@ static struct sk_buff *receive_small_xdp(struct 
> > > net_device *dev,
> > >   if (unlikely(!skb))
> > >   goto err;
> > >
> > > + /* Store the original virtio header for subsequent use by the 
> > > driver. */
> > > + memcpy(skb_vnet_common_hdr(skb), &virtnet_xdp.hdr, vi->hdr_len);
> >
> > About this, a spec is waiting for voting.
> >
>
> A pointer?

Sorry.

http://lore.kernel.org/all/1704263702-50528-1-git-send-email-hen...@linux.alibaba.com


>
> > This may change the logic of the csum offset and so on.
>
>  Btw, doesn't it need a new feature bit?

Yes.

But this patch set should not include this.
We may do the similar thing, but the commit log will
include more info about the spec.

Thanks.


>
> Thanks
>
> >
> > Please not do this.
> >
> > Thanks.
> >
> >
> > > +
> > >   if (metasize)
> > >   skb_metadata_set(skb, metasize);
> > >
> > > @@ -1635,6 +1638,9 @@ static struct sk_buff *receive_mergeable_xdp(struct 
> > > net_device *dev,
> > >   head_skb = build_skb_from_xdp_buff(dev, vi, xdp, 
> > > xdp_frags_truesz);
> > >   if (unlikely(!head_skb))
> > >   break;
> > > + /* Store the original virtio header for subsequent use by 
> > > the driver. */
> > > + memcpy(skb_vnet_common_hdr(head_skb), &virtnet_xdp.hdr, 
> > > vi->hdr_len);
> > > +
> > >   return head_skb;
> > >
> > >   case XDP_TX:
> > > --
> > > 2.40.1
> > >
> >
>



Re: [PATCH v2 2/3] virtio_net: Add missing virtio header in skb for XDP_PASS

2024-01-24 Thread Liang Chen
On Wed, Jan 24, 2024 at 5:16 PM Xuan Zhuo  wrote:
>
> On Wed, 24 Jan 2024 16:57:20 +0800, Liang Chen  
> wrote:
> > For the XDP_PASS scenario of the XDP path, the skb constructed with
> > xdp_buff does not include the virtio header. Adding the virtio header
> > information back when creating the skb.
> >
> > Signed-off-by: Liang Chen 
> > ---
> >  drivers/net/virtio_net.c | 6 ++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index b56828804e5f..2de46eb4c661 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -1270,6 +1270,9 @@ static struct sk_buff *receive_small_xdp(struct 
> > net_device *dev,
> >   if (unlikely(!skb))
> >   goto err;
> >
> > + /* Store the original virtio header for subsequent use by the driver. 
> > */
> > + memcpy(skb_vnet_common_hdr(skb), &virtnet_xdp.hdr, vi->hdr_len);
>
> About this, a spec is waiting for voting.
>
> This may change the logic of the csum offset and so on.
>
> Please not do this.
>

Sure. This will be dropped in v3. Thanks.

> Thanks.
>
>
> > +
> >   if (metasize)
> >   skb_metadata_set(skb, metasize);
> >
> > @@ -1635,6 +1638,9 @@ static struct sk_buff *receive_mergeable_xdp(struct 
> > net_device *dev,
> >   head_skb = build_skb_from_xdp_buff(dev, vi, xdp, 
> > xdp_frags_truesz);
> >   if (unlikely(!head_skb))
> >   break;
> > + /* Store the original virtio header for subsequent use by the 
> > driver. */
> > + memcpy(skb_vnet_common_hdr(head_skb), &virtnet_xdp.hdr, 
> > vi->hdr_len);
> > +
> >   return head_skb;
> >
> >   case XDP_TX:
> > --
> > 2.40.1
> >



Re: [PATCH v2 2/3] virtio_net: Add missing virtio header in skb for XDP_PASS

2024-01-24 Thread Liang Chen
On Wed, Jan 24, 2024 at 7:04 PM Heng Qi  wrote:
>
>
>
> 在 2024/1/24 下午4:57, Liang Chen 写道:
> > For the XDP_PASS scenario of the XDP path, the skb constructed with
> > xdp_buff does not include the virtio header. Adding the virtio header
> > information back when creating the skb.
> >
> > Signed-off-by: Liang Chen 
> > ---
> >   drivers/net/virtio_net.c | 6 ++
> >   1 file changed, 6 insertions(+)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index b56828804e5f..2de46eb4c661 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -1270,6 +1270,9 @@ static struct sk_buff *receive_small_xdp(struct 
> > net_device *dev,
> >   if (unlikely(!skb))
> >   goto err;
> >
> > + /* Store the original virtio header for subsequent use by the driver. 
> > */
> > + memcpy(skb_vnet_common_hdr(skb), &virtnet_xdp.hdr, vi->hdr_len);
>
> If xdp push or xdp pull modifies xdp_buff, will the original header
> still apply to the modified data?
>

No, it would be an issue then. Anyway, this patch will be dropped in v3. Thanks.

> Thanks,
> Heng
>
> > +
> >   if (metasize)
> >   skb_metadata_set(skb, metasize);
> >
> > @@ -1635,6 +1638,9 @@ static struct sk_buff *receive_mergeable_xdp(struct 
> > net_device *dev,
> >   head_skb = build_skb_from_xdp_buff(dev, vi, xdp, 
> > xdp_frags_truesz);
> >   if (unlikely(!head_skb))
> >   break;
> > + /* Store the original virtio header for subsequent use by the 
> > driver. */
> > + memcpy(skb_vnet_common_hdr(head_skb), &virtnet_xdp.hdr, 
> > vi->hdr_len);
> > +
> >   return head_skb;
> >
> >   case XDP_TX:
>



[PATCH v5 0/8] support '%pd' and '%pD' for print file name

2024-01-24 Thread Ye Bin
During fault locating, the file name needs to be printed based on the
dentry/file address. The offset needs to be calculated each time, which
is troublesome. Similar to printk, kprobe supports printing file names
for dentry/file addresses.

Diff v5 vs v4:
1. Use glob_match() instead of str_has_suffix(), so remove the first patch;
2. Allocate buffer from heap for expand dentry;
3. Support "%pd/%pD" print type for fprobe;
4. Use $arg1 instead of origin register in test case;
5. Add test case for fprobe;

Diff v4 vs v3:
1. Use "argv[i][idx + 3] == 'd'" instead of "argv[i][strlen(argv[i]) - 1] == 
'd'"
to judge print format in PATCH[4/7];

Diff v3 vs v2:
1. Return the index of where the suffix was found in str_has_suffix();

Diff v2 vs v1:
1. Use "%pd/%pD" print format instead of "pd/pD" print format;
2. Add "%pd/%pD" in README;
3. Expand "%pd/%pD" argument before parameter parsing;
4. Add more detail information in ftrace documentation;
5. Add test cases for new print format in selftests/ftrace;

Ye Bin (8):
  tracing/probes: add traceprobe_expand_dentry_args() helper
  tracing/probes: support '%pd' type for print struct dentry's name
  tracing/probes: support '%pD' type for print struct file's name
  tracing/probes: support '%pd/%pD' type for fprobe
  tracing: add new type "%pd/%pD" in readme_msg[]
  Documentation: tracing: add new type '%pd' and '%pD' for kprobe
  selftests/ftrace: add kprobe test cases for VFS type "%pd" and "%pD"
  selftests/ftrace: add fprobe test cases for VFS type "%pd" and "%pD"

 Documentation/trace/kprobetrace.rst   |  8 ++-
 kernel/trace/trace.c  |  2 +-
 kernel/trace/trace_fprobe.c   |  6 ++
 kernel/trace/trace_kprobe.c   |  6 ++
 kernel/trace/trace_probe.c| 63 +++
 kernel/trace/trace_probe.h|  2 +
 .../ftrace/test.d/dynevent/fprobe_args_vfs.tc | 40 
 .../ftrace/test.d/kprobe/kprobe_args_vfs.tc   | 43 +
 8 files changed, 167 insertions(+), 3 deletions(-)
 create mode 100644 
tools/testing/selftests/ftrace/test.d/dynevent/fprobe_args_vfs.tc
 create mode 100644 
tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_vfs.tc

-- 
2.31.1




[PATCH v5 1/8] tracing/probes: add traceprobe_expand_dentry_args() helper

2024-01-24 Thread Ye Bin
Add traceprobe_expand_dentry_args() to expand dentry args. this API is
prepare to support "%pd" print format for kprobe.

Signed-off-by: Ye Bin 
---
 kernel/trace/trace_probe.c | 50 ++
 kernel/trace/trace_probe.h |  2 ++
 2 files changed, 52 insertions(+)

diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 4dc74d73fc1d..0c1caf0f474a 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -1565,6 +1565,56 @@ const char **traceprobe_expand_meta_args(int argc, const 
char *argv[],
return ERR_PTR(ret);
 }
 
+/* @buf: *buf must be equal to NULL. Caller must to free *buf */
+int traceprobe_expand_dentry_args(int argc, const char *argv[], char **buf)
+{
+   int i, used, ret;
+   int bufsize = MAX_DENTRY_ARGS_LEN;
+   char *tmpbuf = NULL;
+
+   if (*buf)
+   return -EINVAL;
+
+   used = 0;
+   for (i = 0; i < argc; i++) {
+   if (glob_match("*:%pd", argv[i])) {
+   char *tmp;
+   char *equal;
+
+   if (!tmpbuf) {
+   tmpbuf = kmalloc(bufsize, GFP_KERNEL);
+   if (!tmpbuf)
+   return -ENOMEM;
+   }
+
+   tmp = kstrdup(argv[i], GFP_KERNEL);
+   if (!tmp)
+   goto nomem;
+
+   equal = strchr(tmp, '=');
+   if (equal)
+   *equal = '\0';
+   tmp[strlen(argv[i]) - 4] = '\0';
+   ret = snprintf(tmpbuf + used, bufsize - used,
+  "%s%s+0x0(+0x%zx(%s)):string",
+  equal ? tmp : "", equal ? "=" : "",
+  offsetof(struct dentry, d_name.name),
+  equal ? equal + 1 : tmp);
+   kfree(tmp);
+   if (ret >= bufsize - used)
+   goto nomem;
+   argv[i] = tmpbuf + used;
+   used += ret + 1;
+   }
+   }
+
+   *buf = tmpbuf;
+   return 0;
+nomem:
+   kfree(tmpbuf);
+   return -ENOMEM;
+}
+
 void traceprobe_finish_parse(struct traceprobe_parse_context *ctx)
 {
clear_btf_context(ctx);
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 850d9ecb6765..5800513439f3 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -34,6 +34,7 @@
 #define MAX_ARRAY_LEN  64
 #define MAX_ARG_NAME_LEN   32
 #define MAX_BTF_ARGS_LEN   128
+#define MAX_DENTRY_ARGS_LEN256
 #define MAX_STRING_SIZEPATH_MAX
 #define MAX_ARG_BUF_LEN(MAX_TRACE_ARGS * MAX_ARG_NAME_LEN)
 
@@ -402,6 +403,7 @@ extern int traceprobe_parse_probe_arg(struct trace_probe 
*tp, int i,
 const char **traceprobe_expand_meta_args(int argc, const char *argv[],
 int *new_argc, char *buf, int bufsize,
 struct traceprobe_parse_context *ctx);
+extern int traceprobe_expand_dentry_args(int argc, const char *argv[], char 
**buf);
 
 extern int traceprobe_update_arg(struct probe_arg *arg);
 extern void traceprobe_free_probe_arg(struct probe_arg *arg);
-- 
2.31.1




[PATCH v5 3/8] tracing/probes: support '%pD' type for print struct file's name

2024-01-24 Thread Ye Bin
Similar to '%pD' for printk, use '%pD' for print struct file's name.

Signed-off-by: Ye Bin 
---
 kernel/trace/trace_probe.c | 57 +++---
 1 file changed, 35 insertions(+), 22 deletions(-)

diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 0c1caf0f474a..5ac7b0193dd8 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -12,6 +12,7 @@
 #define pr_fmt(fmt)"trace_probe: " fmt
 
 #include 
+#include 
 #include "trace_btf.h"
 
 #include "trace_probe.h"
@@ -1577,35 +1578,47 @@ int traceprobe_expand_dentry_args(int argc, const char 
*argv[], char **buf)
 
used = 0;
for (i = 0; i < argc; i++) {
-   if (glob_match("*:%pd", argv[i])) {
-   char *tmp;
-   char *equal;
-
-   if (!tmpbuf) {
-   tmpbuf = kmalloc(bufsize, GFP_KERNEL);
-   if (!tmpbuf)
-   return -ENOMEM;
-   }
+   char *tmp;
+   char *equal;
+   size_t arg_len;
 
-   tmp = kstrdup(argv[i], GFP_KERNEL);
-   if (!tmp)
-   goto nomem;
+   if (!glob_match("*:%p[dD]", argv[i]))
+   continue;
 
-   equal = strchr(tmp, '=');
-   if (equal)
-   *equal = '\0';
-   tmp[strlen(argv[i]) - 4] = '\0';
+   if (!tmpbuf) {
+   tmpbuf = kmalloc(bufsize, GFP_KERNEL);
+   if (!tmpbuf)
+   return -ENOMEM;
+   }
+
+   tmp = kstrdup(argv[i], GFP_KERNEL);
+   if (!tmp)
+   goto nomem;
+
+   equal = strchr(tmp, '=');
+   if (equal)
+   *equal = '\0';
+   arg_len = strlen(argv[i]);
+   tmp[arg_len - 4] = '\0';
+   if (argv[i][arg_len - 1] == 'd')
ret = snprintf(tmpbuf + used, bufsize - used,
   "%s%s+0x0(+0x%zx(%s)):string",
   equal ? tmp : "", equal ? "=" : "",
   offsetof(struct dentry, d_name.name),
   equal ? equal + 1 : tmp);
-   kfree(tmp);
-   if (ret >= bufsize - used)
-   goto nomem;
-   argv[i] = tmpbuf + used;
-   used += ret + 1;
-   }
+   else
+   ret = snprintf(tmpbuf + used, bufsize - used,
+  "%s%s+0x0(+0x%zx(+0x%zx(%s))):string",
+  equal ? tmp : "", equal ? "=" : "",
+  offsetof(struct dentry, d_name.name),
+  offsetof(struct file, f_path.dentry),
+  equal ? equal + 1 : tmp);
+
+   kfree(tmp);
+   if (ret >= bufsize - used)
+   goto nomem;
+   argv[i] = tmpbuf + used;
+   used += ret + 1;
}
 
*buf = tmpbuf;
-- 
2.31.1




[PATCH v5 2/8] tracing/probes: support '%pd' type for print struct dentry's name

2024-01-24 Thread Ye Bin
Similar to '%pd' for printk, use '%pd' for print struct dentry's name.

Signed-off-by: Ye Bin 
---
 kernel/trace/trace_kprobe.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index c4c6e0e0068b..7cbb43740b4f 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -779,6 +779,7 @@ static int __trace_kprobe_create(int argc, const char 
*argv[])
char buf[MAX_EVENT_NAME_LEN];
char gbuf[MAX_EVENT_NAME_LEN];
char abuf[MAX_BTF_ARGS_LEN];
+   char *dbuf = NULL;
struct traceprobe_parse_context ctx = { .flags = TPARG_FL_KERNEL };
 
switch (argv[0][0]) {
@@ -930,6 +931,10 @@ static int __trace_kprobe_create(int argc, const char 
*argv[])
argv = new_argv;
}
 
+   ret = traceprobe_expand_dentry_args(argc, argv, &dbuf);
+   if (ret)
+   goto out;
+
/* setup a probe */
tk = alloc_trace_kprobe(group, event, addr, symbol, offset, maxactive,
argc, is_return);
@@ -971,6 +976,7 @@ static int __trace_kprobe_create(int argc, const char 
*argv[])
trace_probe_log_clear();
kfree(new_argv);
kfree(symbol);
+   kfree(dbuf);
return ret;
 
 parse_error:
-- 
2.31.1




[PATCH v5 4/8] tracing/probes: support '%pd/%pD' type for fprobe

2024-01-24 Thread Ye Bin
Support print type '%pd/%pD' for print dentry's or file's name.

Signed-off-by: Ye Bin 
---
 kernel/trace/trace_fprobe.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/kernel/trace/trace_fprobe.c b/kernel/trace/trace_fprobe.c
index 7d2ddbcfa377..988d68e906ad 100644
--- a/kernel/trace/trace_fprobe.c
+++ b/kernel/trace/trace_fprobe.c
@@ -976,6 +976,7 @@ static int __trace_fprobe_create(int argc, const char 
*argv[])
char gbuf[MAX_EVENT_NAME_LEN];
char sbuf[KSYM_NAME_LEN];
char abuf[MAX_BTF_ARGS_LEN];
+   char *dbuf = NULL;
bool is_tracepoint = false;
struct tracepoint *tpoint = NULL;
struct traceprobe_parse_context ctx = {
@@ -1086,6 +1087,10 @@ static int __trace_fprobe_create(int argc, const char 
*argv[])
argv = new_argv;
}
 
+   ret = traceprobe_expand_dentry_args(argc, argv, &dbuf);
+   if (ret)
+   goto out;
+
/* setup a probe */
tf = alloc_trace_fprobe(group, event, symbol, tpoint, maxactive,
argc, is_return);
@@ -1131,6 +1136,7 @@ static int __trace_fprobe_create(int argc, const char 
*argv[])
trace_probe_log_clear();
kfree(new_argv);
kfree(symbol);
+   kfree(dbuf);
return ret;
 
 parse_error:
-- 
2.31.1




[PATCH v5 5/8] tracing: add new type "%pd/%pD" in readme_msg[]

2024-01-24 Thread Ye Bin
Signed-off-by: Ye Bin 
---
 kernel/trace/trace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 2a7c6fd934e9..13197d3b86bd 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -5745,7 +5745,7 @@ static const char readme_msg[] =
"\t   +|-[u](), \\imm-value, 
\\\"imm-string\"\n"
"\t type: s8/16/32/64, u8/16/32/64, x8/16/32/64, char, string, 
symbol,\n"
"\t   b@/, ustring,\n"
-   "\t   symstr, \\[\\]\n"
+   "\t   symstr, %pd/%pD \\[\\]\n"
 #ifdef CONFIG_HIST_TRIGGERS
"\tfield:  ;\n"
"\tstype: u8/u16/u32/u64, s8/s16/s32/s64, pid_t,\n"
-- 
2.31.1




[PATCH v5 6/8] Documentation: tracing: add new type '%pd' and '%pD' for kprobe

2024-01-24 Thread Ye Bin
Similar to printk() '%pd' is for fetch dentry's name from struct dentry's
pointer, and '%pD' is for fetch file's name from struct file's pointer.

Signed-off-by: Ye Bin 
---
 Documentation/trace/kprobetrace.rst | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/Documentation/trace/kprobetrace.rst 
b/Documentation/trace/kprobetrace.rst
index bf9cecb69fc9..f13f0fc11251 100644
--- a/Documentation/trace/kprobetrace.rst
+++ b/Documentation/trace/kprobetrace.rst
@@ -58,8 +58,9 @@ Synopsis of kprobe_events
   NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
   FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
  (u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types
- (x8/x16/x32/x64), "char", "string", "ustring", "symbol", 
"symstr"
-  and bitfield are supported.
+ (x8/x16/x32/x64), VFS layer common type(%pd/%pD), "char",
+  "string", "ustring", "symbol", "symstr" and bitfield are
+  supported.
 
   (\*1) only for the probe on function entry (offs == 0). Note, this argument 
access
 is best effort, because depending on the argument type, it may be 
passed on
@@ -113,6 +114,9 @@ With 'symstr' type, you can filter the event with wildcard 
pattern of the
 symbols, and you don't need to solve symbol name by yourself.
 For $comm, the default type is "string"; any other type is invalid.
 
+VFS layer common type(%pd/%pD) is a special type, which fetches dentry's or
+file's name from struct dentry's address or struct file's address.
+
 .. _user_mem_access:
 
 User Memory Access
-- 
2.31.1




[PATCH v5 7/8] selftests/ftrace: add kprobe test cases for VFS type "%pd" and "%pD"

2024-01-24 Thread Ye Bin
This patch adds test cases for new print format type "%pd/%pD".The test cases
test the following items:
1. Test README if add "%pd/%pD" type;
2. Test "%pd" type for dput();
3. Test "%pD" type for vfs_read();

This test case require enable CONFIG_HAVE_FUNCTION_ARG_ACCESS_API configuration.

Signed-off-by: Ye Bin 
---
 .../ftrace/test.d/kprobe/kprobe_args_vfs.tc   | 43 +++
 1 file changed, 43 insertions(+)
 create mode 100644 
tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_vfs.tc

diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_vfs.tc 
b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_vfs.tc
new file mode 100644
index ..cf0599b90f1a
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_vfs.tc
@@ -0,0 +1,43 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Kprobe event VFS type argument
+# requires: kprobe_events
+
+: "Test argument %pd/%pD in README"
+grep -q "%pd/%pD" README
+
+: "Test argument %pd with name"
+echo 'p:testprobe dput name=$arg1:%pd' > kprobe_events
+echo 1 > events/kprobes/testprobe/enable
+grep -q "1" events/kprobes/testprobe/enable
+echo 0 > events/kprobes/testprobe/enable
+grep "dput" trace | grep -q "enable"
+echo "" > kprobe_events
+echo "" > trace
+
+: "Test argument %pd without name"
+echo 'p:testprobe dput $arg1:%pd' > kprobe_events
+echo 1 > events/kprobes/testprobe/enable
+grep -q "1" events/kprobes/testprobe/enable
+echo 0 > events/kprobes/testprobe/enable
+grep "dput" trace | grep -q "enable"
+echo "" > kprobe_events
+echo "" > trace
+
+: "Test argument %pD with name"
+echo 'p:testprobe vfs_read name=$arg1:%pD' > kprobe_events
+echo 1 > events/kprobes/testprobe/enable
+grep -q "1" events/kprobes/testprobe/enable
+echo 0 > events/kprobes/testprobe/enable
+grep "vfs_read" trace | grep -q "enable"
+echo "" > kprobe_events
+echo "" > trace
+
+: "Test argument %pD without name"
+echo 'p:testprobe vfs_read $arg1:%pD' > kprobe_events
+echo 1 > events/kprobes/testprobe/enable
+grep -q "1"  events/kprobes/testprobe/enable
+echo 0 > events/kprobes/testprobe/enable
+grep "vfs_read" trace | grep -q "enable"
+echo "" > kprobe_events
+echo "" > trace
-- 
2.31.1




[PATCH v5 8/8] selftests/ftrace: add fprobe test cases for VFS type "%pd" and "%pD"

2024-01-24 Thread Ye Bin
This patch adds fprobe test cases for new print format type "%pd/%pD".The
test cases test the following items:
1. Test "%pd" type for dput();
2. Test "%pD" type for vfs_read();

This test case require enable CONFIG_HAVE_FUNCTION_ARG_ACCESS_API configuration.

Signed-off-by: Ye Bin 
---
 .../ftrace/test.d/dynevent/fprobe_args_vfs.tc | 40 +++
 1 file changed, 40 insertions(+)
 create mode 100644 
tools/testing/selftests/ftrace/test.d/dynevent/fprobe_args_vfs.tc

diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_args_vfs.tc 
b/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_args_vfs.tc
new file mode 100644
index ..0a5baaffc137
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_args_vfs.tc
@@ -0,0 +1,40 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Fprobe event VFS type argument
+# requires: kprobe_events
+
+: "Test argument %pd with name for fprobe"
+echo 'f:testprobe dput name=$arg1:%pd' > dynamic_events
+echo 1 > events/fprobes/testprobe/enable
+grep -q "1" events/fprobes/testprobe/enable
+echo 0 > events/fprobes/testprobe/enable
+grep "dput" trace | grep -q "enable"
+echo "" > dynamic_events
+echo "" > trace
+
+: "Test argument %pd without name for fprobe"
+echo 'f:testprobe dput $arg1:%pd' > dynamic_events
+echo 1 > events/fprobes/testprobe/enable
+grep -q "1" events/fprobes/testprobe/enable
+echo 0 > events/fprobes/testprobe/enable
+grep "dput" trace | grep -q "enable"
+echo "" > dynamic_events
+echo "" > trace
+
+: "Test argument %pD with name for fprobe"
+echo 'f:testprobe vfs_read name=$arg1:%pD' > dynamic_events
+echo 1 > events/fprobes/testprobe/enable
+grep -q "1" events/fprobes/testprobe/enable
+echo 0 > events/fprobes/testprobe/enable
+grep "vfs_read" trace | grep -q "enable"
+echo "" > dynamic_events
+echo "" > trace
+
+: "Test argument %pD without name for fprobe"
+echo 'f:testprobe vfs_read $arg1:%pD' > dynamic_events
+echo 1 > events/fprobes/testprobe/enable
+grep -q "1"  events/fprobes/testprobe/enable
+echo 0 > events/fprobes/testprobe/enable
+grep "vfs_read" trace | grep -q "enable"
+echo "" > dynamic_events
+echo "" > trace
-- 
2.31.1