Re: [Xen-devel] [PATCH RFC 4/4] xen-block: introduce a new request type to unmap grants
On Mon, Jul 08, 2013 at 03:41:52PM -0400, Konrad Rzeszutek Wilk wrote: > On Mon, Jul 08, 2013 at 03:03:27PM +0200, Roger Pau Monne wrote: > > Right now blkfront has no way to unmap grant refs, if using persistent > > grants once a grant is used blkfront cannot assure if blkback will > > have this grant mapped or not. To solve this problem, a new request > > type (BLKIF_OP_UNMAP) that allows requesting blkback to unmap certain > > grants is introduced. > > I don't think this is the right way of doing it. It is a new operation > (BLKIF_OP_UNMAP) that has nothing to do with READ/WRITE. All it is > is just some way for the frontend to say: unmap this grant if you can. > > As such I would think a better mechanism would be to have a new > grant mechanism that can say: 'I am done with this grant you can > remove it' - that is called to the hypervisor. The hypervisor > can then figure out whether it is free or not and lazily delete it. > (And the guest would be notified when it is freed). > > I would presume that this problem would also exist with netback/netfront > if it started using persisten grants, right? > I think so. This issue is generic enough for any backend that uses persistent grant so I don't think we should limit it to blk only. Wei. > > > > This request can only be used with the indirect operation, since if > > not using indirect segments the maximum number of grants used will be > > 352, which is very far from the default maximum number of grants. > > > > The requested grefs to unmap are placed inside the indirect pages, so > > they can be parsed as an array of grant_ref_t once the indirect pages > > are mapped. This prevents us from introducing a new request struct, > > since we re-use the same struct from the indirect operation. > > > > Signed-off-by: Roger Pau Monné > > Cc: Konrad Rzeszutek Wilk > > --- > > drivers/block/xen-blkback/blkback.c | 108 +- > > drivers/block/xen-blkback/common.h | 12 ++- > > drivers/block/xen-blkback/xenbus.c | 10 ++ > > drivers/block/xen-blkfront.c| 223 > > +++ > > include/xen/interface/io/blkif.h| 13 ++ > > 5 files changed, 337 insertions(+), 29 deletions(-) > > > > diff --git a/drivers/block/xen-blkback/blkback.c > > b/drivers/block/xen-blkback/blkback.c > > index bf4b9d2..1def5d2 100644 > > --- a/drivers/block/xen-blkback/blkback.c > > +++ b/drivers/block/xen-blkback/blkback.c > > @@ -167,6 +167,9 @@ static int do_block_io_op(struct xen_blkif *blkif); > > static int dispatch_rw_block_io(struct xen_blkif *blkif, > > struct blkif_request *req, > > struct pending_req *pending_req); > > +static int dispatch_unmap(struct xen_blkif *blkif, > > + struct blkif_request *req, > > + struct pending_req *pending_req); > > static void make_response(struct xen_blkif *blkif, u64 id, > > unsigned short op, int st); > > > > @@ -841,7 +844,7 @@ static int xen_blkbk_parse_indirect(struct > > blkif_request *req, > > struct blkif_request_segment_aligned *segments = NULL; > > > > nseg = pending_req->nr_pages; > > - indirect_grefs = INDIRECT_PAGES(nseg); > > + indirect_grefs = INDIRECT_PAGES(nseg, SEGS_PER_INDIRECT_FRAME); > > BUG_ON(indirect_grefs > BLKIF_MAX_INDIRECT_PAGES_PER_REQUEST); > > > > for (i = 0; i < indirect_grefs; i++) > > @@ -1064,10 +1067,18 @@ __do_block_io_op(struct xen_blkif *blkif) > > case BLKIF_OP_WRITE: > > case BLKIF_OP_WRITE_BARRIER: > > case BLKIF_OP_FLUSH_DISKCACHE: > > - case BLKIF_OP_INDIRECT: > > if (dispatch_rw_block_io(blkif, &req, pending_req)) > > goto done; > > break; > > + case BLKIF_OP_INDIRECT: > > + if (req.u.indirect.indirect_op == BLKIF_OP_UNMAP) { > > + if (dispatch_unmap(blkif, &req, pending_req)) > > + goto done; > > + } else { > > + if (dispatch_rw_block_io(blkif, &req, > > pending_req)) > > + goto done; > > + } > > + break; > > case BLKIF_OP_DISCARD: > > free_req(blkif, pending_req); > > if (dispatch_discard_io(blkif, &req)) > > @@ -1311,7 +1322,100 @@ static int dispatch_rw_block_io(struct xen_blkif > > *blkif, > > return -EIO; > > } > > > > +/* > > + * Unmap grefs on request from blkfront. This allows blkfront to securely > > + * free the grefs by making sure blkback no longer has them mapped. > > + */ > > +static int dispatch_unmap(struct xen_blkif *blkif, > > + struct blkif_request *req, > > + struct pending_req *pending_req) > > +{ > > + unsigned int n_grants, n_pages; > >
Re: [Xen-devel] [PATCH RFC 4/4] xen-block: introduce a new request type to unmap grants
On Tue, Jul 09, 2013 at 02:32:07PM -0400, Konrad Rzeszutek Wilk wrote: > On Tue, Jul 09, 2013 at 06:37:58PM +0200, Roger Pau Monné wrote: > > On 08/07/13 21:41, Konrad Rzeszutek Wilk wrote: > > > On Mon, Jul 08, 2013 at 03:03:27PM +0200, Roger Pau Monne wrote: > > >> Right now blkfront has no way to unmap grant refs, if using persistent > > >> grants once a grant is used blkfront cannot assure if blkback will > > >> have this grant mapped or not. To solve this problem, a new request > > >> type (BLKIF_OP_UNMAP) that allows requesting blkback to unmap certain > > >> grants is introduced. > > > > > > I don't think this is the right way of doing it. It is a new operation > > > (BLKIF_OP_UNMAP) that has nothing to do with READ/WRITE. All it is > > > is just some way for the frontend to say: unmap this grant if you can. > > > > > > As such I would think a better mechanism would be to have a new > > > grant mechanism that can say: 'I am done with this grant you can > > > remove it' - that is called to the hypervisor. The hypervisor > > > can then figure out whether it is free or not and lazily delete it. > > > (And the guest would be notified when it is freed). > > > > I would prefer not to involve the hypervisor in persistent grants, this > > is something between the frontends and the backends. The hypervisor > > already provides the basic operations (map/unmap), IMHO there's no need > > to add more logic to the hypervisor itself. > > > > I agree that it would be better to have a generic way to request a > > backend to unmap certain grants, but so far this seems like the best > > solution. > > Lets concentrate on a generic way that any frontend/backend can use. > > Please keep in mind that the indirect descriptors could be implemented by > using mapped grants if a backend or frontend wanted to do it. > > This all is tied in the 'feature-persistent-grant' and as that could be > implemented in a similar fashion on netfront (perhaps by only doing it > for one of the rings - the TX ring, or is it RX?). > > > > > > > > > I would presume that this problem would also exist with netback/netfront > > > if it started using persisten grants, right? > > > > I'm not sure of that, it depends on the number of persistent grants > > netfront/netback use, in the block case we need this operation because > > of indirect descriptors, but netfront/netback might not suffer from this > > problem if the maximum number of grants they use is relatively small. > > 256 is the default amount of grants one ring can have. Since there is > a RX and TX ring that means we can have around 512 for one VIF. > > I presume that with the multi-queue (not yet implemented) this can expand > to be 512 * vCPU. > Yes. We need to allow for some space for multiqueue as well as multi page ring. Wei. > > > > > ___ > Xen-devel mailing list > xen-de...@lists.xen.org > http://lists.xen.org/xen-devel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH 08/14] xen: netback: Remove redundant check on unsigned variable
On Fri, 2012-12-28 at 05:15 +, Tushar Behera wrote: > On 11/16/2012 02:46 PM, Ian Campbell wrote: > > On Fri, 2012-11-16 at 06:50 +, Tushar Behera wrote: > >> No need to check whether unsigned variable is less than 0. > >> > >> CC: Ian Campbell > >> CC: xen-de...@lists.xensource.com > >> CC: net...@vger.kernel.org > >> Signed-off-by: Tushar Behera > > > > Acked-by: Ian Campbell > > > > Thanks. > > > > This patch was not picked up for 3.8-rc1. Any idea, who should pick this up? CC'ing Konrad. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH next] xen: Use more current logging styles
On Thu, Jun 27, 2013 at 09:57:49PM -0700, Joe Perches wrote: > Instead of mixing printk and pr_ forms, > just use pr_ > > Miscellaneous changes around these conversions: > > Add a missing newline to avoid message interleaving, > coalesce formats, reflow modified lines to 80 columns. > Do you also need to replace other printk occurences in xen-netback directory, say, interface.c and xenbus.c? Wei. > Signed-off-by: Joe Perches > --- > drivers/net/xen-netback/netback.c | 7 +++ > drivers/net/xen-netfront.c| 28 +--- > 2 files changed, 16 insertions(+), 19 deletions(-) > > diff --git a/drivers/net/xen-netback/netback.c > b/drivers/net/xen-netback/netback.c > index 130bcb2..64828de 100644 > --- a/drivers/net/xen-netback/netback.c > +++ b/drivers/net/xen-netback/netback.c > @@ -1890,9 +1890,8 @@ static int __init netback_init(void) > return -ENODEV; > > if (fatal_skb_slots < XEN_NETBK_LEGACY_SLOTS_MAX) { > - printk(KERN_INFO > -"xen-netback: fatal_skb_slots too small (%d), bump it to > XEN_NETBK_LEGACY_SLOTS_MAX (%d)\n", > -fatal_skb_slots, XEN_NETBK_LEGACY_SLOTS_MAX); > + pr_info("fatal_skb_slots too small (%d), bump it to > XEN_NETBK_LEGACY_SLOTS_MAX (%d)\n", > + fatal_skb_slots, XEN_NETBK_LEGACY_SLOTS_MAX); > fatal_skb_slots = XEN_NETBK_LEGACY_SLOTS_MAX; > } > > @@ -1921,7 +1920,7 @@ static int __init netback_init(void) >"netback/%u", group); > > if (IS_ERR(netbk->task)) { > - printk(KERN_ALERT "kthread_create() fails at > netback\n"); > + pr_alert("kthread_create() fails at netback\n"); > del_timer(&netbk->net_timer); > rc = PTR_ERR(netbk->task); > goto failed_init; > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c > index 76a2236..ff7f111 100644 > --- a/drivers/net/xen-netfront.c > +++ b/drivers/net/xen-netfront.c > @@ -29,6 +29,8 @@ > * IN THE SOFTWARE. > */ > > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > + > #include > #include > #include > @@ -385,9 +387,8 @@ static void xennet_tx_buf_gc(struct net_device *dev) > skb = np->tx_skbs[id].skb; > if (unlikely(gnttab_query_foreign_access( > np->grant_tx_ref[id]) != 0)) { > - printk(KERN_ALERT "xennet_tx_buf_gc: warning " > -"-- grant still in use by backend " > -"domain.\n"); > + pr_alert("%s: warning -- grant still in use by > backend domain\n", > + __func__); > BUG(); > } > gnttab_end_foreign_access_ref( > @@ -804,14 +805,14 @@ static int xennet_set_skb_gso(struct sk_buff *skb, > { > if (!gso->u.gso.size) { > if (net_ratelimit()) > - printk(KERN_WARNING "GSO size must not be zero.\n"); > + pr_warn("GSO size must not be zero\n"); > return -EINVAL; > } > > /* Currently only TCPv4 S.O. is supported. */ > if (gso->u.gso.type != XEN_NETIF_GSO_TYPE_TCPV4) { > if (net_ratelimit()) > - printk(KERN_WARNING "Bad GSO type %d.\n", > gso->u.gso.type); > + pr_warn("Bad GSO type %d\n", gso->u.gso.type); > return -EINVAL; > } > > @@ -910,9 +911,8 @@ static int checksum_setup(struct net_device *dev, struct > sk_buff *skb) > break; > default: > if (net_ratelimit()) > - printk(KERN_ERR "Attempting to checksum a non-" > -"TCP/UDP packet, dropping a protocol" > -" %d packet", iph->protocol); > + pr_err("Attempting to checksum a non-TCP/UDP packet, > dropping a protocol %d packet\n", > +iph->protocol); > goto out; > } > > @@ -1359,14 +1359,14 @@ static struct net_device *xennet_create_dev(struct > xenbus_device *dev) > /* A grant for every tx ring slot */ > if (gnttab_alloc_grant_references(TX_MAX_TARGET, > &np->gref_tx_head) < 0) { > - printk(KERN_ALERT " netfront can't alloc tx grant refs\n"); > + pr_alert("can't alloc tx grant refs\n"); > err = -ENOMEM; > goto exit_free_stats; > } > /* A grant for every rx ring slot */ > if (gnttab_alloc_grant_references(RX_MAX_TARGET, > &np->gref_rx_head) < 0) { > - printk(KERN_ALERT " netfront can't alloc rx grant refs\n"); > +
Re: [PATCH next] xen: Convert printks to pr_
On Fri, Jun 28, 2013 at 03:21:41AM -0700, Joe Perches wrote: > Convert printks to pr_ (excludes printk(KERN_DEBUG...) > to be more consistent throughout the xen subsystem. > > Add pr_fmt with KBUILD_MODNAME or "xen:" KBUILD_MODNAME > Coalesce formats and add missing word spaces > Add missing newlines > Align arguments and reflow to 80 columns > Remove DRV_NAME from formats as pr_fmt adds the same content > > This does change some of the prefixes of these messages > but it also does make them more consistent. > > Signed-off-by: Joe Perches > --- > > On Fri, 2013-06-28 at 09:02 +0100, Wei Liu wrote: > > Do you also need to replace other printk occurences in xen-netback > > directory, say, interface.c and xenbus.c? > > Well, I don't _need_ to but if you want it > this is what I suggest. > Just wanted to make sure the style is consistent within xen-netback. The change below is good but touches too many other files. I can clean up the rest in xen-netback after your previous patch goes in. Wei. > drivers/xen/balloon.c | 6 +++-- > drivers/xen/cpu_hotplug.c | 6 +++-- > drivers/xen/events.c| 23 +- > drivers/xen/evtchn.c| 6 +++-- > drivers/xen/gntalloc.c | 6 +++-- > drivers/xen/gntdev.c| 8 --- > drivers/xen/grant-table.c | 17 +++--- > drivers/xen/manage.c| 23 +- > drivers/xen/mcelog.c| 36 > +++-- > drivers/xen/pcpu.c | 12 +- > drivers/xen/privcmd.c | 4 +++- > drivers/xen/swiotlb-xen.c | 12 ++ > drivers/xen/tmem.c | 10 > drivers/xen/xen-acpi-cpuhotplug.c | 2 ++ > drivers/xen/xen-acpi-memhotplug.c | 2 ++ > drivers/xen/xen-acpi-pad.c | 2 ++ > drivers/xen/xen-acpi-processor.c| 25 ++-- > drivers/xen/xen-balloon.c | 6 +++-- > drivers/xen/xen-pciback/conf_space_header.c | 16 ++--- > drivers/xen/xen-pciback/pci_stub.c | 25 +--- > drivers/xen/xen-pciback/pciback_ops.c | 9 +--- > drivers/xen/xen-pciback/vpci.c | 10 > drivers/xen/xen-pciback/xenbus.c| 8 --- > drivers/xen/xen-selfballoon.c | 11 - > drivers/xen/xenbus/xenbus_comms.c | 13 ++- > drivers/xen/xenbus/xenbus_dev_backend.c | 4 +++- > drivers/xen/xenbus/xenbus_dev_frontend.c| 4 +++- > drivers/xen/xenbus/xenbus_probe.c | 30 +++- > drivers/xen/xenbus/xenbus_probe_backend.c | 8 --- > drivers/xen/xenbus/xenbus_probe_frontend.c | 35 ++-- > drivers/xen/xenbus/xenbus_xs.c | 22 -- > drivers/xen/xencomm.c | 2 ++ > drivers/xen/xenfs/super.c | 4 +++- > include/xen/hvm.h | 4 ++-- > 34 files changed, 215 insertions(+), 196 deletions(-) > > diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c > index 08e9715..4e43c0e 100644 > --- a/drivers/xen/balloon.c > +++ b/drivers/xen/balloon.c > @@ -36,6 +36,8 @@ > * IN THE SOFTWARE. > */ > > +#define pr_fmt(fmt) "xen:" KBUILD_MODNAME ": " fmt > + > #include > #include > #include > @@ -231,7 +233,7 @@ static enum bp_state reserve_additional_memory(long > credit) > rc = add_memory(nid, hotplug_start_paddr, balloon_hotplug << > PAGE_SHIFT); > > if (rc) { > - pr_info("xen_balloon: %s: add_memory() failed: %i\n", __func__, > rc); > + pr_info("%s: add_memory() failed: %i\n", __func__, rc); > return BP_EAGAIN; > } > > @@ -582,7 +584,7 @@ static int __init balloon_init(void) > if (!xen_domain()) > return -ENODEV; > > - pr_info("xen/balloon: Initialising balloon driver.\n"); > + pr_info("Initialising balloon driver\n"); > > balloon_stats.current_pages = xen_pv_domain() > ? min(xen_start_info->nr_pages - xen_released_pages, max_pfn) > diff --git a/drivers/xen/cpu_hotplug.c b/drivers/xen/cpu_hotplug.c > index b0b87dd..cbb02af 100644 > --- a/drivers/xen/cpu_hotplug.c > +++ b/drivers/xen/cpu_hotplug.c > @@ -1,3 +1,5 @@ > +#define pr_fmt(fmt) "xen:" KBUILD_MODNAME ": " fmt > + > #include > > #include > @@
Re: [Xen-devel] [PATCH] xen: switch to threaded irq in netback
I think you'd better resubmit this patch to netdev@ because netfront/back patches will go via David Miller's tree. And perhaps uses the subject line: xen-netback: switch to threaded irq for control ring On Tue, Sep 20, 2016 at 04:21:37PM +0200, Juergen Gross wrote: > Instead of open coding it use the threaded irq mechanism in > xen-netback. > > Signed-off-by: Juergen Gross The code looks OK: Acked-by: Wei Liu Wei.
Re: [PATCH resend 2] xen-netback: switch to threaded irq for control ring
On Thu, Sep 22, 2016 at 11:06:25AM +0200, Juergen Gross wrote: > Instead of open coding it use the threaded irq mechanism in > xen-netback. > > Signed-off-by: Juergen Gross Acked-by: Wei Liu
Re: [PATCH] xen-netfront, xen-netback: Use correct minimum MTU values
On Mon, Oct 16, 2017 at 03:20:32PM +0200, Mohammed Gamal wrote: > RFC791 specifies the minimum MTU to be 68, while xen-net{front|back} > drivers use a minimum value of 0. > > When set MTU to 0~67 with xen_net{front|back} driver, the network > will become unreachable immediately, the guest can no longer be pinged. > > xen_net{front|back} should not allow the user to set this value which causes > network problems. > > Reported-by: Chen Shi > Signed-off-by: Mohammed Gamal Acked-by: Wei Liu CC netfront maintainers
Re: [PATCH v4 18/20] net/xen-netback: Make it running on 64KB page granularity
You might need to rebase you patch. A patch to netback went it recently. On Mon, Sep 07, 2015 at 04:33:56PM +0100, Julien Grall wrote: > The PV network protocol is using 4KB page granularity. The goal of this > patch is to allow a Linux using 64KB page granularity working as a > network backend on a non-modified Xen. > > It's only necessary to adapt the ring size and break skb data in small > chunk of 4KB. The rest of the code is relying on the grant table code. > > Signed-off-by: Julien Grall > Reviewed-by: Wei Liu -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH v4 18/20] net/xen-netback: Make it running on 64KB page granularity
On Tue, Sep 08, 2015 at 12:07:31PM +0100, Julien Grall wrote: > Hi Wei, > > On 07/09/15 17:57, Wei Liu wrote: > > You might need to rebase you patch. A patch to netback went it recently. > > Do you mean 210c34dcd8d912dcc740f1f17625a7293af5cb56 "xen-netback: add > support for multicast control"? > Yes, that one. > If so I didn't see any specific issue while rebasing on the latest > linus' master. > Good. > > On Mon, Sep 07, 2015 at 04:33:56PM +0100, Julien Grall wrote: > >> The PV network protocol is using 4KB page granularity. The goal of this > >> patch is to allow a Linux using 64KB page granularity working as a > >> network backend on a non-modified Xen. > >> > >> It's only necessary to adapt the ring size and break skb data in small > >> chunk of 4KB. The rest of the code is relying on the grant table code. > >> > >> Signed-off-by: Julien Grall > >> > > > > Reviewed-by: Wei Liu > > Thank you! > > Regards, > > -- > Julien Grall -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] xen-netback: corretly check failed allocation
On Thu, Oct 15, 2015 at 12:26:16PM -0400, Insu Yun wrote: > Since vzalloc can be failed in memory pressure, > return value should be checked and return ENOMEM. This function doesn't return ENOMEM, instead it writes to xenstore to indicate error. The commit log needs to be updated. > > Signed-off-by: Insu Yun > --- > drivers/net/xen-netback/xenbus.c | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/drivers/net/xen-netback/xenbus.c > b/drivers/net/xen-netback/xenbus.c > index 929a6e7..e288246 100644 > --- a/drivers/net/xen-netback/xenbus.c > +++ b/drivers/net/xen-netback/xenbus.c > @@ -788,6 +788,11 @@ static void connect(struct backend_info *be) > /* Use the number of queues requested by the frontend */ > be->vif->queues = vzalloc(requested_num_queues * > sizeof(struct xenvif_queue)); > + if (!be->vif->queues) { > +xenbus_dev_fatal(dev, -ENOMEM, "allocating queues"); > +return; > + } > + The indentation is wrong. Please configure your email client properly. And please use "goto err" for error handling -- yes, I understand there is existing code that returns directly but IMHO that should be fixed too. We. > be->vif->num_queues = requested_num_queues; > be->vif->stalled_queues = requested_num_queues; > > -- > 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] xen-netback: correctly check failed allocation
On Thu, Oct 15, 2015 at 02:02:47PM -0400, Insu Yun wrote: > I changed patch with valid format. > > On Thu, Oct 15, 2015 at 2:02 PM, Insu Yun wrote: > > > Since vzalloc can be failed in memory pressure, > > writes -ENOMEM to xenstore to indicate error. > > > > Signed-off-by: Insu Yun > > --- > > drivers/net/xen-netback/xenbus.c | 6 ++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/drivers/net/xen-netback/xenbus.c > > b/drivers/net/xen-netback/xenbus.c > > index 929a6e7..56ebd82 100644 > > --- a/drivers/net/xen-netback/xenbus.c > > +++ b/drivers/net/xen-netback/xenbus.c > > @@ -788,6 +788,12 @@ static void connect(struct backend_info *be) > > /* Use the number of queues requested by the frontend */ > > be->vif->queues = vzalloc(requested_num_queues * > > sizeof(struct xenvif_queue)); > > + if (!be->vif->queues) { > > + xenbus_dev_fatal(dev, -ENOMEM, > > +"allocating queues"); > > + return; > > > > I didn't use goto err, because another error handling is not required > It's recommended in kernel coding style to use "goto" style error handling. I personally prefer that to arbitrary return in function body, too. It's not a matter of whether another error handling is required or not, it's about cleaner code that is easy to reason about and consistent coding style. The existing code is not perfect, but that doesn't mean we should follow bad example. Wei. > > > + } > > + > > be->vif->num_queues = requested_num_queues; > > be->vif->stalled_queues = requested_num_queues; > > > > -- > > 1.9.1 > > > > > > > -- > Regards > Insu Yun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] xen-netback: correctly check failed allocation
On Fri, Oct 16, 2015 at 10:05:21AM +0100, Wei Liu wrote: > On Thu, Oct 15, 2015 at 02:02:47PM -0400, Insu Yun wrote: > > I changed patch with valid format. > > > > On Thu, Oct 15, 2015 at 2:02 PM, Insu Yun wrote: > > > > > Since vzalloc can be failed in memory pressure, > > > writes -ENOMEM to xenstore to indicate error. > > > > > > Signed-off-by: Insu Yun > > > --- > > > drivers/net/xen-netback/xenbus.c | 6 ++ > > > 1 file changed, 6 insertions(+) > > > > > > diff --git a/drivers/net/xen-netback/xenbus.c > > > b/drivers/net/xen-netback/xenbus.c > > > index 929a6e7..56ebd82 100644 > > > --- a/drivers/net/xen-netback/xenbus.c > > > +++ b/drivers/net/xen-netback/xenbus.c > > > @@ -788,6 +788,12 @@ static void connect(struct backend_info *be) > > > /* Use the number of queues requested by the frontend */ > > > be->vif->queues = vzalloc(requested_num_queues * > > > sizeof(struct xenvif_queue)); > > > + if (!be->vif->queues) { > > > + xenbus_dev_fatal(dev, -ENOMEM, > > > +"allocating queues"); > > > + return; > > > > > > > I didn't use goto err, because another error handling is not required > > > > It's recommended in kernel coding style to use "goto" style error > handling. I personally prefer that to arbitrary return in function body, > too. > > It's not a matter of whether another error handling is required or not, > it's about cleaner code that is easy to reason about and consistent > coding style. > > The existing code is not perfect, but that doesn't mean we should follow > bad example. And to be clear, I don't want to block this patch just because of this coding style thing. It's still an improvement and fix a real problem. So: Acked-by: Wei Liu -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH v3 2/9] xen-block: add document for mutli hardware queues/rings
On Fri, Oct 02, 2015 at 06:04:35PM +0200, Roger Pau Monné wrote: > El 05/09/15 a les 14.39, Bob Liu ha escrit: > > Document multi queues/rings of xen-block. > > > > Signed-off-by: Bob Liu > > As said by Konrad, you should send this against the Xen public headers > also (or even before). I have a comment below. > > > --- > > include/xen/interface/io/blkif.h | 32 > > 1 file changed, 32 insertions(+) > > > > diff --git a/include/xen/interface/io/blkif.h > > b/include/xen/interface/io/blkif.h > > index c33e1c4..b453b70 100644 > > --- a/include/xen/interface/io/blkif.h > > +++ b/include/xen/interface/io/blkif.h > > @@ -28,6 +28,38 @@ typedef uint16_t blkif_vdev_t; > > typedef uint64_t blkif_sector_t; > > > > /* > > + * Multiple hardware queues/rings: > > + * If supported, the backend will write the key "multi-queue-max-queues" to > > + * the directory for that vbd, and set its value to the maximum supported > > + * number of queues. > > + * Frontends that are aware of this feature and wish to use it can write > > the > > + * key "multi-queue-num-queues", set to the number they wish to use, which > > + * must be greater than zero, and no more than the value reported by the > > backend > > + * in "multi-queue-max-queues". > > + * > > + * For frontends requesting just one queue, the usual event-channel and > > + * ring-ref keys are written as before, simplifying the backend processing > > + * to avoid distinguishing between a frontend that doesn't understand the > > + * multi-queue feature, and one that does, but requested only one queue. > > + * > > + * Frontends requesting two or more queues must not write the toplevel > > + * event-channeland ring-ref keys, instead writing those keys under > > sub-keys > > + * having the name "queue-N" where N is the integer ID of the queue/ring > > for > > + * which those keys belong. Queues are indexed from zero. > > + * For example, a frontend with two queues must write the following set of > > + * queue-related keys: > > + * > > + * /local/domain/1/device/vbd/0/multi-queue-num-queues = "2" > > + * /local/domain/1/device/vbd/0/queue-0 = "" > > + * /local/domain/1/device/vbd/0/queue-0/ring-ref = "" > > + * /local/domain/1/device/vbd/0/queue-0/event-channel = "" > > + * /local/domain/1/device/vbd/0/queue-1 = "" > > + * /local/domain/1/device/vbd/0/queue-1/ring-ref = "" > > + * /local/domain/1/device/vbd/0/queue-1/event-channel = "" > > AFAICT, it's impossible by design to use multiple queues together with > multipage rings, is that right? > As far as I can tell, these two features are not inherently coupled. Whether you want to make (by design) them coupled together or not is another matter. :-) > Roger. > > > ___ > Xen-devel mailing list > xen-de...@lists.xen.org > http://lists.xen.org/xen-devel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH] xen/netfront: don't cache skb_shinfo()
On Thu, Aug 09, 2018 at 04:42:16PM +0200, Juergen Gross wrote: > skb_shinfo() can change when calling __pskb_pull_tail(): Don't cache > its return value. > > Cc: sta...@vger.kernel.org > Signed-off-by: Juergen Gross Reviewed-by: Wei Liu
Re: [PATCH 4/4] drivers/net: Use octal not symbolic permissions
On Fri, Mar 23, 2018 at 03:54:39PM -0700, Joe Perches wrote: > Prefer the direct use of octal for permissions. > > Done with checkpatch -f --types=SYMBOLIC_PERMS --fix-inplace > and some typing. > > Miscellanea: > > o Whitespace neatening around these conversions. > > Signed-off-by: Joe Perches > --- > drivers/net/xen-netback/xenbus.c | 4 +- > drivers/net/xen-netfront.c | 6 +-- Reviewed-by: Wei Liu
Re: [Xen-devel] [PATCH] xen/pvh: Indicate XENFEAT_linux_rsdp_unrestricted to Xen
On Mon, Apr 09, 2018 at 02:51:44PM -0400, Boris Ostrovsky wrote: > Pre-4.17 kernels ignored start_info's rsdp_paddr pointer and instead > relied on finding RSDP in standard location in BIOS RO memory. This > has worked since that's where Xen used to place it. > > However, with recent Xen change (commit 4a5733771e6f ("libxl: put RSDP > for PVH guest near 4GB")) it prefers to keep RSDP at a "non-standard" > address. Even though as of commit b17d9d1df3c3 ("x86/xen: Add pvh > specific rsdp address retrieval function") Linux is able to find RSDP, > for back-compatibility reasons we need to indicate to Xen that we can > handle this, an we do so by setting XENFEAT_linux_rsdp_unrestricted > flag in ELF notes. > > (Also take this opportunity and sync features.h header file with Xen) > > Signed-off-by: Boris Ostrovsky Reviewed-by: Wei Liu
Re: [PATCH] net: xenbus: convert to DEFINE_SHOW_ATTRIBUTE
On Mon, Dec 10, 2018 at 10:53:29AM -0500, Yangtao Li wrote: > Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. > > Signed-off-by: Yangtao Li Acked-by: Wei Liu Thanks for the patch.
Re: [PATCH v2] Drivers: hv: vmbus: Initialize unload_event statically
On Tue, Apr 20, 2021 at 04:50:56AM +, Michael Kelley wrote: > From: Andrea Parri (Microsoft) Sent: Monday, April > 19, 2021 6:44 PM > > > > If a malicious or compromised Hyper-V sends a spurious message of type > > CHANNELMSG_UNLOAD_RESPONSE, the function vmbus_unload_response() will > > call complete() on an uninitialized event, and cause an oops. > > > > Reported-by: Michael Kelley > > Signed-off-by: Andrea Parri (Microsoft) > > --- > > Changes since v1[1]: > > - add inline comment in vmbus_unload_response() > > > > [1] > > https://lore.kernel.org/linux-hyperv/20210416143932.16512-1-parri.and...@gmail.com/ > > > > drivers/hv/channel_mgmt.c | 7 ++- > > drivers/hv/connection.c | 2 ++ > > 2 files changed, 8 insertions(+), 1 deletion(-) > > > > Reviewed-by: Michael Kelley Applied to hyperv-next. Thanks.
Re: ** POTENTIAL FRAUD ALERT - RED HAT ** [PATCH v2 1/1] Drivers: hv: vmbus: Increase wait time for VMbus unload
On Tue, Apr 20, 2021 at 11:31:54AM +0200, Vitaly Kuznetsov wrote: > Michael Kelley writes: > > > When running in Azure, disks may be connected to a Linux VM with > > read/write caching enabled. If a VM panics and issues a VMbus > > UNLOAD request to Hyper-V, the response is delayed until all dirty > > data in the disk cache is flushed. In extreme cases, this flushing > > can take 10's of seconds, depending on the disk speed and the amount > > of dirty data. If kdump is configured for the VM, the current 10 second > > timeout in vmbus_wait_for_unload() may be exceeded, and the UNLOAD > > complete message may arrive well after the kdump kernel is already > > running, causing problems. Note that no problem occurs if kdump is > > not enabled because Hyper-V waits for the cache flush before doing > > a reboot through the BIOS/UEFI code. > > > > Fix this problem by increasing the timeout in vmbus_wait_for_unload() > > to 100 seconds. Also output periodic messages so that if anyone is > > watching the serial console, they won't think the VM is completely > > hung. > > > > Fixes: 911e1987efc8 ("Drivers: hv: vmbus: Add timeout to > > vmbus_wait_for_unload") > > Signed-off-by: Michael Kelley Applied to hyperv-next. Thanks. > > --- [...] > > > > +#define UNLOAD_DELAY_UNIT_MS 10 /* 10 milliseconds */ > > +#define UNLOAD_WAIT_MS (100*1000) /* 100 seconds */ > > +#define UNLOAD_WAIT_LOOPS (UNLOAD_WAIT_MS/UNLOAD_DELAY_UNIT_MS) > > +#define UNLOAD_MSG_MS (5*1000)/* Every 5 seconds */ > > +#define UNLOAD_MSG_LOOPS (UNLOAD_MSG_MS/UNLOAD_DELAY_UNIT_MS) > > + > > static void vmbus_wait_for_unload(void) > > { > > int cpu; > > @@ -772,12 +778,17 @@ static void vmbus_wait_for_unload(void) > > * vmbus_connection.unload_event. If not, the last thing we can do is > > * read message pages for all CPUs directly. > > * > > -* Wait no more than 10 seconds so that the panic path can't get > > -* hung forever in case the response message isn't seen. > > +* Wait up to 100 seconds since an Azure host must writeback any dirty > > +* data in its disk cache before the VMbus UNLOAD request will > > +* complete. This flushing has been empirically observed to take up > > +* to 50 seconds in cases with a lot of dirty data, so allow additional > > +* leeway and for inaccuracies in mdelay(). But eventually time out so > > +* that the panic path can't get hung forever in case the response > > +* message isn't seen. > > I vaguely remember debugging cases when CHANNELMSG_UNLOAD_RESPONSE never > arrives, it was kind of pointless to proceed to kexec as attempts to > reconnect Vmbus devices were failing (no devices were offered after > CHANNELMSG_REQUESTOFFERS AFAIR). Would it maybe make sense to just do > emergency reboot instead of proceeding to kexec when this happens? Just > wondering. > Please submit a follow-up patch if necessary. Wei.
Re: [PATCH 1/1] video: hyperv_fb: Add ratelimit on error message
On Tue, Apr 20, 2021 at 08:44:19AM -0700, Michael Kelley wrote: > Due to a full ring buffer, the driver may be unable to send updates to > the Hyper-V host. But outputing the error message can make the problem > worse because console output is also typically written to the frame > buffer. As a result, in some circumstances the error message is output > continuously. > > Break the cycle by rate limiting the error message. Also output > the error code for additional diagnosability. > > Signed-off-by: Michael Kelley Applied to hyperv-next. Thanks. > --- > drivers/video/fbdev/hyperv_fb.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/video/fbdev/hyperv_fb.c b/drivers/video/fbdev/hyperv_fb.c > index 4dc9077..a7e6eea 100644 > --- a/drivers/video/fbdev/hyperv_fb.c > +++ b/drivers/video/fbdev/hyperv_fb.c > @@ -308,7 +308,7 @@ static inline int synthvid_send(struct hv_device *hdev, > VM_PKT_DATA_INBAND, 0); > > if (ret) > - pr_err("Unable to send packet via vmbus\n"); > + pr_err_ratelimited("Unable to send packet via vmbus; error > %d\n", ret); > > return ret; > } > -- > 1.8.3.1 >
Re: Regressions with VMBus/VSCs hardening changes
On Fri, Feb 12, 2021 at 05:50:50PM +0100, Andrea Parri wrote: > Hi all, [...] > 2) IIUC a8c3209998afb5 could be dropped (after rebase) without further modi- >fications to hyperv-next. I've reverted the said patch from hyperv-next. Wei.
[GIT PULL] Hyper-V commits for 5.12
Hi Linus, The following changes since commit 6ee1d745b7c9fd573fba142a2efdad76a9f1cb04: Linux 5.11-rc5 (2021-01-24 16:47:14 -0800) are available in the Git repository at: ssh://g...@gitolite.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git tags/hyperv-next-signed-20210216 for you to fetch changes up to 3019270282a175defc02c8331786c73e082cd2a8: Revert "Drivers: hv: vmbus: Copy packets sent by Hyper-V out of the ring buffer" (2021-02-15 10:49:11 +) hyperv-next for 5.12 - VMBus hardening patches from Andrea Parri and Andres Beltran. - Patches to make Linux boot as the root partition on Microsoft Hypervisor from Wei Liu. - One patch to add a new sysfs interface to support hibernation on Hyper-V from Dexuan Cui. - Two miscellaneous clean-up patches from Colin and Gustavo. Andrea Parri (Microsoft) (9): Drivers: hv: vmbus: Initialize memory to be sent to the host Drivers: hv: vmbus: Reduce number of references to message in vmbus_on_msg_dpc() Drivers: hv: vmbus: Copy the hv_message in vmbus_on_msg_dpc() Drivers: hv: vmbus: Avoid use-after-free in vmbus_onoffer_rescind() Drivers: hv: vmbus: Resolve race condition in vmbus_onoffer_rescind() x86/hyperv: Load/save the Isolation Configuration leaf Drivers: hv: vmbus: Restrict vmbus_devices on isolated guests Drivers: hv: vmbus: Enforce 'VMBus version >= 5.2' on isolated guests hv_netvsc: Restrict configurations on isolated guests Andres Beltran (2): Drivers: hv: vmbus: Copy packets sent by Hyper-V out of the ring buffer hv_utils: Add validation for untrusted Hyper-V values Colin Ian King (1): hv_utils: Fix spelling mistake "Hearbeat" -> "Heartbeat" Dexuan Cui (1): Drivers: hv: vmbus: Add /sys/bus/vmbus/hibernation Gustavo A. R. Silva (1): hv: hyperv.h: Replace one-element array with flexible-array in struct icmsg_negotiate Wei Liu (17): asm-generic/hyperv: change HV_CPU_POWER_MANAGEMENT to HV_CPU_MANAGEMENT x86/hyperv: detect if Linux is the root partition Drivers: hv: vmbus: skip VMBus initialization if Linux is root clocksource/hyperv: use MSR-based access if running as root x86/hyperv: allocate output arg pages if required x86/hyperv: extract partition ID from Microsoft Hypervisor if necessary x86/hyperv: handling hypercall page setup for root ACPI / NUMA: add a stub function for node_to_pxm() x86/hyperv: provide a bunch of helper functions x86/hyperv: implement and use hv_smp_prepare_cpus asm-generic/hyperv: update hv_msi_entry asm-generic/hyperv: update hv_interrupt_entry asm-generic/hyperv: introduce hv_device_id and auxiliary structures asm-generic/hyperv: import data structures for mapping device interrupts x86/hyperv: implement an MSI domain for root partition iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition Revert "Drivers: hv: vmbus: Copy packets sent by Hyper-V out of the ring buffer" Documentation/ABI/stable/sysfs-bus-vmbus | 7 + arch/x86/hyperv/Makefile | 4 +- arch/x86/hyperv/hv_init.c| 122 +- arch/x86/hyperv/hv_proc.c| 219 ++ arch/x86/hyperv/irqdomain.c | 385 +++ arch/x86/include/asm/hyperv-tlfs.h | 38 +++ arch/x86/include/asm/mshyperv.h | 19 +- arch/x86/kernel/cpu/mshyperv.c | 58 + drivers/clocksource/hyperv_timer.c | 3 + drivers/hv/channel.c | 4 +- drivers/hv/channel_mgmt.c| 77 ++- drivers/hv/connection.c | 7 + drivers/hv/hv_fcopy.c| 36 ++- drivers/hv/hv_kvp.c | 122 +- drivers/hv/hv_snapshot.c | 89 --- drivers/hv/hv_util.c | 222 +++--- drivers/hv/vmbus_drv.c | 64 +++-- drivers/iommu/hyperv-iommu.c | 177 +- drivers/net/hyperv/netvsc.c | 18 +- drivers/pci/controller/pci-hyperv.c | 2 +- include/acpi/acpi_numa.h | 4 + include/asm-generic/hyperv-tlfs.h| 255 +++- include/asm-generic/mshyperv.h | 5 + include/linux/hyperv.h | 13 +- 24 files changed, 1717 insertions(+), 233 deletions(-) create mode 100644 arch/x86/hyperv/hv_proc.c create mode 100644 arch/x86/hyperv/irqdomain.c
Re: [PATCH 0/4] Add support for XMM fast hypercalls
On Wed, Apr 07, 2021 at 11:29:26PM +0200, Siddharth Chandrasekaran wrote: > Hyper-V supports the use of XMM registers to perform fast hypercalls. > This allows guests to take advantage of the improved performance of the > fast hypercall interface even though a hypercall may require more than > (the current maximum of) two general purpose registers. > > The XMM fast hypercall interface uses an additional six XMM registers > (XMM0 to XMM5) to allow the caller to pass an input parameter block of > up to 112 bytes. Hyper-V can also return data back to the guest in the > remaining XMM registers that are not used by the current hypercall. > > Although the Hyper-v TLFS mentions that a guest cannot use this feature > unless the hypervisor advertises support for it, some hypercalls which > we plan on upstreaming in future uses them anyway. No, please don't do this. Check the feature bit(s) before you issue hypercalls which rely on the extended interface. Wei.
Re: [PATCH 0/4] Add support for XMM fast hypercalls
On Thu, Apr 08, 2021 at 05:30:26PM +0200, Paolo Bonzini wrote: > On 08/04/21 17:28, Wei Liu wrote: > > > Although the Hyper-v TLFS mentions that a guest cannot use this feature > > > unless the hypervisor advertises support for it, some hypercalls which > > > we plan on upstreaming in future uses them anyway. > > > > No, please don't do this. Check the feature bit(s) before you issue > > hypercalls which rely on the extended interface. > > Perhaps Siddharth should clarify this, but I read it as Hyper-V being buggy > and using XMM arguments unconditionally. > There is no code in upstream Linux that uses the XMM fast hypercall interface at the moment. If there is such code, it has bugs in it and should be fixed. :-) Wei. > Paolo >
Re: [PATCH 4/4] KVM: hyper-v: Advertise support for fast XMM hypercalls
On Thu, Apr 08, 2021 at 04:20:54PM +0200, Siddharth Chandrasekaran wrote: > On Thu, Apr 08, 2021 at 02:05:53PM +0200, Vitaly Kuznetsov wrote: > > Siddharth Chandrasekaran writes: > > > > > Now that all extant hypercalls that can use XMM registers (based on > > > spec) for input/outputs are patched to support them, we can start > > > advertising this feature to guests. > > > > > > Cc: Alexander Graf > > > Cc: Evgeny Iakovlev > > > Signed-off-by: Siddharth Chandrasekaran > > > --- > > > arch/x86/include/asm/hyperv-tlfs.h | 4 ++-- > > > arch/x86/kvm/hyperv.c | 1 + > > > 2 files changed, 3 insertions(+), 2 deletions(-) > > > > > > diff --git a/arch/x86/include/asm/hyperv-tlfs.h > > > b/arch/x86/include/asm/hyperv-tlfs.h > > > index e6cd3fee562b..1f160ef60509 100644 > > > --- a/arch/x86/include/asm/hyperv-tlfs.h > > > +++ b/arch/x86/include/asm/hyperv-tlfs.h > > > @@ -49,10 +49,10 @@ > > > /* Support for physical CPU dynamic partitioning events is available*/ > > > #define HV_X64_CPU_DYNAMIC_PARTITIONING_AVAILABLEBIT(3) > > > /* > > > - * Support for passing hypercall input parameter block via XMM > > > + * Support for passing hypercall input and output parameter block via XMM > > > * registers is available > > > */ > > > -#define HV_X64_HYPERCALL_PARAMS_XMM_AVAILABLEBIT(4) > > > +#define HV_X64_HYPERCALL_PARAMS_XMM_AVAILABLEBIT(4) | > > > BIT(15) > > > > TLFS 6.0b states that there are two distinct bits for input and output: > > > > CPUID Leaf 0x4003.EDX: > > Bit 4: support for passing hypercall input via XMM registers is available. > > Bit 15: support for returning hypercall output via XMM registers is > > available. > > > > and HV_X64_HYPERCALL_PARAMS_XMM_AVAILABLE is not currently used > > anywhere, I'd suggest we just rename > > > > HV_X64_HYPERCALL_PARAMS_XMM_AVAILABLE to > > HV_X64_HYPERCALL_XMM_INPUT_AVAILABLE > > and add HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE (bit 15). > > That is how I had it initially; but then noticed that we would never > need to use either of them separately. So it seemed like a reasonable > abstraction to put them together. > They are two separate things in TLFS. Please use two macros here. Wei.
Re: [PATCH 0/4] Add support for XMM fast hypercalls
On Thu, Apr 08, 2021 at 05:54:43PM +0200, Siddharth Chandrasekaran wrote: > On Thu, Apr 08, 2021 at 05:48:19PM +0200, Paolo Bonzini wrote: > > On 08/04/21 17:40, Siddharth Chandrasekaran wrote: > > > > > > Although the Hyper-v TLFS mentions that a guest cannot use this > > > > > > feature > > > > > > unless the hypervisor advertises support for it, some hypercalls > > > > > > which > > > > > > we plan on upstreaming in future uses them anyway. > > > > > No, please don't do this. Check the feature bit(s) before you issue > > > > > hypercalls which rely on the extended interface. > > > > Perhaps Siddharth should clarify this, but I read it as Hyper-V being > > > > buggy and using XMM arguments unconditionally. > > > The guest is at fault here as it expects Hyper-V to consume arguments > > > from XMM registers for certain hypercalls (that we are working) even if > > > we didn't expose the feature via CPUID bits. > > > > What guest is that? > > It is a Windows Server 2016. Can you be more specific? Are you implementing some hypercalls from TLFS? If so, which ones? Wei.
Re: [PATCH v2] xen-netback: making the bandwidth limiter runtime settable
On Thu, Mar 19, 2015 at 11:05:42AM +0100, Imre Palik wrote: > From: "Palik, Imre" > > With the current netback, the bandwidth limiter's parameters are only > settable during vif setup time. This patch register a watch on them, and > thus makes them runtime changeable. > > When the watch fires, the timer is reset. The timer's mutex is used for > fencing the change. > > Cc: Anthony Liguori > Signed-off-by: Imre Palik Acked-by: Wei Liu -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RESEND 1/2] xenbus_client: Extend interface to support multi-page ring
On Tue, Mar 31, 2015 at 02:36:43PM +0200, Juergen Gross wrote: > On 03/31/2015 02:15 PM, Bob Liu wrote: > >From: Wei Liu > > > >Originally Xen PV drivers only use single-page ring to pass along > >information. This might limit the throughput between frontend and > >backend. > > > >The patch extends Xenbus driver to support multi-page ring, which in > >general should improve throughput if ring is the bottleneck. Changes to > >various frontend / backend to adapt to the new interface are also > >included. > > > >Affected Xen drivers: > >* blkfront/back > >* netfront/back > >* pcifront/back > > What about pvscsi drivers? > They are affected, too! > When I wrote this patch pvscsi didn't exist upstream. It would be fairly easy to change that driver too. Wei. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] xen-netback:Make the function xenvif_schedulable have a return type of bool
On Tue, Jun 16, 2015 at 11:03:30PM -0400, Nicholas Krause wrote: > This makes the function xenvif_sechedulable have a return type of > bool now due to this particular function's return statement only > ever evaluating to have a value of one or zero. > > Signed-off-by: Nicholas Krause Acked-by: Wei Liu > --- > drivers/net/xen-netback/common.h| 2 +- > drivers/net/xen-netback/interface.c | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/xen-netback/common.h > b/drivers/net/xen-netback/common.h > index 8a495b3..c02cafb 100644 > --- a/drivers/net/xen-netback/common.h > +++ b/drivers/net/xen-netback/common.h > @@ -280,7 +280,7 @@ void xenvif_free(struct xenvif *vif); > int xenvif_xenbus_init(void); > void xenvif_xenbus_fini(void); > > -int xenvif_schedulable(struct xenvif *vif); > +bool xenvif_schedulable(struct xenvif *vif); > > int xenvif_queue_stopped(struct xenvif_queue *queue); > void xenvif_wake_queue(struct xenvif_queue *queue); > diff --git a/drivers/net/xen-netback/interface.c > b/drivers/net/xen-netback/interface.c > index 1a83e19..b5fcb52 100644 > --- a/drivers/net/xen-netback/interface.c > +++ b/drivers/net/xen-netback/interface.c > @@ -63,7 +63,7 @@ void xenvif_skb_zerocopy_complete(struct xenvif_queue > *queue) > atomic_dec(&queue->inflight_packets); > } > > -int xenvif_schedulable(struct xenvif *vif) > +bool xenvif_schedulable(struct xenvif *vif) > { > return netif_running(vif->dev) && > test_bit(VIF_STATUS_CONNECTED, &vif->status) && > -- > 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] "tcp: refine TSO autosizing" causes performance regression on Xen
Hi Eric Sorry for coming late to the discussion. On Thu, Apr 16, 2015 at 05:42:16AM -0700, Eric Dumazet wrote: > On Thu, 2015-04-16 at 11:01 +0100, George Dunlap wrote: > > > He suggested that after he'd been prodded by 4 more e-mails in which two > > of us guessed what he was trying to get at. That's what I was > > complaining about. > > My big complain is that I suggested to test to double the sysctl, which > gave good results. > Do I understand correctly that it's acceptable to you to double the size of the buffer? If so I will send a patch to do that. Wei. > Then you provided a patch using a 8x factor. How does that sound ? > > Next time I ask a raise, I should try a 8x factor as well, who knows, > it might be accepted. > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] xen-netback: fix a BUG() during initialization
On Fri, Jun 19, 2015 at 02:21:51PM +0200, Imre Palik wrote: > From: "Palik, Imre" > > Commit edafc132baac ("xen-netback: making the bandwidth limiter runtime > settable") > introduced the capability to change the bandwidth rate limit at runtime. > But it also introduced a possible crashing bug. > > If netback receives two XenbusStateConnected without getting the > hotplug-status watch firing in between, then it will try to register the > watches for the rate limiter again. But this triggers a BUG() in the watch > registration code. > > The fix modifies connect() to remove the possibly existing packet-rate > watches before trying to install those watches. This behaviour is in line > with how connect() deals with the hotplug-status watch. > > Signed-off-by: Imre Palik > Cc: Matt Wilson Acked-by: Wei Liu > --- > drivers/net/xen-netback/xenbus.c |4 > 1 file changed, 4 insertions(+) > > diff --git a/drivers/net/xen-netback/xenbus.c > b/drivers/net/xen-netback/xenbus.c > index 968787a..ec383b0 100644 > --- a/drivers/net/xen-netback/xenbus.c > +++ b/drivers/net/xen-netback/xenbus.c > @@ -681,6 +681,9 @@ static int xen_register_watchers(struct xenbus_device > *dev, struct xenvif *vif) > char *node; > unsigned maxlen = strlen(dev->nodename) + sizeof("/rate"); > > + if (vif->credit_watch.node) > + return -EADDRINUSE; > + > node = kmalloc(maxlen, GFP_KERNEL); > if (!node) > return -ENOMEM; > @@ -770,6 +773,7 @@ static void connect(struct backend_info *be) > } > > xen_net_read_rate(dev, &credit_bytes, &credit_usec); > + xen_unregister_watchers(be->vif); > xen_register_watchers(dev, be->vif); > read_xenbus_vif_flags(be); > > -- > 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH 4/6] xen-netfront: add range check for Tx response id
On Mon, Apr 30, 2018 at 11:01:48PM +0200, Marek Marczykowski-Górecki wrote: > Tx response ID is fetched from shared page, so make sure it is sane > before using it as an array index. > > CC: sta...@vger.kernel.org > Signed-off-by: Marek Marczykowski-Górecki > --- > drivers/net/xen-netfront.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c > index 934b8a4..55c9b25 100644 > --- a/drivers/net/xen-netfront.c > +++ b/drivers/net/xen-netfront.c > @@ -394,6 +394,7 @@ static void xennet_tx_buf_gc(struct netfront_queue *queue) > continue; > > id = txrsp.id; > + BUG_ON(id >= NET_TX_RING_SIZE); It is better to use ARRAY_SIZE here. Wei.
Re: [Xen-devel] [RFC][PATCH] KVM: APPLES can improve the performance of applications and virtualized systems by up to 49%
On Sat, May 12, 2018 at 04:27:04PM +0800, Weiwei Jia wrote: > Dear all, > > Recently, we made a few improvements on effectively utilizing Pause > Loop Exiting (PLE) support for higher throughput on virtualized > systems. Basically, it solves two problems: 1) how to adjust > PLE_Window; 2) how to select virtual CPUs to schedule on VM_EXITs > caused by PLE. Our tests with standard benchmarks show that the > approach can improve performance by up to 49%. The approach shows > promising performance and is easy to implement. We think that it would > be wonderful if Linux/KVM and XEN can consider the approach. > > We already have a prototype implementation based on KVM (Linux Kernel > 3.19.8). Our patch for Linux Kernel 3.19.8 and the paper describing > our idea are available in Github repository [1][2][3]. We are pleased > to revise our patch in order to merge it into Linux/KVM and XEN. We > hope that you can test and adopt our approach/techniques. We are > pleased to get some comments/suggestions on the approach and on how > the idea can be adopted/tested by Linux/KVM and XEN. Thank you. > > [1] APPLES paper: https://github.com/sysmen/apples/tree/master/paper > [2] APPLES patch: > https://github.com/sysmen/apples/blob/master/patches/3.19.8-APPLES.patch > [3] APPLES patch README: > https://github.com/sysmen/apples/blob/master/patches/README.txt > Is PV spinlock involved in your test? There is no mention of it in your paper. Wei. > Best Regards, > Sysmen Research Group > > ___ > Xen-devel mailing list > xen-de...@lists.xenproject.org > https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [PATCH] xen-netback: fix occasional leak of grant ref mappings under memory pressure
On Thu, Feb 28, 2019 at 09:46:57AM +, Paul Durrant wrote: > > -Original Message- > > From: Igor Druzhinin [mailto:igor.druzhi...@citrix.com] > > Sent: 28 February 2019 02:03 > > To: xen-de...@lists.xenproject.org; net...@vger.kernel.org; > > linux-kernel@vger.kernel.org > > Cc: Wei Liu ; Paul Durrant ; > > da...@davemloft.net; Igor > > Druzhinin > > Subject: [PATCH] xen-netback: fix occasional leak of grant ref mappings > > under memory pressure > > > > Zero-copy callback flag is not yet set on frag list skb at the moment > > xenvif_handle_frag_list() returns -ENOMEM. This eventually results in > > leaking grant ref mappings since xenvif_zerocopy_callback() is never > > called for these fragments. Those eventually build up and cause Xen > > to kill Dom0 as the slots get reused for new mappings. > > > > That behavior is observed under certain workloads where sudden spikes > > of page cache usage for writes coexist with active atomic skb allocations. > > > > Signed-off-by: Igor Druzhinin > > --- > > drivers/net/xen-netback/netback.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/drivers/net/xen-netback/netback.c > > b/drivers/net/xen-netback/netback.c > > index 80aae3a..2023317 100644 > > --- a/drivers/net/xen-netback/netback.c > > +++ b/drivers/net/xen-netback/netback.c > > @@ -1146,9 +1146,12 @@ static int xenvif_tx_submit(struct xenvif_queue > > *queue) > > > > if (unlikely(skb_has_frag_list(skb))) { > > if (xenvif_handle_frag_list(queue, skb)) { > > + struct sk_buff *nskb = > > + skb_shinfo(skb)->frag_list; > > if (net_ratelimit()) > > netdev_err(queue->vif->dev, > >"Not enough memory to > > consolidate frag_list!\n"); > > + xenvif_skb_zerocopy_prepare(queue, nskb); > > xenvif_skb_zerocopy_prepare(queue, skb); > > kfree_skb(skb); > > continue; > > Whilst this fix will do the job, I think it would be better to get rid of the > kfree_skb() from inside xenvif_handle_frag_list() and always deal with it > here rather than having it happen in two different places. Something like the > following... +1 for having only one place. > > ---8<--- > diff --git a/drivers/net/xen-netback/netback.c > b/drivers/net/xen-netback/netback.c > index 80aae3a32c2a..093c7b860772 100644 > --- a/drivers/net/xen-netback/netback.c > +++ b/drivers/net/xen-netback/netback.c > @@ -1027,13 +1027,13 @@ static void xenvif_tx_build_gops(struct xenvif_queue > *queue, > /* Consolidate skb with a frag_list into a brand new one with local pages on > * frags. Returns 0 or -ENOMEM if can't allocate new pages. > */ > -static int xenvif_handle_frag_list(struct xenvif_queue *queue, struct > sk_buff *skb) > +static int xenvif_handle_frag_list(struct xenvif_queue *queue, struct > sk_buff *diff --git a/drivers/net/xen-netback/netback.c > b/drivers/net/xen-netback/netback.c > index 80aae3a32c2a..093c7b860772 100644 > --- a/drivers/net/xen-netback/netback.c > +++ b/drivers/net/xen-netback/netback.c > @@ -1027,13 +1027,13 @@ static void xenvif_tx_build_gops(struct xenvif_queue > *qu > eue, > /* Consolidate skb with a frag_list into a brand new one with local pages on > * frags. Returns 0 or -ENOMEM if can't allocate new pages. > */ > -static int xenvif_handle_frag_list(struct xenvif_queue *queue, struct > sk_buff * > skb) > +static int xenvif_handle_frag_list(struct xenvif_queue *queue, struct > sk_buff * > skb, > + struct sk_buff *nskb) > { > unsigned int offset = skb_headlen(skb); > skb_frag_t frags[MAX_SKB_FRAGS]; > int i, f; > struct ubuf_info *uarg; > - struct sk_buff *nskb = skb_shinfo(skb)->frag_list; > > queue->stats.tx_zerocopy_sent += 2; > queue->stats.tx_frag_overflow++; > @@ -1072,11 +1072,6 @@ static int xenvif_handle_frag_list(struct xenvif_queue > *q > ueue, struct sk_buff *s > skb_frag_size_set(&frags[i], len); > } > > - /* Copied all the bits from the frag list -- free it. */ > - skb_frag_list_init(skb); > - xenvif_skb_zerocopy_prepare(queue, nskb); > - kfree_skb(nskb); > - > /* Release all the original (foreign) frags. */ &g
Re: [PATCH] xen-netback: fix occasional leak of grant ref mappings under memory pressure
On Thu, Feb 28, 2019 at 12:07:07PM +, Paul Durrant wrote: > Yes, I meant kfree_skb(nskb). > In that case I think your patch looks fine. Wei.
Re: [PATCH v2] xen-netback: fix occasional leak of grant ref mappings under memory pressure
On Thu, Feb 28, 2019 at 12:48:03PM +, Igor Druzhinin wrote: > Zero-copy callback flag is not yet set on frag list skb at the moment > xenvif_handle_frag_list() returns -ENOMEM. This eventually results in > leaking grant ref mappings since xenvif_zerocopy_callback() is never > called for these fragments. Those eventually build up and cause Xen > to kill Dom0 as the slots get reused for new mappings: > > "d0v0 Attempt to implicitly unmap a granted PTE c01329fce005" > > That behavior is observed under certain workloads where sudden spikes > of page cache writes coexist with active atomic skb allocations from > network traffic. Additionally, rework the logic to deal with frag_list > deallocation in a single place. > > Signed-off-by: Paul Durrant > Signed-off-by: Igor Druzhinin Acked-by: Wei Liu > --- > drivers/net/xen-netback/netback.c | 10 +- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/drivers/net/xen-netback/netback.c > b/drivers/net/xen-netback/netback.c > index 80aae3a..f09948b 100644 > --- a/drivers/net/xen-netback/netback.c > +++ b/drivers/net/xen-netback/netback.c > @@ -1072,11 +1072,6 @@ static int xenvif_handle_frag_list(struct xenvif_queue > *queue, struct sk_buff *s > skb_frag_size_set(&frags[i], len); > } > > - /* Copied all the bits from the frag list -- free it. */ > - skb_frag_list_init(skb); > - xenvif_skb_zerocopy_prepare(queue, nskb); > - kfree_skb(nskb); > - > /* Release all the original (foreign) frags. */ > for (f = 0; f < skb_shinfo(skb)->nr_frags; f++) > skb_frag_unref(skb, f); > @@ -1145,6 +1140,8 @@ static int xenvif_tx_submit(struct xenvif_queue *queue) > xenvif_fill_frags(queue, skb); > > if (unlikely(skb_has_frag_list(skb))) { > + struct sk_buff *nskb = skb_shinfo(skb)->frag_list; > + xenvif_skb_zerocopy_prepare(queue, nskb); > if (xenvif_handle_frag_list(queue, skb)) { > if (net_ratelimit()) > netdev_err(queue->vif->dev, > @@ -1153,6 +1150,9 @@ static int xenvif_tx_submit(struct xenvif_queue *queue) > kfree_skb(skb); > continue; > } > + /* Copied all the bits from the frag list -- free it. */ > + skb_frag_list_init(skb); > + kfree_skb(nskb); > } > > skb->dev = queue->vif->dev; > -- > 2.7.4 >
Re: [PATCH v2 4/8] xen/netback: fix spurious event detection for common event case
On Thu, Feb 11, 2021 at 11:16:12AM +0100, Juergen Gross wrote: > In case of a common event for rx and tx queue the event should be > regarded to be spurious if no rx and no tx requests are pending. > > Unfortunately the condition for testing that is wrong causing to > decide a event being spurious if no rx OR no tx requests are > pending. > > Fix that plus using local variables for rx/tx pending indicators in > order to split function calls and if condition. > > Fixes: 23025393dbeb3b ("xen/netback: use lateeoi irq binding") > Signed-off-by: Juergen Gross Reviewed-by: Wei Liu
Re: [PATCH v5 09/16] x86/hyperv: provide a bunch of helper functions
On Tue, Jan 26, 2021 at 01:20:36AM +, Michael Kelley wrote: > From: Wei Liu Sent: Wednesday, January 20, 2021 4:01 AM [...] > > +#include > > + > > +#define HV_DEPOSIT_MAX_ORDER (8) > > +#define HV_DEPOSIT_MAX (1 << HV_DEPOSIT_MAX_ORDER) > > Is there any reason to not let the maximum be 511, which is > how many entries will fit on the hypercall input page? The > max could be define in terms of HY_HYP_PAGE_SIZE so that > the logical dependency is fully expressed. Let me try changing this. This file is largely authored by Lilian and Nuno. I don't see a particular reason why the value can't be larger. I've updated the value to the following. /* * See struct hv_deposit_memory. The first u64 is partition ID, the rest * are GPAs. */ #define HV_DEPOSIT_MAX (HV_HYP_PAGE_SIZE / sizeof(u64) - 1) Let's see how that goes. I will test it once I fix other places. > > > + > > +/* > > + * Deposits exact number of pages > > + * Must be called with interrupts enabled > > + * Max 256 pages > > + */ > > +int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages) > > +{ > > + struct page **pages; > > + int *counts; > > + int num_allocations; > > + int i, j, page_count; > > + int order; > > + int desired_order; > > + u16 status; > > + int ret; > > + u64 base_pfn; > > + struct hv_deposit_memory *input_page; > > + unsigned long flags; > > + > > + if (num_pages > HV_DEPOSIT_MAX) > > + return -E2BIG; > > + if (!num_pages) > > + return 0; > > + > > + /* One buffer for page pointers and counts */ > > + pages = page_address(alloc_page(GFP_KERNEL)); > > + if (!pages) > > Does the above check work? If alloc_pages() returns NULL, it looks like > page_address() might fault. > Good catch. Fixed. > > + return -ENOMEM; > > + > > + counts = kcalloc(HV_DEPOSIT_MAX, sizeof(int), GFP_KERNEL); > > + if (!counts) { > > + free_page((unsigned long)pages); > > + return -ENOMEM; > > + } > > + > > + /* Allocate all the pages before disabling interrupts */ > > + num_allocations = 0; > > + i = 0; > > + order = HV_DEPOSIT_MAX_ORDER; > > + > > + while (num_pages) { > > + /* Find highest order we can actually allocate */ > > + desired_order = 31 - __builtin_clz(num_pages); > > + order = min(desired_order, order); > > The above seems redundant since request sizes larger than the > max have already been rejected. > min(...) can be dropped. > > + do { > > + pages[i] = alloc_pages_node(node, GFP_KERNEL, order); > > + if (!pages[i]) { > > + if (!order) { > > + ret = -ENOMEM; > > + goto err_free_allocations; > > + } > > + --order; > > + } > > + } while (!pages[i]); > > The duplicative test of !pages[i] is somewhat annoying. How about > this: > > while{!pages[i] = alloc_pages_node(node, GFP_KERNEL, order) { > if (!order) { > ret = -ENOMEM; > goto err_free_allocations; > } > --order; > } > > or if you don't like doing an assignment in the while test: > > while(1) { > pages[i] = alloc_pages_node(node, GFP_KERNEL, order); > if (page[i]) > break; > if (!order) { > ret = -ENOMEM; > goto err_free_allocations; > } > --order; > } > I will use this variant. > > + > > + split_page(pages[i], order); > > + counts[i] = 1 << order; > > + num_pages -= counts[i]; > > + i++; > > + num_allocations++; > > Incrementing both I and num_allocations in the loop seems > redundant, especially since num_allocations isn't used in the loop. > Could num_allocations be assigned the value of i once the loop > is exited? (and num_allocations would not need to be initialized to 0.) > Would also have to do the assignment in the error case. > Yes. That can be done. > > + } > > + > > + l
Re: [PATCH] xen/netback: avoid race in xenvif_rx_ring_slots_available()
On Tue, Feb 02, 2021 at 08:09:38AM +0100, Juergen Gross wrote: > Since commit 23025393dbeb3b8b3 ("xen/netback: use lateeoi irq binding") > xenvif_rx_ring_slots_available() is no longer called only from the rx > queue kernel thread, so it needs to access the rx queue with the > associated queue held. > > Reported-by: Igor Druzhinin > Fixes: 23025393dbeb3b8b3 ("xen/netback: use lateeoi irq binding") > Cc: sta...@vger.kernel.org > Signed-off-by: Juergen Gross Acked-by: Wei Liu
Re: [PATCH v5 13/16] asm-generic/hyperv: introduce hv_device_id and auxiliary structures
On Tue, Jan 26, 2021 at 01:26:52AM +, Michael Kelley wrote: > From: Wei Liu Sent: Wednesday, January 20, 2021 4:01 AM > > > > We will need to identify the device we want Microsoft Hypervisor to > > manipulate. Introduce the data structures for that purpose. > > > > They will be used in a later patch. > > > > Signed-off-by: Sunil Muthuswamy > > Co-Developed-by: Sunil Muthuswamy > > Signed-off-by: Wei Liu > > --- > > include/asm-generic/hyperv-tlfs.h | 79 +++ > > 1 file changed, 79 insertions(+) > > > > diff --git a/include/asm-generic/hyperv-tlfs.h > > b/include/asm-generic/hyperv-tlfs.h > > index 8423bf53c237..42ff1326c6bd 100644 > > --- a/include/asm-generic/hyperv-tlfs.h > > +++ b/include/asm-generic/hyperv-tlfs.h > > @@ -623,4 +623,83 @@ struct hv_set_vp_registers_input { > > } element[]; > > } __packed; > > > > +enum hv_device_type { > > + HV_DEVICE_TYPE_LOGICAL = 0, > > + HV_DEVICE_TYPE_PCI = 1, > > + HV_DEVICE_TYPE_IOAPIC = 2, > > + HV_DEVICE_TYPE_ACPI = 3, > > +}; > > + > > +typedef u16 hv_pci_rid; > > +typedef u16 hv_pci_segment; > > +typedef u64 hv_logical_device_id; > > +union hv_pci_bdf { > > + u16 as_uint16; > > + > > + struct { > > + u8 function:3; > > + u8 device:5; > > + u8 bus; > > + }; > > +} __packed; > > + > > +union hv_pci_bus_range { > > + u16 as_uint16; > > + > > + struct { > > + u8 subordinate_bus; > > + u8 secondary_bus; > > + }; > > +} __packed; > > + > > +union hv_device_id { > > + u64 as_uint64; > > + > > + struct { > > + u64 :62; > > + u64 device_type:2; > > + }; > > Are the above 4 lines extraneous junk? > If not, a comment would be helpful. And we > would normally label the 62 bit field as > "reserved0" or something similar. > No. It is not junk. I got this from a header in tree. I am inclined to just drop this hunk. If that breaks things, I will use "reserved0". Wei.
Re: [PATCH v5 15/16] x86/hyperv: implement an MSI domain for root partition
On Wed, Jan 27, 2021 at 05:47:04AM +, Michael Kelley wrote: > From: Wei Liu Sent: Wednesday, January 20, 2021 4:01 AM > > > > When Linux runs as the root partition on Microsoft Hypervisor, its > > interrupts are remapped. Linux will need to explicitly map and unmap > > interrupts for hardware. > > > > Implement an MSI domain to issue the correct hypercalls. And initialize > > this irqdomain as the default MSI irq domain. > > > > Signed-off-by: Sunil Muthuswamy > > Co-Developed-by: Sunil Muthuswamy > > Signed-off-by: Wei Liu > > --- > > v4: Fix compilation issue when CONFIG_PCI_MSI is not set. > > v3: build irqdomain.o for 32bit as well. > > I'm not clear on the intent for 32-bit builds. Given that hv_proc.c is built > only for 64-bit, I'm assuming running Linux in the root partition > is only functional for 64-bit builds. So is the goal simply that 32-bit > builds will compile correctly? Seems like maybe there should be > a CONFIG option for running Linux in the root partition, and that > option would force 64-bit. To ensure 32 bit kernel builds and 32 bit guests still work. The config option ROOT_API is to be introduced by Nuno's /dev/mshv series. We can use that option to gate some objects when that's available. > [...] > > diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c > > new file mode 100644 > > index ..19637cd60231 > > --- /dev/null > > +++ b/arch/x86/hyperv/irqdomain.c > > @@ -0,0 +1,332 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > +// > > +// Irqdomain for Linux to run as the root partition on Microsoft > > Hypervisor. > > +// > > +// Authors: > > +// Sunil Muthuswamy > > +// Wei Liu > > I think the // comment style should only be used for the SPDX line. Fixed. > > > + > > +#include > > +#include > > +#include > > + [...] > > +static int hv_map_msi_interrupt(struct pci_dev *dev, int vcpu, int vector, > > + struct hv_interrupt_entry *entry) > > +{ > > + struct hv_input_map_device_interrupt *input; > > + struct hv_output_map_device_interrupt *output; > > + struct hv_device_interrupt_descriptor *intr_desc; > > + unsigned long flags; > > + u16 status; > > + > > + local_irq_save(flags); > > + > > + input = *this_cpu_ptr(hyperv_pcpu_input_arg); > > + output = *this_cpu_ptr(hyperv_pcpu_output_arg); > > + > > + intr_desc = &input->interrupt_descriptor; > > + memset(input, 0, sizeof(*input)); > > + input->partition_id = hv_current_partition_id; > > + input->device_id = hv_build_pci_dev_id(dev).as_uint64; > > + intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED; > > + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE; > > + intr_desc->vector_count = 1; > > + intr_desc->target.vector = vector; > > + __set_bit(vcpu, (unsigned long*)&intr_desc->target.vp_mask); > > This is using the CPU bitmap format that supports up to 64 vCPUs. Any reason > not > to use the format that supports a larger number of CPUs? In either case, > perhaps > a check for the value of vcpu against the max of 64 (or the larger number if > you > change the bitmap format) would be appropriate. > This is mostly due to we didn't have a suitably large machine during development. I will see if this can use vpset instead. > > + > > + status = hv_do_rep_hypercall(HVCALL_MAP_DEVICE_INTERRUPT, 0, 0, input, > > output) & > > +HV_HYPERCALL_RESULT_MASK; > > + *entry = output->interrupt_entry; > > + > > + local_irq_restore(flags); > > + > > + if (status != HV_STATUS_SUCCESS) > > + pr_err("%s: hypercall failed, status %d\n", __func__, status); > > + > > + return status; > > +} > > + [...] > > +static int hv_unmap_msi_interrupt(struct pci_dev *dev, struct > > hv_interrupt_entry > > *old_entry) > > +{ > > + return hv_unmap_interrupt(hv_build_pci_dev_id(dev).as_uint64, old_entry) > > + & HV_HYPERCALL_RESULT_MASK; > > The masking with HV_HYPERCALL_RESULT_MASK is already done in > hv_unmap_interrupt(). > Fixed. Wei.
Re: [PATCH v5 15/16] x86/hyperv: implement an MSI domain for root partition
On Tue, Feb 02, 2021 at 06:15:23PM +, Michael Kelley wrote: > From: Wei Liu Sent: Tuesday, February 2, 2021 9:32 AM > > > > On Wed, Jan 27, 2021 at 05:47:04AM +, Michael Kelley wrote: > > > From: Wei Liu Sent: Wednesday, January 20, 2021 4:01 > > > AM > > > > > > > > When Linux runs as the root partition on Microsoft Hypervisor, its > > > > interrupts are remapped. Linux will need to explicitly map and unmap > > > > interrupts for hardware. > > > > > > > > Implement an MSI domain to issue the correct hypercalls. And initialize > > > > this irqdomain as the default MSI irq domain. > > > > > > > > Signed-off-by: Sunil Muthuswamy > > > > Co-Developed-by: Sunil Muthuswamy > > > > Signed-off-by: Wei Liu > > > > --- > > > > v4: Fix compilation issue when CONFIG_PCI_MSI is not set. > > > > v3: build irqdomain.o for 32bit as well. > > > > > > I'm not clear on the intent for 32-bit builds. Given that hv_proc.c is > > > built > > > only for 64-bit, I'm assuming running Linux in the root partition > > > is only functional for 64-bit builds. So is the goal simply that 32-bit > > > builds will compile correctly? Seems like maybe there should be > > > a CONFIG option for running Linux in the root partition, and that > > > option would force 64-bit. > > > > To ensure 32 bit kernel builds and 32 bit guests still work. > > > > The config option ROOT_API is to be introduced by Nuno's /dev/mshv > > series. We can use that option to gate some objects when that's > > available. > > > > But just so I'm 100% clear, is there intent to run 32-bit Linux in the root > partition? I'm assuming not. That's correct. There is no intent to run 32-bit Linux as the root partition. Wei. > > Michael
Re: [PATCH v5 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition
On Wed, Jan 27, 2021 at 05:47:08AM +, Michael Kelley wrote: > From: Wei Liu Sent: Wednesday, January 20, 2021 4:01 AM > > > > Just like MSI/MSI-X, IO-APIC interrupts are remapped by Microsoft > > Hypervisor when Linux runs as the root partition. Implement an IRQ > > domain to handle mapping and unmapping of IO-APIC interrupts. > > > > Signed-off-by: Wei Liu > > --- > > arch/x86/hyperv/irqdomain.c | 54 ++ > > arch/x86/include/asm/mshyperv.h | 4 + > > drivers/iommu/hyperv-iommu.c| 179 +++- > > 3 files changed, 233 insertions(+), 4 deletions(-) > > > > diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c > > index 19637cd60231..8e2b4e478b70 100644 > > --- a/arch/x86/hyperv/irqdomain.c > > +++ b/arch/x86/hyperv/irqdomain.c > > @@ -330,3 +330,57 @@ struct irq_domain * __init > > hv_create_pci_msi_domain(void) > > } > > > > #endif /* CONFIG_PCI_MSI */ > > + > > +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry > > *entry) > > +{ > > + union hv_device_id device_id; > > + > > + device_id.as_uint64 = 0; > > + device_id.device_type = HV_DEVICE_TYPE_IOAPIC; > > + device_id.ioapic.ioapic_id = (u8)ioapic_id; > > + > > + return hv_unmap_interrupt(device_id.as_uint64, entry) & > > HV_HYPERCALL_RESULT_MASK; > > The masking is already done in hv_unmap_interrupt. Fixed. > > > +} > > +EXPORT_SYMBOL_GPL(hv_unmap_ioapic_interrupt); > > + > > +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int > > vector, > > + struct hv_interrupt_entry *entry) > > +{ > > + unsigned long flags; > > + struct hv_input_map_device_interrupt *input; > > + struct hv_output_map_device_interrupt *output; > > + union hv_device_id device_id; > > + struct hv_device_interrupt_descriptor *intr_desc; > > + u16 status; > > + > > + device_id.as_uint64 = 0; > > + device_id.device_type = HV_DEVICE_TYPE_IOAPIC; > > + device_id.ioapic.ioapic_id = (u8)ioapic_id; > > + > > + local_irq_save(flags); > > + input = *this_cpu_ptr(hyperv_pcpu_input_arg); > > + output = *this_cpu_ptr(hyperv_pcpu_output_arg); > > + memset(input, 0, sizeof(*input)); > > + intr_desc = &input->interrupt_descriptor; > > + input->partition_id = hv_current_partition_id; > > + input->device_id = device_id.as_uint64; > > + intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED; > > + intr_desc->target.vector = vector; > > + intr_desc->vector_count = 1; > > + > > + if (level) > > + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_LEVEL; > > + else > > + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE; > > + > > + __set_bit(vcpu, (unsigned long *)&intr_desc->target.vp_mask); > > + > > + status = hv_do_rep_hypercall(HVCALL_MAP_DEVICE_INTERRUPT, 0, 0, input, > > output) & > > +HV_HYPERCALL_RESULT_MASK; > > + local_irq_restore(flags); > > + > > + *entry = output->interrupt_entry; > > + > > + return status; > > As a cross-check, I was comparing this code against hv_map_msi_interrupt(). > They are > mostly parallel, though some of the assignments are done in a different > order. It's a nit, > but making them as parallel as possible would be nice. :-) > Indeed. I will see about factoring out a function. > Same 64 vCPU comment applies here as well. > This is changed to use vpset instead. Took me a bit of time to get it working because document is a bit lacking. > > > +} > > +EXPORT_SYMBOL_GPL(hv_map_ioapic_interrupt); > > diff --git a/arch/x86/include/asm/mshyperv.h > > b/arch/x86/include/asm/mshyperv.h > > index ccc849e25d5e..345d7c6f8c37 100644 > > --- a/arch/x86/include/asm/mshyperv.h > > +++ b/arch/x86/include/asm/mshyperv.h > > @@ -263,6 +263,10 @@ static inline void hv_set_msi_entry_from_desc(union > > hv_msi_entry *msi_entry, > > > > struct irq_domain *hv_create_pci_msi_domain(void); > > > > +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int > > vector, > > + struct hv_interrupt_entry *entry); > > +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry > > *entry); > > + > > #else /* CONFIG_HYPERV */ > > static inline void hyperv_init(void) {} > > static inline void hyperv_setup_mmu_ops(void) {} > > diff --
Re: [PATCH v5 13/16] asm-generic/hyperv: introduce hv_device_id and auxiliary structures
On Tue, Feb 02, 2021 at 05:02:48PM +, Wei Liu wrote: > On Tue, Jan 26, 2021 at 01:26:52AM +, Michael Kelley wrote: > > From: Wei Liu Sent: Wednesday, January 20, 2021 4:01 AM > > > > > > We will need to identify the device we want Microsoft Hypervisor to > > > manipulate. Introduce the data structures for that purpose. > > > > > > They will be used in a later patch. > > > > > > Signed-off-by: Sunil Muthuswamy > > > Co-Developed-by: Sunil Muthuswamy > > > Signed-off-by: Wei Liu > > > --- > > > include/asm-generic/hyperv-tlfs.h | 79 +++ > > > 1 file changed, 79 insertions(+) > > > > > > diff --git a/include/asm-generic/hyperv-tlfs.h > > > b/include/asm-generic/hyperv-tlfs.h > > > index 8423bf53c237..42ff1326c6bd 100644 > > > --- a/include/asm-generic/hyperv-tlfs.h > > > +++ b/include/asm-generic/hyperv-tlfs.h > > > @@ -623,4 +623,83 @@ struct hv_set_vp_registers_input { > > > } element[]; > > > } __packed; > > > > > > +enum hv_device_type { > > > + HV_DEVICE_TYPE_LOGICAL = 0, > > > + HV_DEVICE_TYPE_PCI = 1, > > > + HV_DEVICE_TYPE_IOAPIC = 2, > > > + HV_DEVICE_TYPE_ACPI = 3, > > > +}; > > > + > > > +typedef u16 hv_pci_rid; > > > +typedef u16 hv_pci_segment; > > > +typedef u64 hv_logical_device_id; > > > +union hv_pci_bdf { > > > + u16 as_uint16; > > > + > > > + struct { > > > + u8 function:3; > > > + u8 device:5; > > > + u8 bus; > > > + }; > > > +} __packed; > > > + > > > +union hv_pci_bus_range { > > > + u16 as_uint16; > > > + > > > + struct { > > > + u8 subordinate_bus; > > > + u8 secondary_bus; > > > + }; > > > +} __packed; > > > + > > > +union hv_device_id { > > > + u64 as_uint64; > > > + > > > + struct { > > > + u64 :62; > > > + u64 device_type:2; > > > + }; > > > > Are the above 4 lines extraneous junk? > > If not, a comment would be helpful. And we > > would normally label the 62 bit field as > > "reserved0" or something similar. > > > > No. It is not junk. I got this from a header in tree. > > I am inclined to just drop this hunk. If that breaks things, I will use > "reserved0". > It turns out adding reserved0 is required. Dropping this hunk does not work. Wei.
Re: [PATCH v5 13/16] asm-generic/hyperv: introduce hv_device_id and auxiliary structures
On Wed, Feb 03, 2021 at 02:49:53PM +0100, Arnd Bergmann wrote: > On Wed, Feb 3, 2021 at 2:26 PM Wei Liu wrote: > > On Tue, Feb 02, 2021 at 05:02:48PM +0000, Wei Liu wrote: > > > On Tue, Jan 26, 2021 at 01:26:52AM +, Michael Kelley wrote: > > > > From: Wei Liu Sent: Wednesday, January 20, 2021 > > > > 4:01 AM > > > > > +union hv_device_id { > > > > > + u64 as_uint64; > > > > > + > > > > > + struct { > > > > > + u64 :62; > > > > > + u64 device_type:2; > > > > > + }; > > > > > > > > Are the above 4 lines extraneous junk? > > > > If not, a comment would be helpful. And we > > > > would normally label the 62 bit field as > > > > "reserved0" or something similar. > > > > > > > > > > No. It is not junk. I got this from a header in tree. > > > > > > I am inclined to just drop this hunk. If that breaks things, I will use > > > "reserved0". > > > > > > > It turns out adding reserved0 is required. Dropping this hunk does not > > work. > > Generally speaking, bitfields are not great for specifying binary interfaces, > as the actual bit order can differ by architecture. The normal way we get > around it in the kernel is to use basic integer types and define macros > for bit masks. Ideally, each such field should also be marked with a > particular endianess as __le64 or __be64, in case this is ever used with > an Arm guest running a big-endian kernel. Thanks for the information. I think we will need to wait until Microsoft Hypervisor clearly defines the endianess in its header(s) before we can make changes to the copy in Linux. > > That said, if you do not care about the specific order of the bits, having > anonymous bitfields for the reserved members is fine, I don't see a > reason to name it as reserved. Michael, let me know what you think. I'm not too fussed either way. Wei. > > Arnd
[PATCH v6 00/16] Introducing Linux root partition support for Microsoft Hypervisor
Hi all Here we propose this patch series to make Linux run as the root partition [0] on Microsoft Hypervisor [1]. There will be a subsequent patch series to provide a device node (/dev/mshv) such that userspace programs can create and run virtual machines. We've also ported Cloud Hypervisor [3] over and have been able to boot a Linux guest with Virtio devices since late July 2020. This series implements only the absolutely necessary components to get things running. A large portion of this series consists of patches that augment hyperv-tlfs.h. They should be rather uncontroversial and can be applied right away. A few key things other than the changes to hyperv-tlfs.h: 1. Linux needs to setup existing Hyper-V facilities differently. 2. Linux needs to make a few hypercalls to bring up APs. 3. Interrupts are remapped by IOMMU, which is controlled by the hypervisor. Linux needs to make hypercalls to map and unmap interrupts. This is done by introducing a new MSI irqdomain and extending the remapping domain in hyperv-iommu. This series is now based on 5.11-rc2. Comments and suggestions are welcome. Thanks, Wei. [0] Just think of it like Xen's Dom0. [1] Hyper-V is more well-known, but it really refers to the whole stack including the hypervisor and other components that run in Windows kernel and userspace. [3] https://github.com/cloud-hypervisor/ Cc: sa...@linux.intel.com Cc: robert.bradf...@intel.com Cc: sebastien.bo...@intel.com Changes since v5: 1. Address Michael's comments. 2. Further improve and simplify code. 3. Drop a redundant patch and add one new patch for ACPI / NUMA code. Changes since v4: 1. Rework IO-APIC handling. Changes since v3: 1. Fix compilation errors. 2. Adapt to upstream changes. Changes since v2: 1. Address more comments from Vitaly. 2. Fix and test 32bit build. Changes since v1: 1. Simplify MSI IRQ domain implementation. 2. Address Vitaly's comments. Wei Liu (16): asm-generic/hyperv: change HV_CPU_POWER_MANAGEMENT to HV_CPU_MANAGEMENT x86/hyperv: detect if Linux is the root partition Drivers: hv: vmbus: skip VMBus initialization if Linux is root clocksource/hyperv: use MSR-based access if running as root x86/hyperv: allocate output arg pages if required x86/hyperv: extract partition ID from Microsoft Hypervisor if necessary x86/hyperv: handling hypercall page setup for root ACPI / NUMA: add a stub function for node_to_pxm() x86/hyperv: provide a bunch of helper functions x86/hyperv: implement and use hv_smp_prepare_cpus asm-generic/hyperv: update hv_msi_entry asm-generic/hyperv: update hv_interrupt_entry asm-generic/hyperv: introduce hv_device_id and auxiliary structures asm-generic/hyperv: import data structures for mapping device interrupts x86/hyperv: implement an MSI domain for root partition iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition arch/x86/hyperv/Makefile| 4 +- arch/x86/hyperv/hv_init.c | 107 +++- arch/x86/hyperv/hv_proc.c | 219 arch/x86/hyperv/irqdomain.c | 387 arch/x86/include/asm/hyperv-tlfs.h | 23 ++ arch/x86/include/asm/mshyperv.h | 19 +- arch/x86/kernel/cpu/mshyperv.c | 49 drivers/clocksource/hyperv_timer.c | 3 + drivers/hv/vmbus_drv.c | 3 + drivers/iommu/hyperv-iommu.c| 177 - drivers/pci/controller/pci-hyperv.c | 2 +- include/acpi/acpi_numa.h| 4 + include/asm-generic/hyperv-tlfs.h | 254 +- 13 files changed, 1230 insertions(+), 21 deletions(-) create mode 100644 arch/x86/hyperv/hv_proc.c create mode 100644 arch/x86/hyperv/irqdomain.c base-commit: e71ba9452f0b5b2e8dc8aa5445198cd9214a6a62 -- 2.20.1
[PATCH v6 06/16] x86/hyperv: extract partition ID from Microsoft Hypervisor if necessary
We will need the partition ID for executing some hypercalls later. Signed-off-by: Lillian Grassin-Drake Co-Developed-by: Sunil Muthuswamy Signed-off-by: Wei Liu --- v6: 1. Use u64 status. v3: 1. Make hv_get_partition_id static. 2. Change code structure a bit. --- arch/x86/hyperv/hv_init.c | 26 ++ arch/x86/include/asm/mshyperv.h | 2 ++ include/asm-generic/hyperv-tlfs.h | 6 ++ 3 files changed, 34 insertions(+) diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c index 6f4cb40e53fe..5b90a7290177 100644 --- a/arch/x86/hyperv/hv_init.c +++ b/arch/x86/hyperv/hv_init.c @@ -26,6 +26,9 @@ #include #include +u64 hv_current_partition_id = ~0ull; +EXPORT_SYMBOL_GPL(hv_current_partition_id); + void *hv_hypercall_pg; EXPORT_SYMBOL_GPL(hv_hypercall_pg); @@ -331,6 +334,24 @@ static struct syscore_ops hv_syscore_ops = { .resume = hv_resume, }; +static void __init hv_get_partition_id(void) +{ + struct hv_get_partition_id *output_page; + u64 status; + unsigned long flags; + + local_irq_save(flags); + output_page = *this_cpu_ptr(hyperv_pcpu_output_arg); + status = hv_do_hypercall(HVCALL_GET_PARTITION_ID, NULL, output_page); + if ((status & HV_HYPERCALL_RESULT_MASK) != HV_STATUS_SUCCESS) { + /* No point in proceeding if this failed */ + pr_err("Failed to get partition ID: %lld\n", status); + BUG(); + } + hv_current_partition_id = output_page->partition_id; + local_irq_restore(flags); +} + /* * This function is to be invoked early in the boot sequence after the * hypervisor has been detected. @@ -426,6 +447,11 @@ void __init hyperv_init(void) register_syscore_ops(&hv_syscore_ops); + if (cpuid_ebx(HYPERV_CPUID_FEATURES) & HV_ACCESS_PARTITION_ID) + hv_get_partition_id(); + + BUG_ON(hv_root_partition && hv_current_partition_id == ~0ull); + return; remove_cpuhp_state: diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h index 62d9390f1ddf..67f5d35a73d3 100644 --- a/arch/x86/include/asm/mshyperv.h +++ b/arch/x86/include/asm/mshyperv.h @@ -78,6 +78,8 @@ extern void *hv_hypercall_pg; extern void __percpu **hyperv_pcpu_input_arg; extern void __percpu **hyperv_pcpu_output_arg; +extern u64 hv_current_partition_id; + static inline u64 hv_do_hypercall(u64 control, void *input, void *output) { u64 input_address = input ? virt_to_phys(input) : 0; diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h index e6903589a82a..87b1a79b19eb 100644 --- a/include/asm-generic/hyperv-tlfs.h +++ b/include/asm-generic/hyperv-tlfs.h @@ -141,6 +141,7 @@ struct ms_hyperv_tsc_page { #define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX 0x0013 #define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX 0x0014 #define HVCALL_SEND_IPI_EX 0x0015 +#define HVCALL_GET_PARTITION_ID0x0046 #define HVCALL_GET_VP_REGISTERS0x0050 #define HVCALL_SET_VP_REGISTERS0x0051 #define HVCALL_POST_MESSAGE0x005c @@ -407,6 +408,11 @@ struct hv_tlb_flush_ex { u64 gva_list[]; } __packed; +/* HvGetPartitionId hypercall (output only) */ +struct hv_get_partition_id { + u64 partition_id; +} __packed; + /* HvRetargetDeviceInterrupt hypercall */ union hv_msi_entry { u64 as_uint64; -- 2.20.1
[PATCH v6 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition
Just like MSI/MSI-X, IO-APIC interrupts are remapped by Microsoft Hypervisor when Linux runs as the root partition. Implement an IRQ domain to handle mapping and unmapping of IO-APIC interrupts. Signed-off-by: Wei Liu --- v6: 1. Simplify code due to changes in a previous patch. --- arch/x86/hyperv/irqdomain.c | 25 + arch/x86/include/asm/mshyperv.h | 4 + drivers/iommu/hyperv-iommu.c| 177 +++- 3 files changed, 203 insertions(+), 3 deletions(-) diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c index 117f17e8c88a..0cabc9aece38 100644 --- a/arch/x86/hyperv/irqdomain.c +++ b/arch/x86/hyperv/irqdomain.c @@ -360,3 +360,28 @@ struct irq_domain * __init hv_create_pci_msi_domain(void) } #endif /* CONFIG_PCI_MSI */ + +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry) +{ + union hv_device_id device_id; + + device_id.as_uint64 = 0; + device_id.device_type = HV_DEVICE_TYPE_IOAPIC; + device_id.ioapic.ioapic_id = (u8)ioapic_id; + + return hv_unmap_interrupt(device_id.as_uint64, entry); +} +EXPORT_SYMBOL_GPL(hv_unmap_ioapic_interrupt); + +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int cpu, int vector, + struct hv_interrupt_entry *entry) +{ + union hv_device_id device_id; + + device_id.as_uint64 = 0; + device_id.device_type = HV_DEVICE_TYPE_IOAPIC; + device_id.ioapic.ioapic_id = (u8)ioapic_id; + + return hv_map_interrupt(device_id, level, cpu, vector, entry); +} +EXPORT_SYMBOL_GPL(hv_map_ioapic_interrupt); diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h index ccc849e25d5e..345d7c6f8c37 100644 --- a/arch/x86/include/asm/mshyperv.h +++ b/arch/x86/include/asm/mshyperv.h @@ -263,6 +263,10 @@ static inline void hv_set_msi_entry_from_desc(union hv_msi_entry *msi_entry, struct irq_domain *hv_create_pci_msi_domain(void); +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector, + struct hv_interrupt_entry *entry); +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry); + #else /* CONFIG_HYPERV */ static inline void hyperv_init(void) {} static inline void hyperv_setup_mmu_ops(void) {} diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c index 1d21a0b5f724..e285a220c913 100644 --- a/drivers/iommu/hyperv-iommu.c +++ b/drivers/iommu/hyperv-iommu.c @@ -20,6 +20,7 @@ #include #include #include +#include #include "irq_remapping.h" @@ -115,30 +116,43 @@ static const struct irq_domain_ops hyperv_ir_domain_ops = { .free = hyperv_irq_remapping_free, }; +static const struct irq_domain_ops hyperv_root_ir_domain_ops; static int __init hyperv_prepare_irq_remapping(void) { struct fwnode_handle *fn; int i; + const char *name; + const struct irq_domain_ops *ops; if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) || x86_init.hyper.msi_ext_dest_id() || !x2apic_supported()) return -ENODEV; - fn = irq_domain_alloc_named_id_fwnode("HYPERV-IR", 0); + if (hv_root_partition) { + name = "HYPERV-ROOT-IR"; + ops = &hyperv_root_ir_domain_ops; + } else { + name = "HYPERV-IR"; + ops = &hyperv_ir_domain_ops; + } + + fn = irq_domain_alloc_named_id_fwnode(name, 0); if (!fn) return -ENOMEM; ioapic_ir_domain = irq_domain_create_hierarchy(arch_get_ir_parent_domain(), - 0, IOAPIC_REMAPPING_ENTRY, fn, - &hyperv_ir_domain_ops, NULL); + 0, IOAPIC_REMAPPING_ENTRY, fn, ops, NULL); if (!ioapic_ir_domain) { irq_domain_free_fwnode(fn); return -ENOMEM; } + if (hv_root_partition) + return 0; /* The rest is only relevant to guests */ + /* * Hyper-V doesn't provide irq remapping function for * IO-APIC and so IO-APIC only accepts 8-bit APIC ID. @@ -166,4 +180,161 @@ struct irq_remap_ops hyperv_irq_remap_ops = { .enable = hyperv_enable_irq_remapping, }; +/* IRQ remapping domain when Linux runs as the root partition */ +struct hyperv_root_ir_data { + u8 ioapic_id; + bool is_level; + struct hv_interrupt_entry entry; +}; + +static void +hyperv_root_ir_compose_msi_msg(struct irq_data *irq_data, struct msi_msg *msg) +{ + u64 status; + u32 vector; + struct irq_cfg *cfg; + int ioapic_id; + struct cpumask *affinity; + int cpu; + struct hv_interrupt_entry entry; + struct hyperv_root_ir_data *data = irq_data->chip_data; + struct IO_APIC_route_entry e; + + cfg = irqd_cfg(irq_data); + affinity =
[PATCH v6 07/16] x86/hyperv: handling hypercall page setup for root
When Linux is running as the root partition, the hypercall page will have already been setup by Hyper-V. Copy the content over to the allocated page. Add checks to hv_suspend & co to bail early because they are not supported in this setup yet. Signed-off-by: Lillian Grassin-Drake Signed-off-by: Sunil Muthuswamy Signed-off-by: Nuno Das Neves Co-Developed-by: Lillian Grassin-Drake Co-Developed-by: Sunil Muthuswamy Co-Developed-by: Nuno Das Neves Signed-off-by: Wei Liu Reviewed-by: Michael Kelley --- v3: 1. Use HV_HYP_PAGE_SIZE. 2. Add checks to hv_suspend & co. --- arch/x86/hyperv/hv_init.c | 37 ++--- 1 file changed, 34 insertions(+), 3 deletions(-) diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c index 5b90a7290177..11c5997691f4 100644 --- a/arch/x86/hyperv/hv_init.c +++ b/arch/x86/hyperv/hv_init.c @@ -25,6 +25,7 @@ #include #include #include +#include u64 hv_current_partition_id = ~0ull; EXPORT_SYMBOL_GPL(hv_current_partition_id); @@ -283,6 +284,9 @@ static int hv_suspend(void) union hv_x64_msr_hypercall_contents hypercall_msr; int ret; + if (hv_root_partition) + return -EPERM; + /* * Reset the hypercall page as it is going to be invalidated * accross hibernation. Setting hv_hypercall_pg to NULL ensures @@ -432,8 +436,35 @@ void __init hyperv_init(void) rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64); hypercall_msr.enable = 1; - hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg); - wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64); + + if (hv_root_partition) { + struct page *pg; + void *src, *dst; + + /* +* For the root partition, the hypervisor will set up its +* hypercall page. The hypervisor guarantees it will not show +* up in the root's address space. The root can't change the +* location of the hypercall page. +* +* Order is important here. We must enable the hypercall page +* so it is populated with code, then copy the code to an +* executable page. +*/ + wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64); + + pg = vmalloc_to_page(hv_hypercall_pg); + dst = kmap(pg); + src = memremap(hypercall_msr.guest_physical_address << PAGE_SHIFT, PAGE_SIZE, + MEMREMAP_WB); + BUG_ON(!(src && dst)); + memcpy(dst, src, HV_HYP_PAGE_SIZE); + memunmap(src); + kunmap(pg); + } else { + hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg); + wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64); + } /* * Ignore any errors in setting up stimer clockevents @@ -576,6 +607,6 @@ EXPORT_SYMBOL_GPL(hv_is_hyperv_initialized); bool hv_is_hibernation_supported(void) { - return acpi_sleep_state_supported(ACPI_STATE_S4); + return !hv_root_partition && acpi_sleep_state_supported(ACPI_STATE_S4); } EXPORT_SYMBOL_GPL(hv_is_hibernation_supported); -- 2.20.1
[PATCH v6 15/16] x86/hyperv: implement an MSI domain for root partition
When Linux runs as the root partition on Microsoft Hypervisor, its interrupts are remapped. Linux will need to explicitly map and unmap interrupts for hardware. Implement an MSI domain to issue the correct hypercalls. And initialize this irqdomain as the default MSI irq domain. Signed-off-by: Sunil Muthuswamy Co-Developed-by: Sunil Muthuswamy Signed-off-by: Wei Liu --- v6: 1. Use u64 status. 2. Use vpset instead of bitmap. 3. Factor out hv_map_interrupt 4. Address other misc comments. v4: Fix compilation issue when CONFIG_PCI_MSI is not set. v3: build irqdomain.o for 32bit as well. v2: This patch is simplified due to upstream changes. --- arch/x86/hyperv/Makefile| 2 +- arch/x86/hyperv/hv_init.c | 9 + arch/x86/hyperv/irqdomain.c | 362 arch/x86/include/asm/mshyperv.h | 2 + 4 files changed, 374 insertions(+), 1 deletion(-) create mode 100644 arch/x86/hyperv/irqdomain.c diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile index 565358020921..48e2c51464e8 100644 --- a/arch/x86/hyperv/Makefile +++ b/arch/x86/hyperv/Makefile @@ -1,5 +1,5 @@ # SPDX-License-Identifier: GPL-2.0-only -obj-y := hv_init.o mmu.o nested.o +obj-y := hv_init.o mmu.o nested.o irqdomain.o obj-$(CONFIG_X86_64) += hv_apic.o hv_proc.o ifdef CONFIG_X86_64 diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c index 11c5997691f4..894ce899f0cb 100644 --- a/arch/x86/hyperv/hv_init.c +++ b/arch/x86/hyperv/hv_init.c @@ -483,6 +483,15 @@ void __init hyperv_init(void) BUG_ON(hv_root_partition && hv_current_partition_id == ~0ull); +#ifdef CONFIG_PCI_MSI + /* +* If we're running as root, we want to create our own PCI MSI domain. +* We can't set this in hv_pci_init because that would be too late. +*/ + if (hv_root_partition) + x86_init.irqs.create_pci_msi_domain = hv_create_pci_msi_domain; +#endif + return; remove_cpuhp_state: diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c new file mode 100644 index ..117f17e8c88a --- /dev/null +++ b/arch/x86/hyperv/irqdomain.c @@ -0,0 +1,362 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* + * for Linux to run as the root partition on Microsoft Hypervisor. + * + * Authors: + * Sunil Muthuswamy + * Wei Liu + */ + +#include +#include +#include + +static int hv_map_interrupt(union hv_device_id device_id, bool level, + int cpu, int vector, struct hv_interrupt_entry *entry) +{ + struct hv_input_map_device_interrupt *input; + struct hv_output_map_device_interrupt *output; + struct hv_device_interrupt_descriptor *intr_desc; + unsigned long flags; + u64 status; + cpumask_t mask = CPU_MASK_NONE; + int nr_bank, var_size; + + local_irq_save(flags); + + input = *this_cpu_ptr(hyperv_pcpu_input_arg); + output = *this_cpu_ptr(hyperv_pcpu_output_arg); + + intr_desc = &input->interrupt_descriptor; + memset(input, 0, sizeof(*input)); + input->partition_id = hv_current_partition_id; + input->device_id = device_id.as_uint64; + intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED; + intr_desc->vector_count = 1; + intr_desc->target.vector = vector; + + if (level) + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_LEVEL; + else + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE; + + cpumask_set_cpu(cpu, &mask); + intr_desc->target.vp_set.valid_bank_mask = 0; + intr_desc->target.vp_set.format = HV_GENERIC_SET_SPARSE_4K; + nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), &mask); + if (nr_bank < 0) { + local_irq_restore(flags); + pr_err("%s: unable to generate VP set\n", __func__); + return EINVAL; + } + intr_desc->target.flags = HV_DEVICE_INTERRUPT_TARGET_PROCESSOR_SET; + + /* +* var-sized hypercall, var-size starts after vp_mask (thus +* vp_set.format does not count, but vp_set.valid_bank_mask +* does). +*/ + var_size = nr_bank + 1; + + status = hv_do_rep_hypercall(HVCALL_MAP_DEVICE_INTERRUPT, 0, var_size, + input, output); + *entry = output->interrupt_entry; + + local_irq_restore(flags); + + if ((status & HV_HYPERCALL_RESULT_MASK) != HV_STATUS_SUCCESS) + pr_err("%s: hypercall failed, status %lld\n", __func__, status); + + return status & HV_HYPERCALL_RESULT_MASK; +} + +static int hv_unmap_interrupt(u64 id, struct hv_interrupt_entry *old_entry) +{ + unsigned long flags; + struct hv_input_unmap_device_interrupt *input; + struct hv_interrupt_entry *intr_entry; + u64 status; + + local_irq_sav
[PATCH v6 14/16] asm-generic/hyperv: import data structures for mapping device interrupts
Signed-off-by: Sunil Muthuswamy Co-Developed-by: Sunil Muthuswamy Signed-off-by: Wei Liu Reviewed-by: Michael Kelley --- arch/x86/include/asm/hyperv-tlfs.h | 13 +++ include/asm-generic/hyperv-tlfs.h | 36 ++ 2 files changed, 49 insertions(+) diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h index 204010350604..ab7d6cde548d 100644 --- a/arch/x86/include/asm/hyperv-tlfs.h +++ b/arch/x86/include/asm/hyperv-tlfs.h @@ -533,6 +533,19 @@ struct hv_partition_assist_pg { u32 tlb_lock_count; }; +enum hv_interrupt_type { + HV_X64_INTERRUPT_TYPE_FIXED = 0x, + HV_X64_INTERRUPT_TYPE_LOWESTPRIORITY= 0x0001, + HV_X64_INTERRUPT_TYPE_SMI = 0x0002, + HV_X64_INTERRUPT_TYPE_REMOTEREAD= 0x0003, + HV_X64_INTERRUPT_TYPE_NMI = 0x0004, + HV_X64_INTERRUPT_TYPE_INIT = 0x0005, + HV_X64_INTERRUPT_TYPE_SIPI = 0x0006, + HV_X64_INTERRUPT_TYPE_EXTINT= 0x0007, + HV_X64_INTERRUPT_TYPE_LOCALINT0 = 0x0008, + HV_X64_INTERRUPT_TYPE_LOCALINT1 = 0x0009, + HV_X64_INTERRUPT_TYPE_MAXIMUM = 0x000A, +}; #include diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h index ce53c0db28ae..a2eaed1b79e5 100644 --- a/include/asm-generic/hyperv-tlfs.h +++ b/include/asm-generic/hyperv-tlfs.h @@ -152,6 +152,8 @@ struct ms_hyperv_tsc_page { #define HVCALL_RETRIEVE_DEBUG_DATA 0x006a #define HVCALL_RESET_DEBUG_SESSION 0x006b #define HVCALL_ADD_LOGICAL_PROCESSOR 0x0076 +#define HVCALL_MAP_DEVICE_INTERRUPT0x007c +#define HVCALL_UNMAP_DEVICE_INTERRUPT 0x007d #define HVCALL_RETARGET_INTERRUPT 0x007e #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE 0x00af #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST 0x00b0 @@ -702,4 +704,38 @@ union hv_device_id { } acpi; } __packed; +enum hv_interrupt_trigger_mode { + HV_INTERRUPT_TRIGGER_MODE_EDGE = 0, + HV_INTERRUPT_TRIGGER_MODE_LEVEL = 1, +}; + +struct hv_device_interrupt_descriptor { + u32 interrupt_type; + u32 trigger_mode; + u32 vector_count; + u32 reserved; + struct hv_device_interrupt_target target; +} __packed; + +struct hv_input_map_device_interrupt { + u64 partition_id; + u64 device_id; + u64 flags; + struct hv_interrupt_entry logical_interrupt_entry; + struct hv_device_interrupt_descriptor interrupt_descriptor; +} __packed; + +struct hv_output_map_device_interrupt { + struct hv_interrupt_entry interrupt_entry; +} __packed; + +struct hv_input_unmap_device_interrupt { + u64 partition_id; + u64 device_id; + struct hv_interrupt_entry interrupt_entry; +} __packed; + +#define HV_SOURCE_SHADOW_NONE 0x0 +#define HV_SOURCE_SHADOW_BRIDGE_BUS_RANGE 0x1 + #endif -- 2.20.1
[PATCH v6 11/16] asm-generic/hyperv: update hv_msi_entry
We will soon need to access fields inside the MSI address and MSI data fields. Introduce hv_msi_address_register and hv_msi_data_register. Fix up one user of hv_msi_entry in mshyperv.h. No functional change expected. Signed-off-by: Wei Liu Reviewed-by: Michael Kelley --- arch/x86/include/asm/mshyperv.h | 4 ++-- include/asm-generic/hyperv-tlfs.h | 28 ++-- 2 files changed, 28 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h index 4e590a167160..cbee72550a12 100644 --- a/arch/x86/include/asm/mshyperv.h +++ b/arch/x86/include/asm/mshyperv.h @@ -257,8 +257,8 @@ static inline void hv_apic_init(void) {} static inline void hv_set_msi_entry_from_desc(union hv_msi_entry *msi_entry, struct msi_desc *msi_desc) { - msi_entry->address = msi_desc->msg.address_lo; - msi_entry->data = msi_desc->msg.data; + msi_entry->address.as_uint32 = msi_desc->msg.address_lo; + msi_entry->data.as_uint32 = msi_desc->msg.data; } #else /* CONFIG_HYPERV */ diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h index 69de4e3d89d3..4669f9a4e1f1 100644 --- a/include/asm-generic/hyperv-tlfs.h +++ b/include/asm-generic/hyperv-tlfs.h @@ -480,12 +480,36 @@ struct hv_create_vp { u64 flags; } __packed; +union hv_msi_address_register { + u32 as_uint32; + struct { + u32 reserved1:2; + u32 destination_mode:1; + u32 redirection_hint:1; + u32 reserved2:8; + u32 destination_id:8; + u32 msi_base:12; + }; +} __packed; + +union hv_msi_data_register { + u32 as_uint32; + struct { + u32 vector:8; + u32 delivery_mode:3; + u32 reserved1:3; + u32 level_assert:1; + u32 trigger_mode:1; + u32 reserved2:16; + }; +} __packed; + /* HvRetargetDeviceInterrupt hypercall */ union hv_msi_entry { u64 as_uint64; struct { - u32 address; - u32 data; + union hv_msi_address_register address; + union hv_msi_data_register data; } __packed; }; -- 2.20.1
[PATCH v6 09/16] x86/hyperv: provide a bunch of helper functions
They are used to deposit pages into Microsoft Hypervisor and bring up logical and virtual processors. Signed-off-by: Lillian Grassin-Drake Signed-off-by: Sunil Muthuswamy Signed-off-by: Nuno Das Neves Co-Developed-by: Lillian Grassin-Drake Co-Developed-by: Sunil Muthuswamy Co-Developed-by: Nuno Das Neves Signed-off-by: Wei Liu --- v6: 1. Address Michael's comments. v4: Fix compilation issue when CONFIG_ACPI_NUMA is not set. v3: 1. Add __packed to structures. 2. Drop unnecessary exports. v2: 1. Adapt to hypervisor side changes 2. Address Vitaly's comments use u64 status pages major comments minor comments rely on acpi code --- arch/x86/hyperv/Makefile | 2 +- arch/x86/hyperv/hv_proc.c | 219 ++ arch/x86/include/asm/mshyperv.h | 4 + include/asm-generic/hyperv-tlfs.h | 67 + 4 files changed, 291 insertions(+), 1 deletion(-) create mode 100644 arch/x86/hyperv/hv_proc.c diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile index 89b1f74d3225..565358020921 100644 --- a/arch/x86/hyperv/Makefile +++ b/arch/x86/hyperv/Makefile @@ -1,6 +1,6 @@ # SPDX-License-Identifier: GPL-2.0-only obj-y := hv_init.o mmu.o nested.o -obj-$(CONFIG_X86_64) += hv_apic.o +obj-$(CONFIG_X86_64) += hv_apic.o hv_proc.o ifdef CONFIG_X86_64 obj-$(CONFIG_PARAVIRT_SPINLOCKS) += hv_spinlock.o diff --git a/arch/x86/hyperv/hv_proc.c b/arch/x86/hyperv/hv_proc.c new file mode 100644 index ..60461e598239 --- /dev/null +++ b/arch/x86/hyperv/hv_proc.c @@ -0,0 +1,219 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +/* + * See struct hv_deposit_memory. The first u64 is partition ID, the rest + * are GPAs. + */ +#define HV_DEPOSIT_MAX (HV_HYP_PAGE_SIZE / sizeof(u64) - 1) + +/* Deposits exact number of pages. Must be called with interrupts enabled. */ +int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages) +{ + struct page **pages, *page; + int *counts; + int num_allocations; + int i, j, page_count; + int order; + u64 status; + int ret; + u64 base_pfn; + struct hv_deposit_memory *input_page; + unsigned long flags; + + if (num_pages > HV_DEPOSIT_MAX) + return -E2BIG; + if (!num_pages) + return 0; + + /* One buffer for page pointers and counts */ + page = alloc_page(GFP_KERNEL); + if (!page) + return -ENOMEM; + pages = page_address(page); + + counts = kcalloc(HV_DEPOSIT_MAX, sizeof(int), GFP_KERNEL); + if (!counts) { + free_page((unsigned long)pages); + return -ENOMEM; + } + + /* Allocate all the pages before disabling interrupts */ + i = 0; + + while (num_pages) { + /* Find highest order we can actually allocate */ + order = 31 - __builtin_clz(num_pages); + + while (1) { + pages[i] = alloc_pages_node(node, GFP_KERNEL, order); + if (pages[i]) + break; + if (!order) { + ret = -ENOMEM; + num_allocations = i; + goto err_free_allocations; + } + --order; + } + + split_page(pages[i], order); + counts[i] = 1 << order; + num_pages -= counts[i]; + i++; + } + num_allocations = i; + + local_irq_save(flags); + + input_page = *this_cpu_ptr(hyperv_pcpu_input_arg); + + input_page->partition_id = partition_id; + + /* Populate gpa_page_list - these will fit on the input page */ + for (i = 0, page_count = 0; i < num_allocations; ++i) { + base_pfn = page_to_pfn(pages[i]); + for (j = 0; j < counts[i]; ++j, ++page_count) + input_page->gpa_page_list[page_count] = base_pfn + j; + } + status = hv_do_rep_hypercall(HVCALL_DEPOSIT_MEMORY, +page_count, 0, input_page, NULL); + local_irq_restore(flags); + + if ((status & HV_HYPERCALL_RESULT_MASK) != HV_STATUS_SUCCESS) { + pr_err("Failed to deposit pages: %lld\n", status); + ret = status; + goto err_free_allocations; + } + + ret = 0; + goto free_buf; + +err_free_allocations: + for (i = 0; i < num_allocations; ++i) { + base_pfn = page_to_pfn(pages[i]); + for (j = 0; j < counts[i]; ++j) + __free_page(pfn_to_page(base_pfn + j)); + } + +free_buf: + free_page((unsigned long)pages); +
[PATCH v6 12/16] asm-generic/hyperv: update hv_interrupt_entry
We will soon use the same structure to handle IO-APIC interrupts as well. Introduce an enum to identify the source and a data structure for IO-APIC RTE. While at it, update pci-hyperv.c to use the enum. No functional change. Signed-off-by: Wei Liu Acked-by: Rob Herring Reviewed-by: Michael Kelley --- drivers/pci/controller/pci-hyperv.c | 2 +- include/asm-generic/hyperv-tlfs.h | 36 +++-- 2 files changed, 35 insertions(+), 3 deletions(-) diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c index 6db8d96a78eb..87aa62ee0368 100644 --- a/drivers/pci/controller/pci-hyperv.c +++ b/drivers/pci/controller/pci-hyperv.c @@ -1216,7 +1216,7 @@ static void hv_irq_unmask(struct irq_data *data) params = &hbus->retarget_msi_interrupt_params; memset(params, 0, sizeof(*params)); params->partition_id = HV_PARTITION_ID_SELF; - params->int_entry.source = 1; /* MSI(-X) */ + params->int_entry.source = HV_INTERRUPT_SOURCE_MSI; hv_set_msi_entry_from_desc(¶ms->int_entry.msi_entry, msi_desc); params->device_id = (hbus->hdev->dev_instance.b[5] << 24) | (hbus->hdev->dev_instance.b[4] << 16) | diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h index 4669f9a4e1f1..94c7d77bbf68 100644 --- a/include/asm-generic/hyperv-tlfs.h +++ b/include/asm-generic/hyperv-tlfs.h @@ -480,6 +480,11 @@ struct hv_create_vp { u64 flags; } __packed; +enum hv_interrupt_source { + HV_INTERRUPT_SOURCE_MSI = 1, /* MSI and MSI-X */ + HV_INTERRUPT_SOURCE_IOAPIC, +}; + union hv_msi_address_register { u32 as_uint32; struct { @@ -513,10 +518,37 @@ union hv_msi_entry { } __packed; }; +union hv_ioapic_rte { + u64 as_uint64; + + struct { + u32 vector:8; + u32 delivery_mode:3; + u32 destination_mode:1; + u32 delivery_status:1; + u32 interrupt_polarity:1; + u32 remote_irr:1; + u32 trigger_mode:1; + u32 interrupt_mask:1; + u32 reserved1:15; + + u32 reserved2:24; + u32 destination_id:8; + }; + + struct { + u32 low_uint32; + u32 high_uint32; + }; +} __packed; + struct hv_interrupt_entry { - u32 source; /* 1 for MSI(-X) */ + u32 source; u32 reserved1; - union hv_msi_entry msi_entry; + union { + union hv_msi_entry msi_entry; + union hv_ioapic_rte ioapic_rte; + }; } __packed; /* -- 2.20.1
[PATCH v6 13/16] asm-generic/hyperv: introduce hv_device_id and auxiliary structures
We will need to identify the device we want Microsoft Hypervisor to manipulate. Introduce the data structures for that purpose. They will be used in a later patch. Signed-off-by: Sunil Muthuswamy Co-Developed-by: Sunil Muthuswamy Signed-off-by: Wei Liu --- v6: 1. Add reserved0 as field name. --- include/asm-generic/hyperv-tlfs.h | 79 +++ 1 file changed, 79 insertions(+) diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h index 94c7d77bbf68..ce53c0db28ae 100644 --- a/include/asm-generic/hyperv-tlfs.h +++ b/include/asm-generic/hyperv-tlfs.h @@ -623,4 +623,83 @@ struct hv_set_vp_registers_input { } element[]; } __packed; +enum hv_device_type { + HV_DEVICE_TYPE_LOGICAL = 0, + HV_DEVICE_TYPE_PCI = 1, + HV_DEVICE_TYPE_IOAPIC = 2, + HV_DEVICE_TYPE_ACPI = 3, +}; + +typedef u16 hv_pci_rid; +typedef u16 hv_pci_segment; +typedef u64 hv_logical_device_id; +union hv_pci_bdf { + u16 as_uint16; + + struct { + u8 function:3; + u8 device:5; + u8 bus; + }; +} __packed; + +union hv_pci_bus_range { + u16 as_uint16; + + struct { + u8 subordinate_bus; + u8 secondary_bus; + }; +} __packed; + +union hv_device_id { + u64 as_uint64; + + struct { + u64 reserved0:62; + u64 device_type:2; + }; + + /* HV_DEVICE_TYPE_LOGICAL */ + struct { + u64 id:62; + u64 device_type:2; + } logical; + + /* HV_DEVICE_TYPE_PCI */ + struct { + union { + hv_pci_rid rid; + union hv_pci_bdf bdf; + }; + + hv_pci_segment segment; + union hv_pci_bus_range shadow_bus_range; + + u16 phantom_function_bits:2; + u16 source_shadow:1; + + u16 rsvdz0:11; + u16 device_type:2; + } pci; + + /* HV_DEVICE_TYPE_IOAPIC */ + struct { + u8 ioapic_id; + u8 rsvdz0; + u16 rsvdz1; + u16 rsvdz2; + + u16 rsvdz3:14; + u16 device_type:2; + } ioapic; + + /* HV_DEVICE_TYPE_ACPI */ + struct { + u32 input_mapping_base; + u32 input_mapping_count:30; + u32 device_type:2; + } acpi; +} __packed; + #endif -- 2.20.1
[PATCH v6 10/16] x86/hyperv: implement and use hv_smp_prepare_cpus
Microsoft Hypervisor requires the root partition to make a few hypercalls to setup application processors before they can be used. Signed-off-by: Lillian Grassin-Drake Signed-off-by: Sunil Muthuswamy Co-Developed-by: Lillian Grassin-Drake Co-Developed-by: Sunil Muthuswamy Signed-off-by: Wei Liu Reviewed-by: Michael Kelley --- CPU hotplug and unplug is not yet supported in this setup, so those paths remain untouched. v3: Always call native SMP preparation function. --- arch/x86/kernel/cpu/mshyperv.c | 29 + 1 file changed, 29 insertions(+) diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c index c376d191a260..13d3b6dd21a3 100644 --- a/arch/x86/kernel/cpu/mshyperv.c +++ b/arch/x86/kernel/cpu/mshyperv.c @@ -31,6 +31,7 @@ #include #include #include +#include /* Is Linux running as the root partition? */ bool hv_root_partition; @@ -212,6 +213,32 @@ static void __init hv_smp_prepare_boot_cpu(void) hv_init_spinlocks(); #endif } + +static void __init hv_smp_prepare_cpus(unsigned int max_cpus) +{ +#ifdef CONFIG_X86_64 + int i; + int ret; +#endif + + native_smp_prepare_cpus(max_cpus); + +#ifdef CONFIG_X86_64 + for_each_present_cpu(i) { + if (i == 0) + continue; + ret = hv_call_add_logical_proc(numa_cpu_node(i), i, cpu_physical_id(i)); + BUG_ON(ret); + } + + for_each_present_cpu(i) { + if (i == 0) + continue; + ret = hv_call_create_vp(numa_cpu_node(i), hv_current_partition_id, i, i); + BUG_ON(ret); + } +#endif +} #endif static void __init ms_hyperv_init_platform(void) @@ -368,6 +395,8 @@ static void __init ms_hyperv_init_platform(void) # ifdef CONFIG_SMP smp_ops.smp_prepare_boot_cpu = hv_smp_prepare_boot_cpu; + if (hv_root_partition) + smp_ops.smp_prepare_cpus = hv_smp_prepare_cpus; # endif /* -- 2.20.1
[PATCH v6 05/16] x86/hyperv: allocate output arg pages if required
When Linux runs as the root partition, it will need to make hypercalls which return data from the hypervisor. Allocate pages for storing results when Linux runs as the root partition. Signed-off-by: Lillian Grassin-Drake Co-Developed-by: Lillian Grassin-Drake Signed-off-by: Wei Liu --- v3: Fix hv_cpu_die to use free_pages. v2: Address Vitaly's comments --- arch/x86/hyperv/hv_init.c | 35 - arch/x86/include/asm/mshyperv.h | 1 + 2 files changed, 31 insertions(+), 5 deletions(-) diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c index e04d90af4c27..6f4cb40e53fe 100644 --- a/arch/x86/hyperv/hv_init.c +++ b/arch/x86/hyperv/hv_init.c @@ -41,6 +41,9 @@ EXPORT_SYMBOL_GPL(hv_vp_assist_page); void __percpu **hyperv_pcpu_input_arg; EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg); +void __percpu **hyperv_pcpu_output_arg; +EXPORT_SYMBOL_GPL(hyperv_pcpu_output_arg); + u32 hv_max_vp_index; EXPORT_SYMBOL_GPL(hv_max_vp_index); @@ -73,12 +76,19 @@ static int hv_cpu_init(unsigned int cpu) void **input_arg; struct page *pg; - input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg); /* hv_cpu_init() can be called with IRQs disabled from hv_resume() */ - pg = alloc_page(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL); + pg = alloc_pages(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL, hv_root_partition ? 1 : 0); if (unlikely(!pg)) return -ENOMEM; + + input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg); *input_arg = page_address(pg); + if (hv_root_partition) { + void **output_arg; + + output_arg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg); + *output_arg = page_address(pg + 1); + } hv_get_vp_index(msr_vp_index); @@ -205,14 +215,23 @@ static int hv_cpu_die(unsigned int cpu) unsigned int new_cpu; unsigned long flags; void **input_arg; - void *input_pg = NULL; + void *pg; local_irq_save(flags); input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg); - input_pg = *input_arg; + pg = *input_arg; *input_arg = NULL; + + if (hv_root_partition) { + void **output_arg; + + output_arg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg); + *output_arg = NULL; + } + local_irq_restore(flags); - free_page((unsigned long)input_pg); + + free_pages((unsigned long)pg, hv_root_partition ? 1 : 0); if (hv_vp_assist_page && hv_vp_assist_page[cpu]) wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0); @@ -346,6 +365,12 @@ void __init hyperv_init(void) BUG_ON(hyperv_pcpu_input_arg == NULL); + /* Allocate the per-CPU state for output arg for root */ + if (hv_root_partition) { + hyperv_pcpu_output_arg = alloc_percpu(void *); + BUG_ON(hyperv_pcpu_output_arg == NULL); + } + /* Allocate percpu VP index */ hv_vp_index = kmalloc_array(num_possible_cpus(), sizeof(*hv_vp_index), GFP_KERNEL); diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h index ac2b0d110f03..62d9390f1ddf 100644 --- a/arch/x86/include/asm/mshyperv.h +++ b/arch/x86/include/asm/mshyperv.h @@ -76,6 +76,7 @@ static inline void hv_disable_stimer0_percpu_irq(int irq) {} #if IS_ENABLED(CONFIG_HYPERV) extern void *hv_hypercall_pg; extern void __percpu **hyperv_pcpu_input_arg; +extern void __percpu **hyperv_pcpu_output_arg; static inline u64 hv_do_hypercall(u64 control, void *input, void *output) { -- 2.20.1
[PATCH v6 08/16] ACPI / NUMA: add a stub function for node_to_pxm()
There is already a stub function for pxm_to_node but conversion to the other direction is missing. It will be used by Microsoft Hypervisor code later. Signed-off-by: Wei Liu --- v6: new --- include/acpi/acpi_numa.h | 4 1 file changed, 4 insertions(+) diff --git a/include/acpi/acpi_numa.h b/include/acpi/acpi_numa.h index a4c6ef809e27..40a91ce87e04 100644 --- a/include/acpi/acpi_numa.h +++ b/include/acpi/acpi_numa.h @@ -30,6 +30,10 @@ static inline int pxm_to_node(int pxm) { return 0; } +static inline int node_to_pxm(int node) +{ + return 0; +} #endif /* CONFIG_ACPI_NUMA */ #ifdef CONFIG_ACPI_HMAT -- 2.20.1
[PATCH v6 03/16] Drivers: hv: vmbus: skip VMBus initialization if Linux is root
There is no VMBus and the other infrastructures initialized in hv_acpi_init when Linux is running as the root partition. Signed-off-by: Wei Liu Reviewed-by: Pavel Tatashin Reviewed-by: Michael Kelley --- v3: Return 0 instead of -ENODEV. --- drivers/hv/vmbus_drv.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c index 502f8cd95f6d..ee27b3670a51 100644 --- a/drivers/hv/vmbus_drv.c +++ b/drivers/hv/vmbus_drv.c @@ -2620,6 +2620,9 @@ static int __init hv_acpi_init(void) if (!hv_is_hyperv_initialized()) return -ENODEV; + if (hv_root_partition) + return 0; + init_completion(&probe_event); /* -- 2.20.1
[PATCH v6 04/16] clocksource/hyperv: use MSR-based access if running as root
When Linux runs as the root partition, the setup required for TSC page is different. Luckily Linux also has access to the MSR based clocksource. We can just disable the TSC page clocksource if Linux is the root partition. Signed-off-by: Wei Liu Acked-by: Daniel Lezcano Reviewed-by: Pavel Tatashin Reviewed-by: Michael Kelley --- drivers/clocksource/hyperv_timer.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c index ba04cb381cd3..269a691bd2c4 100644 --- a/drivers/clocksource/hyperv_timer.c +++ b/drivers/clocksource/hyperv_timer.c @@ -426,6 +426,9 @@ static bool __init hv_init_tsc_clocksource(void) if (!(ms_hyperv.features & HV_MSR_REFERENCE_TSC_AVAILABLE)) return false; + if (hv_root_partition) + return false; + hv_read_reference_counter = read_hv_clock_tsc; phys_addr = virt_to_phys(hv_get_tsc_page()); -- 2.20.1
[PATCH v6 02/16] x86/hyperv: detect if Linux is the root partition
For now we can use the privilege flag to check. Stash the value to be used later. Put in a bunch of defines for future use when we want to have more fine-grained detection. Signed-off-by: Wei Liu Reviewed-by: Pavel Tatashin --- v3: move hv_root_partition to mshyperv.c --- arch/x86/include/asm/hyperv-tlfs.h | 10 ++ arch/x86/include/asm/mshyperv.h| 2 ++ arch/x86/kernel/cpu/mshyperv.c | 20 3 files changed, 32 insertions(+) diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h index 6bf42aed387e..204010350604 100644 --- a/arch/x86/include/asm/hyperv-tlfs.h +++ b/arch/x86/include/asm/hyperv-tlfs.h @@ -21,6 +21,7 @@ #define HYPERV_CPUID_FEATURES 0x4003 #define HYPERV_CPUID_ENLIGHTMENT_INFO 0x4004 #define HYPERV_CPUID_IMPLEMENT_LIMITS 0x4005 +#define HYPERV_CPUID_CPU_MANAGEMENT_FEATURES 0x4007 #define HYPERV_CPUID_NESTED_FEATURES 0x400A #define HYPERV_CPUID_VIRT_STACK_INTERFACE 0x4081 @@ -110,6 +111,15 @@ /* Recommend using enlightened VMCS */ #define HV_X64_ENLIGHTENED_VMCS_RECOMMENDEDBIT(14) +/* + * CPU management features identification. + * These are HYPERV_CPUID_CPU_MANAGEMENT_FEATURES.EAX bits. + */ +#define HV_X64_START_LOGICAL_PROCESSOR BIT(0) +#define HV_X64_CREATE_ROOT_VIRTUAL_PROCESSOR BIT(1) +#define HV_X64_PERFORMANCE_COUNTER_SYNCBIT(2) +#define HV_X64_RESERVED_IDENTITY_BIT BIT(31) + /* * Virtual processor will never share a physical core with another virtual * processor, except for virtual processors that are reported as sibling SMT diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h index ffc289992d1b..ac2b0d110f03 100644 --- a/arch/x86/include/asm/mshyperv.h +++ b/arch/x86/include/asm/mshyperv.h @@ -237,6 +237,8 @@ int hyperv_fill_flush_guest_mapping_list( struct hv_guest_mapping_flush_list *flush, u64 start_gfn, u64 end_gfn); +extern bool hv_root_partition; + #ifdef CONFIG_X86_64 void hv_apic_init(void); void __init hv_init_spinlocks(void); diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c index f628e3dc150f..c376d191a260 100644 --- a/arch/x86/kernel/cpu/mshyperv.c +++ b/arch/x86/kernel/cpu/mshyperv.c @@ -32,6 +32,10 @@ #include #include +/* Is Linux running as the root partition? */ +bool hv_root_partition; +EXPORT_SYMBOL_GPL(hv_root_partition); + struct ms_hyperv_info ms_hyperv; EXPORT_SYMBOL_GPL(ms_hyperv); @@ -237,6 +241,22 @@ static void __init ms_hyperv_init_platform(void) pr_debug("Hyper-V: max %u virtual processors, %u logical processors\n", ms_hyperv.max_vp_index, ms_hyperv.max_lp_index); + /* +* Check CPU management privilege. +* +* To mirror what Windows does we should extract CPU management +* features and use the ReservedIdentityBit to detect if Linux is the +* root partition. But that requires negotiating CPU management +* interface (a process to be finalized). +* +* For now, use the privilege flag as the indicator for running as +* root. +*/ + if (cpuid_ebx(HYPERV_CPUID_FEATURES) & HV_CPU_MANAGEMENT) { + hv_root_partition = true; + pr_info("Hyper-V: running as root partition\n"); + } + /* * Extract host information. */ -- 2.20.1
[PATCH v6 01/16] asm-generic/hyperv: change HV_CPU_POWER_MANAGEMENT to HV_CPU_MANAGEMENT
This makes the name match Hyper-V TLFS. Signed-off-by: Wei Liu Reviewed-by: Vitaly Kuznetsov Reviewed-by: Pavel Tatashin Reviewed-by: Michael Kelley --- include/asm-generic/hyperv-tlfs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h index e73a11850055..e6903589a82a 100644 --- a/include/asm-generic/hyperv-tlfs.h +++ b/include/asm-generic/hyperv-tlfs.h @@ -88,7 +88,7 @@ #define HV_CONNECT_PORTBIT(7) #define HV_ACCESS_STATSBIT(8) #define HV_DEBUGGING BIT(11) -#define HV_CPU_POWER_MANAGEMENTBIT(12) +#define HV_CPU_MANAGEMENT BIT(12) /* -- 2.20.1
Re: [PATCH v3 hyperv-next 0/4] Drivers: hv: vmbus: Restrict devices and configurations on 'isolated' guests
On Mon, Feb 01, 2021 at 03:48:10PM +0100, Andrea Parri (Microsoft) wrote: > Andrea Parri (Microsoft) (4): > x86/hyperv: Load/save the Isolation Configuration leaf > Drivers: hv: vmbus: Restrict vmbus_devices on isolated guests > Drivers: hv: vmbus: Enforce 'VMBus version >= 5.2' on isolated guests > hv_netvsc: Restrict configurations on isolated guests Applied to hyperv-next. Thanks. Wei.
Re: [PATCH][next] hv: hyperv.h: Replace one-element array with flexible-array in struct icmsg_negotiate
On Mon, Feb 01, 2021 at 05:56:06PM +, Michael Kelley wrote: > From: Gustavo A. R. Silva Sent: Monday, February 1, > 2021 9:44 AM [...] > > struct shutdown_msg_data { > > -- > > 2.27.0 > > Reviewed-by: Michael Kelley Applied to hyperv-next. Thanks. Wei.
Re: [PATCH 08/10] clocksource/drivers/hyper-v: Handle sched_clock differences inline
On Wed, Jan 27, 2021 at 12:23:43PM -0800, Michael Kelley wrote: [...] > +/* > + * Reference to pv_ops must be inline so objtool > + * detection of noinstr violations can work correctly. > + */ > +static __always_inline void hv_setup_sched_clock(void *sched_clock) sched_clock_register is not trivial. Having __always_inline here is going to make the compiled object bloated. Given this is a static function, I don't think we need to specify any inline keyword. The compiler should be able to determine whether this function should be inlined all by itself. Wei. > +{ > +#ifdef CONFIG_GENERIC_SCHED_CLOCK > + /* > + * We're on an architecture with generic sched clock (not x86/x64). > + * The Hyper-V sched clock read function returns nanoseconds, not > + * the normal 100ns units of the Hyper-V synthetic clock. > + */ > + sched_clock_register(sched_clock, 64, NSEC_PER_SEC); > +#else > +#ifdef CONFIG_PARAVIRT > + /* We're on x86/x64 *and* using PV ops */ > + pv_ops.time.sched_clock = sched_clock; > +#endif > +#endif > +} > + > static bool __init hv_init_tsc_clocksource(void) > { > u64 tsc_msr; > -- > 1.8.3.1 >
Re: [PATCH 10/10] clocksource/drivers/hyper-v: Move handling of STIMER0 interrupts
On Wed, Jan 27, 2021 at 12:23:45PM -0800, Michael Kelley wrote: [...] > +static int hv_setup_stimer0_irq(void) > +{ > + int ret; > + > + ret = acpi_register_gsi(NULL, HYPERV_STIMER0_VECTOR, > + ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_HIGH); When IO-APIC is enabled on x86, acpi_register_gsi calls mp_map_gsi_to_irq. That function then calls mp_find_ioapic. Is HYPERV_STIMER0_VECTOR, when used as a GSI, associated with an IO-APIC? If not, wouldn't mp_find_ioapic return an error causing acpi_register_gsi to fail? Ah, it appears that this function is not called on x86. I haven't checked how ARM handles GSI, but presumably this works for you. It would be good if a comment can be added to clarify things. Wei.
Re: [PATCH v3 00/17] Introducing Linux root partition support for Microsoft Hypervisor
On Tue, Feb 02, 2021 at 10:40:43AM +, David Woodhouse wrote: > On Tue, 2020-12-15 at 16:42 +0000, Wei Liu wrote: > > On Tue, Dec 15, 2020 at 04:25:03PM +0100, Enrico Weigelt, metux IT consult > > wrote: > > > On 03.12.20 00:22, Wei Liu wrote: > > > > > > Hi, > > > > > > > I don't follow. Do you mean reusing /dev/kvm but with a different set of > > > > APIs underneath? I don't think that will work. > > > > > > My idea was using the same uapi for both hypervisors, so that we can use > > > the same userlands for both. > > > > > > Are the semantis so different that we can't provide the same API ? > > > > We can provide some similar APIs for ease of porting, but can't provide > > 1:1 mappings. By definition KVM and MSHV are two different things. There > > is no goal to make one ABI / API compatible with the other. > > I'm not sure I understand. > > KVM is the Linux userspace API for virtualisation. It is designed to be > versatile enough that it can support multiple implementations across > multiple architectures, including both AMD SVM and Intel VMX on x86. > > Are you saying that KVM has *failed* to be versatile enough that this > can be "just another implementation"? What are the problems? Is it > unfixable? The KVM APIs are good enough to cover guest life cycle management. To make MSHV another implementation of the KVM APIs, we essentially need to massage the data structures both way. They are There is also an aspect for controlling the hypervisor that affect the whole virtualization system. KVM APIs don't handle those. We would need /dev/mshv for that purpose alone. There is another aspect for Microsoft Hypervisor specific features and enhancements, which aren't applicable to KVM. Features make sense for a specific type-1 hypervisor may not make sense for KVM (a type-2 hypervisor). We have no intention to pollute KVM APIs with those. All in all the latter two points make /dev/mshv is a more viable route in the long run. Wei.
Re: [PATCH v5 07/16] x86/hyperv: extract partition ID from Microsoft Hypervisor if necessary
On Tue, Jan 26, 2021 at 12:48:37AM +, Michael Kelley wrote: > From: Wei Liu Sent: Wednesday, January 20, 2021 4:01 AM > > > > We will need the partition ID for executing some hypercalls later. > > > > Signed-off-by: Lillian Grassin-Drake > > Co-Developed-by: Sunil Muthuswamy > > Signed-off-by: Wei Liu > > --- > > v3: > > 1. Make hv_get_partition_id static. > > 2. Change code structure a bit. > > --- > > arch/x86/hyperv/hv_init.c | 27 +++ > > arch/x86/include/asm/mshyperv.h | 2 ++ > > include/asm-generic/hyperv-tlfs.h | 6 ++ > > 3 files changed, 35 insertions(+) > > > > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c > > index 6f4cb40e53fe..fc9941bd8653 100644 > > --- a/arch/x86/hyperv/hv_init.c > > +++ b/arch/x86/hyperv/hv_init.c > > @@ -26,6 +26,9 @@ > > #include > > #include > > > > +u64 hv_current_partition_id = ~0ull; > > +EXPORT_SYMBOL_GPL(hv_current_partition_id); > > + > > void *hv_hypercall_pg; > > EXPORT_SYMBOL_GPL(hv_hypercall_pg); > > > > @@ -331,6 +334,25 @@ static struct syscore_ops hv_syscore_ops = { > > .resume = hv_resume, > > }; > > > > +static void __init hv_get_partition_id(void) > > +{ > > + struct hv_get_partition_id *output_page; > > + u16 status; > > + unsigned long flags; > > + > > + local_irq_save(flags); > > + output_page = *this_cpu_ptr(hyperv_pcpu_output_arg); > > + status = hv_do_hypercall(HVCALL_GET_PARTITION_ID, NULL, output_page) & > > + HV_HYPERCALL_RESULT_MASK; > > + if (status != HV_STATUS_SUCCESS) { > > Across the Hyper-V code in Linux, the way we check the hypercall result > is very inconsistent. IMHO, the and'ing of hv_do_hypercall() with > HV_HYPERCALL_RESULT_MASK so that status can be a u16 is stylistically > a bit unusual. > > I'd like to see the hypercall result being stored into a u64 local variable. > Then the subsequent test for the status should 'and' the u64 with > HV_HYPERCALL_RESULT_MASK to determine the result code. > I've made a note to go fix the places that aren't doing it that way. > I will fold in the following diff in the next version. I will also check if there are other instances in this patch series that need fixing. Pretty sure there are a few. diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c index fc9941bd8653..6064f64a1295 100644 --- a/arch/x86/hyperv/hv_init.c +++ b/arch/x86/hyperv/hv_init.c @@ -337,14 +337,13 @@ static struct syscore_ops hv_syscore_ops = { static void __init hv_get_partition_id(void) { struct hv_get_partition_id *output_page; - u16 status; + u64 status; unsigned long flags; local_irq_save(flags); output_page = *this_cpu_ptr(hyperv_pcpu_output_arg); - status = hv_do_hypercall(HVCALL_GET_PARTITION_ID, NULL, output_page) & - HV_HYPERCALL_RESULT_MASK; - if (status != HV_STATUS_SUCCESS) { + status = hv_do_hypercall(HVCALL_GET_PARTITION_ID, NULL, output_page); + if ((status & HV_HYPERCALL_RESULT_MASK) != HV_STATUS_SUCCESS) { /* No point in proceeding if this failed */ pr_err("Failed to get partition ID: %d\n", status); BUG(); > > + /* No point in proceeding if this failed */ > > + pr_err("Failed to get partition ID: %d\n", status); > > + BUG(); > > + } > > + hv_current_partition_id = output_page->partition_id; > > + local_irq_restore(flags); > > +} > > + > > /* > > * This function is to be invoked early in the boot sequence after the > > * hypervisor has been detected. > > @@ -426,6 +448,11 @@ void __init hyperv_init(void) > > > > register_syscore_ops(&hv_syscore_ops); > > > > + if (cpuid_ebx(HYPERV_CPUID_FEATURES) & HV_ACCESS_PARTITION_ID) > > + hv_get_partition_id(); > > Another place where the EBX value saved into the ms_hyperv structure > could be used. If you're okay with my response earlier, this will be handled later in another patch (series). Wei.
Re: [PATCH 08/10] clocksource/drivers/hyper-v: Handle sched_clock differences inline
On Thu, Feb 04, 2021 at 04:28:38PM +, Michael Kelley wrote: > From: Wei Liu Sent: Monday, February 1, 2021 10:55 AM > > > > On Wed, Jan 27, 2021 at 12:23:43PM -0800, Michael Kelley wrote: > > [...] > > > +/* > > > + * Reference to pv_ops must be inline so objtool > > > + * detection of noinstr violations can work correctly. > > > + */ > > > +static __always_inline void hv_setup_sched_clock(void *sched_clock) > > > > sched_clock_register is not trivial. Having __always_inline here is > > going to make the compiled object bloated. > > > > Given this is a static function, I don't think we need to specify any > > inline keyword. The compiler should be able to determine whether this > > function should be inlined all by itself. > > > > Wei. > > There was an explicit request from Peter Zijlstra and Thomas Gleixner > to force it inline. See https://lore.kernel.org/patchwork/patch/1283635/ and > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/x86/include/asm/mshyperv.h?id=b9d8cf2eb3ceecdee3434b87763492aee9e28845 Oh. noinstr. I failed to notice the comment. Sorry for the noise. Wei.
Re: [PATCH v5 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition
On Thu, Feb 04, 2021 at 04:41:47PM +, Michael Kelley wrote: > From: Wei Liu Sent: Wednesday, February 3, 2021 4:47 AM [...] > > > > + > > > > + if (level) > > > > + intr_desc->trigger_mode = > > > > HV_INTERRUPT_TRIGGER_MODE_LEVEL; > > > > + else > > > > + intr_desc->trigger_mode = > > > > HV_INTERRUPT_TRIGGER_MODE_EDGE; > > > > + > > > > + __set_bit(vcpu, (unsigned long *)&intr_desc->target.vp_mask); > > > > + > > > > + status = hv_do_rep_hypercall(HVCALL_MAP_DEVICE_INTERRUPT, 0, 0, > > > > input, > > output) & > > > > +HV_HYPERCALL_RESULT_MASK; > > > > + local_irq_restore(flags); > > > > + > > > > + *entry = output->interrupt_entry; > > > > + > > > > + return status; > > > > > > As a cross-check, I was comparing this code against > > > hv_map_msi_interrupt(). They are > > > mostly parallel, though some of the assignments are done in a different > > > order. It's a nit, > > > but making them as parallel as possible would be nice. :-) > > > > > > > Indeed. I will see about factoring out a function. > > If factoring out a separate helper function is clumsy, just having the > parallel code > in the two functions be as similar as possible makes it easier to see what's > the > same and what's different. > No. It is not clumsy at all. I've done it in the newly posted v6. I was baffled why I wrote hv_unmap_interrupt helper to be used by both hv_unmap_ioapic_interrupt and hv_unmap_msi_interrupt in the previous patch, but didn't write a hv_map_interrupt. Maybe I didn't have enough coffee that day. :-/ Thanks for pointing out that issue. It definitely helped improve the quality of this series. Wei.
Re: [PATCH v6 15/16] x86/hyperv: implement an MSI domain for root partition
On Thu, Feb 04, 2021 at 05:43:16PM +, Michael Kelley wrote: [...] > > remove_cpuhp_state: > > diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c > > new file mode 100644 > > index ..117f17e8c88a > > --- /dev/null > > +++ b/arch/x86/hyperv/irqdomain.c > > @@ -0,0 +1,362 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > + > > +/* > > + * for Linux to run as the root partition on Microsoft Hypervisor. > > Nit: Looks like the initial word "Irqdomain" got dropped from the above > comment line. But don't respin just for this. > I've added it back. Thanks. > > +static int hv_map_interrupt(union hv_device_id device_id, bool level, > > + int cpu, int vector, struct hv_interrupt_entry *entry) > > +{ > > + struct hv_input_map_device_interrupt *input; > > + struct hv_output_map_device_interrupt *output; > > + struct hv_device_interrupt_descriptor *intr_desc; > > + unsigned long flags; > > + u64 status; > > + cpumask_t mask = CPU_MASK_NONE; > > + int nr_bank, var_size; > > + > > + local_irq_save(flags); > > + > > + input = *this_cpu_ptr(hyperv_pcpu_input_arg); > > + output = *this_cpu_ptr(hyperv_pcpu_output_arg); > > + > > + intr_desc = &input->interrupt_descriptor; > > + memset(input, 0, sizeof(*input)); > > + input->partition_id = hv_current_partition_id; > > + input->device_id = device_id.as_uint64; > > + intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED; > > + intr_desc->vector_count = 1; > > + intr_desc->target.vector = vector; > > + > > + if (level) > > + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_LEVEL; > > + else > > + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE; > > + > > + cpumask_set_cpu(cpu, &mask); > > + intr_desc->target.vp_set.valid_bank_mask = 0; > > + intr_desc->target.vp_set.format = HV_GENERIC_SET_SPARSE_4K; > > + nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), &mask); > > There's a function get_cpu_mask() that returns a pointer to a cpumask with > only > the specified cpu set in the mask. It returns a const pointer to the correct > entry > in a pre-allocated array of all such cpumasks, so it's a lot more efficient > than > allocating and initializing a local cpumask instance on the stack. > That's nice. I've got the following diff to fix both issues. If you're happy with the changes, can you give your Reviewed-by? That saves a round of posting. diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c index 0cabc9aece38..fa71db798465 100644 --- a/arch/x86/hyperv/irqdomain.c +++ b/arch/x86/hyperv/irqdomain.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 /* - * for Linux to run as the root partition on Microsoft Hypervisor. + * Irqdomain for Linux to run as the root partition on Microsoft Hypervisor. * * Authors: * Sunil Muthuswamy @@ -20,7 +20,7 @@ static int hv_map_interrupt(union hv_device_id device_id, bool level, struct hv_device_interrupt_descriptor *intr_desc; unsigned long flags; u64 status; - cpumask_t mask = CPU_MASK_NONE; + const cpumask_t *mask; int nr_bank, var_size; local_irq_save(flags); @@ -41,10 +41,10 @@ static int hv_map_interrupt(union hv_device_id device_id, bool level, else intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE; - cpumask_set_cpu(cpu, &mask); + mask = cpumask_of(cpu); intr_desc->target.vp_set.valid_bank_mask = 0; intr_desc->target.vp_set.format = HV_GENERIC_SET_SPARSE_4K; - nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), &mask); + nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), mask); if (nr_bank < 0) { local_irq_restore(flags); pr_err("%s: unable to generate VP set\n", __func__);
Re: [PATCH v6 08/16] ACPI / NUMA: add a stub function for node_to_pxm()
On Wed, Feb 03, 2021 at 03:04:27PM +, Wei Liu wrote: > There is already a stub function for pxm_to_node but conversion to the > other direction is missing. > > It will be used by Microsoft Hypervisor code later. > > Signed-off-by: Wei Liu Hi ACPI maintainers, if you're happy with this patch I can take it via the hyperv-next tree, given the issue is discovered when pxm_to_node is called in our code. > --- > v6: new > --- > include/acpi/acpi_numa.h | 4 > 1 file changed, 4 insertions(+) > > diff --git a/include/acpi/acpi_numa.h b/include/acpi/acpi_numa.h > index a4c6ef809e27..40a91ce87e04 100644 > --- a/include/acpi/acpi_numa.h > +++ b/include/acpi/acpi_numa.h > @@ -30,6 +30,10 @@ static inline int pxm_to_node(int pxm) > { > return 0; > } > +static inline int node_to_pxm(int node) > +{ > + return 0; > +} > #endif /* CONFIG_ACPI_NUMA */ > > #ifdef CONFIG_ACPI_HMAT > -- > 2.20.1 >
Re: [PATCH v6 15/16] x86/hyperv: implement an MSI domain for root partition
On Thu, Feb 04, 2021 at 06:40:55PM +, Michael Kelley wrote: > From: Wei Liu Sent: Thursday, February 4, 2021 9:57 AM [...] > > I've got the following diff to fix both issues. If you're happy with the > > changes, can you give your Reviewed-by? That saves a round of posting. > > > > diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c > > index 0cabc9aece38..fa71db798465 100644 > > --- a/arch/x86/hyperv/irqdomain.c > > +++ b/arch/x86/hyperv/irqdomain.c > > @@ -1,7 +1,7 @@ > > // SPDX-License-Identifier: GPL-2.0 > > > > /* > > - * for Linux to run as the root partition on Microsoft Hypervisor. > > + * Irqdomain for Linux to run as the root partition on Microsoft > > Hypervisor. > > * > > * Authors: > > * Sunil Muthuswamy > > @@ -20,7 +20,7 @@ static int hv_map_interrupt(union hv_device_id device_id, > > bool level, > > struct hv_device_interrupt_descriptor *intr_desc; > > unsigned long flags; > > u64 status; > > - cpumask_t mask = CPU_MASK_NONE; > > + const cpumask_t *mask; > > int nr_bank, var_size; > > > > local_irq_save(flags); > > @@ -41,10 +41,10 @@ static int hv_map_interrupt(union hv_device_id > > device_id, bool > > level, > > else > > intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE; > > > > - cpumask_set_cpu(cpu, &mask); > > + mask = cpumask_of(cpu); > > intr_desc->target.vp_set.valid_bank_mask = 0; > > intr_desc->target.vp_set.format = HV_GENERIC_SET_SPARSE_4K; > > - nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), &mask); > > + nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), mask); > > Can you just do the following and get rid of the 'mask' local entirely? > > nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), cpumask_of(cpu)); Sure. That can be done. > > Either way, > > Reviewed-by: Michael Kelley Thank you. Wei.
Re: [PATCH v6 08/16] ACPI / NUMA: add a stub function for node_to_pxm()
On Thu, Feb 04, 2021 at 07:45:25PM +0100, Rafael J. Wysocki wrote: > On Thu, Feb 4, 2021 at 7:41 PM Wei Liu wrote: > > > > On Wed, Feb 03, 2021 at 03:04:27PM +, Wei Liu wrote: > > > There is already a stub function for pxm_to_node but conversion to the > > > other direction is missing. > > > > > > It will be used by Microsoft Hypervisor code later. > > > > > > Signed-off-by: Wei Liu > > > > Hi ACPI maintainers, if you're happy with this patch I can take it via > > the hyperv-next tree, given the issue is discovered when pxm_to_node is > > called in our code. > > Yes, you can. Thanks Rafael. I will add your ack to the patch as well. Wei.
Re: [PATCH v6 00/16] Introducing Linux root partition support for Microsoft Hypervisor
On Wed, Feb 03, 2021 at 03:04:19PM +, Wei Liu wrote: > Wei Liu (16): > asm-generic/hyperv: change HV_CPU_POWER_MANAGEMENT to > HV_CPU_MANAGEMENT > x86/hyperv: detect if Linux is the root partition > Drivers: hv: vmbus: skip VMBus initialization if Linux is root > clocksource/hyperv: use MSR-based access if running as root > x86/hyperv: allocate output arg pages if required > x86/hyperv: extract partition ID from Microsoft Hypervisor if > necessary > x86/hyperv: handling hypercall page setup for root > ACPI / NUMA: add a stub function for node_to_pxm() > x86/hyperv: provide a bunch of helper functions > x86/hyperv: implement and use hv_smp_prepare_cpus > asm-generic/hyperv: update hv_msi_entry > asm-generic/hyperv: update hv_interrupt_entry > asm-generic/hyperv: introduce hv_device_id and auxiliary structures > asm-generic/hyperv: import data structures for mapping device > interrupts > x86/hyperv: implement an MSI domain for root partition > iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition This series is now rebased and pushed to hyperv-next. Many thanks to all the people that provided reviews and comments. Wei.
Re: linux-next: manual merge of the hyperv tree with Linus' tree
On Fri, Feb 05, 2021 at 07:02:02PM +1100, Stephen Rothwell wrote: > Hi all, > > Today's linux-next merge of the hyperv tree got a conflict in: > > arch/x86/hyperv/hv_init.c > > between commit: > > fff7b5e6ee63 ("x86/hyperv: Initialize clockevents after LAPIC is > initialized") > > from Linus' tree and commits: > > a06c2e7df586 ("x86/hyperv: extract partition ID from Microsoft Hypervisor > if necessary") > fa2c411b58fe ("x86/hyperv: implement an MSI domain for root partition") > > from the hyperv tree. > > I fixed it up (see below) and can carry the fix as necessary. This > is now fixed as far as linux-next is concerned, but any non trivial > conflicts should be mentioned to your upstream maintainer when your tree > is submitted for merging. You may also want to consider cooperating > with the maintainer of the conflicting tree to minimise any particularly > complex conflicts. > Thanks Stephen. I've fixed up the conflicts locally in hyperv-next. Wei.
Re: [PATCH 4/7] xen/events: link interdomain events to associated xenbus device
On Sat, Feb 06, 2021 at 11:49:29AM +0100, Juergen Gross wrote: > In order to support the possibility of per-device event channel > settings (e.g. lateeoi spurious event thresholds) add a xenbus device > pointer to struct irq_info() and modify the related event channel > binding interfaces to take the pointer to the xenbus device as a > parameter instead of the domain id of the other side. > > While at it remove the stale prototype of bind_evtchn_to_irq_lateeoi(). > > Signed-off-by: Juergen Gross Reviewed-by: Wei Liu
[GIT PULL] Hyper-V fixes for v5.7-rc7
Hi Linus Please pull the following changes since commit f081bbb3fd03f949bcdc5aed95a827d7c65e0f30: hyper-v: Remove internal types from UAPI header (2020-04-22 21:10:05 +0100) are available in the Git repository at: ssh://g...@gitolite.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git tags/hyperv-fixes-signed for you to fetch changes up to 38dce4195f0daefb566279fd9fd51e1fbd62ae1b: x86/hyperv: Properly suspend/resume reenlightenment notifications (2020-05-13 15:02:03 +) hyperv-fixes for 5.7-rc6 - One patch from Vitaly to fix reenlightenment notifications Vitaly Kuznetsov (1): x86/hyperv: Properly suspend/resume reenlightenment notifications arch/x86/hyperv/hv_init.c | 19 +-- 1 file changed, 17 insertions(+), 2 deletions(-)
Re: [PATCH] x86/Hyper-V: Support for free page reporting
On Tue, May 19, 2020 at 06:37:57PM +, Sunil Muthuswamy wrote: > Linux has support for free page reporting now (36e66c554b5c) for > virtualized environment. On Hyper-V when virtually backed VMs are > configured, Hyper-V will advertise cold memory discard capability, > when supported. This patch adds the support to hook into the free > page reporting infrastructure and leverage the Hyper-V cold memory > discard hint hypercall to report/free these pages back to the host. > > Signed-off-by: Sunil Muthuswamy > --- > First patch mail bounced backed. Sending it again with the email > addresses fixed. > --- > arch/x86/hyperv/hv_init.c | 24 > arch/x86/kernel/cpu/mshyperv.c| 6 +- > drivers/hv/hv_balloon.c | 93 +++ > include/asm-generic/hyperv-tlfs.h | 29 ++ > include/asm-generic/mshyperv.h| 2 + > 5 files changed, 152 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c > index 624f5d9b0f79..925e2f7eb82c 100644 > --- a/arch/x86/hyperv/hv_init.c > +++ b/arch/x86/hyperv/hv_init.c > @@ -506,3 +506,27 @@ bool hv_is_hibernation_supported(void) > return acpi_sleep_state_supported(ACPI_STATE_S4); > } > EXPORT_SYMBOL_GPL(hv_is_hibernation_supported); > + > +u64 hv_query_ext_cap(void) > +{ > + u64 *cap; > + unsigned long flags; > + u64 ext_cap = 0; > + > + /* > + * Querying extended capabilities is an extended hypercall. Check if the > + * partition supports extended hypercall, first. > + */ > + if (!(ms_hyperv.b_features & HV_ENABLE_EXTENDED_HYPERCALLS)) > + return 0; > + > + local_irq_save(flags); > + cap = *(u64 **)this_cpu_ptr(hyperv_pcpu_input_arg); The cast here is not strictly needed. > + if (hv_do_hypercall(HV_EXT_CALL_QUERY_CAPABILITIES, NULL, cap) == > + HV_STATUS_SUCCESS) You're using the input page as the output parameter. Ideally we should introduce hyperv_pcpu_output_arg page, but that would waste one page per cpu just for this one call. So for now I think this setup is fine, but I would like to add the following comment. /* * Repurpose the input_arg page to accept output from Hyper-V for * now because this is the only call that needs output from the * hypervisor. It should be fixed properly by introducing an * output_arg page once we have more places that require output. */ > + ext_cap = *cap; > + > + local_irq_restore(flags); > + return ext_cap; > +} > +EXPORT_SYMBOL_GPL(hv_query_ext_cap); > diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c > index ebf34c7bc8bc..2de3f692c8bf 100644 > --- a/arch/x86/kernel/cpu/mshyperv.c > +++ b/arch/x86/kernel/cpu/mshyperv.c > @@ -224,11 +224,13 @@ static void __init ms_hyperv_init_platform(void) >* Extract the features and hints >*/ > ms_hyperv.features = cpuid_eax(HYPERV_CPUID_FEATURES); > + ms_hyperv.b_features = cpuid_ebx(HYPERV_CPUID_FEATURES); > ms_hyperv.misc_features = cpuid_edx(HYPERV_CPUID_FEATURES); > ms_hyperv.hints= cpuid_eax(HYPERV_CPUID_ENLIGHTMENT_INFO); > > - pr_info("Hyper-V: features 0x%x, hints 0x%x, misc 0x%x\n", > - ms_hyperv.features, ms_hyperv.hints, ms_hyperv.misc_features); > + pr_info("Hyper-V: features 0x%x, additional features: 0x%x, hints 0x%x, > misc 0x%x\n", > + ms_hyperv.features, ms_hyperv.b_features, ms_hyperv.hints, > + ms_hyperv.misc_features); > > ms_hyperv.max_vp_index = cpuid_eax(HYPERV_CPUID_IMPLEMENT_LIMITS); > ms_hyperv.max_lp_index = cpuid_ebx(HYPERV_CPUID_IMPLEMENT_LIMITS); > diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c > index 32e3bc0aa665..77be31094556 100644 > --- a/drivers/hv/hv_balloon.c > +++ b/drivers/hv/hv_balloon.c > @@ -21,6 +21,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -563,6 +564,10 @@ struct hv_dynmem_device { >* The negotiated version agreed by host. >*/ > __u32 version; > + > +#ifdef CONFIG_PAGE_REPORTING > + struct page_reporting_dev_info pr_dev_info; > +#endif > }; > > static struct hv_dynmem_device dm_device; > @@ -1565,6 +1570,83 @@ static void balloon_onchannelcallback(void *context) > > } > > +#ifdef CONFIG_PAGE_REPORTING > +static int hv_free_page_report(struct page_reporting_dev_info *pr_dev_info, > + struct scatterlist *sgl, unsigned int nents) > +{ > + unsigned long flags; > + struct hv_memory_hint *hint; > + int i; > + u64 status; > + struct scatterlist *sg; > + > + WARN_ON(nents > HV_MAX_GPA_PAGE_RANGES); Should we return -ENOSPC here? > + local_irq_save(flags); > + hint = *(struct hv_memory_hint **)this_cpu_ptr(hyperv_pcpu_input_arg); > + if (!hint) { > + local_irq_restore(flags); > + return -ENOSPC; > + } > + > + hint->type = HV_EXT
Re: [PATCH] x86/Hyper-V: Support for free page reporting
On Wed, May 20, 2020 at 10:59:22AM +0200, Vitaly Kuznetsov wrote: > Sunil Muthuswamy writes: [...] > > +EXPORT_SYMBOL_GPL(hv_query_ext_cap); > > diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c > > index ebf34c7bc8bc..2de3f692c8bf 100644 > > --- a/arch/x86/kernel/cpu/mshyperv.c > > +++ b/arch/x86/kernel/cpu/mshyperv.c > > @@ -224,11 +224,13 @@ static void __init ms_hyperv_init_platform(void) > > * Extract the features and hints > > */ > > ms_hyperv.features = cpuid_eax(HYPERV_CPUID_FEATURES); > > + ms_hyperv.b_features = cpuid_ebx(HYPERV_CPUID_FEATURES); > > ms_hyperv.misc_features = cpuid_edx(HYPERV_CPUID_FEATURES); > > ms_hyperv.hints= cpuid_eax(HYPERV_CPUID_ENLIGHTMENT_INFO); > > > > - pr_info("Hyper-V: features 0x%x, hints 0x%x, misc 0x%x\n", > > - ms_hyperv.features, ms_hyperv.hints, ms_hyperv.misc_features); > > + pr_info("Hyper-V: features 0x%x, additional features: 0x%x, hints 0x%x, > > misc 0x%x\n", > > + ms_hyperv.features, ms_hyperv.b_features, ms_hyperv.hints, > > + ms_hyperv.misc_features); > > HYPERV_CPUID_FEATURES(0x4003) EAX and EBX correspond to Partition > Privilege Flags (TLFS), I'd suggest to take the opportunity and rename > this to something like 'privilege flags low=0x%x high=0x%x'. > > Also, I don't quite like 'ms_hyperv.b_features' as I'll always have to > look at what it's being assigned to understand what it holds. I'd even > suggest to rename ms_hyperv.features to ms_hyperv.priv_low and > ms_hyperv.b_features tp ms_hyperv.priv_high. Or maybe even better, pack > them to the same 'u64 ms_hyperv.privileges'. +1 for this. :-) Wei.
Re: [PATCH 1/1] x86/hyperv: Make hv_setup_sched_clock inline
On Mon, Aug 10, 2020 at 10:08:41PM +0200, Thomas Gleixner wrote: > Michael Kelley writes: > > Make hv_setup_sched_clock inline so the reference to pv_ops works > > correctly with objtool updates to detect noinstr violations. > > See https://lore.kernel.org/patchwork/patch/1283635/ > > > > Signed-off-by: Michael Kelley > > Acked-by: Thomas Gleixner Thomas and Peter, thank you for your acks. I have applied to hyperv-fixes and plan to submit it to Linus in a few days' time. Thomas, let me know if you want this patch to go through the tip tree. I will revert it from hyperv-fixes if that's the case. Wei.
Re: [PATCH 1/1] Drivers: hv: vmbus: Only notify Hyper-V for die events that are oops
On Fri, Aug 07, 2020 at 11:06:47AM +0200, Vitaly Kuznetsov wrote: > Michael Kelley writes: > > > Hyper-V currently may be notified of a panic for any die event. But > > this results in false panic notifications for various user space traps > > that are die events. Fix this by ignoring die events that aren't oops. > > > > Fixes: 510f7aef65bb ("Drivers: hv: vmbus: prefer 'die' notification chain > > to 'panic'") > > Signed-off-by: Michael Kelley > > --- > > drivers/hv/vmbus_drv.c | 4 > > 1 file changed, 4 insertions(+) > > > > diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c > > index b50081c..910b6e9 100644 > > --- a/drivers/hv/vmbus_drv.c > > +++ b/drivers/hv/vmbus_drv.c > > @@ -86,6 +86,10 @@ static int hyperv_die_event(struct notifier_block *nb, > > unsigned long val, > > struct die_args *die = (struct die_args *)args; > > struct pt_regs *regs = die->regs; > > > > + /* Don't notify Hyper-V if the die event is other than oops */ > > + if (val != DIE_OOPS) > > + return NOTIFY_DONE; > > + > > Looking at die_val enum, DIE_PANIC also sounds like something we would > want to report but it doesn't get emitted anywhere and honestly I don't > quite understand how is was supposed to be different from DIE_OOPS. > > Reviewed-by: Vitaly Kuznetsov Applied to hyperv-fixes. Wei. > > > /* > > * Hyper-V should be notified only once about a panic. If we will be > > * doing hyperv_report_panic_msg() later with kmsg data, don't do > > -- > Vitaly >
Re: [PATCH 1/1] x86/hyperv: Make hv_setup_sched_clock inline
On Sun, Aug 09, 2020 at 06:29:51PM -0700, Michael Kelley wrote: > Make hv_setup_sched_clock inline so the reference to pv_ops works > correctly with objtool updates to detect noinstr violations. > See https://lore.kernel.org/patchwork/patch/1283635/ > I read the reference link. This change looks sensible. I will wait a few days before queueing this up in case other people have an opinion on this. Wei.
Re: [PATCH v1] tools: hv: remove cast from hyperv_die_event
On Wed, Aug 19, 2020 at 11:05:09AM +0200, Olaf Hering wrote: > No need to cast a void pointer. > > Signed-off-by: Olaf Hering Applied to hyperv-next. I also changed "tools" to "drivers". > --- > drivers/hv/vmbus_drv.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c > index 910b6e90866c..187809977360 100644 > --- a/drivers/hv/vmbus_drv.c > +++ b/drivers/hv/vmbus_drv.c > @@ -83,7 +83,7 @@ static int hyperv_panic_event(struct notifier_block *nb, > unsigned long val, > static int hyperv_die_event(struct notifier_block *nb, unsigned long val, > void *args) > { > - struct die_args *die = (struct die_args *)args; > + struct die_args *die = args; > struct pt_regs *regs = die->regs; > > /* Don't notify Hyper-V if the die event is other than oops */
Re: [PATCH v3] Drivers: hv: vmbus: Add /sys/bus/vmbus/hibernation
On Thu, Jan 07, 2021 at 06:08:51PM +, Michael Kelley wrote: > From: Dexuan Cui Sent: Wednesday, January 6, 2021 5:46 > PM > > > > When a Linux VM runs on Hyper-V, if the host toolstack doesn't support > > hibernation for the VM (this happens on old Hyper-V hosts like Windows > > Server 2016, or new Hyper-V hosts if the admin or user doesn't declare > > the hibernation intent for the VM), the VM is discouraged from trying > > hibernation (because the host doesn't guarantee that the VM's virtual > > hardware configuration will remain exactly the same across hibernation), > > i.e. the VM should not try to set up the swap partition/file for > > hibernation, etc. > > > > x86 Hyper-V uses the presence of the virtual ACPI S4 state as the > > indication of the host toolstack support for a VM. Currently there is > > no easy and reliable way for the userspace to detect the presence of > > the state (see https://lkml.org/lkml/2020/12/11/1097). Add > > /sys/bus/vmbus/hibernation for this purpose. > > > > Signed-off-by: Dexuan Cui [...] > > Reviewed-by: Michael Kelley Queued for hyperv-next. Thanks. Wei. >
[GIT PULL] Hyper-V fixes for 5.11-rc4
Hi Linus, The following changes since commit e71ba9452f0b5b2e8dc8aa5445198cd9214a6a62: Linux 5.11-rc2 (2021-01-03 15:55:30 -0800) are available in the Git repository at: ssh://g...@gitolite.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git tags/hyperv-fixes-signed-20210111 for you to fetch changes up to ad0a6bad44758afa3b440c254a24999a0c7e35d5: x86/hyperv: check cpu mask after interrupt has been disabled (2021-01-06 11:03:16 +) hyperv-fixes for 5.11-rc4 - One patch from Dexuan Cui to fix kexec panic/hang - One patch from Wei Liu to fix occasional crashes when flushing TLB Dexuan Cui (1): x86/hyperv: Fix kexec panic/hang issues Wei Liu (1): x86/hyperv: check cpu mask after interrupt has been disabled arch/x86/hyperv/hv_init.c | 4 arch/x86/hyperv/mmu.c | 12 +--- arch/x86/include/asm/mshyperv.h | 2 ++ arch/x86/kernel/cpu/mshyperv.c | 18 ++ drivers/hv/vmbus_drv.c | 2 -- 5 files changed, 33 insertions(+), 5 deletions(-)
Re: [PATCH] Revert "scsi: storvsc: Validate length of incoming packet in storvsc_on_channel_callback()"
On Fri, Dec 11, 2020 at 02:14:04PM +0100, Andrea Parri (Microsoft) wrote: > This reverts commit 3b8c72d076c42bf27284cda7b2b2b522810686f8. > > Dexuan reported a regression where StorVSC fails to probe a device (and > where, consequently, the VM may fail to boot). The root-cause analysis > led to a long-standing race condition that is exposed by the validation > /commit in question. Let's put the new validation aside until a proper > solution for that race condition is in place. > > Signed-off-by: Andrea Parri (Microsoft) > Cc: Dexuan Cui > Cc: "James E.J. Bottomley" > Cc: "Martin K. Petersen" > Cc: linux-s...@vger.kernel.org Hi Martin Sorry for the last minute patch. We would very like this goes into 5.10 if possible; otherwise Linux 5.10 is going to be broken on Hyper-V. :-( Wei.
Re: [PATCH] Revert "scsi: storvsc: Validate length of incoming packet in storvsc_on_channel_callback()"
On Fri, Dec 11, 2020 at 09:59:34AM -0500, Martin K. Petersen wrote: > > Wei, > > > Sorry for the last minute patch. We would very like this goes into > > 5.10 if possible; otherwise Linux 5.10 is going to be broken on > > Hyper-V. :-( > > Applied to 5.10/scsi-fixes. Thanks Martin.
Re: [PATCH] drivers/hv: remove obsolete TODO and fix misleading typo in comment
On Sun, Dec 06, 2020 at 11:48:50AM +0100, Stefan Eschenbacher wrote: > Removes an obsolete TODO in the VMBus module and fixes a misleading typo > in the comment for the macro MAX_NUM_CHANNELS, where two digits have been > twisted. > > Signed-off-by: Stefan Eschenbacher > Co-developed-by: Max Stolze > Signed-off-by: Max Stolze Applied to hyperv-next. Thanks. Wei.
[GIT PULL] Hyper-V fixes for 5.11-rc5
Hi Linus, The following changes since commit ad0a6bad44758afa3b440c254a24999a0c7e35d5: x86/hyperv: check cpu mask after interrupt has been disabled (2021-01-06 11:03:16 +) are available in the Git repository at: ssh://g...@gitolite.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git tags/hyperv-fixes-signed-20210119 for you to fetch changes up to fff7b5e6ee63c5d20406a131b260c619cdd24fd1: x86/hyperv: Initialize clockevents after LAPIC is initialized (2021-01-17 15:20:50 +) hyperv-fixes for 5.11-rc5 - One patch from Dexuan to fix clockevent initialization Dexuan Cui (1): x86/hyperv: Initialize clockevents after LAPIC is initialized arch/x86/hyperv/hv_init.c | 29 ++--- 1 file changed, 26 insertions(+), 3 deletions(-)
Re: [PATCH v4 15/17] x86/hyperv: implement an MSI domain for root partition
On Thu, Jan 07, 2021 at 06:21:24AM +0800, kernel test robot wrote: > Hi Wei, > > I love your patch! Perhaps something to improve: > > [auto build test WARNING on e71ba9452f0b5b2e8dc8aa5445198cd9214a6a62] > > url: > https://github.com/0day-ci/linux/commits/Wei-Liu/Introducing-Linux-root-partition-support-for-Microsoft-Hypervisor/20210107-044149 > base:e71ba9452f0b5b2e8dc8aa5445198cd9214a6a62 > config: i386-randconfig-m021-20210106 (attached as .config) > compiler: gcc-9 (Debian 9.3.0-15) 9.3.0 > reproduce (this is a W=1 build): > # > https://github.com/0day-ci/linux/commit/376aee69c6ab18dc23b0386590bee82d59555be8 > git remote add linux-review https://github.com/0day-ci/linux > git fetch --no-tags linux-review > Wei-Liu/Introducing-Linux-root-partition-support-for-Microsoft-Hypervisor/20210107-044149 > git checkout 376aee69c6ab18dc23b0386590bee82d59555be8 > # save the attached .config to linux build tree > make W=1 ARCH=i386 > > If you fix the issue, kindly add following tag as appropriate > Reported-by: kernel test robot > > All warnings (new ones prefixed by >>): > > >> arch/x86/hyperv/irqdomain.c:317:28: warning: no previous prototype for > >> 'hv_create_pci_msi_domain' [-Wmissing-prototypes] > 317 | struct irq_domain * __init hv_create_pci_msi_domain(void) > |^~~~ > > > vim +/hv_create_pci_msi_domain +317 arch/x86/hyperv/irqdomain.c I've fixed this locally by moving the prototype to a header file -- previously it was left in a C file. Wei.
Re: [PATCH v4 16/17] x86/ioapic: export a few functions and data structures via io_apic.h
On Wed, Jan 06, 2021 at 08:33:49PM +, Wei Liu wrote: > We are about to implement an irqchip for IO-APIC when Linux runs as root > on Microsoft Hypervisor. At the same time we would like to reuse > existing code as much as possible. > > Move mp_chip_data to io_apic.h and make a few helper functions > non-static. > > No functional change. > > Signed-off-by: Wei Liu This patch can be dropped. I've figured out a much cleaner way to handle IO-APIC due to recent changes. Wei.
Re: [PATCH v4 17/17] x86/hyperv: handle IO-APIC when running as root
On Wed, Jan 06, 2021 at 08:33:50PM +, Wei Liu wrote: > Just like MSI/MSI-X, IO-APIC interrupts are remapped by Microsoft > Hypervisor when Linux runs as the root partition. Implement an IRQ chip > to handle mapping and unmapping of IO-APIC interrupts. > > Use custom functions for mapping and unmapping ACPI GSIs. They will > issue Microsoft Hypervisor specific hypercalls on top of the native > routines. > This patch is superseded by a new patch I recently wrote. The new approach is to implement a parent IRQ remapping domain for the IO-APIC domain in Linux. It fits cleanly into Linux's architecture. No more horrendous hooking required! :-) Please review the following patch instead. CC IOMMU maintainer. FYI. ---8<--- >From 53d2346c97efdfc6ac3bc435990767bdd29ef311 Mon Sep 17 00:00:00 2001 From: Wei Liu Date: Fri, 8 Jan 2021 21:29:24 + Subject: [PATCH] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition Just like MSI/MSI-X, IO-APIC interrupts are remapped by Microsoft Hypervisor when Linux runs as the root partition. Implement an IRQ domain to handle mapping and unmapping of IO-APIC interrupts. Signed-off-by: Wei Liu --- arch/x86/hyperv/irqdomain.c | 54 ++ arch/x86/include/asm/mshyperv.h | 4 + drivers/iommu/hyperv-iommu.c| 179 +++- 3 files changed, 233 insertions(+), 4 deletions(-) diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c index 19637cd60231..8e2b4e478b70 100644 --- a/arch/x86/hyperv/irqdomain.c +++ b/arch/x86/hyperv/irqdomain.c @@ -330,3 +330,57 @@ struct irq_domain * __init hv_create_pci_msi_domain(void) } #endif /* CONFIG_PCI_MSI */ + +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry) +{ + union hv_device_id device_id; + + device_id.as_uint64 = 0; + device_id.device_type = HV_DEVICE_TYPE_IOAPIC; + device_id.ioapic.ioapic_id = (u8)ioapic_id; + + return hv_unmap_interrupt(device_id.as_uint64, entry) & HV_HYPERCALL_RESULT_MASK; +} +EXPORT_SYMBOL_GPL(hv_unmap_ioapic_interrupt); + +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector, + struct hv_interrupt_entry *entry) +{ + unsigned long flags; + struct hv_input_map_device_interrupt *input; + struct hv_output_map_device_interrupt *output; + union hv_device_id device_id; + struct hv_device_interrupt_descriptor *intr_desc; + u16 status; + + device_id.as_uint64 = 0; + device_id.device_type = HV_DEVICE_TYPE_IOAPIC; + device_id.ioapic.ioapic_id = (u8)ioapic_id; + + local_irq_save(flags); + input = *this_cpu_ptr(hyperv_pcpu_input_arg); + output = *this_cpu_ptr(hyperv_pcpu_output_arg); + memset(input, 0, sizeof(*input)); + intr_desc = &input->interrupt_descriptor; + input->partition_id = hv_current_partition_id; + input->device_id = device_id.as_uint64; + intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED; + intr_desc->target.vector = vector; + intr_desc->vector_count = 1; + + if (level) + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_LEVEL; + else + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE; + + __set_bit(vcpu, (unsigned long *)&intr_desc->target.vp_mask); + + status = hv_do_rep_hypercall(HVCALL_MAP_DEVICE_INTERRUPT, 0, 0, input, output) & +HV_HYPERCALL_RESULT_MASK; + local_irq_restore(flags); + + *entry = output->interrupt_entry; + + return status; +} +EXPORT_SYMBOL_GPL(hv_map_ioapic_interrupt); diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h index ccc849e25d5e..345d7c6f8c37 100644 --- a/arch/x86/include/asm/mshyperv.h +++ b/arch/x86/include/asm/mshyperv.h @@ -263,6 +263,10 @@ static inline void hv_set_msi_entry_from_desc(union hv_msi_entry *msi_entry, struct irq_domain *hv_create_pci_msi_domain(void); +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector, + struct hv_interrupt_entry *entry); +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry); + #else /* CONFIG_HYPERV */ static inline void hyperv_init(void) {} static inline void hyperv_setup_mmu_ops(void) {} diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c index b7db6024e65c..6d35e4c303c6 100644 --- a/drivers/iommu/hyperv-iommu.c +++ b/drivers/iommu/hyperv-iommu.c @@ -116,30 +116,43 @@ static const struct irq_domain_ops hyperv_ir_domain_ops = { .free = hyperv_irq_remapping_free, }; +static const struct irq_domain_ops hyperv_root_ir_domain_ops; static int __init hyperv_prepare_irq_remapping(void) { struct fwnode_handle *fn; int i; + const char *name; + const struct irq_domain_ops *ops; if (!hype
Re: [PATCH -next] x86: Fix unused variable 'msr_val' warning
On Tue, Mar 23, 2021 at 10:43:02AM +0800, Xu Yihang wrote: > Fixes the following W=1 kernel build warning(s): > arch/x86/hyperv/hv_spinlock.c:28:16: warning: variable ‘msr_val’ set but not > used [-Wunused-but-set-variable] > unsigned long msr_val; > > As Hypervisor Top-Level Functional Specification states in chapter 7.5 > Virtual Processor Idle Sleep State, "A partition which possesses the > AccessGuestIdleMsr privilege (refer to section 4.2.2) may trigger entry into > the virtual processor idle sleep state through a read to the > hypervisor-defined MSR HV_X64_MSR_GUEST_IDLE". That means only a read is > necessary, msr_val is not uesed, so potentially cast to void in order to > silent this warning. > > Reference: > https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/reference/tlfs > > Reported-by: Hulk Robot > Signed-off-by: Xu Yihang Applied to hyperv-next. Thanks.
Re: [PATCH -next] x86: Fix unused variable 'hi'
On Tue, Mar 23, 2021 at 11:32:50AM +, Wei Liu wrote: > On Tue, Mar 23, 2021 at 10:50:13AM +0800, Xu Yihang wrote: > > Fixes the following W=1 kernel build warning(s): > > arch/x86/hyperv/hv_apic.c:58:15: warning: variable ‘hi’ set but not used > > [-Wunused-but-set-variable] > > > > Compiled with CONFIG_HYPERV enabled: > > make allmodconfig ARCH=x86_64 CROSS_COMPILE=x86_64-linux-gnu- > > make W=1 arch/x86/hyperv/hv_apic.o ARCH=x86_64 > > CROSS_COMPILE=x86_64-linux-gnu- > > > > HV_X64_MSR_EOI stores on bit 31:0 and HV_X64_MSR_TPR stores in bit 7:0, > > which means higher > > 32 bits are not really used, therefore potentially cast to void in order > > to silent this warning. > > > > Reported-by: Hulk Robot > > Signed-off-by: Xu Yihang > > --- > > arch/x86/hyperv/hv_apic.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c > > index 284e73661a18..a8b639498033 100644 > > --- a/arch/x86/hyperv/hv_apic.c > > +++ b/arch/x86/hyperv/hv_apic.c > > @@ -60,9 +60,11 @@ static u32 hv_apic_read(u32 reg) > > switch (reg) { > > case APIC_EOI: > > rdmsr(HV_X64_MSR_EOI, reg_val, hi); > > + (void) hi; > > return reg_val; > > case APIC_TASKPRI: > > rdmsr(HV_X64_MSR_TPR, reg_val, hi); > > + (void) hi; > > I would like to remove the space while committing this patch. There is > no need for you to do anything. Applied to hyperv-next.
Re: [PATCH v5] x86/Hyper-V: Support for free page reporting
On Tue, Mar 23, 2021 at 06:47:16PM +, Sunil Muthuswamy wrote: > Linux has support for free page reporting now (36e66c554b5c) for > virtualized environment. On Hyper-V when virtually backed VMs are > configured, Hyper-V will advertise cold memory discard capability, > when supported. This patch adds the support to hook into the free > page reporting infrastructure and leverage the Hyper-V cold memory > discard hint hypercall to report/free these pages back to the host. > > Signed-off-by: Sunil Muthuswamy > Tested-by: Matheus Castello Applied to hyperv-next. Thanks.
Re: [PATCH v2] video: hyperv_fb: Fix a double free in hvfb_probe
On Wed, Mar 24, 2021 at 01:46:39PM +, Michael Kelley wrote: > From: Lv Yunlong Sent: Wednesday, March 24, 2021 > 3:37 AM > > > > In function hvfb_probe in hyperv_fb.c, it calls hvfb_getmem(hdev, info) > > and return err when info->apertures is freed. > > > > In the error1 label of hvfb_probe, info->apertures will be freed for the > > second time in framebuffer_release(info). > > > > My patch removes all kfree(info->apertures) instead of set info->apertures > > to NULL. It is because that let framebuffer_release() handle freeing the > > memory flows the fbdev pattern, and less code overall. > > Let me suggest some clarifications in the commit message. It's probably > better not to reference the initial approach of setting info->apertures to > NULL, since there won't be any record of that approach in the commit > history. Here's what I would suggest: > > Function hvfb_probe() calls hvfb_getmem(), expecting upon return that > info->apertures is either NULL or points to memory that should be freed > by framebuffer_release(). But hvfb_getmem() is freeing the memory and > leaving the pointer non-NULL, resulting in a double free if an error > occurs or later if hvfb_remove() is called. > > Fix this by removing all kfree(info->apertures) calls in hvfb_getmem(). > This will allow framebuffer_release() to free the memory, which follows > the pattern of other fbdev drivers. > > Modulo this revision to the commit message, which Wei Liu can > probably incorporate, > Yes. I surely can incorporate the changes. I will also add the Fixes tag. > Reviewed-by: Michael Kelley
Re: [PATCH] video/fbdev: Fix a double free in hvfb_probe
Thanks for your patch. I would like to change the prefix to "video: hyperv_fb:" to be more specific. On Tue, Mar 23, 2021 at 12:33:50AM -0700, Lv Yunlong wrote: > In function hvfb_probe in hyperv_fb.c, it calls hvfb_getmem(hdev, info) > and return err when info->apertures is freed. > > In the error1 label of hvfb_probe, info->apertures will be freed twice > by framebuffer_release(info). > I would say "freed for the second time" here. What you wrote reads to me fraembuffer_release frees the buffer twice all by itself. > My patch sets info->apertures to NULL after it was freed to avoid > double free. > I think this approach works. I would like to give other people a chance to comment though. Fixes: 3a6fb6c4255c ("video: hyperv: hyperv_fb: Use physical memory for fb on HyperV Gen 1 VMs.") > Signed-off-by: Lv Yunlong > --- > drivers/video/fbdev/hyperv_fb.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/video/fbdev/hyperv_fb.c b/drivers/video/fbdev/hyperv_fb.c > index c8b0ae676809..2fc9b507e73a 100644 > --- a/drivers/video/fbdev/hyperv_fb.c > +++ b/drivers/video/fbdev/hyperv_fb.c > @@ -1032,6 +1032,7 @@ static int hvfb_getmem(struct hv_device *hdev, struct > fb_info *info) > if (!pdev) { > pr_err("Unable to find PCI Hyper-V video\n"); > kfree(info->apertures); > + info->apertures = NULL; > return -ENODEV; > } > > @@ -1130,6 +1131,7 @@ static int hvfb_getmem(struct hv_device *hdev, struct > fb_info *info) > pci_dev_put(pdev); > } > kfree(info->apertures); > + info->apertures = NULL; > > return 0; > > @@ -1142,6 +1144,7 @@ static int hvfb_getmem(struct hv_device *hdev, struct > fb_info *info) > if (!gen2vm) > pci_dev_put(pdev); > kfree(info->apertures); > + info->apertures = NULL; > > return -ENOMEM; > } > -- > 2.25.1 > >
Re: [PATCH -next] x86: Fix unused variable 'hi'
On Tue, Mar 23, 2021 at 10:50:13AM +0800, Xu Yihang wrote: > Fixes the following W=1 kernel build warning(s): > arch/x86/hyperv/hv_apic.c:58:15: warning: variable ‘hi’ set but not used > [-Wunused-but-set-variable] > > Compiled with CONFIG_HYPERV enabled: > make allmodconfig ARCH=x86_64 CROSS_COMPILE=x86_64-linux-gnu- > make W=1 arch/x86/hyperv/hv_apic.o ARCH=x86_64 CROSS_COMPILE=x86_64-linux-gnu- > > HV_X64_MSR_EOI stores on bit 31:0 and HV_X64_MSR_TPR stores in bit 7:0, which > means higher > 32 bits are not really used, therefore potentially cast to void in order to > silent this warning. > > Reported-by: Hulk Robot > Signed-off-by: Xu Yihang > --- > arch/x86/hyperv/hv_apic.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c > index 284e73661a18..a8b639498033 100644 > --- a/arch/x86/hyperv/hv_apic.c > +++ b/arch/x86/hyperv/hv_apic.c > @@ -60,9 +60,11 @@ static u32 hv_apic_read(u32 reg) > switch (reg) { > case APIC_EOI: > rdmsr(HV_X64_MSR_EOI, reg_val, hi); > + (void) hi; > return reg_val; > case APIC_TASKPRI: > rdmsr(HV_X64_MSR_TPR, reg_val, hi); > + (void) hi; I would like to remove the space while committing this patch. There is no need for you to do anything. Wei. > return reg_val; > > default: > -- > 2.17.1 >