Re: [kvm-devel] [PATCH 2/6] virtio_config
On Thu, 2007-09-20 at 14:55 +0200, Avi Kivity wrote: > Avi Kivity wrote: > > Rusty Russell wrote: > >> Previous versions of virtio didn't commonalize probing. For every > >> driver, every virtio implementation (KVM, lguest, etc) needed an > >> in-kernel stub to join their bus to the probe code. > >> > >> To solve this, we introduce a "virtio_config" mechanism, which is > >> simply a set of [u8 type][u8 len][...data...] fields for each device. > >> Some convenient wrapper functions make this fairly easy for drivers to > >> unpack their configuration data themselves. > >> > >> This configuration data could be PCI config space, or an unrelated bus > >> (eg. lguest) or manufactured by the kernel itself. It's designed to > >> be extensible: fields get marked as the device reads them so a host > >> supporting some future extension can tell if the guest driver > >> understood it. This also applies to bitfields: the guest explicitly > >> acks the bits it understands. > >> > >> There's also a simple status bitmask useful for low-level host > >> analysis of what the guest is doing with the device. > >> > >> > > > > Lovely. > > > > A side effect of this is that Xen drivers can no longer use virtio. I'm not so sure. We were always assuming that Xen could do state management in its virtio layer. If this is not true, it implies we need hooks in the virtio drivers, and I don't think we've made it worse by making it translate xenbus config info into virtio_config. Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 4/6] virtio block driver
On Thu, 2007-09-20 at 15:05 +0200, Jens Axboe wrote: > On Thu, Sep 20 2007, Rusty Russell wrote: > > +static void end_dequeued_request(struct request *req, > > +struct request_queue *q, int uptodate) > > +{ > > + /* And so the insanity of the block layer infects us here. */ > > + int nsectors = req->hard_nr_sectors; > > + > > + if (blk_pc_request(req)) { > > + nsectors = (req->data_len + 511) >> 9; > > + if (!nsectors) > > + nsectors = 1; > > + } > > + if (end_that_request_first(req, uptodate, nsectors)) > > + BUG(); > > + add_disk_randomness(req->rq_disk); > > + end_that_request_last(req, uptodate); > > +} > > We have end_queued_request(), lets add end_dequeued_request(). Below. OK, thanks, I'll throw that in the mix and test... > > + vblk->sg[0].page = virt_to_page(&vbr->out_hdr); > > + vblk->sg[0].offset = offset_in_page(&vbr->out_hdr); > > + vblk->sg[0].length = sizeof(vbr->out_hdr); > > + num = blk_rq_map_sg(q, vbr->req, vblk->sg+1); > > This wont work for chained sglists, but I gather (I'm so funny) that it > wont be an issue for you. How large are your sglists? Hmm, potentially extremely large. What do I need to do for chained sglists? > > + if (!do_req(q, vblk, req)) { > > + /* Queue full? Wait. */ > > + blk_stop_queue(q); > > + break; > > + } > > Personally I think this bool stuff is foul. You return false/true, but > still use ! to test. That is just more confusing than the canonical 0/1 > good/bad return imho. Except "0/1" is not canonical in the kernel. Arguably, "-errno/0" is canonical. OTOH, bool is clear. if do_req() fails, we assume the queue is full. I shall change the comment to that effect. Thanks! Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [kvm-devel] [PATCH 1/6] virtio interace
On Thu, 2007-09-20 at 14:27 +0200, Avi Kivity wrote: > Rusty Russell wrote: > > +struct virtio_backend_ops virtio_backend_ops; > > +EXPORT_SYMBOL_GPL(virtio_backend_ops); > > Suggest calling this virtio_transport_ops rather than the too-generic > virtio_backend_ops. Especially since Xen uses backend for something > completely different. Hi Avi, Agreed, fixed. I actually don't like this interface at all, but it is simple until we determine something better. > > +/** > > + * virtqueue_ops - operations for virtqueue abstraction layer > > + * @new_vq: create a new virtqueue > > + * config: the virtio_config_space field describing the queue > > + * off: the offset in the config space of the queue configuration > > + * len: the length of the virtio_config_space field > > > > 'off, len' are really a magic cookie. Why does the interface care about > their meaning? off is a cookie, len isn't. The driver does "get me the foo field" and it returns off + len. If it wants to read or write the foo field it needs that length. In practice, drivers use the virtio_config_vq() convenience interface which does "find field, hand to ops->new_vq". > > + * callback: the driver callback when the queue is used. > > > > Missing callback return value description. Indeed, that's always made me uncomfortable. Fixed. Although it's possible that an explicit disable hook is better... > > + * @kick: update after add_buf > > + * vq: the struct virtqueue > > + * After one or more add_buf calls, invoke this to kick the virtio layer. > > > > 'the other side' Thanks, fixed. > I'm not thrilled about reusing pci ids. Maybe the s390 can say whether > this is a real issue. I doubt it: it's not a problem for lguest. But I wonder if I've chosen the right fields... Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 4/6] virtio block driver
On Fri, Sep 21 2007, Rusty Russell wrote: > On Thu, 2007-09-20 at 15:05 +0200, Jens Axboe wrote: > > On Thu, Sep 20 2007, Rusty Russell wrote: > > > +static void end_dequeued_request(struct request *req, > > > + struct request_queue *q, int uptodate) > > > +{ > > > + /* And so the insanity of the block layer infects us here. */ > > > + int nsectors = req->hard_nr_sectors; > > > + > > > + if (blk_pc_request(req)) { > > > + nsectors = (req->data_len + 511) >> 9; > > > + if (!nsectors) > > > + nsectors = 1; > > > + } > > > + if (end_that_request_first(req, uptodate, nsectors)) > > > + BUG(); > > > + add_disk_randomness(req->rq_disk); > > > + end_that_request_last(req, uptodate); > > > +} > > > > We have end_queued_request(), lets add end_dequeued_request(). Below. > > OK, thanks, I'll throw that in the mix and test... I've queued it for 2.6.24 as well, so should be in mainline in that time frame. > > > + vblk->sg[0].page = virt_to_page(&vbr->out_hdr); > > > + vblk->sg[0].offset = offset_in_page(&vbr->out_hdr); > > > + vblk->sg[0].length = sizeof(vbr->out_hdr); > > > + num = blk_rq_map_sg(q, vbr->req, vblk->sg+1); > > > > This wont work for chained sglists, but I gather (I'm so funny) that it > > wont be an issue for you. How large are your sglists? > > Hmm, potentially extremely large. What do I need to do for chained > sglists? Not a lot, actually. You snipped the problematic part in your reply, though: vblk->sg[num+1].page = virt_to_page(&vbr->in_hdr); which assumes that sg is a contigious piece of memory, for chained sg lists that isn't true. sg chaining will be in 2.6.24, so if you really do need large sglists, then that's the way to go. blk_rq_map_sg() maps correctly for you, no changes needed there. But you want to use sg_last() for adding to the end of the sglist. And then use sg_next() to retrieve the next sg segment instead of sg + 1, and for_each_sg() to loop over all segments. Just something to keep in mind, if you plan on merging this post 2.6.23. > > > + if (!do_req(q, vblk, req)) { > > > + /* Queue full? Wait. */ > > > + blk_stop_queue(q); > > > + break; > > > + } > > > > Personally I think this bool stuff is foul. You return false/true, but > > still use ! to test. That is just more confusing than the canonical 0/1 > > good/bad return imho. > > Except "0/1" is not canonical in the kernel. Arguably, "-errno/0" is > canonical. OTOH, bool is clear. -errno/0, then. 1 is typically used for 'failure without specific error number' when -Exx doesn't apply. Like the above :-) But lets just agree to disagree on the bool. > if do_req() fails, we assume the queue is full. I shall change the > comment to that effect. Thanks! -- Jens Axboe ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 0/6] virtio with config abstraction and ring implementation
On Thu, 2007-09-20 at 15:43 +0200, Dor Laor wrote: > Rusty Russell wrote: > > Drivers now unpack their own configuration: their probe() methods > > are > > uniform. The configuration mechanism is extensible and can be backed by > > PCI, a string of bytes, or something else. > I like the separation of the ring code, the improved descriptors and > the notify too. > Regarding the pci config space, I rather see config_ops type of > operations to let > the 390/xen/other implementations jump on our wagon. It's possible to abstract at the find_config() level, but it's also not too bad to linearize any existing configuration. I chose to change the lguest device page to use a linearized format, but I could have adapted the old device info struct in the kernel without too much hassle: FYI, here's the lguest snippet, for example: +struct lguest_config { + struct virtio_config_space v; + + /* Status pointer: 4 bytes, then comes the config space itself. */ + u8 *status; +}; ... +static void lguest_writeb(struct virtio_config_space *v, unsigned off, u8 b) +{ + struct lguest_config *c = container_of(v, struct lguest_config, v); + + c->status[4 + off] = b; +} + +static u8 lguest_readb(struct virtio_config_space *v, unsigned off) +{ + struct lguest_config *c = container_of(v, struct lguest_config, v); + + return c->status[4 + off]; +} + +static void lguest_set_status(struct virtio_config_space *v, u32 status) +{ + struct lguest_config *c = container_of(v, struct lguest_config, v); + + memcpy(c->status, &status, sizeof(status)); +} + +static u32 lguest_get_status(struct virtio_config_space *v) +{ + struct lguest_config *c = container_of(v, struct lguest_config, v); + u32 status; + + memcpy(&status, c->status, sizeof(status)); + return status; +} ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [kvm-devel] [PATCH 3/6] virtio net driver
On Fri, 2007-09-21 at 12:48 +0200, Christian Borntraeger wrote: > Am Donnerstag, 20. September 2007 schrieb Rusty Russell: > > The network driver uses *two* virtqueues: one for input packets and > > one for output packets. This has nice locking properties (ie. we > > don't do any for recv vs send). > [...] > > 3) Resolve freeing of old xmit skbs (someone sent patch IIRC?) > > Yes, that was me. I am quite busy at the moment but I will send a refreshed > patch soon. The most annoying fact of my current patch is, that I have to add > locking. (Because of the only one operation per virtqueue at a time rule) Sorry Christian, I thought it was you but was in a hurry to send out the patches so didn't go back and check. Can't we just re-use the default xmit lock? > [...] > > +struct virtnet_info > > +{ > > + struct virtqueue_ops *vq_ops; > > + struct virtqueue *vq_recv; > > + struct virtqueue *vq_send; > > + struct net_device *ndev; > > This is only a matter of taste, but I like netdev or dev more than ndev. Yeah, I agreed. It was a moment of weakness: I've renamed it to "dev". Thanks! Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 4/6] virtio block driver
On Thu, 2007-09-20 at 14:27 +0200, Jens Axboe wrote: > On Thu, Sep 20 2007, Rusty Russell wrote: > > The block driver uses scatter-gather lists with sg[0] being the > > request information (struct virtio_blk_outhdr) with the type, sector > > and inbuf id. The next N sg entries are the bio itself, then the last > > sg is the status byte. Whether the N entries are in or out depends on > > whether it's a read or a write. > > > > We accept the normal (SCSI) ioctls: they get handed through to the other > > side which can then handle it or reply that it's unsupported. It's > > not clear that this actually works in general, since I don't know > > if blk_pc_request() requests have an accurate rq_data_dir(). > > They should, if they imply a data transfer. OK, good. We rely on that to mark input vs output sg elements. I was wondering if there was some weird requests which did both input and output. > > Although we try to reply -ENOTTY on unsupported commands, the block > > layer in its infinite wisdom suppressed the error so ioctl(fd, > > CDROMEJECT) returns success to userspace. > > How about ever submitting a patch for that, instead of just repeatedly > complaining about it? To be fair, I've left the complaint in that same patch, you're just reading it repeatedly 8) I shall look through the code and see if I can figure out how to fix it. I'm assuming from your response that there's not some strange reason to preserve current behaviour. Thanks! Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 4/6] virtio block driver
On Fri, Sep 21 2007, Rusty Russell wrote: > On Thu, 2007-09-20 at 14:27 +0200, Jens Axboe wrote: > > On Thu, Sep 20 2007, Rusty Russell wrote: > > > The block driver uses scatter-gather lists with sg[0] being the > > > request information (struct virtio_blk_outhdr) with the type, sector > > > and inbuf id. The next N sg entries are the bio itself, then the last > > > sg is the status byte. Whether the N entries are in or out depends on > > > whether it's a read or a write. > > > > > > We accept the normal (SCSI) ioctls: they get handed through to the other > > > side which can then handle it or reply that it's unsupported. It's > > > not clear that this actually works in general, since I don't know > > > if blk_pc_request() requests have an accurate rq_data_dir(). > > > > They should, if they imply a data transfer. > > OK, good. We rely on that to mark input vs output sg elements. I was > wondering if there was some weird requests which did both input and > output. There very soon will be bidirectional requests with both in and out elements, but they will be clearly marked as such (and the driver needs to be marked capable). So nothing to worry about for now. > > > Although we try to reply -ENOTTY on unsupported commands, the block > > > layer in its infinite wisdom suppressed the error so ioctl(fd, > > > CDROMEJECT) returns success to userspace. > > > > How about ever submitting a patch for that, instead of just repeatedly > > complaining about it? > > To be fair, I've left the complaint in that same patch, you're just > reading it repeatedly 8) That may be the case :-) > I shall look through the code and see if I can figure out how to fix it. > I'm assuming from your response that there's not some strange reason to > preserve current behaviour. It surely sounds like a bug, if you issue ioctl(fd, CDROMEJECT), the driver sees it and returns -ENOTTY, but userland sees a 0 retval. So if you have time, please do poke at it a bit. -- Jens Axboe ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/6] virtio interace
On Thursday 20 September 2007, Rusty Russell wrote: > + * virtio_driver - operations for a virtio I/O driver > + * @name: the name of the driver (KBUILD_MODNAME). > + * @owner: the module which contains these routines (ie. THIS_MODULE). > + * @id_table: the ids (we re-use PCI ids) serviced by this driver. > + * @probe: the function to call when a device is found. Returns a token for > + * remove, or PTR_ERR(). > + * @remove: the function when a device is removed. > + */ > +struct virtio_driver { > + const char *name; > + struct module *owner; > + struct pci_device_id *id_table; > + void *(*probe)(struct device *device, > + struct virtio_config_space *config, > + struct virtqueue_ops *vqops); > + void (*remove)(void *dev); > +}; > + > +int register_virtio_driver(struct virtio_driver *drv); > +void unregister_virtio_driver(struct virtio_driver *drv); > + > +/* The particular virtio backend supplies these. */ > +struct virtio_backend_ops { > + int (*register_driver)(struct virtio_driver *drv); > + void (*unregister_driver)(struct virtio_driver *drv); > +}; > +extern struct virtio_backend_ops virtio_backend_ops; This still seems a little awkward. From what I understand, you register a virtio_driver, which leads to a pci_driver (or whatever you are based on) to be registered behind the covers, so that the pci_device can be used directly as the virtio device. I think there should instead be a pci_driver that automatically binds to all PCI based virtio imlpementations and creates a child device for the actual virtio_device. Then you can have the virtio_driver itself be based on a device_driver, and you can get rid of the global virtio_backend_ops. That will be useful when a virtual machine has two ways to get at the virtio devices, e.g. a KVM guest that has both hcall based probing for virtio devices and some other virtio devices that are exported through PCI. Arnd <>< ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [kvm-devel] [PATCH 6/6] virtio ring helper
On Thu, 2007-09-20 at 14:43 +0200, Avi Kivity wrote: > Rusty Russell wrote: > > These helper routines supply most of the virtqueue_ops for hypervisors > > which want to use a ring for virtio. Unlike the previous lguest > > implementation: > > > > 3) The page numbers are always 64 bit (PAE anyone?) > > > > 32 bits of page numbers give 44 bits of physical address on x86. That's > 16TB per guest. Admittedly it's smaller on a VAX. I like to feel that I make these mistakes to ensure others are paying attention. However, it does mean that I can just put an address in there and increase the length field to 32 bits. Much rejoicing. Will fix and resend tomorrow (Friday is in-office-away-from-test-machine day). > I don't like the chaining and would prefer the descriptor to refer to an > array of page descriptors. However I can't complain since this isn't > common code. The intent is for kvm to use it. I'll certainly consider your patches, although I suspect that managing descriptor pages will make things ugly enough to cause you to reconsider. Thanks! Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/6] virtio interace
On Fri, 2007-09-21 at 14:05 +0200, Arnd Bergmann wrote: > On Thursday 20 September 2007, Rusty Russell wrote: > > +int register_virtio_driver(struct virtio_driver *drv); > > +void unregister_virtio_driver(struct virtio_driver *drv); > > + > > +/* The particular virtio backend supplies these. */ > > +struct virtio_backend_ops { > > + int (*register_driver)(struct virtio_driver *drv); > > + void (*unregister_driver)(struct virtio_driver *drv); > > +}; > > +extern struct virtio_backend_ops virtio_backend_ops; > > This still seems a little awkward. From what I understand, you register > a virtio_driver, which leads to a pci_driver (or whatever you are based on) > to be registered behind the covers, so that the pci_device can be used > directly as the virtio device. Hi Arnd, Yes, and I dislike it too. > I think there should instead be a pci_driver that automatically binds > to all PCI based virtio imlpementations and creates a child device for > the actual virtio_device. I'm not sure I understand. For PCI probing to work, you want to have identified yourself as a driver for each PCI id claimed by a virtio device. Hmm, I guess we could have a PCI driver which claims all VIRTIO vendor devices. Then it can call virtio_find_driver() (?) at the top of its probe function to find if there's a matching virtio driver. This PCI driver would have to be initialized after all the virtio drivers are registered, but that's easy. The virtio layer would simply maintain a linked list of drivers and implement the virtio_find_driver() matching function. And since we've suppressed normal PCI driver request_module (since it always finds "the driver") then we can implement that in virtio_find_driver(), and not use a PCI MODULE_DEVICE_TABLE. Then we don't need (full) PCI ids at all. OK, I have some coding to do now... Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 4/6] virtio block driver
On Fri, 2007-09-21 at 13:47 +0200, Jens Axboe wrote: > On Fri, Sep 21 2007, Rusty Russell wrote: > > On Thu, 2007-09-20 at 15:05 +0200, Jens Axboe wrote: > > > We have end_queued_request(), lets add end_dequeued_request(). Below. > > > > OK, thanks, I'll throw that in the mix and test... > > I've queued it for 2.6.24 as well, so should be in mainline in that time > frame. OK, I'll sit it in my queue so my patches work until then. > > > > + vblk->sg[0].page = virt_to_page(&vbr->out_hdr); > > > > + vblk->sg[0].offset = offset_in_page(&vbr->out_hdr); > > > > + vblk->sg[0].length = sizeof(vbr->out_hdr); > > > > + num = blk_rq_map_sg(q, vbr->req, vblk->sg+1); > > > > > > This wont work for chained sglists, but I gather (I'm so funny) that it > > > wont be an issue for you. How large are your sglists? > > > > Hmm, potentially extremely large. What do I need to do for chained > > sglists? > > Not a lot, actually. You snipped the problematic part in your reply, > though: > > vblk->sg[num+1].page = virt_to_page(&vbr->in_hdr); > > which assumes that sg is a contigious piece of memory, for chained sg > lists that isn't true. sg chaining will be in 2.6.24, so if you really > do need large sglists, then that's the way to go. I'm not sure if I need large sglists here. Obviously I want to handle anything the block layer can give to me (assume you're going to increase MAX_PHYS_SEGMENTS soon?). This might make "vblk" too big to kmalloc easily: struct virtio_blk { ... /* Scatterlist: can be too big for stack. */ struct scatterlist sg[3+MAX_PHYS_SEGMENTS]; }; The sglist for virtio is really a scratch space to represent the request buffers (with one element for metadata at start and one at end) which we hand through to the virtio layer which turns it into its internal descriptors for the host to read. Have you thought of putting an sglist inside the req itself? Then instead of blk_rq_map_sg() it'd just be req->sg? It could just be a chain of sg's in each bio I guess. This would allow me to just to borrow that request sg chain, rather than doing the req -> sg -> virtio double-conversion. > > Except "0/1" is not canonical in the kernel. Arguably, "-errno/0" is > > canonical. OTOH, bool is clear. > > -errno/0, then. 1 is typically used for 'failure without specific error > number' when -Exx doesn't apply. Like the above :-) See, I'd be absolutely sure that >= 0 is success (like all syscalls), and I'd be appalled by code which used it as failure. So I think you're right that we should agree to disagree :) Thanks! Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [kvm-devel] [PATCH 3/6] virtio net driver
Am Freitag, 21. September 2007 schrieb Rusty Russell: > Can't we just re-use the default xmit lock? yes we can. This patch is untested but is basically an refresh of an ealier version. I currently have no code testable with the latest virtio version from this mail thread, so if you could apply this (It still has the ndev name) and test the patch? Thanks Christian --- Use the sent interrupt for xmit reclaim. --- drivers/net/virtio_net.c | 43 ++- 1 file changed, 26 insertions(+), 17 deletions(-) Index: virtio/drivers/net/virtio_net.c === --- virtio.orig/drivers/net/virtio_net.c +++ virtio/drivers/net/virtio_net.c @@ -52,10 +52,29 @@ static inline void vnet_hdr_to_sg(struct sg_init_one(sg, skb_vnet_hdr(skb), sizeof(struct virtio_net_hdr)); } +static void free_old_xmit_skbs(struct net_device *dev) +{ + struct virtnet_info *vi = netdev_priv(dev); + struct sk_buff *skb; + unsigned int len; + + netif_tx_lock(dev); + while ((skb = vi->vq_ops->get_buf(vi->vq_send, &len)) != NULL) { + pr_debug("Sent skb %p\n", skb); + __skb_unlink(skb, &vi->send); + dev->stats.tx_bytes += len; + dev->stats.tx_packets++; + dev_kfree_skb_irq(skb); + } + netif_tx_unlock(dev); +} + static bool skb_xmit_done(void *ndev) { /* In case we were waiting for output buffers. */ netif_wake_queue(ndev); + /* reclaim sent skbs */ + fre_old_xmit_skbs(ndev); return true; } @@ -209,20 +228,6 @@ again: return 0; } -static void free_old_xmit_skbs(struct virtnet_info *vi) -{ - struct sk_buff *skb; - unsigned int len; - - while ((skb = vi->vq_ops->get_buf(vi->vq_send, &len)) != NULL) { - pr_debug("Sent skb %p\n", skb); - __skb_unlink(skb, &vi->send); - vi->ndev->stats.tx_bytes += len; - vi->ndev->stats.tx_packets++; - kfree_skb(skb); - } -} - static int start_xmit(struct sk_buff *skb, struct net_device *dev) { struct virtnet_info *vi = netdev_priv(dev); @@ -235,8 +240,6 @@ static int start_xmit(struct sk_buff *sk dev->name, skb, dest[0], dest[1], dest[2], dest[3], dest[4], dest[5]); - free_old_xmit_skbs(vi); - /* Encode metadata header at front. */ hdr = skb_vnet_hdr(skb); if (skb->ip_summed == CHECKSUM_PARTIAL) { @@ -267,15 +270,21 @@ static int start_xmit(struct sk_buff *sk vnet_hdr_to_sg(sg, skb); num = skb_to_sgvec(skb, sg+1, 0, skb->len) + 1; + local_irq_disable(); + netif_tx_lock(dev); __skb_queue_head(&vi->send, skb); err = vi->vq_ops->add_buf(vi->vq_send, sg, num, 0, skb); if (err) { pr_debug("%s: virtio not prepared to send\n", dev->name); skb_unlink(skb, &vi->send); netif_stop_queue(dev); + netif_tx_unlock(dev); + local_irq_enable(); return NETDEV_TX_BUSY; } vi->vq_ops->kick(vi->vq_send); + netif_tx_unlock(dev); + local_irq_enable(); return 0; } @@ -335,7 +344,7 @@ static void *virtnet_probe(struct device dev->poll = virtnet_poll; dev->hard_start_xmit = start_xmit; dev->weight = 16; - dev->features = NETIF_F_HIGHDMA; + dev->features = NETIF_F_HIGHDMA | NETIF_F_LLTX; SET_NETDEV_DEV(dev, device); /* Do we support "hardware" checksums? */ ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [kvm-devel] [PATCH 3/6] virtio net driver
Am Freitag, 21. September 2007 schrieb Herbert Xu: > Please don't use LLTX in new drivers. We're trying to get rid > of it since it's > > 1) unnecessary; > 2) causes problems with AF_PACKET seeing things twice. Ok, but then I cannot reuse the xmit lock in an interrupt handler. Otherwise deadlock can occur because hard_start_xmit has interrupts enabled. Rusty, seems we need an additional lock. Christian ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [kvm-devel] [PATCH 3/6] virtio net driver
Am Donnerstag, 20. September 2007 schrieb Rusty Russell: > The network driver uses *two* virtqueues: one for input packets and > one for output packets. This has nice locking properties (ie. we > don't do any for recv vs send). [...] > 3) Resolve freeing of old xmit skbs (someone sent patch IIRC?) Yes, that was me. I am quite busy at the moment but I will send a refreshed patch soon. The most annoying fact of my current patch is, that I have to add locking. (Because of the only one operation per virtqueue at a time rule) [...] > +struct virtnet_info > +{ > + struct virtqueue_ops *vq_ops; > + struct virtqueue *vq_recv; > + struct virtqueue *vq_send; > + struct net_device *ndev; This is only a matter of taste, but I like netdev or dev more than ndev. [...] Everything else looks sane. 20-minutes-code-review-by: Christian Borntraeger <[EMAIL PROTECTED]> Christian ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/6] virtio interace
On Friday 21 September 2007, Rusty Russell wrote: > Hmm, I guess we could have a PCI driver which claims all VIRTIO vendor > devices. yes, that was the idea. > Then it can call virtio_find_driver() (?) at the top of its > probe function to find if there's a matching virtio driver. > This PCI driver would have to be initialized after all the virtio > drivers are registered, but that's easy. No, just use the driver model, instead of working against it: struct pci_virtio_device { struct pci_dev *pdev; char __iomem *mmio_space; struct virtio_device vdev; }; static int __devinit pci_virtio_probe(struct pci_dev *pdev, const struct pci_device_id *ent) { struct pci_virtio_device *dev = kzalloc(sizeof (*dev), GFP_KERNEL); dev->pdev = pdev; dev->mmio_space = pcim_iomap(pdev, 0, PCI_VIRTIO_BUFSIZE); dev->vdev->ops = &pci_virtqueue_ops; dev->vdev->config = &pci_virtio_config_ops; dev->vdev->type = ent->device; dev->vdev->class = ent->class; dev->vdev.dev.parent = &pdev->dev; return virtio_device_register(&dev->vdev; } > The virtio layer would simply maintain a linked list of drivers and > implement the virtio_find_driver() matching function. nonono, just a virtio_bus that all virtio drivers register to: static int virtio_net_probe(struct device *dev) { struct virtio_device *vdev = to_virtio_dev(dev); struct virtqueue_ops *vq_ops = vdev->ops; /* same as current code */ ... return 0; } static struct virtio_device_id virtio_net_ids[] = { { .type = VIRTIO_ID_NET, .class = PCI_CLASS_NETWORK_OTHER }, { }, }; static struct virtio_driver virtio_net = { .id_table = &virtio_net_ids, .driver = { .name = "virtionet", .probe = virtio_net_probe, .remove = virtionet_remove, .bus = &virtio_bus,/* <- look here */ }, }; static int __init virtio_net_init(void) { return driver_register(&virtio_net.driver); } module_init(virtio_net_init); > And since we've suppressed normal PCI driver request_module (since it > always finds "the driver") then we can implement that in > virtio_find_driver(), and not use a PCI MODULE_DEVICE_TABLE. Then we > don't need (full) PCI ids at all. right, as shown in my example above. Arnd <>< ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [kvm-devel] [PATCH 3/6] virtio net driver
On Fri, Sep 21, 2007 at 02:36:43PM +0200, Christian Borntraeger wrote: > > @@ -335,7 +344,7 @@ static void *virtnet_probe(struct device > dev->poll = virtnet_poll; > dev->hard_start_xmit = start_xmit; > dev->weight = 16; > - dev->features = NETIF_F_HIGHDMA; > + dev->features = NETIF_F_HIGHDMA | NETIF_F_LLTX; > SET_NETDEV_DEV(dev, device); > > /* Do we support "hardware" checksums? */ Please don't use LLTX in new drivers. We're trying to get rid of it since it's 1) unnecessary; 2) causes problems with AF_PACKET seeing things twice. Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization