Re: [PATCH] stopmachine: add stopmachine_timeout

2008-07-15 Thread Rusty Russell
On Tuesday 15 July 2008 12:24:54 Max Krasnyansky wrote:
> Heiko Carstens wrote:
> > On Mon, Jul 14, 2008 at 11:56:18AM -0700, Jeremy Fitzhardinge wrote:
> > This is asking for trouble... a config option to disable this would be
> > nice. But as I don't know which problem this patch originally addresses
> > it might be that this is needed anyway. So lets see why we need it first.
>
> How about this. We'll make this a sysctl, as Rusty already did, and set the
> default to 0 which means "never timeout". That way crazy people like me who
> care about this scenario can enable this feature.

Indeed, this was my thought too.  s390 can initialize it to zero somewhere in 
their boot code.

> btw Rusty, I just had this "why didn't I think of that" moments. This is
> actually another way of handling my workload. I mean it certainly does not
> fix the root case of the problems and we still need other things that we
> talked about (non-blocking module delete, lock-free module insertion, etc)
> but at least in the mean time it avoids wedging the machines for good.
> btw I'd like that timeout in milliseconds. I think 5 seconds is way to
> long :).

We can make it ms, sure.  200ms should be plenty of time: worst I ever saw was 
150ms, and that was some weird Power box doing crazy stuff.  I wouldn't be 
surprised if you'd never see 1ms on your hardware.

The ipi idea would handle your case a little more nicely, too, but that's 
probably not going to hit this merge window.

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 4/5] virtio net: Allow receiving SG packets

2008-07-15 Thread Rusty Russell
On Tuesday 15 July 2008 13:52:09 Herbert Xu wrote:
> On Mon, Jul 14, 2008 at 10:41:38PM -0500, Rusty Russell wrote:
> > +   /* If we can receive ANY GSO packets, we must allocate large ones. */
> > +   if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4)
> > +   || virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6)
> > +   || virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN))
> > +   dev->features |= NETIF_F_LRO;
>
> We don't want to do this because LRO implies joining packets
> together in an irreversible way which is not what this is about.
>
> Case in point we're now disabling LRO for devices that are
> forwarding packets which is totally unnecessary for virtio.

Oops.  I grepped for LRO when I did this and found nothing.

How's this one?

Subject: virtio net: Allow receiving SG packets
Date: Fri, 18 Apr 2008 11:24:27 +0800
From: Herbert Xu <[EMAIL PROTECTED]>

Finally this patch lets virtio_net receive GSO packets in addition
to sending them.  This can definitely be optimised for the non-GSO
case.  For comparison the Xen approach stores one page in each skb
and uses subsequent skb's pages to construct an SG skb instead of
preallocating the maximum amount of pages per skb.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]> (added feature bits)
---
 drivers/net/virtio_net.c |   41 -
 1 file changed, 36 insertions(+), 5 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -55,6 +55,9 @@ struct virtnet_info
struct tasklet_struct tasklet;
bool free_in_tasklet;
 
+   /* I like... big packets and I cannot lie! */
+   bool big_packets;
+
/* Receive & send queues. */
struct sk_buff_head recv;
struct sk_buff_head send;
@@ -89,6 +92,7 @@ static void receive_skb(struct net_devic
unsigned len)
 {
struct virtio_net_hdr *hdr = skb_vnet_hdr(skb);
+   int err;
 
if (unlikely(len < sizeof(struct virtio_net_hdr) + ETH_HLEN)) {
pr_debug("%s: short packet %i\n", dev->name, len);
@@ -96,10 +100,14 @@ static void receive_skb(struct net_devic
goto drop;
}
len -= sizeof(struct virtio_net_hdr);
-   BUG_ON(len > MAX_PACKET_LEN);
 
-   skb_trim(skb, len);
-
+   err = pskb_trim(skb, len);
+   if (err) {
+   pr_debug("%s: pskb_trim failed %i %d\n", dev->name, len, err);
+   dev->stats.rx_dropped++;
+   goto drop;
+   }
+   skb->truesize += skb->data_len;
dev->stats.rx_bytes += skb->len;
dev->stats.rx_packets++;
 
@@ -161,7 +169,7 @@ static void try_fill_recv(struct virtnet
 {
struct sk_buff *skb;
struct scatterlist sg[2+MAX_SKB_FRAGS];
-   int num, err;
+   int num, err, i;
 
sg_init_table(sg, 2+MAX_SKB_FRAGS);
for (;;) {
@@ -171,6 +179,24 @@ static void try_fill_recv(struct virtnet
 
skb_put(skb, MAX_PACKET_LEN);
vnet_hdr_to_sg(sg, skb);
+
+   if (vi->big_packets) {
+   for (i = 0; i < MAX_SKB_FRAGS; i++) {
+   skb_frag_t *f = &skb_shinfo(skb)->frags[i];
+   f->page = alloc_page(GFP_ATOMIC);
+   if (!f->page)
+   break;
+
+   f->page_offset = 0;
+   f->size = PAGE_SIZE;
+
+   skb->data_len += PAGE_SIZE;
+   skb->len += PAGE_SIZE;
+
+   skb_shinfo(skb)->nr_frags++;
+   }
+   }
+
num = skb_to_sgvec(skb, sg+1, 0, skb->len) + 1;
skb_queue_head(&vi->recv, skb);
 
@@ -485,6 +511,12 @@ static int virtnet_probe(struct virtio_d
 * the timer. */
vi->free_in_tasklet = virtio_has_feature(vdev,VIRTIO_F_NOTIFY_ON_EMPTY);
 
+   /* If we can receive ANY GSO packets, we must allocate large ones. */
+   if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4)
+   || virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6)
+   || virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN))
+   vi->big_packets = true;
+
/* We expect two virtqueues, receive then send. */
vi->rvq = vdev->config->find_vq(vdev, 0, skb_recv_done);
if (IS_ERR(vi->rvq)) {
@@ -571,7 +603,9 @@ static unsigned int features[] = {
VIRTIO_NET_F_CSUM, VIRTIO_NET_F_GUEST_CSUM,
VIRTIO_NET_F_GSO, VIRTIO_NET_F_MAC,
VIRTIO_NET_F_HOST_TSO4, VIRTIO_NET_F_HOST_UFO, VIRTIO_NET_F_HOST_TSO6,
-   VIRTIO_NET_F_HOST_ECN, VIRTIO_F_NOTIFY_ON_EMPTY,
+   VIRTIO_NET_F_HOST_ECN, VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6,
+   VIRTIO_NET_F_GUEST_ECN, /* We don't yet handle UFO input. */
+   VIRTIO_F_NOTIFY_ON_EMPTY,
 };
 
 stati

Re: [PATCH] stopmachine: add stopmachine_timeout

2008-07-15 Thread Heiko Carstens
On Tue, Jul 15, 2008 at 06:09:59PM +1000, Rusty Russell wrote:
> On Tuesday 15 July 2008 12:24:54 Max Krasnyansky wrote:
> > Heiko Carstens wrote:
> > > On Mon, Jul 14, 2008 at 11:56:18AM -0700, Jeremy Fitzhardinge wrote:
> > > This is asking for trouble... a config option to disable this would be
> > > nice. But as I don't know which problem this patch originally addresses
> > > it might be that this is needed anyway. So lets see why we need it first.
> >
> > How about this. We'll make this a sysctl, as Rusty already did, and set the
> > default to 0 which means "never timeout". That way crazy people like me who
> > care about this scenario can enable this feature.
> 
> Indeed, this was my thought too.  s390 can initialize it to zero somewhere in 
> their boot code.

Shouldn't we default to zero and let whowever wants something different
configure that via sysctl.conf?
Whatever default value > 0 you pick is likely wrong for whatever specific
usecase.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] stopmachine: add stopmachine_timeout

2008-07-15 Thread Max Krasnyansky
Rusty Russell wrote:
> On Tuesday 15 July 2008 12:24:54 Max Krasnyansky wrote:
>> Heiko Carstens wrote:
>>> On Mon, Jul 14, 2008 at 11:56:18AM -0700, Jeremy Fitzhardinge wrote:
>>> This is asking for trouble... a config option to disable this would be
>>> nice. But as I don't know which problem this patch originally addresses
>>> it might be that this is needed anyway. So lets see why we need it first.
>> How about this. We'll make this a sysctl, as Rusty already did, and set the
>> default to 0 which means "never timeout". That way crazy people like me who
>> care about this scenario can enable this feature.
> 
> Indeed, this was my thought too.  s390 can initialize it to zero somewhere in 
> their boot code.
> 
>> btw Rusty, I just had this "why didn't I think of that" moments. This is
>> actually another way of handling my workload. I mean it certainly does not
>> fix the root case of the problems and we still need other things that we
>> talked about (non-blocking module delete, lock-free module insertion, etc)
>> but at least in the mean time it avoids wedging the machines for good.
>> btw I'd like that timeout in milliseconds. I think 5 seconds is way to
>> long :).
> 
> We can make it ms, sure.  200ms should be plenty of time: worst I ever saw 
> was 
> 150ms, and that was some weird Power box doing crazy stuff.  I wouldn't be 
> surprised if you'd never see 1ms on your hardware.
Sounds good.

> The ipi idea would handle your case a little more nicely, too, but that's 
> probably not going to hit this merge window.
Which reminds me that I wanted to submit a bunch of kthread and workqueue
related things in this window :).

Max
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 4/5] virtio net: Allow receiving SG packets

2008-07-15 Thread Herbert Xu
On Tue, Jul 15, 2008 at 06:25:04PM +1000, Rusty Russell wrote:
>
> Oops.  I grepped for LRO when I did this and found nothing.
> 
> How's this one?

Looks good.  Thanks!
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Patch from LKML

2008-07-15 Thread Zachary Amsden

> On Tue, Jul 15, 2008 at 10:33 AM, Suresh Siddha
> <[EMAIL PROTECTED]> wrote:
> > On Sun, Jul 13, 2008 at 10:19:35PM -0700, Yinghai Lu wrote:
> >>
> >> fix for pv.
> >>
> >> Signed-off-by: Yinghai Lu <[EMAIL PROTECTED]>
> >>
> >> ---
> >>  arch/x86/kernel/paravirt.c |5 
> >>  arch/x86/kernel/vmi_32.c   |   51 
> >> ++---
> >>  arch/x86/xen/enlighten.c   |   19 +++-
> >>  include/asm-x86/paravirt.h |   29 -
> >>  4 files changed, 57 insertions(+), 47 deletions(-)
> >>
> >> Index: linux-2.6/arch/x86/kernel/paravirt.c
> >> ===
> >> --- linux-2.6.orig/arch/x86/kernel/paravirt.c
> >> +++ linux-2.6/arch/x86/kernel/paravirt.c
> >> @@ -373,11 +373,6 @@ struct pv_cpu_ops pv_cpu_ops = {
> >>
> >>  struct pv_apic_ops pv_apic_ops = {
> >>  #ifdef CONFIG_X86_LOCAL_APIC
> >> -#ifndef CONFIG_X86_64
> >> -   .apic_write = native_apic_mem_write,
> >> -   .apic_write_atomic = native_apic_mem_write_atomic,
> >> -   .apic_read = native_apic_mem_read,
> >> -#endif
> >> .setup_boot_clock = setup_boot_APIC_clock,
> >> .setup_secondary_clock = setup_secondary_APIC_clock,
> >> .startup_ipi_hook = paravirt_nop,
> >> Index: linux-2.6/arch/x86/kernel/vmi_32.c
> >> ===
> >> --- linux-2.6.orig/arch/x86/kernel/vmi_32.c
> >> +++ linux-2.6/arch/x86/kernel/vmi_32.c
> >> @@ -676,6 +676,50 @@ static inline int __init probe_vmi_rom(v
> >> return 0;
> >>  }
> >>
> >> +#ifdef CONFIG_X86_LOCAL_APIC
> >> +static u32 vmi_apic_read(u32 reg)
> >> +{
> >> +   return 0;
> >> +}
> >> +
> >> +static void vmi_apic_write(u32 reg, u32 val)
> >> +{
> >> +   /* Warn to see if there's any stray references */
> >> +   WARN_ON(1);
> >> +}
> >> +
> >> +static u64 vmi_apic_icr_read(void)
> >> +{
> >> +   return 0;
> >> +}
> >> +
> >> +static void vmi_apic_icr_write(u32 low, u32 id)
> >> +{
> >> +   /* Warn to see if there's any stray references */
> >> +   WARN_ON(1);
> >> +}
> >> +
> >> +static void vmi_apic_wait_icr_idle(void)
> >> +{
> >> +   return;
> >> +}
> >> +
> >> +static u32 vmi_safe_apic_wait_icr_idle(void)
> >> +{
> >> +   return 0;
> >> +}
> >> +
> >> +static struct apic_ops vmi_basic_apic_ops = {
> >> +.read = vmi_apic_read,
> >> +.write = vmi_apic_write,
> >> +.write_atomic = vmi_apic_write,
> >> +.icr_read = vmi_apic_icr_read,
> >> +.icr_write = vmi_apic_icr_write,
> >> +.wait_icr_idle = vmi_apic_wait_icr_idle,
> >> +.safe_wait_icr_idle = vmi_safe_apic_wait_icr_idle,
> >> +};
> >> +#endif
> >> +
> >>  /*
> >>   * VMI setup common to all processors
> >>   */
> >> @@ -904,9 +948,10 @@ static inline int __init activate_vmi(vo
> >>  #endif
> >>
> >>  #ifdef CONFIG_X86_LOCAL_APIC
> >> -   para_fill(pv_apic_ops.apic_read, APICRead);
> >> -   para_fill(pv_apic_ops.apic_write, APICWrite);
> >> -   para_fill(pv_apic_ops.apic_write_atomic, APICWrite);
> >> +   para_fill(vmi_basic_apic_ops.read, APICRead);
> >> +   para_fill(vmi_basic_apic_ops.write, APICWrite);
> >> +   para_fill(vmi_basic_apic_ops.write_atomic, APICWrite);
> >> +   apic_ops = &vmi_basic_apic_ops;
> >
> > Yinghai, Looking more closely at this, based on my understanding this might 
> > be
> > wrong for VMI. Correct patch should be as follows. Any comments?
> 
> so you mean icr related will still use default native member?
> 
> YH

Nacked-by: Zachary Amsden <[EMAIL PROTECTED]>

What are you doing here and why aren't you cc-ing the maintainers?

Thanks,

Zach

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: Patch from LKML

2008-07-15 Thread Yinghai Lu
On Tue, Jul 15, 2008 at 11:38 AM, Zachary Amsden <[EMAIL PROTECTED]> wrote:
>
>> On Tue, Jul 15, 2008 at 10:33 AM, Suresh Siddha
>> <[EMAIL PROTECTED]> wrote:
>> > On Sun, Jul 13, 2008 at 10:19:35PM -0700, Yinghai Lu wrote:
>> >>
>> >> fix for pv.
>> >>
>> >> Signed-off-by: Yinghai Lu <[EMAIL PROTECTED]>
>> >>
>> >> ---
>> >>  arch/x86/kernel/paravirt.c |5 
>> >>  arch/x86/kernel/vmi_32.c   |   51 
>> >> ++---
>> >>  arch/x86/xen/enlighten.c   |   19 +++-
>> >>  include/asm-x86/paravirt.h |   29 -
>> >>  4 files changed, 57 insertions(+), 47 deletions(-)
>> >>
>> >> Index: linux-2.6/arch/x86/kernel/paravirt.c
>> >> ===
>> >> --- linux-2.6.orig/arch/x86/kernel/paravirt.c
>> >> +++ linux-2.6/arch/x86/kernel/paravirt.c
>> >> @@ -373,11 +373,6 @@ struct pv_cpu_ops pv_cpu_ops = {
>> >>
>> >>  struct pv_apic_ops pv_apic_ops = {
>> >>  #ifdef CONFIG_X86_LOCAL_APIC
>> >> -#ifndef CONFIG_X86_64
>> >> -   .apic_write = native_apic_mem_write,
>> >> -   .apic_write_atomic = native_apic_mem_write_atomic,
>> >> -   .apic_read = native_apic_mem_read,
>> >> -#endif
>> >> .setup_boot_clock = setup_boot_APIC_clock,
>> >> .setup_secondary_clock = setup_secondary_APIC_clock,
>> >> .startup_ipi_hook = paravirt_nop,
>> >> Index: linux-2.6/arch/x86/kernel/vmi_32.c
>> >> ===
>> >> --- linux-2.6.orig/arch/x86/kernel/vmi_32.c
>> >> +++ linux-2.6/arch/x86/kernel/vmi_32.c
>> >> @@ -676,6 +676,50 @@ static inline int __init probe_vmi_rom(v
>> >> return 0;
>> >>  }
>> >>
>> >> +#ifdef CONFIG_X86_LOCAL_APIC
>> >> +static u32 vmi_apic_read(u32 reg)
>> >> +{
>> >> +   return 0;
>> >> +}
>> >> +
>> >> +static void vmi_apic_write(u32 reg, u32 val)
>> >> +{
>> >> +   /* Warn to see if there's any stray references */
>> >> +   WARN_ON(1);
>> >> +}
>> >> +
>> >> +static u64 vmi_apic_icr_read(void)
>> >> +{
>> >> +   return 0;
>> >> +}
>> >> +
>> >> +static void vmi_apic_icr_write(u32 low, u32 id)
>> >> +{
>> >> +   /* Warn to see if there's any stray references */
>> >> +   WARN_ON(1);
>> >> +}
>> >> +
>> >> +static void vmi_apic_wait_icr_idle(void)
>> >> +{
>> >> +   return;
>> >> +}
>> >> +
>> >> +static u32 vmi_safe_apic_wait_icr_idle(void)
>> >> +{
>> >> +   return 0;
>> >> +}
>> >> +
>> >> +static struct apic_ops vmi_basic_apic_ops = {
>> >> +.read = vmi_apic_read,
>> >> +.write = vmi_apic_write,
>> >> +.write_atomic = vmi_apic_write,
>> >> +.icr_read = vmi_apic_icr_read,
>> >> +.icr_write = vmi_apic_icr_write,
>> >> +.wait_icr_idle = vmi_apic_wait_icr_idle,
>> >> +.safe_wait_icr_idle = vmi_safe_apic_wait_icr_idle,
>> >> +};
>> >> +#endif
>> >> +
>> >>  /*
>> >>   * VMI setup common to all processors
>> >>   */
>> >> @@ -904,9 +948,10 @@ static inline int __init activate_vmi(vo
>> >>  #endif
>> >>
>> >>  #ifdef CONFIG_X86_LOCAL_APIC
>> >> -   para_fill(pv_apic_ops.apic_read, APICRead);
>> >> -   para_fill(pv_apic_ops.apic_write, APICWrite);
>> >> -   para_fill(pv_apic_ops.apic_write_atomic, APICWrite);
>> >> +   para_fill(vmi_basic_apic_ops.read, APICRead);
>> >> +   para_fill(vmi_basic_apic_ops.write, APICWrite);
>> >> +   para_fill(vmi_basic_apic_ops.write_atomic, APICWrite);
>> >> +   apic_ops = &vmi_basic_apic_ops;
>> >
>> > Yinghai, Looking more closely at this, based on my understanding this 
>> > might be
>> > wrong for VMI. Correct patch should be as follows. Any comments?
>>
>> so you mean icr related will still use default native member?
>>
>> YH
>
> Nacked-by: Zachary Amsden <[EMAIL PROTECTED]>

because of not ccing you?

>
> What are you doing here and why aren't you cc-ing the maintainers?

did you checking tip tree for x86 changing?

YH
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: Patch from LKML

2008-07-15 Thread Zachary Amsden
On Tue, 2008-07-15 at 11:52 -0700, Yinghai Lu wrote:

> > Nacked-by: Zachary Amsden <[EMAIL PROTECTED]>
> 
> because of not ccing you?

Because it's wrong.

> >
> > What are you doing here and why aren't you cc-ing the maintainers?
> 
> did you checking tip tree for x86 changing?

No, this was brought to my attention by someone who spotted it.

Zach

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] x86: let 32bit use apic_ops too - fix

2008-07-15 Thread Zachary Amsden
On Tue, 2008-07-15 at 11:51 -0700, Suresh Siddha wrote:
> On Tue, Jul 15, 2008 at 11:38:50AM -0700, Zachary Amsden wrote:
> >
> > Nacked-by: Zachary Amsden <[EMAIL PROTECTED]>
> >
> > What are you doing here and why aren't you cc-ing the maintainers?
> 
> Sorry. I was about to bring you into the loop.
> 
> Yinghai posted 32bit native apic_ops(similar to my 64bit apic ops patch, which
> is different from pv_apic_ops) which is in tip/x86/x2apic and proposed a fix
> for VMI case aswell.
> 
> Based on my understanding, tip/x86/x2apic git commit
> 94a8c3c2437c8946f1b6c8e0b2c560a7db8ed3c6 is wrong and it should be fixed with
> something like
> http://marc.info/?l=linux-kernel&m=121614328831237&w=2

Looks better, but I need to read more context to find out where the
apic_ops variable comes from; I'll read the list for patches.

You are correct in that we will want to use the same wait_icr_idle
routine as native hardware; it's not clear from just this patch how that
happens.

Also, the VMI operations are sensitive to parameter order because they
interface with an ABI at the other end.  I need to check the parameter
order for apic read / write is still consistent with the ABI.

Zach

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] x86: let 32bit use apic_ops too - fix

2008-07-15 Thread Zachary Amsden
On Tue, 2008-07-15 at 12:10 -0700, Yinghai Lu wrote:
> On Tue, Jul 15, 2008 at 12:04 PM, Zachary Amsden <[EMAIL PROTECTED]> wrote:
> > On Tue, 2008-07-15 at 11:51 -0700, Suresh Siddha wrote:
> >> On Tue, Jul 15, 2008 at 11:38:50AM -0700, Zachary Amsden wrote:
> >> >
> >> > Nacked-by: Zachary Amsden <[EMAIL PROTECTED]>
> >> >
> >> > What are you doing here and why aren't you cc-ing the maintainers?
> >>
> >> Sorry. I was about to bring you into the loop.
> >>
> >> Yinghai posted 32bit native apic_ops(similar to my 64bit apic ops patch, 
> >> which
> >> is different from pv_apic_ops) which is in tip/x86/x2apic and proposed a 
> >> fix
> >> for VMI case aswell.
> >>
> >> Based on my understanding, tip/x86/x2apic git commit
> >> 94a8c3c2437c8946f1b6c8e0b2c560a7db8ed3c6 is wrong and it should be fixed 
> >> with
> >> something like
> >> http://marc.info/?l=linux-kernel&m=121614328831237&w=2


Suresh's patch looks good.  Thanks Suresh!

Acked-by: Zachary Amsden <[EMAIL PROTECTED]>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] x86: let 32bit use apic_ops too - fix

2008-07-15 Thread Yinghai Lu
On Tue, Jul 15, 2008 at 12:04 PM, Zachary Amsden <[EMAIL PROTECTED]> wrote:
> On Tue, 2008-07-15 at 11:51 -0700, Suresh Siddha wrote:
>> On Tue, Jul 15, 2008 at 11:38:50AM -0700, Zachary Amsden wrote:
>> >
>> > Nacked-by: Zachary Amsden <[EMAIL PROTECTED]>
>> >
>> > What are you doing here and why aren't you cc-ing the maintainers?
>>
>> Sorry. I was about to bring you into the loop.
>>
>> Yinghai posted 32bit native apic_ops(similar to my 64bit apic ops patch, 
>> which
>> is different from pv_apic_ops) which is in tip/x86/x2apic and proposed a fix
>> for VMI case aswell.
>>
>> Based on my understanding, tip/x86/x2apic git commit
>> 94a8c3c2437c8946f1b6c8e0b2c560a7db8ed3c6 is wrong and it should be fixed with
>> something like
>> http://marc.info/?l=linux-kernel&m=121614328831237&w=2
>
> Looks better, but I need to read more context to find out where the
> apic_ops variable comes from; I'll read the list for patches.
>
> You are correct in that we will want to use the same wait_icr_idle
> routine as native hardware; it's not clear from just this patch how that
> happens.
>
> Also, the VMI operations are sensitive to parameter order because they
> interface with an ABI at the other end.  I need to check the parameter
> order for apic read / write is still consistent with the ABI.

http://people.redhat.com/mingo/tip.git/readme.txt

mkdir linux-2.6
cd linux-2.6

git init

git remote add linus
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git

git remote add tip
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git

git remote update

git checkout -b tip-latest tip/master

git merge tip/x86/x2apic

YH
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] x86: let 32bit use apic_ops too - fix

2008-07-15 Thread Yinghai Lu
On Tue, Jul 15, 2008 at 11:51 AM, Suresh Siddha
<[EMAIL PROTECTED]> wrote:
> On Tue, Jul 15, 2008 at 11:38:50AM -0700, Zachary Amsden wrote:
>>
>> Nacked-by: Zachary Amsden <[EMAIL PROTECTED]>
>>
>> What are you doing here and why aren't you cc-ing the maintainers?
>
> Sorry. I was about to bring you into the loop.
>
> Yinghai posted 32bit native apic_ops(similar to my 64bit apic ops patch, which
> is different from pv_apic_ops) which is in tip/x86/x2apic and proposed a fix
> for VMI case aswell.
>
> Based on my understanding, tip/x86/x2apic git commit
> 94a8c3c2437c8946f1b6c8e0b2c560a7db8ed3c6 is wrong and it should be fixed with
> something like
> http://marc.info/?l=linux-kernel&m=121614328831237&w=2
>
> Can you please check and Ack/Nack this fix?

how about xen pv with icr related?

YH
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] x86: let 32bit use apic_ops too - fix

2008-07-15 Thread Zachary Amsden
On Tue, 2008-07-15 at 12:22 -0700, Yinghai Lu wrote:
> On Tue, Jul 15, 2008 at 11:51 AM, Suresh Siddha
> <[EMAIL PROTECTED]> wrote:
> > On Tue, Jul 15, 2008 at 11:38:50AM -0700, Zachary Amsden wrote:
> >>
> >> Nacked-by: Zachary Amsden <[EMAIL PROTECTED]>
> >>
> >> What are you doing here and why aren't you cc-ing the maintainers?
> >
> > Sorry. I was about to bring you into the loop.
> >
> > Yinghai posted 32bit native apic_ops(similar to my 64bit apic ops patch, 
> > which
> > is different from pv_apic_ops) which is in tip/x86/x2apic and proposed a fix
> > for VMI case aswell.
> >
> > Based on my understanding, tip/x86/x2apic git commit
> > 94a8c3c2437c8946f1b6c8e0b2c560a7db8ed3c6 is wrong and it should be fixed 
> > with
> > something like
> > http://marc.info/?l=linux-kernel&m=121614328831237&w=2
> >
> > Can you please check and Ack/Nack this fix?
> 
> how about xen pv with icr related?

Xen doesn't have an APIC; they should be fine as is I think.

Zach

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] stopmachine: add stopmachine_timeout

2008-07-15 Thread Hidetoshi Seto
Heiko Carstens wrote:
> Hmm.. probably a stupid question: but what could happen that a real cpu
> (not virtual) becomes unresponsive so that it won't schedule a MAX_RT_PRIO-1
> prioritized task for 5 seconds?

The original problem (once I heard and easily reproduced) was there was an
another MAX_RT_PRIO-1 task and the task was spinning in itself by a bug.
(Now this would not be a problem since RLIMIT_RTTIME will work for it, but
 I cannot deny that there are some situations which cannot set the limit.)

However there would be more possible problem in the world, ex. assume that
a routine work with interrupt (and also preemption) disabled have an issue
of scalability so it takes long time on huge machine then stop_machine will
stop whole system such long time.  You can assume a driver's bug.  Now the
stop_machine is good tool to escalate a partial problem to global suddenly.

>> So I think monotonic wallclock time actually makes the most sense here.
> 
> This is asking for trouble... a config option to disable this would be
> nice. But as I don't know which problem this patch originally addresses
> it might be that this is needed anyway. So lets see why we need it first.

I'm not good at VM etc., but I think user doesn't care who holds a cpu,
whether other guest or actual buggy software or space alien or so.
The important thing here is return control to user if timeout.

Thanks,
H.Seto
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 2/5] virtio: fix virtio_net xmit of freed skb bug

2008-07-15 Thread Rusty Russell
[PATCH] virtio_net: Delay dropping tx skbs
Cc: Mark McLoughlin <[EMAIL PROTECTED]>,
 virtualization@lists.linux-foundation.org,
 [EMAIL PROTECTED]
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <[EMAIL PROTECTED]>

From: Mark McLoughlin <[EMAIL PROTECTED]>

On Mon, 2008-05-26 at 17:42 +1000, Rusty Russell wrote:
> If we fail to transmit a packet, we assume the queue is full and put
> the skb into last_xmit_skb.  However, if more space frees up before we
> xmit it, we loop, and the result can be transmitting the same skb twice.
> 
> Fix is simple: set skb to NULL if we've used it in some way, and check
> before sending.
...
> diff -r 564237b31993 drivers/net/virtio_net.c
> --- a/drivers/net/virtio_net.cMon May 19 12:22:00 2008 +1000
> +++ b/drivers/net/virtio_net.cMon May 19 12:24:58 2008 +1000
> @@ -287,21 +287,25 @@ again:
>   free_old_xmit_skbs(vi);
>  
>   /* If we has a buffer left over from last time, send it now. */
> - if (vi->last_xmit_skb) {
> + if (unlikely(vi->last_xmit_skb)) {
>   if (xmit_skb(vi, vi->last_xmit_skb) != 0) {
>   /* Drop this skb: we only queue one. */
>   vi->dev->stats.tx_dropped++;
>   kfree_skb(skb);
> + skb = NULL;
>   goto stop_queue;
>   }
>   vi->last_xmit_skb = NULL;

With this, may drop an skb and then later in the function discover that
we could have sent it after all. Poor wee skb :)

How about the incremental patch below?

Cheers,
Mark.


Currently we drop the skb in start_xmit() if we have a
queued buffer and fail to transmit it.

However, if we delay dropping it until we've stopped the
queue and enabled the tx notification callback, then there
is a chance space might become available for it.

Signed-off-by: Mark McLoughlin <[EMAIL PROTECTED]>
Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
---
 drivers/net/virtio_net.c |   20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index d50f4fe..30af05c 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -287,16 +287,11 @@ again:
free_old_xmit_skbs(vi);
 
/* If we has a buffer left over from last time, send it now. */
-   if (unlikely(vi->last_xmit_skb)) {
-   if (xmit_skb(vi, vi->last_xmit_skb) != 0) {
-   /* Drop this skb: we only queue one. */
-   vi->dev->stats.tx_dropped++;
-   kfree_skb(skb);
-   skb = NULL;
-   goto stop_queue;
-   }
-   vi->last_xmit_skb = NULL;
-   }
+   if (unlikely(vi->last_xmit_skb) &&
+   xmit_skb(vi, vi->last_xmit_skb) != 0)
+   goto stop_queue;
+
+   vi->last_xmit_skb = NULL;
 
/* Put new one in send queue and do transmit */
if (likely(skb)) {
@@ -322,6 +317,11 @@ stop_queue:
netif_start_queue(dev);
goto again;
}
+   if (skb) {
+   /* Drop this skb: we only queue one. */
+   vi->dev->stats.tx_dropped++;
+   kfree_skb(skb);
+   }
goto done;
 }
 



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] x86: let 32bit use apic_ops too - fix

2008-07-15 Thread Suresh Siddha
On Tue, Jul 15, 2008 at 11:38:50AM -0700, Zachary Amsden wrote:
> 
> Nacked-by: Zachary Amsden <[EMAIL PROTECTED]>
> 
> What are you doing here and why aren't you cc-ing the maintainers?

Sorry. I was about to bring you into the loop.

Yinghai posted 32bit native apic_ops(similar to my 64bit apic ops patch, which
is different from pv_apic_ops) which is in tip/x86/x2apic and proposed a fix 
for VMI case aswell.

Based on my understanding, tip/x86/x2apic git commit
94a8c3c2437c8946f1b6c8e0b2c560a7db8ed3c6 is wrong and it should be fixed with
something like
http://marc.info/?l=linux-kernel&m=121614328831237&w=2

Can you please check and Ack/Nack this fix?

thanks,
suresh
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH] stopmachine: add stopmachine_timeout v2

2008-07-15 Thread Hidetoshi Seto
Thank you for useful feedbacks!
Here is the updated version.
Could you put this on top of your patches, Rusty?

Thanks,
H.Seto


If stop_machine() invoked while one of onlined cpu is locked up
by some reason, stop_machine cannot finish its work because the
locked cpu cannot stop.  This means all other healthy cpus
will be blocked infinitely by one dead cpu.

This patch allows stop_machine to return -EBUSY with some printk
messages if any of stop_machine's threads cannot start running on
its target cpu.

v2:
 - remove fix for warning since it will be fixed upcoming typesafe
   patches
 - make stopmachine_timeout from secs to msecs, and set default to
   200 msec (since v1's arbitrary 5 sec is too long)
 - allow disabling timeout by setting the stopmachine_timeout to 0

Signed-off-by: Hidetoshi Seto <[EMAIL PROTECTED]>
---
 kernel/stop_machine.c |   54 ++--
 kernel/sysctl.c   |   15 +
 2 files changed, 66 insertions(+), 3 deletions(-)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 5b72c2b..2968b8a 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -35,15 +35,18 @@ struct stop_machine_data {
 };
 
 /* Like num_online_cpus(), but hotplug cpu uses us, so we need this. */
-static unsigned int num_threads;
+static atomic_t num_threads;
 static atomic_t thread_ack;
+static cpumask_t prepared_cpus;
 static struct completion finished;
 static DEFINE_MUTEX(lock);
 
+unsigned long stopmachine_timeout = 200; /* msecs, arbitrary */
+
 static void set_state(enum stopmachine_state newstate)
 {
/* Reset ack counter. */
-   atomic_set(&thread_ack, num_threads);
+   atomic_set(&thread_ack, atomic_read(&num_threads));
smp_wmb();
state = newstate;
 }
@@ -67,6 +70,8 @@ static int stop_cpu(struct stop_machine_data *smdata)
enum stopmachine_state curstate = STOPMACHINE_NONE;
int uninitialized_var(ret);
 
+   cpu_set(smp_processor_id(), prepared_cpus);
+
/* Simple state machine */
do {
/* Chill out and ensure we re-read stopmachine_state. */
@@ -90,6 +95,7 @@ static int stop_cpu(struct stop_machine_data *smdata)
}
} while (curstate != STOPMACHINE_EXIT);
 
+   atomic_dec(&num_threads);
local_irq_enable();
do_exit(0);
 }
@@ -105,6 +111,15 @@ int __stop_machine_run(int (*fn)(void *), void *data, 
const cpumask_t *cpus)
int i, err;
struct stop_machine_data active, idle;
struct task_struct **threads;
+   unsigned long limit;
+
+   if (atomic_read(&num_threads)) {
+   /*
+* previous stop_machine was timeout, and still there are some
+* unfinished thread (dangling stucked CPU?).
+*/
+   return -EBUSY;
+   }
 
active.fn = fn;
active.data = data;
@@ -120,7 +135,7 @@ int __stop_machine_run(int (*fn)(void *), void *data, const 
cpumask_t *cpus)
/* Set up initial state. */
mutex_lock(&lock);
init_completion(&finished);
-   num_threads = num_online_cpus();
+   atomic_set(&num_threads, num_online_cpus());
set_state(STOPMACHINE_PREPARE);
 
for_each_online_cpu(i) {
@@ -152,10 +167,21 @@ int __stop_machine_run(int (*fn)(void *), void *data, 
const cpumask_t *cpus)
 
/* We've created all the threads.  Wake them all: hold this CPU so one
 * doesn't hit this CPU until we're ready. */
+   cpus_clear(prepared_cpus);
get_cpu();
for_each_online_cpu(i)
wake_up_process(threads[i]);
 
+   /* Wait all others come to life */
+   if (stopmachine_timeout) {
+   limit = jiffies + msecs_to_jiffies(stopmachine_timeout);
+   while (cpus_weight(prepared_cpus) != num_online_cpus() - 1) {
+   if (time_is_before_jiffies(limit))
+   goto timeout;
+   cpu_relax();
+   }
+   }
+
/* This will release the thread on our CPU. */
put_cpu();
wait_for_completion(&finished);
@@ -169,10 +195,32 @@ kill_threads:
for_each_online_cpu(i)
if (threads[i])
kthread_stop(threads[i]);
+   atomic_set(&num_threads, 0);
mutex_unlock(&lock);
 
kfree(threads);
return err;
+
+timeout:
+   printk(KERN_CRIT "stopmachine: Failed to stop machine in time(%lds).\n",
+   stopmachine_timeout);
+   for_each_online_cpu(i) {
+   if (!cpu_isset(i, prepared_cpus) && i != smp_processor_id())
+   printk(KERN_CRIT "stopmachine: cpu#%d seems to be "
+   "stuck.\n", i);
+   /* Unbind threads */
+   set_cpus_allowed(threads[i], cpu_online_map);
+   }
+
+   /* Let threads go exit */
+   set_state(STOPMACHINE_EXIT);
+
+   put_cpu();
+

Re: [PATCH] stopmachine: add stopmachine_timeout v2

2008-07-15 Thread Max Krasnyansky


Hidetoshi Seto wrote:
> Thank you for useful feedbacks!
> Here is the updated version.
> Could you put this on top of your patches, Rusty?
> 
> Thanks,
> H.Seto
> 
> 
> If stop_machine() invoked while one of onlined cpu is locked up
> by some reason, stop_machine cannot finish its work because the
> locked cpu cannot stop.  This means all other healthy cpus
> will be blocked infinitely by one dead cpu.
> 
> This patch allows stop_machine to return -EBUSY with some printk
> messages if any of stop_machine's threads cannot start running on
> its target cpu.
> 
> v2:
>  - remove fix for warning since it will be fixed upcoming typesafe
>patches
>  - make stopmachine_timeout from secs to msecs, and set default to
>200 msec (since v1's arbitrary 5 sec is too long)
>  - allow disabling timeout by setting the stopmachine_timeout to 0
> 

I'd set the default to zero. I beleive that's what Heiko suggested too.

Max
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] stopmachine: add stopmachine_timeout v2

2008-07-15 Thread Hidetoshi Seto
Max Krasnyansky wrote:
> I'd set the default to zero. I beleive that's what Heiko suggested too.

Oh, yes, you are right.  I missed to catch the suggestion.
I'll post fixed version soon.  Wait a minutes...

Thanks,
H.Seto
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization