From: Eric Dumazet
Date: Thu, 04 Jul 2013 03:12:10 -0700
> It looks like a typical COW issue to me.
Generically speaking, if we have to mess with page protections this
eliminates the performance gain from bypass/zerocopy/whatever that
these virtualization layers are doing.
But there may be othe
--On 4 July 2013 03:12:10 -0700 Eric Dumazet wrote:
It looks like a typical COW issue to me.
If the page content is written while there is still a reference on this
page, we should allocate a new page and copy the previous content.
And this has little to do with networking.
I suspect this
On Thu, 2013-07-04 at 10:52 +0100, Ian Campbell wrote:
> Might just be that no one has observed it with vmsplice()+splice()? Most
> of the time this happens silently and you'll probably never notice, it's
> just the behaviour of Xen which escalates the issue into one you can
> see.
The point I wa
On Thu, 2013-07-04 at 02:34 -0700, Eric Dumazet wrote:
> On Thu, 2013-07-04 at 09:59 +0100, Ian Campbell wrote:
> > On Thu, 2013-07-04 at 16:55 +0800, Joe Jin wrote:
> > >
> > > Another way is add new page flag like PG_send, when sendpage() be called,
> > > set the bit, when page be put, clear the
On Thu, 2013-07-04 at 09:59 +0100, Ian Campbell wrote:
> On Thu, 2013-07-04 at 16:55 +0800, Joe Jin wrote:
> >
> > Another way is add new page flag like PG_send, when sendpage() be called,
> > set the bit, when page be put, clear the bit. Then xen-blkback can wait
> > on the pagequeue.
>
> These
On Thu, 2013-07-04 at 16:55 +0800, Joe Jin wrote:
> On 07/01/13 16:11, Ian Campbell wrote:
> > On Mon, 2013-07-01 at 11:18 +0800, Joe Jin wrote:
> >>> A workaround is to turn off O_DIRECT use by Xen as that ensures
> >>> the pages are copied. Xen 4.3 does this by default.
> >>>
> >>> I believe fixe
On 07/01/13 16:11, Ian Campbell wrote:
> On Mon, 2013-07-01 at 11:18 +0800, Joe Jin wrote:
>>> A workaround is to turn off O_DIRECT use by Xen as that ensures
>>> the pages are copied. Xen 4.3 does this by default.
>>>
>>> I believe fixes for this are in 4.3 and 4.2.2 if using the
>>> qemu upstream
From: Eric Dumazet
Date: Fri, 28 Jun 2013 02:37:42 -0700
> [PATCH] neighbour: fix a race in neigh_destroy()
>
> There is a race in neighbour code, because neigh_destroy() uses
> skb_queue_purge(&neigh->arp_queue) without holding neighbour lock,
> while other parts of the code assume neighbour rw
On 07/01/13 16:11, Ian Campbell wrote:
> On Mon, 2013-07-01 at 11:18 +0800, Joe Jin wrote:
>>> A workaround is to turn off O_DIRECT use by Xen as that ensures
>>> the pages are copied. Xen 4.3 does this by default.
>>>
>>> I believe fixes for this are in 4.3 and 4.2.2 if using the
>>> qemu upstream
Joe,
Do you know if have a fix for above? so far we also suspected the
grant page be unmapped earlier, we using 4.1 stable during our test.
A true fix? No, but I posted a patch set (see later email message
for a link) that you could forward port. The workaround is:
A workaround is to turn of
On Mon, 2013-07-01 at 11:18 +0800, Joe Jin wrote:
> > A workaround is to turn off O_DIRECT use by Xen as that ensures
> > the pages are copied. Xen 4.3 does this by default.
> >
> > I believe fixes for this are in 4.3 and 4.2.2 if using the
> > qemu upstream DM. Note these aren't real fixes, just
On 06/30/13 17:13, Alex Bligh wrote:
>
>
> --On 28 June 2013 12:17:43 +0800 Joe Jin wrote:
>
>> Find a similar issue
>> http://www.gossamer-threads.com/lists/xen/devel/265611 So copied to Xen
>> developer as well.
>
> I thought this sounded familiar. I haven't got the start of this
> thread, b
--On 30 June 2013 10:13:35 +0100 Alex Bligh wrote:
The nature of the bug
is extensively discussed in that thread - you'll also find
a reference to a thread on linux-nfs which concludes it
isn't an nfs problem, and even some patches to fix it in the
kernel adding reference counting.
Some mor
--On 28 June 2013 12:17:43 +0800 Joe Jin wrote:
Find a similar issue
http://www.gossamer-threads.com/lists/xen/devel/265611 So copied to Xen
developer as well.
I thought this sounded familiar. I haven't got the start of this
thread, but what version of Xen are you running and what device
mo
On Sun, 2013-06-30 at 08:26 +0800, Joe Jin wrote:
> So far we suspected it caused by iscsi called sendpage(), and later page
> be unmapped but still trying copy skb. We'll try to disable sg to see if
> help or no.
sendpage() should increment page refcounts for every page frag of an
skb, therefore
On 06/29/13 15:20, Eric Dumazet wrote:
> On Sat, 2013-06-29 at 07:36 +0800, Joe Jin wrote:
>> Hi Eric,
>>
>> The patch not fix the issue and panic as same as early I posted:
>>> BUG: unable to handle kernel paging request at 88006d9e8d48
>>> IP: [] memcpy+0xb/0x120
>>> PGD 1798067 PUD 1fd2067 P
On 06/29/2013 09:26 AM, Eric Dumazet wrote:
On Sat, 2013-06-29 at 09:11 -0700, Ben Greear wrote:
Do you know if your patch should go in 3.9?
Yes it should.
Ok, I'll add that to my tree.
Your test case sounds a bit like what gives us the rare crash in tcp_collapse
(we have lots of bouncin
On Sat, 2013-06-29 at 09:11 -0700, Ben Greear wrote:
> Do you know if your patch should go in 3.9?
>
Yes it should.
> Your test case sounds a bit like what gives us the rare crash in tcp_collapse
> (we have lots of bouncing wifi interfaces running slow-speed TCP trafic).
> But,
> it takes day
On 06/29/2013 12:20 AM, Eric Dumazet wrote:
On Sat, 2013-06-29 at 07:36 +0800, Joe Jin wrote:
Hi Eric,
The patch not fix the issue and panic as same as early I posted:
BUG: unable to handle kernel paging request at 88006d9e8d48
IP: [] memcpy+0xb/0x120
PGD 1798067 PUD 1fd2067 PMD 213f067 PT
On Sat, 2013-06-29 at 07:36 +0800, Joe Jin wrote:
> Hi Eric,
>
> The patch not fix the issue and panic as same as early I posted:
> > BUG: unable to handle kernel paging request at 88006d9e8d48
> > IP: [] memcpy+0xb/0x120
> > PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0
> > Oops: [#1] SMP
>
On Sat, 2013-06-29 at 07:36 +0800, Joe Jin wrote:
> Hi Eric,
>
> The patch not fix the issue and panic as same as early I posted:
At least it fixes my own panics ;)
My test bed was :
Launch 24 concurrent "netperf -t UDP_STREAM -H destination -- -m 128"
Then on "destination" disconnect the eth
Hi Eric,
The patch not fix the issue and panic as same as early I posted:
> BUG: unable to handle kernel paging request at 88006d9e8d48
> IP: [] memcpy+0xb/0x120
> PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0
> Oops: [#1] SMP
> CPU 7
> Modules linked in: dm_nfs tun nfs fscache auth_rpcgss
Hi Eric,
Thanks for your patch, I'll test it then get back to you.
Regards,
Joe
On 06/28/13 17:37, Eric Dumazet wrote:
> OK please try the following patch
>
>
> [PATCH] neighbour: fix a race in neigh_destroy()
>
> There is a race in neighbour code, because neigh_destroy() uses
> skb_queue_purg
OK please try the following patch
[PATCH] neighbour: fix a race in neigh_destroy()
There is a race in neighbour code, because neigh_destroy() uses
skb_queue_purge(&neigh->arp_queue) without holding neighbour lock,
while other parts of the code assume neighbour rwlock is what
protects arp_queue
On Fri, 2013-06-28 at 12:17 +0800, Joe Jin wrote:
> Find a similar issue http://www.gossamer-threads.com/lists/xen/devel/265611
> So copied to Xen developer as well.
>
> On 06/27/13 13:31, Eric Dumazet wrote:
> > On Thu, 2013-06-27 at 10:58 +0800, Joe Jin wrote:
> >> Hi,
> >>
> >> When we do fail
Find a similar issue http://www.gossamer-threads.com/lists/xen/devel/265611
So copied to Xen developer as well.
On 06/27/13 13:31, Eric Dumazet wrote:
> On Thu, 2013-06-27 at 10:58 +0800, Joe Jin wrote:
>> Hi,
>>
>> When we do fail over test with iscsi + multipath by reset the switches
>> on OVM(
Hi Eric,
Thanks for you response, will test it and get back to you.
Regards,
Joe
On 06/27/13 13:31, Eric Dumazet wrote:
> On Thu, 2013-06-27 at 10:58 +0800, Joe Jin wrote:
>> Hi,
>>
>> When we do fail over test with iscsi + multipath by reset the switches
>> on OVM(2.6.39) we hit the panic:
>>
>>
On Thu, 2013-06-27 at 10:58 +0800, Joe Jin wrote:
> Hi,
>
> When we do fail over test with iscsi + multipath by reset the switches
> on OVM(2.6.39) we hit the panic:
>
> BUG: unable to handle kernel paging request at 88006d9e8d48
> IP: [] memcpy+0xb/0x120
> PGD 1798067 PUD 1fd2067 PMD 213f067
Hi,
When we do fail over test with iscsi + multipath by reset the switches
on OVM(2.6.39) we hit the panic:
BUG: unable to handle kernel paging request at 88006d9e8d48
IP: [] memcpy+0xb/0x120
PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0
Oops: [#1] SMP
CPU 7
Modules linked in: dm_nfs tun n
29 matches
Mail list logo