Re: [PATCH] forgetting to cancel request in interrupted zero-copy 9P RPC (was Re: [git pull] vfs part 2)

2015-07-03 Thread Andrey Ryabinin
2015-07-03 18:00 GMT+03:00 Al Viro : > Could you verify that the patch below deals with your setup as well? > If it does, I'm going to put it into tonight's pull request, after I get > some sleep... Right now I'm about to crawl in direction of bed - 25 hours > of uptime is a bit too much... ;-/ >

[PATCH] forgetting to cancel request in interrupted zero-copy 9P RPC (was Re: [git pull] vfs part 2)

2015-07-03 Thread Al Viro
On Fri, Jul 03, 2015 at 10:42:10AM +0100, Al Viro wrote: > AFAICS, we get occasional stray responses from somewhere. And no, it doesn't > seem to be related to flushes or to dropping chan->lock in req_done() (this > run had been with chan->lock taken on the outside of the loop). > > What I reall

Re: running out of tags in 9P (was Re: [git pull] vfs part 2)

2015-07-03 Thread Al Viro
On Fri, Jul 03, 2015 at 11:19:31AM +0300, Andrey Ryabinin wrote: > On 07/02/2015 07:49 PM, Al Viro wrote: > > On Thu, Jul 02, 2015 at 05:43:32PM +0100, Al Viro wrote: > >>req->tc->tag = tag-1; > >> + if (req->status != REQ_STATUS_IDLE) > >> + pr_err("using tag %d with odd status (%d)"

Re: running out of tags in 9P (was Re: [git pull] vfs part 2)

2015-07-03 Thread Andrey Ryabinin
On 07/02/2015 07:49 PM, Al Viro wrote: > On Thu, Jul 02, 2015 at 05:43:32PM +0100, Al Viro wrote: >> req->tc->tag = tag-1; >> +if (req->status != REQ_STATUS_IDLE) >> +pr_err("using tag %d with odd status (%d)", tag, req->status); > > Should be tag - 1 here, actually. So, with

Re: [git pull] vfs part 2

2015-07-02 Thread Dominique Martinet
Al Viro wrote on Thu, Jul 02, 2015: > On Thu, Jul 02, 2015 at 07:56:29PM +0200, Dominique Martinet wrote: > > Using cache=none here so behavious is likely different with cache, but > > basically you can't get more than one tag per user thread accessing the > > 9P mount... > > Yes, and...? You can

Re: [git pull] vfs part 2

2015-07-02 Thread Al Viro
On Thu, Jul 02, 2015 at 12:16:14PM -0700, Linus Torvalds wrote: > On Thu, Jul 2, 2015 at 11:40 AM, Al Viro wrote: > > > > All they are used for is matching response to request. Basically, you > > can have up to 65535 pending requests. Reusing it right after getting > > the response is fine. > >

Re: [git pull] vfs part 2

2015-07-02 Thread Andrey Ryabinin
2015-07-02 20:56 GMT+03:00 Dominique Martinet : > > Still definitely needs fixing, but I think the issue is somewhere > else... If Andrey could share the workload he uses I can try with other > servers, would be nice if we can rule a qemu bug out completely :) > I simply run trinity from 9p rootfs

Re: running out of tags in 9P (was Re: [git pull] vfs part 2)

2015-07-02 Thread Andrey Ryabinin
2015-07-02 19:43 GMT+03:00 Al Viro : > On Thu, Jul 02, 2015 at 03:19:57PM +0300, Andrey Ryabinin wrote: > >> Added: >> +if (total > count) >> + *(char *)0 = 0 >> >> and never hit this condition. >> > > OK, so it's definitely a mismatched response. > >> req->tc->tag = tag-1; >>

Re: [git pull] vfs part 2

2015-07-02 Thread Linus Torvalds
On Thu, Jul 2, 2015 at 11:40 AM, Al Viro wrote: > > All they are used for is matching response to request. Basically, you > can have up to 65535 pending requests. Reusing it right after getting > the response is fine. Reusing a tag right after getting the completion may be fine in theory, but i

Re: [git pull] vfs part 2

2015-07-02 Thread Jeff Layton
On Thu, 2 Jul 2015 19:56:29 +0200 Dominique Martinet wrote: > Jeff Layton wrote on Thu, Jul 02, 2015: > > So p9_idpool_create should take an argument for the "end" value, and > > then store that in a new field in p9_idpool. Then they can pass that in > > as the "end" parm in idr_alloc. Or, they c

Re: [git pull] vfs part 2

2015-07-02 Thread Al Viro
On Thu, Jul 02, 2015 at 07:56:29PM +0200, Dominique Martinet wrote: > Using cache=none here so behavious is likely different with cache, but > basically you can't get more than one tag per user thread accessing the > 9P mount... Yes, and...? You can get a lot more than one user thread... Andrey

Re: [git pull] vfs part 2

2015-07-02 Thread Al Viro
On Thu, Jul 02, 2015 at 01:01:39PM -0400, Jeff Layton wrote: > So p9_idpool_create should take an argument for the "end" value, and > then store that in a new field in p9_idpool. Then they can pass that in > as the "end" parm in idr_alloc. Or, they could give up using the same > function there and

Re: [git pull] vfs part 2

2015-07-02 Thread Dominique Martinet
Jeff Layton wrote on Thu, Jul 02, 2015: > So p9_idpool_create should take an argument for the "end" value, and > then store that in a new field in p9_idpool. Then they can pass that in > as the "end" parm in idr_alloc. Or, they could give up using the same > function there and use a different one f

Re: [git pull] vfs part 2

2015-07-02 Thread Jeff Layton
On Thu, 2 Jul 2015 17:45:35 +0100 Al Viro wrote: > On Thu, Jul 02, 2015 at 08:07:38AM -0400, Jeff Layton wrote: > > > Erm...and why is it passing in '0' to idr_alloc for the end value if it > > can't deal with more than 16 bits? That seems like a plain old bug... > > Because they are using the

Re: running out of tags in 9P (was Re: [git pull] vfs part 2)

2015-07-02 Thread Al Viro
On Thu, Jul 02, 2015 at 05:43:32PM +0100, Al Viro wrote: > req->tc->tag = tag-1; > + if (req->status != REQ_STATUS_IDLE) > + pr_err("using tag %d with odd status (%d)", tag, req->status); Should be tag - 1 here, actually. diff --git a/net/9p/client.c b/net/9p/client.c index

Re: [git pull] vfs part 2

2015-07-02 Thread Al Viro
On Thu, Jul 02, 2015 at 08:07:38AM -0400, Jeff Layton wrote: > Erm...and why is it passing in '0' to idr_alloc for the end value if it > can't deal with more than 16 bits? That seems like a plain old bug... Because they are using the same function (with different pool, obviously) for FID allocati

Re: running out of tags in 9P (was Re: [git pull] vfs part 2)

2015-07-02 Thread Al Viro
On Thu, Jul 02, 2015 at 03:19:57PM +0300, Andrey Ryabinin wrote: > Added: > +if (total > count) > + *(char *)0 = 0 > > and never hit this condition. > OK, so it's definitely a mismatched response. > req->tc->tag = tag-1; > + if (WARN_ON(req->status != REQ_STATUS_IDL

Re: running out of tags in 9P (was Re: [git pull] vfs part 2)

2015-07-02 Thread Andrey Ryabinin
On 07/02/2015 11:42 AM, Al Viro wrote: > On Thu, Jul 02, 2015 at 09:25:30AM +0100, Al Viro wrote: >> On Thu, Jul 02, 2015 at 11:19:03AM +0300, Andrey Ryabinin wrote: >>> Besides qemu, I've also tried kvmtool with the same result. IOW I'm seeing >>> this under kvmtool as well. It just takes a bit lo

Re: [git pull] vfs part 2

2015-07-02 Thread Jeff Layton
On Thu, 2 Jul 2015 04:20:42 +0100 Al Viro wrote: > On Wed, Jul 01, 2015 at 07:44:08PM +0100, Al Viro wrote: > > Mismatched reply could also be a possibility, but only if we end up with > > sending more than one request with the same tag without waiting for response > > for the first one. > > ...

Re: [git pull] vfs part 2

2015-07-02 Thread Jeff Layton
On Thu, 2 Jul 2015 08:00:26 -0400 Jeff Layton wrote: > On Thu, 2 Jul 2015 04:20:42 +0100 > Al Viro wrote: > > > On Wed, Jul 01, 2015 at 07:44:08PM +0100, Al Viro wrote: > > > Mismatched reply could also be a possibility, but only if we end up with > > > sending more than one request with the sa

Re: running out of tags in 9P (was Re: [git pull] vfs part 2)

2015-07-02 Thread Al Viro
On Thu, Jul 02, 2015 at 09:25:30AM +0100, Al Viro wrote: > On Thu, Jul 02, 2015 at 11:19:03AM +0300, Andrey Ryabinin wrote: > > Besides qemu, I've also tried kvmtool with the same result. IOW I'm seeing > > this under kvmtool as well. It just takes a bit longer to reproduce > > this in kvmtool. > >

Re: running out of tags in 9P (was Re: [git pull] vfs part 2)

2015-07-02 Thread Al Viro
On Thu, Jul 02, 2015 at 11:19:03AM +0300, Andrey Ryabinin wrote: > Besides qemu, I've also tried kvmtool with the same result. IOW I'm seeing > this under kvmtool as well. It just takes a bit longer to reproduce > this in kvmtool. > > > The bug I suspected to be the cause of that is in tag allocat

Re: running out of tags in 9P (was Re: [git pull] vfs part 2)

2015-07-02 Thread Andrey Ryabinin
On 07/02/2015 10:59 AM, Al Viro wrote: > On Thu, Jul 02, 2015 at 10:50:05AM +0300, Andrey Ryabinin wrote: > and see if it triggers. I'm not sure if failing with ENOMEM is the right response (another variant is to sleep there until the pile gets cleaned or until we get killed), and W

Re: running out of tags in 9P (was Re: [git pull] vfs part 2)

2015-07-02 Thread Al Viro
On Thu, Jul 02, 2015 at 10:50:05AM +0300, Andrey Ryabinin wrote: > >> and see if it triggers. I'm not sure if failing with ENOMEM is the > >> right response (another variant is to sleep there until the pile > >> gets cleaned or until we get killed), and WARN_ON_ONCE() is definitely > >> not for t

Re: running out of tags in 9P (was Re: [git pull] vfs part 2)

2015-07-02 Thread Al Viro
On Thu, Jul 02, 2015 at 10:19:07AM +0300, Andrey Ryabinin wrote: > On 07/02/2015 07:10 AM, Al Viro wrote: > >> > >> It should be easy to confirm - in p9_client_prepare_req() add > >>if (WARN_ON_ONCE(tag != (u16)tag)) { > >>p9_idpool_put(tag, c->tagpool); > >>

Re: running out of tags in 9P (was Re: [git pull] vfs part 2)

2015-07-02 Thread Andrey Ryabinin
[repeating, since my previous email didn't reach mailing lists] 2015-07-02 7:10 GMT+03:00 Al Viro : >> It should be easy to confirm - in p9_client_prepare_req() add >> if (WARN_ON_ONCE(tag != (u16)tag)) { >> p9_idpool_put(tag, c->tagpool); >>

running out of tags in 9P (was Re: [git pull] vfs part 2)

2015-07-01 Thread Al Viro
[9p and sunrpc folks added to Cc] On Thu, Jul 02, 2015 at 04:20:42AM +0100, Al Viro wrote: > On Wed, Jul 01, 2015 at 07:44:08PM +0100, Al Viro wrote: > > Mismatched reply could also be a possibility, but only if we end up with > > sending more than one request with the same tag without waiting for

Re: [git pull] vfs part 2

2015-07-01 Thread Al Viro
On Wed, Jul 01, 2015 at 07:44:08PM +0100, Al Viro wrote: > Mismatched reply could also be a possibility, but only if we end up with > sending more than one request with the same tag without waiting for response > for the first one. ... and I think I see what's going on. Tags are 16bit. Suppose t

Re: [git pull] vfs part 2

2015-07-01 Thread Al Viro
On Wed, Jul 01, 2015 at 02:25:43PM +0300, Andrey Ryabinin wrote: > I've attached gdb instead. > So, after message "bogus RWRITE: 93 -> 4096" > I've got this: > > (gdb) p *req->rc > $11 = {size = 11, id = 119 'w', tag = 3, offset = 11, capacity = 8192, sdata > = 0x8802347b8020 "\v"} > (gdb) p

Re: [git pull] vfs part 2

2015-07-01 Thread Andrey Ryabinin
On 07/01/2015 11:55 AM, Al Viro wrote: > On Wed, Jul 01, 2015 at 11:41:04AM +0300, Andrey Ryabinin wrote: >> On 07/01/2015 11:27 AM, Al Viro wrote: >>> >>> Could you check if 3.19 was getting anything similar? I.e. in >>> p9_client_write() there add >>> if (count > rsize) >>> prin

Re: [git pull] vfs part 2

2015-07-01 Thread Al Viro
On Wed, Jul 01, 2015 at 11:41:04AM +0300, Andrey Ryabinin wrote: > On 07/01/2015 11:27 AM, Al Viro wrote: > > > > Could you check if 3.19 was getting anything similar? I.e. in > > p9_client_write() there add > > if (count > rsize) > > printk(KERN_ERR "bogus RWRITE: %d -> %d\n", r

Re: [git pull] vfs part 2

2015-07-01 Thread Andrey Ryabinin
On 07/01/2015 11:27 AM, Al Viro wrote: > > Could you check if 3.19 was getting anything similar? I.e. in > p9_client_write() there add > if (count > rsize) > printk(KERN_ERR "bogus RWRITE: %d -> %d\n", rsize, count); > just before > p9_debug(P9_DEBUG_9P, "<<< RWRITE cou

Re: [git pull] vfs part 2

2015-07-01 Thread Al Viro
On Wed, Jul 01, 2015 at 10:50:59AM +0300, Andrey Ryabinin wrote: > # dmesg | grep fucked > > [ 114.732166] fucked: sent 2037, server says it got 2047 (err = 0) > [ 124.937105] fucked: sent 27, server says it got 4096 (err = 0) > [ 154.075400] fucked: sent 19, server says it got 4096 (err = 0)

Re: [git pull] vfs part 2

2015-07-01 Thread Andrey Ryabinin
On 07/01/2015 09:27 AM, Al Viro wrote: > On Mon, Jun 22, 2015 at 03:02:11PM +0300, Andrey Ryabinin wrote: >> On 06/22/2015 12:12 AM, Al Viro wrote: >>> On Thu, Apr 23, 2015 at 01:16:15PM +0300, Andrey Ryabinin wrote: This change caused following: >>> This could happen when p9pdu_readf() c

Re: [git pull] vfs part 2

2015-06-30 Thread Al Viro
On Mon, Jun 22, 2015 at 03:02:11PM +0300, Andrey Ryabinin wrote: > On 06/22/2015 12:12 AM, Al Viro wrote: > > On Thu, Apr 23, 2015 at 01:16:15PM +0300, Andrey Ryabinin wrote: > >> This change caused following: > > > >> This could happen when p9pdu_readf() changes 'count' to some value > > >> iov_

Re: [git pull] vfs part 2

2015-06-22 Thread Andrey Ryabinin
On 06/22/2015 12:12 AM, Al Viro wrote: > On Thu, Apr 23, 2015 at 01:16:15PM +0300, Andrey Ryabinin wrote: >> This change caused following: > >> This could happen when p9pdu_readf() changes 'count' to some value > >> iov_iter_count(from): >> >> p9_client_write(): >> <...> >> int count

Re: [git pull] vfs part 2

2015-06-21 Thread Al Viro
On Sun, Jun 21, 2015 at 02:16:15PM -0700, Linus Torvalds wrote: > On Sun, Jun 21, 2015 at 2:12 PM, Al Viro wrote: > > + if (count > rsize) { > > + WARN_ON(1); > > + count = rsize; > > + } > > So if we'd actually want to merge

Re: [git pull] vfs part 2

2015-06-21 Thread Linus Torvalds
On Sun, Jun 21, 2015 at 2:12 PM, Al Viro wrote: > + if (count > rsize) { > + WARN_ON(1); > + count = rsize; > + } So if we'd actually want to merge it with the warning, I'd prefer writing it as if (WARN_ON_ONCE(count

Re: [git pull] vfs part 2

2015-06-21 Thread Al Viro
On Thu, Apr 23, 2015 at 01:16:15PM +0300, Andrey Ryabinin wrote: > This change caused following: > This could happen when p9pdu_readf() changes 'count' to some value > > iov_iter_count(from): > > p9_client_write(): > <...> > int count = iov_iter_count(from); > <...> >

Re: [git pull] vfs part 2

2015-05-25 Thread Andrey Ryabinin
On 04/23/2015 01:16 PM, Andrey Ryabinin wrote: > On 04/15/2015 09:14 PM, Al Viro wrote: >> 9p: switch p9_client_write() to passing it struct iov_iter * > > Hi Al, > > This change caused following: > > [ 91.637917] > == > [

Re: [git pull] vfs part 2

2015-04-23 Thread Andrey Ryabinin
On 04/15/2015 09:14 PM, Al Viro wrote: > 9p: switch p9_client_write() to passing it struct iov_iter * Hi Al, This change caused following: [ 91.637917] == [ 91.639252] BUG: KASan: out of bounds on stack in iov_iter_advan

[git pull] vfs part 2

2015-04-15 Thread Al Viro
Now that net-next went in... Here's the next big chunk - killing ->aio_read() and ->aio_write(). There'll be one more pile today (direct_IO changes and generic_write_checks() cleanups/fixes), but I'd prefer to keep that one separate. Please, pull from git://git.kernel.org/pub/scm/linux/k