On Thu, 7 Jul 2016 15:34:34 +0200 Greg Kurz <gr...@kaod.org> wrote: > On Thu, 7 Jul 2016 14:35:40 +0200 > Dominique Martinet <dominique.marti...@cea.fr> wrote: > > > Hi Greg, > > > > Hi Dominique, > > > Greg Kurz wrote on Mon, Jul 04, 2016 at 05:08:49PM +0200: > > > On Mon, 4 Jul 2016 16:16:55 +0200 > > > Dominique Martinet <dominique.marti...@cea.fr> wrote: > > > > > > > I *think* this introduces a race somewhere, I'm getting errors like: > > > > cat: f.05: No such file or directory > > > > cat: f.14: No such file or directory > > > > cat: f.13: No such file or directory > > > > cat: f.39: No such file or directory > > > > cat: f.05: No such file or directory > > > > > > > > > > > > when doing: > > > > for file in {01..50}; do touch f.${file}; done > > > > seq 1 1000 | xargs -n 1 -P 25 -I{} cat f.* > /dev/null > > > > Ok so, tested with the first two patches and I can't seem to hit any > > problem with the qemu server at least (I'd need more time to fix > > ganesha's 9p tcp/rdma server before I could blame the client in any way) > > > > I'm not surprised: patch 1 simply adds a "fallback" lookup to the existing > code, > and patch 2 changes this "fallback" lookup only. > > Bad things can come with patch 3 because it really changes the lookup logic. > > > > > The last patch looks good to me, I think it only makes an existing race > > more visible... What I think could happen is: > > process 1 has file open > > process 2 tries to open file, sees fid open > > process 1 closes file/clunk fids > > process 2 tries to clone now-clunked fid and gets ENOENT > > > > I'll try to have a look with this scenario in mind. >
The error indeed comes from v9fs_file_open()->v9fs_fid_clone(). I'll try to find a fix next week. Cheers. -- Greg > > > > I'm afraid I just found out my hypervisor is no longer recent enough for > > gdb kernel scripts (gdb 7.6 and python 2.7.5 in el7 compared to the > > apparently required 7.7 and 2.7.6 respectively...), and I don't see > > anything obvious with just debug messages/adding a few printks (wasn't > > able to confirm where exactly that ENOENT comes from or if my theory is > > even close to the truth) > > > > I'd like to spend more time on it but don't think I'll be able to for a > > couple of weeks ; sorry about that. > > > > No problem. My plate is full anyway until I go into a 1-month vacation, > starting end of July. And I'm currently targeting QEMU 2.8 for the > server side fixes: we have plenty of time to fix this. > > > > > Were you able to reproduce the problem? > > > > Yes ! I get it every time :) > > > Thanks, > > I really appreciate your assistance since v9fs-devel is really quiet these > days. > > Cheers. > > -- > Greg