On Jun 3, 2016, at 8:56 PM, Al Viro wrote:
> On Fri, Jun 03, 2016 at 07:58:37PM -0400, Oleg Drokin wrote:
>
>>> EOPENSTALE, that is... Oleg, could you check if the following works?
>>
>> Yes, this one lasted for an hour with no crashing, so it must be good.
>> Thanks.
>> (note, I am not equipp
On Sat, 2016-06-04 at 01:56 +0100, Al Viro wrote:
> On Fri, Jun 03, 2016 at 07:58:37PM -0400, Oleg Drokin wrote:
>
> >
> > >
> > > EOPENSTALE, that is... Oleg, could you check if the following works?
> > Yes, this one lasted for an hour with no crashing, so it must be good.
> > Thanks.
> > (not
On Fri, Jun 03, 2016 at 07:58:37PM -0400, Oleg Drokin wrote:
> > EOPENSTALE, that is... Oleg, could you check if the following works?
>
> Yes, this one lasted for an hour with no crashing, so it must be good.
> Thanks.
> (note, I am not equipped to verify correctness of NFS operations, though).
On Jun 3, 2016, at 6:37 PM, Al Viro wrote:
> On Fri, Jun 03, 2016 at 11:23:55PM +0100, Al Viro wrote:
>
>> It's not that. It's explicit put_link() in do_last(), followed by
>> ESTALEOPEN and subsequent misbegotten "retry the last step on ESTALEOPEN"
>> looking at now-freed nd->last.name. IOW,
On Jun 3, 2016, at 6:37 PM, Al Viro wrote:
> On Fri, Jun 03, 2016 at 11:23:55PM +0100, Al Viro wrote:
>
>> It's not that. It's explicit put_link() in do_last(), followed by
>> ESTALEOPEN and subsequent misbegotten "retry the last step on ESTALEOPEN"
>> looking at now-freed nd->last.name. IOW,
On Fri, Jun 03, 2016 at 03:36:22PM -0700, Linus Torvalds wrote:
> Happy to hear that you seem to have figured it out.
>
> But why did it apparently only start happening now?
Oleg has started to use Lustre torture tests on NFS, that's all. Note, BTW,
that first they'd triggered an oopsable bug (
On Jun 3, 2016, at 6:36 PM, Linus Torvalds wrote:
> On Fri, Jun 3, 2016 at 3:23 PM, Al Viro wrote:
>> On Fri, Jun 03, 2016 at 03:00:02PM -0700, Linus Torvalds wrote:
>>> Normally it's done at terminate_walk() time. But I note that in
>>> walk_component(), we do put_link(nd) which does a do
On Fri, Jun 03, 2016 at 11:23:55PM +0100, Al Viro wrote:
> It's not that. It's explicit put_link() in do_last(), followed by
> ESTALEOPEN and subsequent misbegotten "retry the last step on ESTALEOPEN"
> looking at now-freed nd->last.name. IOW, the bug predates delayed_call
> stuff.
EOPENSTALE,
On Fri, Jun 3, 2016 at 3:23 PM, Al Viro wrote:
> On Fri, Jun 03, 2016 at 03:00:02PM -0700, Linus Torvalds wrote:
>>>
>> Normally it's done at terminate_walk() time. But I note that in
>> walk_component(), we do put_link(nd) which does a do_delayed_call(),
>> but does *not* do a clear_delayed_call(
On Fri, Jun 03, 2016 at 11:23:55PM +0100, Al Viro wrote:
> It's not that. It's explicit put_link() in do_last(), followed by
> ESTALEOPEN and subsequent misbegotten "retry the last step on ESTALEOPEN"
> looking at now-freed nd->last.name. IOW, the bug predates delayed_call
> stuff.
FWIW, I'd st
On Fri, Jun 03, 2016 at 03:00:02PM -0700, Linus Torvalds wrote:
> Is perhaps the "delayed_call" logic broken, and the symlink is free'd too
> early?
>
> That whole set_delayed_call/do_delayed_call thing came in 4.5. Maybe
> something broke that logic, and we've executed the delayed freeing
> bef
On Fri, Jun 03, 2016 at 10:46:31PM +0100, Al Viro wrote:
> On Fri, Jun 03, 2016 at 05:17:06PM -0400, Oleg Drokin wrote:
>
> > > Can the same thing be reproduced (with NFS fix) on v4.6, ede4090, 7f427d3,
> > > 4e8440b?
> >
> > Well, that was faster than I expected. 4e8440b triggers right away, so
On Fri, Jun 3, 2016 at 2:26 PM, Al Viro wrote:
>>
>> in the __d_lookup() disassembly. And %rdi contains 2, so there were
>> supposed to be two more characters at 'ct' (which is %rdx).
>
> ... and since r8 and rsi are 0, we couldn't have consumed anything.
Right you are. So it really started out p
On Fri, Jun 03, 2016 at 05:17:06PM -0400, Oleg Drokin wrote:
> > Can the same thing be reproduced (with NFS fix) on v4.6, ede4090, 7f427d3,
> > 4e8440b?
>
> Well, that was faster than I expected. 4e8440b triggers right away, so I guess
> there's no point in trying the later ones?
> BTW, just to c
On Fri, Jun 03, 2016 at 02:18:15PM -0700, Linus Torvalds wrote:
> So something must have corrupted the qstr.
>
> The remaining length *should* in %edi, judging by the
>
>0x81243b82 <+306>: cmp$0x7,%edi
>
> in the __d_lookup() disassembly. And %rdi contains 2, so there were
> sup
On Fri, Jun 3, 2016 at 1:07 PM, Al Viro wrote:
>
> Aha... It's load_unaligned_zeropad() from dentry_string_cmp(), hitting
> a genuinely unmapped address. That sends it into fixup, where it tries to
> load an aligned word containing the address in question, in hope that
> fault was on attempt to
On Jun 3, 2016, at 4:07 PM, Al Viro wrote:
> On Fri, Jun 03, 2016 at 02:35:41PM -0400, Oleg Drokin wrote:
>
[ 2642.364383] BUG: unable to handle kernel paging request at
880113f82000
[ 2642.365014] IP: [] bad_gs+0xd1d/0x1ba9
>>>
>>> *ow*
>>> Could you dump your vmlinux (and
On Fri, Jun 03, 2016 at 02:35:41PM -0400, Oleg Drokin wrote:
> >> [ 2642.364383] BUG: unable to handle kernel paging request at
> >> 880113f82000
> >> [ 2642.365014] IP: [] bad_gs+0xd1d/0x1ba9
> >
> > *ow*
> > Could you dump your vmlinux (and System.map) somewhere on anonftp?
> > This 'bad_g
On Jun 3, 2016, at 2:22 PM, Al Viro wrote:
> On Fri, Jun 03, 2016 at 12:38:40PM -0400, Oleg Drokin wrote:
>> I am dropping NFS people since it seems to be converting into a generic
>> VFS/dcache bug even though you need NFS or the like to trigger it - the
>> lookup_open path.
>
> NFS bug is re
On Fri, Jun 03, 2016 at 12:38:40PM -0400, Oleg Drokin wrote:
> I am dropping NFS people since it seems to be converting into a generic
> VFS/dcache bug even though you need NFS or the like to trigger it - the
> lookup_open path.
NFS bug is real; there might very well be something else, but that
20 matches
Mail list logo