On Thu, Jul 14, 2022 at 02:44:16PM +0200, Alexander Bluhm wrote:
> On Thu, Jul 14, 2022 at 06:16:14AM +0300, Vitaliy Makkoveev wrote:
> > > On 14 Jul 2022, at 05:51, Visa Hankala <[email protected]> wrote:
> > >
> > > On Thu, Jul 14, 2022 at 04:39:33AM +0300, Vitaliy Makkoveev wrote:
> > >> It looks like sometimes the `unp_conn' doesn't reloaded in
> > >> the "unp->unp_conn != unp2" check, and READ_ONCE() should prevent this.
> > >
> > > Are you sure about the non-reloading of unp->unp_conn? I did a quick
> > > look in the compiled output on amd64, mips64 and riscv64, and did not
> > > spot any obvious change (there is no change on mips64 and riscv64).
> > >
> >
> > It seems to be. Otherwise I have no explanation of syzkaller reports.
> >
> > May be it is not compiler, but CPU cache specific. I see the
> > difference on amd64, compiler loads to register before compare.
> >
> > mov 0x20(%r13),%rax
> > cmp %r12,%rax
> > je 0xffff...
> >
> > instead of
> >
> > cmp %r12,0x20(%r13)
> > je 0xffff...
> >
> > I can???t say what is the real difference in cmp behaviour in both
> > cases. May be some one could explain.
>
> For me this chunk of assembly looks equivalent. If this is the
> only binary change, I don't expect READ_ONCE() to fix syzkaller.
> I am not against adding READ_ONCE(), it prevents unexpected compiler
> optimisations. But currently such a optimization does not happen.
>
> > > The loads are surrounded by lock/reference operations that imply
> > > memory clobbering. The compiler should not assume that unp->unp_conn
> > > is unchanged.
>
> Maybe I did not understand correctly why READ_ONCE() is needed.
>
> > >> if ((unp2 = unp->unp_conn) == NULL)
> > >> return NULL;
> > >>
> > >> so2 = unp2->unp_socket;
>
> Can the compiler modify this code to this?
>
> if ((unp2 = unp->unp_conn) == NULL)
> return NULL;
> so2 = unp->unp_conn->unp_socket;
>
> Then we can get a NULL deref if another CPU sets unp->unp_conn to
> NULL between our two reads. There READ_ONCE() makes sense.
>
Another CPU can't modify unp->unp_conn because unp is locked. We need to
lock both sockets to modify unp->unp_conn or unp->unp_refs. I added this
to be sure that `unp2' is reassigned when we returned here from
"if (unp->unp_conn != unp2)" block.
> > >> solock(so);
> > >>
> > >> /* Datagram socket could be reconnected due to re-lock. */
> > >> if (unp->unp_conn != unp2) {
> > >> sounlock(so2);
> > >> unp_rele(unp2);
> > >> goto again;
> > >> }
> > >>
> > >> unp_rele(unp2);
>
> Can the compiler assume unp->unp_conn == unp2 ? Then it would be
> valid to do unp_rele(unp->unp_conn). Not sure whether C standard
> allows this. And it would be stupid if the compiler would do that.
>
> Sould we use more READ_ONCE() to be safe?
>
The compiler could decide that `unp2' can't be changed and skip this
block. But it shouldn't because we have rwlock and atomic operations
between.
Also may be the problem lays in other layer, but triggered here...