On Sat, 10 Aug 2024, Michael Schmitz wrote:

> >> Anyway, if I run dump under strace I see no CLONE_INTO_CGROUP flag:
> 
> strace may not be aware of the CLONE_INTO_CGROUP flag yet? How old is 
> your strace binary?
> 

I don't think strace is the problem. If it was, we should still see all 
the flags in the disassembly, in the constant passed to the syscall.

> >> clone(child_stack=NULL, flags=CLONE_IO|SIGCHLD) = -1 EBADF (Bad file
> >> descriptor)
> >>
> >> The -EBADF result was introduced into cgroup_css_set_fork() by the
> >> commit above. That should not happen unless CLONE_INTO_CGROUP was set,
> >> but strace says its not. So I don't know what's going on here.
> >>
> >
> > Here's what gdb says, FWIW...
> >
> > # gdb
> > GNU gdb (Debian 13.1-3) 13.1
> > ...
> > (gdb) file /usr/sbin/dump
> > Reading symbols from /usr/sbin/dump...
> > Reading symbols from
> > /usr/lib/debug/.build-id/24/071a827207bee9c025d364137514447279302b.debug...
> > (gdb) run -0f /dev/null /dev/sda
> > Starting program: /usr/sbin/dump -0f /dev/null /dev/sda
> >   DUMP: Date of this level 0 dump: Fri Aug  9 23:37:15 2024
> >   DUMP: Dumping /dev/sda (an unlisted file system) to /dev/null
> >   DUMP: Label: none
> >   DUMP: Writing 10 Kilobyte records
> >   DUMP: mapping (Pass I) [regular files]
> >   DUMP: mapping (Pass II) [directories]
> >   DUMP: estimated 3595695 blocks.
> >   DUMP: Context save fork fails in parent 671
> > [Inferior 1 (process 671) exited with code 03]
> > (gdb) b fork_clone_io
> > Breakpoint 1 at 0x80009dbc: file tape.c, line 740.
> > (gdb) run -0f /dev/null /dev/sda
> > Starting program: /usr/sbin/dump -0f /dev/null /dev/sda
> >   DUMP: Date of this level 0 dump: Fri Aug  9 23:38:17 2024
> >   DUMP: Dumping /dev/sda (an unlisted file system) to /dev/null
> >   DUMP: Label: none
> >   DUMP: Writing 10 Kilobyte records
> >   DUMP: mapping (Pass I) [regular files]
> >   DUMP: mapping (Pass II) [directories]
> >   DUMP: estimated 3595695 blocks.
> >
> > Program received signal SIGSEGV, Segmentation fault.
> > 0x00000001 in ?? ()
> > (gdb) l fork_clone_io
> > warning: Source file is more recent than executable.
> > 735
> > 736     #ifdef __linux__
> > 737     #if defined(SYS_clone) && defined(CLONE_IO)
> > 738     pid_t
> > 739     fork_clone_io(void)
> > 740     {
> > 741        pid_t res,parent;
> > 742        parent=getppid();            /* az hackety hack... */
> > 743
> > 744        res=syscall(SYS_clone, CLONE_ARGS);
> > 745        getppid();
> > 746        /* as per clone call manpage: caching! */
> > 747        getpid();
> > 748     #ifdef __alpha__
> > 749        syscall(SYS_getxpid);
> > 750     #else
> > 751         syscall(SYS_getpid);
> > 752     #endif
> > 753
> > 754        /* az: clone manpage doesn't say jack about what the
> > (gdb) disas fork_clone_io
> > Dump of assembler code for function fork_clone_io:
> >    0x80009dbc <+0>:     movel %d3,%sp@-
> >    0x80009dbe <+2>:     movel %d2,%sp@-
> >    0x80009dc0 <+4>:     bsrl 0x80004200 <getppid@plt>
> >    0x80009dc6 <+10>:    movel %d0,%d3
> >    0x80009dc8 <+12>:    clrl %sp@-
> >    0x80009dca <+14>:    clrl %sp@-
> >    0x80009dcc <+16>:    clrl %sp@-
> >    0x80009dce <+18>:    movel #-2147483631,%sp@-
> >    0x80009dd4 <+24>:    pea 0x78
> >    0x80009dd8 <+28>:    bsrl 0x80003fd0 <syscall@plt>
> >    0x80009dde <+34>:    movel %d0,%d2
> >    0x80009de0 <+36>:    bsrl 0x80004200 <getppid@plt>
> >    0x80009de6 <+42>:    bsrl 0x80003c9c <getpid@plt>
> >    0x80009dec <+48>:    pea 0x14
> >    0x80009df0 <+52>:    bsrl 0x80003fd0 <syscall@plt>
> >    0x80009df6 <+58>:    bsrl 0x80004200 <getppid@plt>
> >    0x80009dfc <+64>:    lea %sp@(24),%sp
> >    0x80009e00 <+68>:    cmpl %d0,%d3
> >    0x80009e02 <+70>:    beqs 0x80009e06 <fork_clone_io+74>
> >    0x80009e04 <+72>:    clrl %d2
> >    0x80009e06 <+74>:    movel %d2,%d0
> >    0x80009e08 <+76>:    movel %sp@+,%d2
> >    0x80009e0a <+78>:    movel %sp@+,%d3
> >    0x80009e0c <+80>:    rts
> > End of assembler dump.
> > (gdb)
> >
> > Is this clone syscall (0x78) really executing sys_clone3()? Also,
> 
> Nope, syscall no. 120 calls __sys_clone() which in turn calls 
> m68k_clone() which emulates sys_clone() (roundabout way due to different 
> calling conventions on m68k).
> 
> clone3 is syscall 435 (calling __sys_clone3() -> m68k_clone3() -> 
> sys_clone3()).
> 

What confused me was that 'git bisect' fingered what looked like a clone3 
patch, but it turns out that this patch affects anything that calls 
cgroup_can_fork(), that is, any syscalls that call copy_process().

> But as long as syscall() takes care of the calling convention, I see no 
> reason why that way of calling sys_clone() would fail.
> 

The interesting thing about the calling convention is that the flags make 
up a 32-bit quantity when passed to clone as an int, and a 64-bit quantity 
when passed to clone3 as struct clone_args.flags.

So I've just added some printk() statements and found that m68k_clone() 
messed up the flags in the kernel_clone_args struct: I'm seeing 
0xFFFFFFFF80000000 which explains how CLONE_INTO_CGROUP got set.

I'll send a patch.

Reply via email to