On Mon, Dec 12, 2016 at 11:45 AM, Dmitry Vyukov <dvyu...@google.com> wrote: > On Mon, Dec 12, 2016 at 11:29 AM, Dmitry Vyukov <dvyu...@google.com> wrote: >> On Wed, Nov 23, 2016 at 3:59 PM, <alexander.le...@verizon.com> wrote: >>> On Mon, Nov 21, 2016 at 03:48:17PM +0100, Dmitry Vyukov wrote: >>>> Several observations based on my experience with syzkaller descriptions: >>>> - there are 2 levels: physical and logical; >>>> on physical level there are int, pointer, array, struct, union; >>>> and that's pretty much it. >>>> on logical level there are flags, bitmasks, file paths, sctp socket fds, >>>> unix socket names, etc. >>>> These levels are almost completely orthogonal. It would be useful to >>>> clearly separate them on description level. E.g. now you have TYPE_PTR >>>> and >>>> TYPE_INT which is physical level; and then TYPE_FD which is also an int. >>>> >>>> - logical types won't fit into 64 bits, there are more of them >>> >>> I agree with your two points above. >>> >>> As an example, let's look at: >>> >>> int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event); >>> >>> epfd would be a physical int, logical "epoll_fd", and constrainted with >>> being >>> an open descriptor. >>> >>> fd on the other hand is tricky: it's a physical int, logical fd, and >>> constrainted with being a file descriptor that supports poll, and being >>> open. >>> >>> So while I think that logical types can be just a counter rather than a >>> bitmask >>> I suspect that our constraints won't fit into 64 bits. Make 2 64 bit fields? >> >> One observation is that there are just 5 physical types: >> - scalar >> - pointer >> - array >> - struct >> - union >> >> The rest deals with what exactly "scalar" is in a particular case. >> >> I don't yet have complete answer, as it somewhat intermixed with the >> rest of questions. >> >> >> >>>> - we need support for recursive types (yes, there are linked lists in >>>> kernel APIs) >>> >>> I imagine that this will be handled by specific logical type handlers we'll >>> need to implement. Can you give me an example and I'll try to code that? >> >> One example is te_oper_param here: >> https://android.googlesource.com/kernel/tegra/+/android-tegra-3.10/security/tlk_driver/ote_protocol.h >> next_ptr_user is a pointer to te_oper_param. Thus recursive definition. >> >> Another example is snd_seq_ev_quote: >> http://lxr.free-electrons.com/source/include/uapi/sound/asequencer.h#L194 >> it contains struct snd_seq_event *event and snd_seq_event recursively >> contains snd_seq_ev_quote. >> >> In all cases it is pointer recursion via structs. >> >> Sometimes it wish that developers have to write formal descriptions in >> a limited language upfront. That would probably eliminate lots of >> weird one-off "see what I invented here" cases :) >> >> >> >> >>>> - we need support for input/output data >>>> currently syzkaller does this only on pointer level, i.e. you >>>> attach direction to pointer target >>>> but that's not enough, frequently there is a struct where one field >>>> is input and another is output >>> >>> Assuming it's "data", for intput we'll just need to check that the given >>> length is readable and for output that the length is writable, no? >> >> It also can be an fd in a struct field. If it's output (e.g. pipe), >> then we must not check that it's valid on entry. But we may want to >> check that it's valid on successful exit, or fuzzer will use these >> output fd's as inputs to other calls. >> >> >>> We can do it with constraints right now. >>> >>>> - we may need support for reusing types in several arguments >>>> e.g. you may have a pretty complex type, and you don't want to >>>> write it out a dozen of times >>> >>> Yup, so if we go with the physical/logical split we can have handlers for >>> logical types. >>> >>>> - we need some support for discriminated syscalls >>>> if we want to support strace usecase, the support needs to be more >>>> extensive than what syzkaller has; >>>> i.e. syzkaller can't restore discrimination having actual argument >>>> values (it can do it only in the other direction) >>>> >>>> - I would not create a special support for arguments; >>>> rather I would create support for structs and struct fields, >>>> and then pretend that a syscalls effectively accepts a struct by value >>> >>> But that means I need a custom handler for every syscall to parse the >>> struct fields rather than a generic code that goes through the args and >>> calls >>> the right handler? >> >> No, you don't. We will need generic code that parses a piece of memory >> as a struct and splits it into fields anyway. >> We can just reuse this code to handle syscall arguments as follows. >> Describe syscall arguments as a pseudo struct (array of fields). Then >> syscall handling function accepts pointer to region of memory with >> arguments and description of the struct, and invokes common struct >> handling code. >> >> >> >>>> How would you like us to collaborate on this? >>>> If you share your git repo, I could form it into something that would >>>> be suitable for syzkaller and incorporate most of the above. >>> >>> I'd really like to have something that either generates these descriptions >>> from >>> your DSL (it really doesn't have to be perfect (at first)) or something that >>> generates DSL from these C structs. >> >> Do you mean generating C from my DSL of a one-off or as a permanent solution? > > > Main problem I am trying to resolve now is how to make types reference > other types. > Say we have "pointer to pointer to int" (the same applies to structs > and arrays). This means that struct type must be able to reference > another instance of struct type. But we can't put type into type by > value. Namely, the following is not possible: > > struct type { > int kind; > union { > ... > // for kind = KIND_PTR > struct { > struct type type; > } ptr; > }; > }; > > One obvious solution would be to always reference _pointers_ to types. E.g.: > > // for kind = KIND_PTR > struct { > struct type* type; > } ptr; > > This works but leads to super-verbose descriptions (if we want to use > static tables), because we need to describe type of each syscall > argument and inner types for all pointers/arrays/structs/unions as > separate global variables: > > > static struct type open_arg0_inner { > .kind = KIND_ARRAY, > ... > }; > > static struct type open_arg0 { > .kind = KIND_PTR, > .ptr.type = &open_arg0_inner > ... > }; > > static struct type open_arg1 { > .kind = KIND_SCALAR, > ... > }; > > static struct type open_arg2 { > .kind = KIND_SCALAR, > ... > }; > > static struct type open_ret { > .kind = KIND_SCALAR, > ... > }; > > struct syscall_spec syscall_spec_open = { > .name = "open", > .retval = { > .name = "retval", > .type = &open_ret, > } > .nargs = 3, > .args[0] = { > .name = "pathname", > .type = &open_arg0, > } > .args[1] = { > .name = "flags", > .type = &open_arg1, > } > .args[2] = { > .name = "mode", > .type = &open_arg2, > } > }; > > > This looks way too verbose provided that we write these descriptions > by hand (just coming up with consistent unique names for all these > global vars is a problem). > If we generate that from DSL, it can work though. > > Another option I am considering is to use helper construction > functions that return pointers to types, e.g. something along these > lines: > > struct syscall_spec syscall_spec_open = { > .name = "open", > .retval = { > .name = "retval", > .type = type_scalar(FD | ERRNO), > } > .nargs = 3, > .args[0] = { > .name = "pathname", > .type = type_ptr(DIR_IN, type_array(PATHNAME)), > } > .args[1] = { > .name = "flags", > .type = type_scalar_flags(O_RDONLY | O_WRONLY, ....), > } > .args[2] = { > .name = "mode", > .type = type_scalar_flags(S_IRWXU ....), > } > }; > > Now these table can only be initialized dynamically. And we will > probably need lots of these helper functions. So I don't like it > either... > > Any suggestions?
Here is my current prototype: https://github.com/dvyukov/linux/commit/6200a9333e78bef393f8ead41205813b94d340f3 For now it can trace arguments of 4 system calls: [ 4.055483] [pid 1258] open(&00007ffdefc023a0=[], 0x0, 0x1b6) [ 4.055991] [pid 1258] open(&00007ffdefc023a0=[], 0x0, 0x1b6) = 3 [ 4.056486] [pid 1258] read(0x3, &00007ffdefc01320=[], 0x1000) [ 4.056977] [pid 1258] read(0x3, &00007ffdefc01320=[], 0x1000) = 1780 [ 4.057485] [pid 1258] read(0x3, &00007f316a732000=[], 0x1000) [ 4.057991] [pid 1258] read(0x3, &00007f316a732000=[], 0x1000) = 0 [ 4.058488] [pid 1258] close(0x0) = 0 [ 4.058848] [pid 1258] write(0x1, &00007f316a732000=[], 0x5) [ 4.059304] [pid 1258] write(0x1, &00007f316a732000=[], 0x5) = 5 [ 4.059905] [pid 1234] close(0x0) = 0 [ 4.060239] [pid 1234] close(0x0) = 0 Main outstanding problems: - understanding length of arrays and buffers - understanding discriminators of unions and syscall variations