Re: [PATCH 0/6] RFC: adding support to GCC for detecting trust boundaries

Martin Sebor via Gcc-patches Mon, 06 Dec 2021 10:12:14 -0800

On 11/13/21 1:37 PM, David Malcolm via Gcc-patches wrote:

[Crossposting between gcc-patches@gcc.gnu.org and
linux-toolcha...@vger.kernel.org; sorry about my lack of kernel
knowledge, in case of the following seems bogus]


I've been trying to turn my prototype from the LPC2021 session on
"Adding kernel-specific test coverage to GCC's -fanalyzer option"
( https://linuxplumbersconf.org/event/11/contributions/1076/ ) into
something that can go into GCC upstream without adding kernel-specific
special cases, or requiring a GCC plugin.  The prototype simply
specialcased "copy_from_user" and "copy_to_user" in GCC, which is
clearly not OK.

This GCC patch kit implements detection of "trust boundaries", aimed at
detection of "infoleaks" and of use of unsanitized attacker-controlled
values ("taint") in the Linux kernel.

For example, here's an infoleak diagnostic (using notes to
express what fields and padding within a struct have not been
initialized):

infoleak-CVE-2011-1078-2.c: In function ‘test_1’:
infoleak-CVE-2011-1078-2.c:28:9: warning: potential exposure of sensitive
   information by copying uninitialized data from stack across trust
   boundary [CWE-200] [-Wanalyzer-exposure-through-uninit-copy]
    28 |         copy_to_user(optval, &cinfo, sizeof(cinfo));
       |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   ‘test_1’: events 1-3
     |
     |   21 |         struct sco_conninfo cinfo;
     |      |                             ^~~~~
     |      |                             |
     |      |                             (1) region created on stack here
     |      |                             (2) capacity: 6 bytes
     |......
     |   28 |         copy_to_user(optval, &cinfo, sizeof(cinfo));
     |      |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     |      |         |
     |      |         (3) uninitialized data copied from stack here
     |
infoleak-CVE-2011-1078-2.c:28:9: note: 1 byte is uninitialized
    28 |         copy_to_user(optval, &cinfo, sizeof(cinfo));
       |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
infoleak-CVE-2011-1078-2.c:14:15: note: padding after field ‘dev_class’ is 
uninitialized (1 byte)
    14 |         __u8  dev_class[3];
       |               ^~~~~~~~~
infoleak-CVE-2011-1078-2.c:21:29: note: suggest forcing zero-initialization by 
providing a ‘{0}’ initializer
    21 |         struct sco_conninfo cinfo;
       |                             ^~~~~
       |                                   = {0}

I have to come up with a way of expressing trust boundaries in a way
that will be:
- acceptable to the GCC community (not be too kernel-specific), and
- useful to the Linux kernel community.

At LPC it was pointed out that the kernel already has various
annotations e.g. "__user" for different kinds of pointers, and that it
would be best to reuse those.


Approach 1: Custom Address Spaces
=================================

GCC's C frontend supports target-specific address spaces; see:
   https://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html
Quoting the N1275 draft of ISO/IEC DTR 18037:
   "Address space names are ordinary identifiers, sharing the same name
   space as variables and typedef names.  Any such names follow the same
   rules for scope as other ordinary identifiers (such as typedef names).
   An implementation may provide an implementation-defined set of
   intrinsic address spaces that are, in effect, predefined at the start
   of every translation unit.  The names of intrinsic address spaces must
   be reserved identifiers (beginning with an underscore and an uppercase
   letter or with two underscores).  An implementation may also
   optionally support a means for new address space names to be defined
   within a translation unit."

Patch 1a in the following patch kit for GCC implements such a means to
define new address spaces names in a translation unit, via a pragma:
   #prgama GCC custom_address_space(NAME_OF_ADDRESS_SPACE)

For example, the Linux kernel could perhaps write:

   #define __kernel
   #pragma GCC custom_address_space(__user)
   #pragma GCC custom_address_space(__iomem)
   #pragma GCC custom_address_space(__percpu)
   #pragma GCC custom_address_space(__rcu)

and thus the C frontend can complain about code that mismatches __user
and kernel pointers, e.g.:

custom-address-space-1.c: In function ‘test_argpass_to_p’:
custom-address-space-1.c:29:14: error: passing argument 1 of ‘accepts_p’
from pointer to non-enclosed address space
    29 |   accepts_p (p_user);
       |              ^~~~~~
custom-address-space-1.c:21:24: note: expected ‘void *’ but argument is
of type ‘__user void *’
    21 | extern void accepts_p (void *);
       |                        ^~~~~~
custom-address-space-1.c: In function ‘test_cast_k_to_u’:
custom-address-space-1.c:135:12: warning: cast to ‘__user’ address space
pointer from disjoint generic address space pointer
   135 |   p_user = (void __user *)p_kernel;
       |            ^


This seems like an excellent use of named address spaces :)

I'm familiar with TR 18037 but I'm not an expert on this stuff
so I can't really say a whole lot more.

My only suggestion here is to follow the terminology from
there in the naming of the pragma, unless you have some reason
not to.  I'd also recommend to consider other implementations
of named address spaces, if there are any, especially those
that try to be compatible with GCC.  If there are none, rather
than custom_address_space I'd suggest either just address_space
or named_address_space.

I have not yet looked at the implementation so this is just
a high-level comment on the design.

The patch doesn't yet maintain a good distinction between implicit
target-specific address spaces and user-defined address spaces, has at
least one known major bug, and has only been lightly tested.  I can
fix these issues, but was hoping for feedback that this approach is the
right direction from both the GCC and Linux development communities.

Implementation status: doesn't yet bootstrap; am running into stage2
vs stage3 comparison issues.


Approach 2: An "untrusted" attribute
====================================

Alternatively, patch 1b in the kit implements:

   __attribute__((untrusted))

which can be applied to types as a qualifier (similarly to const,
volatile, etc) to mark a trust boundary, hence the kernel could have:

   #define __user __attribute__((untrusted))

where my patched GCC treats
   T *
vs
   T __attribute__((untrusted)) *
as being different types and thus the C frontend can complain (even without
-fanalyzer) about e.g.:

extern void accepts_p(void *);

void test_argpass_to_p(void __user *p_user)
{
   accepts_p(p_user);
}

untrusted-pointer-1.c: In function ‘test_argpass_to_p’:
untrusted-pointer-1.c:22:13: error: passing argument 1 of ‘accepts_p’
from pointer with different trust level
    22 |   accepts_p(p_user);
       |              ^~~~~~
untrusted-pointer-1.c:14:23: note: expected ‘void *’ but argument is of
type ‘__attribute__((untrusted)) void *’
    14 | extern void accepts_p(void *);
       |                        ^~~~~~

So you'd get enforcement of __user vs non-__user pointers as part of
GCC's regular type-checking.  (You need an explicit cast to convert
between the untrusted vs trusted types).


As with the named address space idea, this approach also looks
reasonable to me.  If you anticipate using the attribute only
in the analyzer I would suggest to consider introducing it in
the analyzer's namespace (e.g., analyzer::untrusted, or even
gnu::analyzer::untrusted).

I'll try to loook at the patch itself sometime later this week
and comment on the implementation there.


This approach is much less expressive that the custom addres space
approach; it would only cover the trust boundary aspect; it wouldn't
cover any differences between generic pointers and __user, vs __iomem,
__percpu, and __rcu which I admit I only dimly understand.

Implementation status: bootstraps and passes regression testing.
Builds most of the kernel, but am running into various conversion
issues.  It would be good to have some clarity on what conversions
the compiler ought to warn about, and what conversions should be OK.


Approach 3: some kind of custom qualifier
=========================================

Approach 1 extends the existing "named address space" machinery to add
new values; approach 2 adds a new flag to cv-qualifiers.  Both of these
approaches work in terms of cv-qualifiers.  We have some spare bits
available for these; perhaps a third approach could be to add a new
kind of user-defined qualifier, like named address spaces, but othogonal
to them.   I haven't attempted to implement this.


I'm afraid I don't understand what this would be useful for
enough to comment.

Other attributes
================

Patch 2 in the kit adds:
   __attribute__((returns_zero_on_success))
and
   __attribute__((returns_nonzero_on_success))
as hints to the analyzer that it's worth bifurcating the analysis of
such functions (to explore failure vs success, and thus to better
explore error-handling paths).  It's also a hint to the human reader of
the source code.


I thing being able to express something along these lines would
be useful even outside the analyzer, both for warnings and, when
done right, perhaps also for optimization.  So I'm in favor of
something like this.  I'll just reiterate here the comment on
this attribute I sent you privately some time ago.

A more general attribute would also make it possible to specify
the value (or argument) on success and failure.  With those we
would be able to express the return values of the POSIX read and
write functions and others like it:

  ssize_t read (int fildes, void *buf, size_t nbyte);
  ssize_t write (int fildes, const void *buf, size_t nbyte);

I.e., it would be nice to express that the return value is
also the number of bytes (elements?) of the array the function
wrote into.  This, along with symbolic evaluation in the middle
end, would then let us detect uninitialized reads back in
the function's caller (after read) and similar.

This is just an idea, and there may be more general apoproaches
that would be even more expressive.  But it's probably too late
in the development cycle to design and add those to GCC 12.

As I promised, I'll try to look at the meat of each patch and
give you some comments, hopefully later this week.

Martin


Given the above, the kernel could then have:

extern int copy_from_user(void *to, const void __user *from, long n)
   __attribute__((access (write_only, 1, 3),
                 access (read_only, 2, 3),
                 returns_zero_on_success));

extern long copy_to_user(void __user *to, const void *from, unsigned long n)
   __attribute__((access (write_only, 1, 3),
                 access (read_only, 2, 3),
                 returns_zero_on_success));

with suitable macros in compiler.h or whatnot.

("access" is an existing GCC attribute; see
  https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html )

My patched GCC add a heuristic to -fanalyzer that a 3-argument function
with a read_only buffer, a write_only buffer and a shared size argument
is a "copy function", and treats it as a copy from *from to *to of up to
n bytes that succeeds, or, given one of the above attributes can succeed
or fail.  I'm wiring things up so that values read from *untrusted_ptr
are tracked as tainted, and values written to *untrusted_ptr are treated
as possible infoleaks (e.g. uninitialized values written to
*untrusted_ptr are specifically called out).  This gets the extra
checking for infoleaks and taint that my earlier prototype had, but is
thus expressed via attributes, without having to have kernel-specific
special cases.

Patch 3 of the kit adds infoleak detection to GCC's -fanalyzer (as
in the example above).

Possibly silly question: is it always a bug for the value of a kernel
pointer to leak into user space?  i.e. should I be complaining about an
infoleak if the value of a trusted_ptr itself is written to
*untrusted_ptr?  e.g.

   s.p = some_kernel_ptr;
   copy_to_user(user_p, &s, sizeof (s));
      /* value of some_kernel_ptr is written to user space;
         is this something we should warn for?  */

Patch 4a/4b wire up the different implementations of "untrusted" into
GCC's -fanalyzer, which is used by...

Patch 5 uses this so that "untrusted" values are used in taint detection
in the analyzer, so that it can complain about attacker-controlled
values being used without sanitization.

Patch 6 adds a new __attribute__ ((tainted)) allowing for further
taint detection (e.g. identifying syscalls), with minimal patching of
the kernel, and without requiring a lot of link-time interprocedural
analysis.  I believe that some of this could work independently of
the trust boundary marking from the rest of the patch kit.

The combined patch kit (using approach 2 i.e. the "b" patches)
successfully bootstraps and passes regression testing on
x86_64-pc-linux-gnu.


Which of the 3 approaches looks best to:
- the GCC community?
- the Linux kernel community?

Does clang/LLVM have anything similar?

There are many examples in the patches, some of which are taken from
historical kernel vulnerabilities, and others from my "antipatterns.ko"
project ( https://github.com/davidmalcolm/antipatterns.ko ).

Thoughts?

Dave


David Malcolm (6 or 8, depending how you count):
   1a: RFC: Implement "#pragma GCC custom_address_space"
   1b: Add __attribute__((untrusted))
   2: Add returns_zero_on_success/failure attributes
   3: analyzer: implement infoleak detection
   4a: analyzer: implemention of region::untrusted_p in terms of custom
     address spaces
   4b: analyzer: implement region::untrusted_p in terms of
     __attribute__((untrusted))
   5: analyzer: use region::untrusted_p in taint detection
   6: Add __attribute__ ((tainted))

  gcc/Makefile.in                               |   3 +-
  gcc/analyzer/analyzer.opt                     |  20 +
  gcc/analyzer/checker-path.cc                  | 104 +++
  gcc/analyzer/checker-path.h                   |  47 +
  gcc/analyzer/diagnostic-manager.cc            |  75 +-
  gcc/analyzer/diagnostic-manager.h             |   3 +-
  gcc/analyzer/engine.cc                        | 342 ++++++-
  gcc/analyzer/exploded-graph.h                 |   3 +
  gcc/analyzer/pending-diagnostic.cc            |  30 +
  gcc/analyzer/pending-diagnostic.h             |  24 +
  gcc/analyzer/program-state.cc                 |  26 +-
  gcc/analyzer/region-model-impl-calls.cc       |  26 +-
  gcc/analyzer/region-model.cc                  | 504 ++++++++++-
  gcc/analyzer/region-model.h                   |  46 +-
  gcc/analyzer/region.cc                        |  52 ++
  gcc/analyzer/region.h                         |   4 +
  gcc/analyzer/sm-taint.cc                      | 839 ++++++++++++++++--
  gcc/analyzer/sm.h                             |   9 +
  gcc/analyzer/store.h                          |   1 +
  gcc/analyzer/trust-boundaries.cc              | 615 +++++++++++++
  gcc/c-family/c-attribs.c                      | 132 +++
  gcc/c-family/c-pretty-print.c                 |   2 +
  gcc/c/c-typeck.c                              |  64 ++
  gcc/doc/extend.texi                           |  63 +-
  gcc/doc/invoke.texi                           |  80 +-
  gcc/print-tree.c                              |   3 +
  .../c-c++-common/attr-returns-zero-on-1.c     |  68 ++
  gcc/testsuite/c-c++-common/attr-untrusted-1.c | 165 ++++
  .../gcc.dg/analyzer/attr-tainted-1.c          |  88 ++
  .../gcc.dg/analyzer/attr-tainted-misuses.c    |   6 +
  .../gcc.dg/analyzer/copy-function-1.c         |  98 ++
  .../gcc.dg/analyzer/copy_from_user-1.c        |  45 +
  gcc/testsuite/gcc.dg/analyzer/infoleak-1.c    | 181 ++++
  gcc/testsuite/gcc.dg/analyzer/infoleak-2.c    |  29 +
  gcc/testsuite/gcc.dg/analyzer/infoleak-3.c    | 141 +++
  gcc/testsuite/gcc.dg/analyzer/infoleak-5.c    |  35 +
  .../analyzer/infoleak-CVE-2011-1078-1.c       | 134 +++
  .../analyzer/infoleak-CVE-2011-1078-2.c       |  42 +
  .../analyzer/infoleak-CVE-2014-1446-1.c       | 117 +++
  .../analyzer/infoleak-CVE-2017-18549-1.c      | 101 +++
  .../analyzer/infoleak-CVE-2017-18550-1.c      | 171 ++++
  .../gcc.dg/analyzer/infoleak-antipatterns-1.c | 162 ++++
  .../gcc.dg/analyzer/infoleak-fixit-1.c        |  22 +
  gcc/testsuite/gcc.dg/analyzer/pr93382.c       |   2 +-
  .../analyzer/taint-CVE-2011-0521-1-fixed.c    | 113 +++
  .../gcc.dg/analyzer/taint-CVE-2011-0521-1.c   | 113 +++
  .../analyzer/taint-CVE-2011-0521-2-fixed.c    |  93 ++
  .../gcc.dg/analyzer/taint-CVE-2011-0521-2.c   |  93 ++
  .../analyzer/taint-CVE-2011-0521-3-fixed.c    |  56 ++
  .../gcc.dg/analyzer/taint-CVE-2011-0521-3.c   |  57 ++
  .../gcc.dg/analyzer/taint-CVE-2011-0521-4.c   |  40 +
  .../gcc.dg/analyzer/taint-CVE-2011-0521-5.c   |  42 +
  .../gcc.dg/analyzer/taint-CVE-2011-0521-6.c   |  37 +
  .../gcc.dg/analyzer/taint-CVE-2011-0521.h     | 136 +++
  .../gcc.dg/analyzer/taint-CVE-2011-2210-1.c   |  93 ++
  .../gcc.dg/analyzer/taint-CVE-2020-13143-1.c  |  38 +
  .../gcc.dg/analyzer/taint-CVE-2020-13143-2.c  |  32 +
  .../gcc.dg/analyzer/taint-CVE-2020-13143.h    |  91 ++
  gcc/testsuite/gcc.dg/analyzer/taint-alloc-1.c |  64 ++
  gcc/testsuite/gcc.dg/analyzer/taint-alloc-2.c |  27 +
  gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c |  21 +
  gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c |  31 +
  .../gcc.dg/analyzer/taint-antipatterns-1.c    | 137 +++
  .../gcc.dg/analyzer/taint-divisor-1.c         |  26 +
  .../{taint-1.c => taint-read-index-1.c}       |  19 +-
  .../gcc.dg/analyzer/taint-read-offset-1.c     | 128 +++
  .../taint-read-through-untrusted-ptr-1.c      |  37 +
  gcc/testsuite/gcc.dg/analyzer/taint-size-1.c  |  32 +
  .../gcc.dg/analyzer/taint-write-index-1.c     | 132 +++
  .../gcc.dg/analyzer/taint-write-offset-1.c    | 132 +++
  gcc/testsuite/gcc.dg/analyzer/test-uaccess.h  |  19 +
  .../torture/infoleak-net-ethtool-ioctl.c      |  78 ++
  .../torture/infoleak-vfio_iommu_type1.c       |  39 +
  gcc/tree-core.h                               |   6 +-
  gcc/tree.c                                    |   1 +
  gcc/tree.h                                    |  11 +-
  76 files changed, 6558 insertions(+), 140 deletions(-)
  create mode 100644 gcc/analyzer/trust-boundaries.cc
  create mode 100644 gcc/testsuite/c-c++-common/attr-returns-zero-on-1.c
  create mode 100644 gcc/testsuite/c-c++-common/attr-untrusted-1.c
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/attr-tainted-1.c
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/attr-tainted-misuses.c
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/copy-function-1.c
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/copy_from_user-1.c
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-1.c
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-2.c
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-3.c
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-5.c
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-CVE-2011-1078-1.c
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-CVE-2011-1078-2.c
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-CVE-2014-1446-1.c
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-CVE-2017-18549-1.c
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-CVE-2017-18550-1.c
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-antipatterns-1.c
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-fixit-1.c
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-1-fixed.c
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-1.c
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-2-fixed.c
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-2.c
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-3-fixed.c
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-3.c
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-4.c
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-5.c
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-6.c
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521.h
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-2210-1.c
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-1.c
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-2.c
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143.h
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-alloc-1.c
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-alloc-2.c
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-antipatterns-1.c
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-divisor-1.c
  rename gcc/testsuite/gcc.dg/analyzer/{taint-1.c => taint-read-index-1.c} (72%)
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-read-offset-1.c
  create mode 100644 
gcc/testsuite/gcc.dg/analyzer/taint-read-through-untrusted-ptr-1.c
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-size-1.c
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-write-index-1.c
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-write-offset-1.c
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/test-uaccess.h
  create mode 100644 
gcc/testsuite/gcc.dg/analyzer/torture/infoleak-net-ethtool-ioctl.c
  create mode 100644 
gcc/testsuite/gcc.dg/analyzer/torture/infoleak-vfio_iommu_type1.c

Re: [PATCH 0/6] RFC: adding support to GCC for detecting trust boundaries

Reply via email to