On 12/6/21 12:40 PM, Segher Boessenkool wrote:
On Mon, Dec 06, 2021 at 11:12:00AM -0700, Martin Sebor wrote:
On 11/13/21 1:37 PM, David Malcolm via Gcc-patches wrote:
Approach 1: Custom Address Spaces
=================================

GCC's C frontend supports target-specific address spaces; see:
   https://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html
Quoting the N1275 draft of ISO/IEC DTR 18037:
   "Address space names are ordinary identifiers, sharing the same name
   space as variables and typedef names.  Any such names follow the same
   rules for scope as other ordinary identifiers (such as typedef names).
   An implementation may provide an implementation-defined set of
   intrinsic address spaces that are, in effect, predefined at the start
   of every translation unit.  The names of intrinsic address spaces must
   be reserved identifiers (beginning with an underscore and an uppercase
   letter or with two underscores).  An implementation may also
   optionally support a means for new address space names to be defined
   within a translation unit."

Patch 1a in the following patch kit for GCC implements such a means to
define new address spaces names in a translation unit, via a pragma:
   #prgama GCC custom_address_space(NAME_OF_ADDRESS_SPACE)

For example, the Linux kernel could perhaps write:

   #define __kernel
   #pragma GCC custom_address_space(__user)
   #pragma GCC custom_address_space(__iomem)
   #pragma GCC custom_address_space(__percpu)
   #pragma GCC custom_address_space(__rcu)

and thus the C frontend can complain about code that mismatches __user
and kernel pointers, e.g.:

custom-address-space-1.c: In function ‘test_argpass_to_p’:
custom-address-space-1.c:29:14: error: passing argument 1 of
‘accepts_p’
>from pointer to non-enclosed address space
    29 |   accepts_p (p_user);
       |              ^~~~~~
custom-address-space-1.c:21:24: note: expected ‘void *’ but argument is
of type ‘__user void *’
    21 | extern void accepts_p (void *);
       |                        ^~~~~~
custom-address-space-1.c: In function ‘test_cast_k_to_u’:
custom-address-space-1.c:135:12: warning: cast to ‘__user’ address
space
pointer from disjoint generic address space pointer
   135 |   p_user = (void __user *)p_kernel;
       |            ^

This seems like an excellent use of named address spaces :)

It has some big problems though.

Named address spaces are completely target-specific.

My understanding of these kernel/user address spaces that David
is adding for the benefit of the analyzer is that the correspond
to what TR 18037 calls nested namespaces.  They're nested within
the generic namespace that's a union of the twp.  With that, I'd
expect them to be fully handled early on and be transparent
afterwards.  Is implementing this idea not feasible in the GCC
design?

Martin

Defining them with
a pragma like this does not allow you to set the pointer mode or
anything related to a custom LEGITIMATE_ADDRESS_P.  It does not allow
you to sayy zero pointers are invalid in some address spaces and not in
others.  You cannot provide any of the DWARF address space stuff this
way.  But most importantly, there are only four bits for the address
space field internally, and they are used by however a backend wants to
use them.

None of this cannot be solved, but all of it will have to be solved.

IMO it will be best to not mix this with address spaces in the user
interface (it is of course fine to *implement* it like that, or with
big overlap at least).

The patch doesn't yet maintain a good distinction between implicit
target-specific address spaces and user-defined address spaces,

And that will have to be fixed in the user code syntax at least.

has at
least one known major bug, and has only been lightly tested.  I can
fix these issues, but was hoping for feedback that this approach is the
right direction from both the GCC and Linux development communities.

Allowing the user to define new address spaces does not jibe well with
how targets do (validly!) use them.

Approach 2: An "untrusted" attribute
====================================

Alternatively, patch 1b in the kit implements:

   __attribute__((untrusted))

which can be applied to types as a qualifier (similarly to const,
volatile, etc) to mark a trust boundary, hence the kernel could have:

   #define __user __attribute__((untrusted))

where my patched GCC treats
   T *
vs
   T __attribute__((untrusted)) *
as being different types and thus the C frontend can complain (even without
-fanalyzer) about e.g.:

extern void accepts_p(void *);

void test_argpass_to_p(void __user *p_user)
{
   accepts_p(p_user);
}

untrusted-pointer-1.c: In function ‘test_argpass_to_p’:
untrusted-pointer-1.c:22:13: error: passing argument 1 of ‘accepts_p’
>from pointer with different trust level
    22 |   accepts_p(p_user);
       |              ^~~~~~
untrusted-pointer-1.c:14:23: note: expected ‘void *’ but argument is of
type ‘__attribute__((untrusted)) void *’
    14 | extern void accepts_p(void *);
       |                        ^~~~~~

So you'd get enforcement of __user vs non-__user pointers as part of
GCC's regular type-checking.  (You need an explicit cast to convert
between the untrusted vs trusted types).

As with the named address space idea, this approach also looks
reasonable to me.  If you anticipate using the attribute only
in the analyzer I would suggest to consider introducing it in
the analyzer's namespace (e.g., analyzer::untrusted, or even
gnu::analyzer::untrusted).

I don't see any fundamental problems with this approach.  It also is
very much in line with how Perl handles this (and some copycat languages
do as well), the "tainted" flag on data.

This approach is much less expressive that the custom addres space
approach; it would only cover the trust boundary aspect; it wouldn't
cover any differences between generic pointers and __user, vs __iomem,
__percpu, and __rcu which I admit I only dimly understand.

Yes, it does not have any of the big problems that come with those
address spaces either!  :-)

Other attributes
================

Patch 2 in the kit adds:
   __attribute__((returns_zero_on_success))
and
   __attribute__((returns_nonzero_on_success))
as hints to the analyzer that it's worth bifurcating the analysis of
such functions (to explore failure vs success, and thus to better
explore error-handling paths).  It's also a hint to the human reader of
the source code.

I thing being able to express something along these lines would
be useful even outside the analyzer, both for warnings and, when
done right, perhaps also for optimization.  So I'm in favor of
something like this.  I'll just reiterate here the comment on
this attribute I sent you privately some time ago.

What is "success" though?  You probably want it so some checker can make
sure you do handle failure some way, but how do you see what is handling
failure and what is handling the successful case?


Segher


Reply via email to