Hi! On Wed, Dec 08, 2021 at 07:06:30PM -0500, David Malcolm wrote: > On Mon, 2021-12-06 at 13:40 -0600, Segher Boessenkool wrote: > > Named address spaces are completely target-specific. Defining them > > with > > a pragma like this does not allow you to set the pointer mode or > > anything related to a custom LEGITIMATE_ADDRESS_P. > > My thinking was that each custom address space is based on an existing > address space, but is disjoint from it, where "based on" means "what it > looks like in terms of RTL generation" (clearly I'm handwaving here). > > In patch 1a, the custom address spaces are all based on the generic > address space (but disjoint from it); syntax could be added to base > them on one of the target-specific address spaces. > > > It does not allow > > you to sayy zero pointers are invalid in some address spaces and not in > > others. > > Syntax could be added for this, I suppose. > > > You cannot provide any of the DWARF address space stuff this > > way. > > True. I confess that I haven't thought about the debugging experience, > and I'd need to think what would happen at the DWARF level. > > > But most importantly, there are only four bits for the address > > space field internally, and they are used by however a backend wants to > > use them. > > One of the ideas of patch 1a is to divide up this 4-bit space between > the target-specific and the custom address spaces. The backend code > would need to be tweaked to decode the 4-bit value to get at the > underlying target-specific address space value. This is done by the > function ensure_builtin_addr_space in patch 1a, though I've likely > missed some places. > > IIRC, the target that's currently using the most address spaces is avr, > which I believe has 8 target-specific address spaces, in addition to > the generic one, i.e. 9 builtin address spaces, which would leave room > for up to 6 user-defined address spaces.
Except that a backend is free to use this bitfield any way it pleases. All of the above says that what you want is something completely orthogonal to and separate from named address spaces. Very similar in some ways, sure, but keeping it apart will work much better and be much less pain. > The Linux kernel's smatch > annotations currently effectively introduce 4 custom address spaces, > __user, __iomem, __percpu, and __rcu (assuming that __kernel is the > generic address space), so it's something of a tight squeeze, but it > does fit. This doesn't account for out-of-tree backends, of course. Or any future backends. > > IMO it will be best to not mix this with address spaces in the user > > interface (it is of course fine to *implement* it like that, or with > > big overlap at least). > > I was thinking the other way around, in that it should look like > address spaces in terms of the user's source code, but has some > implementation differences. That does not solve any of the problems I brought up though. That was just a list of all the basic features from address spaces btw, from gccint. > > Allowing the user to define new address spaces does not jibe well with > > how targets do (validly!) use them. > > I think from a user's perspective it's a nice approach - my feeling is > that it makes certain things easier for the user, whilst complicating > things from a backend implementation perspective. > > Plus you've raised various technical issues which I'd have to resolve > if we went in this direction. It is fine to have a (very) similar concept for the user, but it does not work well at all to equate this to the existing concept of named address spaces. > Indeed - I think this "untrusted attribute" approach is much simpler > implementation-wise than the "custom address space" approach, which is > also in its favor. > > I'm wondering if anyone from the kernel development community has > strong opinions here, since the custom address space approach is > potentially much more expressive. Anything that is more expressive than you have thought through what the consequences will be is not a feature but a danger. Anything that does not fit in with the rest structurally now, will never do that. > Otherwise I think we're both preferring the "untrusted attribute" > approach (patch 1b). That attribute does not interfere with anything else afaics, so that is much safer. > > > I thing being able to express something along these lines would > > > be useful even outside the analyzer, both for warnings and, when > > > done right, perhaps also for optimization. So I'm in favor of > > > something like this. I'll just reiterate here the comment on > > > this attribute I sent you privately some time ago. > > > > What is "success" though? You probably want it so some checker can > > make > > sure you do handle failure some way, but how do you see what is > > handling > > failure and what is handling the successful case? > > "success" and "failure" in this case are purely in terms of how we > label events for the user in the analyzer, such as in event (3) in the > following: > > infoleak-antipatterns-1.c: In function ‘infoleak_stack_unchecked_err’: > infoleak-antipatterns-1.c:118:10: warning: potential exposure of > sensitive information by copying uninitialized data from stack across > trust boundary [CWE-200] [-Wanalyzer-exposure-through-uninit-copy] > 118 | err |= copy_to_user (dst, &st, sizeof(st)); > | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > ‘infoleak_stack_unchecked_err’: events 1-4 > | > | 110 | struct infoleak_buf st; > | | ^~ > | | | > | | (1) source region created on stack here > | | (2) capacity: 256 bytes > |...... > | 117 | int err = copy_from_user (&st, src, sizeof(st)); > | | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > | | | > | | (3) when ‘copy_from_user’ fails, returning non-zero > | 118 | err |= copy_to_user (dst, &st, sizeof(st)); > | | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > | | | > | | (4) uninitialized data copied from stack here > | > > i.e. it's allows the analyzer to provide a hint to the reader of the > analyzer output. The attribute is also a hint to the human reader of > the source code. But how do you tell the analyser what is success and what is failure? Do you always count non-zero return values as failure, like here? There are other conventions (negative means error, zero means error, etc.) Segher