> On Nov 5, 2020, at 4:26 AM, Uros Bizjak <ubiz...@gmail.com> wrote:
>
> On Thu, Nov 5, 2020 at 1:14 PM Alexander Monakov <amona...@ispras.ru> wrote:
>
>>> I was also thinking of introducing of operand modifier, but Richi
>>> advises the following:
>>>
>>> --cut here--
>>> typedef __UINTPTR_TYPE__ uintptr_t;
>>>
>>> __seg_fs int x;
>>>
>>> uintptr_t test (void)
>>> {
>>> uintptr_t *p = (uintptr_t *)(uintptr_t) &x;
>>> uintptr_t addr;
>>>
>>> asm volatile ("lea %1, %0" : "=r"(addr) : "m"(*p));
>>>
>>> return addr;
>>> }
>>
>> This is even worse undefined behavior compared to my solution above:
>> this code references memory in uintptr_t type, while mine preserves the
>> original type via __typeof. So this can visibly break with TBAA (though
>> the kernel uses -fno-strict-aliasing, so this particular concern wouldn't
>> apply there).
>
> Agreed, but I was trying to solve this lone use case in the kernel. It
> fits this particular usage, so I found a bit of overkill to implement
> the otherwise useless operand modifier in gcc. As discussed
> previously, these hacks are needed exclusively in asm templates, they
> are not needed in "normal" C code.
>>
>> If you don't care about preserving sizeof and type you can use a cast to 
>> char:
>>
>> #define strip_as(mem) (*(char *)(intptr_t)&(mem))
>
> I hope that a developer from kernel can chime in and express their
> opinion on the proposed approaches.
>

I haven’t looked all that closely at precisely what the kernel needs,
but I’ve had bad experiences with passing imprecise things into asm
“m” and “=m” operands. GCC seems to assume, quite reasonably, that if
I pass a value via “m” or “=m”, then I read or write *that value*.
So, if we use type hackery to produce an lvalue or rvalue that has the
address space stripped, then I would imagine I get UB — GCC will try
to understand what value I’m reading or writing, and this will only
match what I’m actually doing by luck.

It’s kind of like doing this (sorry for whitespace damage):

int read_int(int *ptr)
{
int ret; uintptr_t tmp;
asm (
"lea %[val], %[tmp]\n\t"
"mov 4(%[tmp]), %[ret]"
: [ret] "=r" (ret), [tmp] "+r" (tmp)
: [val] "m" (*(ptr - 1)));
return ret;
}

That code is obviously rather contrived, but I think it's
fundamentally the same type of hack as all these typeofs.  I haven't
tested precisely what GCC does, but I suspect we have:

int foo;
read_int(&foo);  // UB

int foo[2];
read_int(foo[1]);  // Maybe UB, but maybe non-UB that returns garbage

So I think a better constraint type would be an improvement.  Or maybe
a more general "pointer" constraint could be invented for this and
other use cases:

[name] "p" (ptr)

With this constraint, ptr must be uintptr_t or intptr_t.  %[name]
refers to ptr, formatted as a dereference operation.  So the generated
asm is identical to [name] "m" (*(char *)ptr), but the semantics are
different.  The problem is that I don't know how to specify the
semantics, but at least the instant UB of building and dereferencing a
garbage pointer would be avoided.

--Andy

Reply via email to