Backporting KAsan patches to 4.9 branch

2014-09-18 Thread Yury Gribov
Hi all,

Kernel Asan patches are currently being discussed in LKML. One of the 
points raised during review was that KAsan requires GCC 5.0 which is 
presumably unstable (e.g. compilation of kernel modules has been broken 
for two months due to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61848).


Would it make sense to backport Kasan-related patches to 4.9 branch to 
make this feature more accessible to kernel developers? Quick analysis 
showed that at the very least this would require

* r211091 (BUILT_IN_ASAN_REPORT_LOAD_N and friends)
* r211092 (instrument unaligned accesses)
* r211713 and r211699 (New asan-instrumentation-with-call-threshold 
parameter)

* r213367 (initial support for -fsanitize=kernel-address)
and also maybe ~10 bugfix patches.

Is it ok to backport these to 4.9? Note that I would discard patches for 
other sanitizers (UBsan, Tsan).


-Y


Re: Backporting KAsan patches to 4.9 branch

2014-09-18 Thread Jakub Jelinek
On Thu, Sep 18, 2014 at 01:46:21PM +0400, Yury Gribov wrote:
> Kernel Asan patches are currently being discussed in LKML. One of the points
> raised during review was that KAsan requires GCC 5.0 which is presumably
> unstable (e.g. compilation of kernel modules has been broken for two months
> due to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61848).
> 
> Would it make sense to backport Kasan-related patches to 4.9 branch to make
> this feature more accessible to kernel developers? Quick analysis showed
> that at the very least this would require
> * r211091 (BUILT_IN_ASAN_REPORT_LOAD_N and friends)
> * r211092 (instrument unaligned accesses)
> * r211713 and r211699 (New asan-instrumentation-with-call-threshold
> parameter)
> * r213367 (initial support for -fsanitize=kernel-address)
> and also maybe ~10 bugfix patches.
> 
> Is it ok to backport these to 4.9? Note that I would discard patches for
> other sanitizers (UBsan, Tsan).

I'd say so, if it doesn't need any library changes (especially not any ABI
visible ones, guess bugfixes could be acceptable).

What asan related patches are still pending review (sorry for missing some)?
Do we have any known regressions in 5 from 4.9?  Those would need to be
resolved first.

Jakub


Re: Backporting KAsan patches to 4.9 branch

2014-09-18 Thread Yury Gribov
On 09/18/2014 01:57 PM, Jakub Jelinek wrote:

On Thu, Sep 18, 2014 at 01:46:21PM +0400, Yury Gribov wrote:

Kernel Asan patches are currently being discussed in LKML. One of the points
raised during review was that KAsan requires GCC 5.0 which is presumably
unstable (e.g. compilation of kernel modules has been broken for two months
due to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61848).

Would it make sense to backport Kasan-related patches to 4.9 branch to make
this feature more accessible to kernel developers? Quick analysis showed
that at the very least this would require
* r211091 (BUILT_IN_ASAN_REPORT_LOAD_N and friends)
* r211092 (instrument unaligned accesses)
* r211713 and r211699 (New asan-instrumentation-with-call-threshold
parameter)
* r213367 (initial support for -fsanitize=kernel-address)
and also maybe ~10 bugfix patches.

Is it ok to backport these to 4.9? Note that I would discard patches for
other sanitizers (UBsan, Tsan).


I'd say so, if it doesn't need any library changes (especially not any ABI
visible ones, guess bugfixes could be acceptable).


Cool! I'll go for it then.


What asan related patches are still pending review (sorry for missing some)?


Np, AFAIK there are just two:
* add -fasan-shadow-offset 
(https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01170.html)
* enable -fsanitize-recover for KAsan by default 
(https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01169.html)



Do we have any known regressions in 5 from 4.9?


Not that I know of.

-Y


[RFC] Add asm constraint modifier to mark strict memory accesses

2014-09-18 Thread Yury Gribov
Hi all,

Current semantics of memory constraints in GCC inline asm (i.e. "m", 
"v", etc.) is somewhat loosy in that it tells GCC that asm code _may_ 
access given amount of bytes but is not guaranteed to do so. This is 
(ab)used by e.g. glibc (and also some pieces of kernel):

__STRING_INLINE void *
__rawmemchr (const void *__s, int __c)
{
...
  __asm__ __volatile__
("cld\n\t"
 "repne; scasb\n\t"
...
   "m" ( *(struct { char __x[0xfff]; } *)__s)

Imprecise size specification prevents code analysis tools from 
understanding semantics of inline asm (without parsing inline asm 
instructions which e.g. Asan in Clang tries to do). In particular we 
can't automatically instrument inline asm in kernel with Kasan because 
we can not determine exact access size (see e.g. discussion in 
https://gcc.gnu.org/ml/gcc-patches/2014-05/msg02530.html).


Would it make sense to add another constraint modifier (like "=", "&", 
etc.) that would tell compiler/tool that memory access in asm is 
_guaranteed_ to have the specified size?


-Y


Re: [RFC] Add asm constraint modifier to mark strict memory accesses

2014-09-18 Thread Jakub Jelinek
On Thu, Sep 18, 2014 at 03:09:34PM +0400, Yury Gribov wrote:
> Current semantics of memory constraints in GCC inline asm (i.e. "m", "v",
> etc.) is somewhat loosy in that it tells GCC that asm code _may_ access
> given amount of bytes but is not guaranteed to do so. This is (ab)used by
> e.g. glibc (and also some pieces of kernel):
> __STRING_INLINE void *
> __rawmemchr (const void *__s, int __c)
> {
> ...
>   __asm__ __volatile__
> ("cld\n\t"
>  "repne; scasb\n\t"
> ...
>"m" ( *(struct { char __x[0xfff]; } *)__s)
> 
> Imprecise size specification prevents code analysis tools from understanding
> semantics of inline asm (without parsing inline asm instructions which e.g.
> Asan in Clang tries to do). In particular we can't automatically instrument
> inline asm in kernel with Kasan because we can not determine exact access
> size (see e.g. discussion in
> https://gcc.gnu.org/ml/gcc-patches/2014-05/msg02530.html).
> 
> Would it make sense to add another constraint modifier (like "=", "&", etc.)
> that would tell compiler/tool that memory access in asm is _guaranteed_ to
> have the specified size?

CCing Richard/Jeff on this for thoughts.

Would that modifier mean that the inline asm is unconditionally reading
resp. writing that memory? "m"/"=m" right now is always about might read or
might write, not must.

In any case, as no GCC versions support that, you'd need to heavily macroize
it in the kernel, not sure the kernel people would like that very much.

Jakub


Re: [RFC] Add asm constraint modifier to mark strict memory accesses

2014-09-18 Thread Yury Gribov
On 09/18/2014 03:09 PM, Yury Gribov wrote:

Hi all,

Current semantics of memory constraints in GCC inline asm (i.e. "m",
"v", etc.) is somewhat loosy in that it tells GCC that asm code _may_
access given amount of bytes but is not guaranteed to do so. This is
(ab)used by e.g. glibc (and also some pieces of kernel):
__STRING_INLINE void *
__rawmemchr (const void *__s, int __c)
{
...
   __asm__ __volatile__
 ("cld\n\t"
  "repne; scasb\n\t"
...
"m" ( *(struct { char __x[0xfff]; } *)__s)

Imprecise size specification prevents code analysis tools from
understanding semantics of inline asm (without parsing inline asm
instructions which e.g. Asan in Clang tries to do). In particular we
can't automatically instrument inline asm in kernel with Kasan because
we can not determine exact access size (see e.g. discussion in
https://gcc.gnu.org/ml/gcc-patches/2014-05/msg02530.html).

Would it make sense to add another constraint modifier (like "=", "&",
etc.) that would tell compiler/tool that memory access in asm is
_guaranteed_ to have the specified size?

-Y



Added kernel folks.


Re: [RFC] Add asm constraint modifier to mark strict memory accesses

2014-09-18 Thread Yury Gribov
On 09/18/2014 03:16 PM, Jakub Jelinek wrote:

On Thu, Sep 18, 2014 at 03:09:34PM +0400, Yury Gribov wrote:

Current semantics of memory constraints in GCC inline asm (i.e. "m", "v",
etc.) is somewhat loosy in that it tells GCC that asm code _may_ access
given amount of bytes but is not guaranteed to do so. This is (ab)used by
e.g. glibc (and also some pieces of kernel):
__STRING_INLINE void *
__rawmemchr (const void *__s, int __c)
{
...
   __asm__ __volatile__
 ("cld\n\t"
  "repne; scasb\n\t"
...
"m" ( *(struct { char __x[0xfff]; } *)__s)

Imprecise size specification prevents code analysis tools from understanding
semantics of inline asm (without parsing inline asm instructions which e.g.
Asan in Clang tries to do). In particular we can't automatically instrument
inline asm in kernel with Kasan because we can not determine exact access
size (see e.g. discussion in
https://gcc.gnu.org/ml/gcc-patches/2014-05/msg02530.html).

Would it make sense to add another constraint modifier (like "=", "&", etc.)
that would tell compiler/tool that memory access in asm is _guaranteed_ to
have the specified size?


CCing Richard/Jeff on this for thoughts.

Would that modifier mean that the inline asm is unconditionally reading
resp. writing that memory? "m"/"=m" right now is always about might read or
might write, not must.


Yes, that's what I had in mind. Many inline asms (at least in kernel) do 
read memory region unconditionally.



In any case, as no GCC versions support that, you'd need to heavily macroize
it in the kernel, not sure the kernel people would like that very much.


They said they could think about it.

-Y



Re: [RFC] Add asm constraint modifier to mark strict memory accesses

2014-09-18 Thread Jeff Law
On 09/18/14 05:19, Yury Gribov wrote:


Would that modifier mean that the inline asm is unconditionally reading
resp. writing that memory? "m"/"=m" right now is always about might
read or might write, not must.


Yes, that's what I had in mind. Many inline asms (at least in kernel) do
read memory region unconditionally.
That's precisely what I'd expect such a modifier to mean.  Right now 
memory modifiers are strictly "may" but I can see a use case for "must".


I think the question is will the kernel or glibc folks use that new 
capability and if so, do we get a significant improvement in the amount 
of checking we can do.So I think both those groups need to be looped 
into this conversation.


From an implementation standpoint, are you thinking a different 
modifier (my first choice)?  That wouldn't allow us to say something 
like the first X bytes of this memory region are written and the 
remaining Y bytes may be written, but I suspect that's not a use case 
we're likely to care about.


jeff



Re: [RFC] Add asm constraint modifier to mark strict memory accesses

2014-09-18 Thread Richard Biener
On September 18, 2014 3:36:24 PM CEST, Jeff Law  wrote:
>On 09/18/14 05:19, Yury Gribov wrote:
>>>
>>> Would that modifier mean that the inline asm is unconditionally
>reading
>>> resp. writing that memory? "m"/"=m" right now is always about might
>>> read or might write, not must.
>>
>> Yes, that's what I had in mind. Many inline asms (at least in kernel)
>do
>> read memory region unconditionally.
>That's precisely what I'd expect such a modifier to mean.  Right now 
>memory modifiers are strictly "may" but I can see a use case for
>"must".
>
>I think the question is will the kernel or glibc folks use that new 
>capability and if so, do we get a significant improvement in the amount
>
>of checking we can do.So I think both those groups need to be
>looped 
>into this conversation.
>
> From an implementation standpoint, are you thinking a different 
>modifier (my first choice)?  That wouldn't allow us to say something 
>like the first X bytes of this memory region are written and the 
>remaining Y bytes may be written, but I suspect that's not a use case 
>we're likely to care about.

It would also enable us to do more DSE as the asm stmt is then known to kill a 
specific part of memory.  Maybe we even want to constrain the effective type of 
the memory accesses so we can do TBAA against inline asms?

Richard.

>jeff




Re: [RFC] Add asm constraint modifier to mark strict memory accesses

2014-09-18 Thread Yury Gribov
On 09/18/2014 05:36 PM, Jeff Law wrote:

On 09/18/14 05:19, Yury Gribov wrote:


Would that modifier mean that the inline asm is unconditionally reading
resp. writing that memory? "m"/"=m" right now is always about might
read or might write, not must.


Yes, that's what I had in mind. Many inline asms (at least in kernel) do
read memory region unconditionally.

That's precisely what I'd expect such a modifier to mean.  Right now
memory modifiers are strictly "may" but I can see a use case for "must".

I think the question is will the kernel or glibc folks use that new
capability and if so, do we get a significant improvement in the amount
of checking we can do.So I think both those groups need to be looped
into this conversation.


Right. Should I x-post or better send separate emails and then report 
feedback on GCC list?



 From an implementation standpoint, are you thinking a different
modifier (my first choice)?


So we have constraints ("m", "v", "<", etc.) and modifiers which can be 
attached to arbitrary constraints ("+", "=", "&", etc.). I though about 
adding a new modifier so that it could be added to arbitrary memory 
constraint as needed.



That wouldn't allow us to say something
like the first X bytes of this memory region are written and the
remaining Y bytes may be written, but I suspect that's not a use case
we're likely to care about.


Yeah, I don't think anyone needs this.

-Y


Re: [RFC] Add asm constraint modifier to mark strict memory accesses

2014-09-18 Thread Dmitry Vyukov
On Thu, Sep 18, 2014 at 4:09 AM, Yury Gribov  wrote:
> Hi all,
>
> Current semantics of memory constraints in GCC inline asm (i.e. "m", "v",
> etc.) is somewhat loosy in that it tells GCC that asm code _may_ access
> given amount of bytes but is not guaranteed to do so. This is (ab)used by
> e.g. glibc (and also some pieces of kernel):
> __STRING_INLINE void *
> __rawmemchr (const void *__s, int __c)
> {
> ...
>   __asm__ __volatile__
> ("cld\n\t"
>  "repne; scasb\n\t"
> ...
>"m" ( *(struct { char __x[0xfff]; } *)__s)
>
> Imprecise size specification prevents code analysis tools from understanding
> semantics of inline asm (without parsing inline asm instructions which e.g.
> Asan in Clang tries to do). In particular we can't automatically instrument
> inline asm in kernel with Kasan because we can not determine exact access
> size (see e.g. discussion in
> https://gcc.gnu.org/ml/gcc-patches/2014-05/msg02530.html).
>
> Would it make sense to add another constraint modifier (like "=", "&", etc.)
> that would tell compiler/tool that memory access in asm is _guaranteed_ to
> have the specified size?

Hi,

What is the number of cases it will fix for kasan?

It won't fix the memchr function because the size is indeed not known
statically. So it's a bad example.

My impression was that kernel has relatively small amount of assembly,
out of which:

1. memchr/strcpy need special handling anyway.

2. putuser/getuser must not be instrumented.

3. atomic operations need special handling for ktsan, so kasan can
just reuse the same manual instrumentation.

And the rest is just not interesting enough. Am I missing something?


Re: [RFC] Add asm constraint modifier to mark strict memory accesses

2014-09-18 Thread Jeff Law
On 09/18/14 08:38, Yury Gribov wrote:

On 09/18/2014 05:36 PM, Jeff Law wrote:

On 09/18/14 05:19, Yury Gribov wrote:


Would that modifier mean that the inline asm is unconditionally reading
resp. writing that memory? "m"/"=m" right now is always about might
read or might write, not must.


Yes, that's what I had in mind. Many inline asms (at least in kernel) do
read memory region unconditionally.

That's precisely what I'd expect such a modifier to mean.  Right now
memory modifiers are strictly "may" but I can see a use case for "must".

I think the question is will the kernel or glibc folks use that new
capability and if so, do we get a significant improvement in the amount
of checking we can do.So I think both those groups need to be looped
into this conversation.


Right. Should I x-post or better send separate emails and then report
feedback on GCC list?
I think cross posting is fine.  Most of us don't necessarily watch the 
kernel or glibc lists -- and in this case I think those cross list 
discussions could be extremely valuable.


Jeff


Re: [RFC] Add asm constraint modifier to mark strict memory accesses

2014-09-18 Thread Jeff Law
On 09/18/14 08:32, Richard Biener wrote:

On September 18, 2014 3:36:24 PM CEST, Jeff Law 
wrote:

On 09/18/14 05:19, Yury Gribov wrote:


Would that modifier mean that the inline asm is
unconditionally

reading

resp. writing that memory? "m"/"=m" right now is always about
might read or might write, not must.


Yes, that's what I had in mind. Many inline asms (at least in
kernel)

do

read memory region unconditionally.

That's precisely what I'd expect such a modifier to mean.  Right
now memory modifiers are strictly "may" but I can see a use case
for "must".

I think the question is will the kernel or glibc folks use that
new capability and if so, do we get a significant improvement in
the amount

of checking we can do.So I think both those groups need to be
looped into this conversation.

From an implementation standpoint, are you thinking a different
modifier (my first choice)?  That wouldn't allow us to say
something like the first X bytes of this memory region are written
and the remaining Y bytes may be written, but I suspect that's not
a use case we're likely to care about.


It would also enable us to do more DSE as the asm stmt is then known
to kill a specific part of memory.  Maybe we even want to constrain
the effective type of the memory accesses so we can do TBAA against
inline asms?
Yea, but I suspect there aren't that many opportunities to do DSE that 
are enabled by seeing the the must-write in an ASM.


Then again, one might argue that even if they aren't common, if they do 
occur, they're important as (in theory) folks shouldn't be using ASMs if 
the code isn't hot.


jeff


LTO testsuite - single test execution

2014-09-18 Thread Martin Liška
Hello.

I would to introduce a new test case for an issue (PR63270). I was looking for 
*.exp files and I expected that another test located in: 
./gcc/testsuite/g++.dg/lto/pr63166_0.ii can be executed with: make check -k 
RUNTESTFLAGS="lto.exp=pr63166*"

But without succeed. Another interesting issue is running: 'make check-lto', 
where I was given:
make: *** No rule to make target `check-lto'.  Stop.

Can you please help my with a LTO test integration?

Thanks,
Martin


Re: Fwd: Building gcc-4.9 on OpenBSD

2014-09-18 Thread Tobias Ulmer
On Wed, Sep 17, 2014 at 01:26:48PM -0400, Ian Grant wrote:
> The reason I'm doing this is that I want to understand why the total
> size of the binaries grew from around 10MB (gcc v 4.5) to over 70MB in
> 4.9
> 
> I can compile the first stage OK, and the binaries are quite modest:
> 
> -rwxr-xr-x  1 ian  ian  17.2M Sep  6 03:47 prev-gcc/cc1
> -rwxr-xr-x  1 ian  ian   1.2M Sep  6 04:24 prev-gcc/cpp
> -rwxr-xr-x  1 ian  ian   1.2M Sep  6 04:24 prev-gcc/xgcc

Gcc 4.9 binaries on OpenBSD/amd64 are resonable:

-r-xr-xr-x  1 root  bin11.6M Sep  9 03:02 cc1
-r-xr-xr-x  1 root  bin15.4M Sep  9 03:02 gnat1
-r-xr-xr-x  1 root  bin   749K Sep  9 03:02 ecpp

There is indeed a problem with huge binaries on OpenBSD/arm, which I've
not yet figured out, but i386/amd64/sparc64 are fine.

Are you trying to build gcc from the vanilla sources? If so, you're in
for a treat...

> 
> The 2nd stage doesn't compile however, because the Intel library
> doesn't support OpenBSD. The host/target is i386-unknown-openbsd5.4:
> 
> ../.././libcilkrts/runtime/os-unix.c:69:5: error: #error "Unsupported OS"
>  #   error "Unsupported OS"
>  ^
> ../.././libcilkrts/runtime/os-unix.c: In function
> '__cilkrts_hardware_cpu_count':
> ../.././libcilkrts/runtime/os-unix.c:386:2: error: #error "Unknown 
> architecture"
>  #error "Unknown architecture"
>   ^
> Makefile:691: recipe for target 'os-unix.lo' failed
> 
> My questions are, is this what I should expect in terms of file sizes?:
> 
> ian3@jaguar:~/build/guile-2.0.11$ ls -l ~/usr/bin/gcc
> ~/usr/libexec/gcc/i686-pc-linux-gnu/4.9.0/cc1
> -rwxr-xr-x 3 ian3 ian3  2538426 2014-08-03 01:18 /home/ian3/usr/bin/gcc
> -rwxr-xr-x 1 ian3 ian3 66149541 2014-08-03 01:18
> /home/ian3/usr/libexec/gcc/i686-pc-linux-gnu/4.9.0/cc1
> ian3@jaguar:~/build/guile-2.0.11$
> 
> And is there any way to disable the Intel library? The fact that the
> first stage bootstrap works without it indicates that it might be
> possible.
> 
> Thanks
> Ian


Re: Fwd: Building gcc-4.9 on OpenBSD

2014-09-18 Thread Ian Grant
On Thu, Sep 18, 2014 at 5:37 PM, Ian Grant  wrote:
>
> On Thu, Sep 18, 2014 at 5:22 PM, Tobias Ulmer  wrote:
>>
>> On Wed, Sep 17, 2014 at 01:26:48PM -0400, Ian Grant wrote:
>> > The reason I'm doing this is that I want to understand why the total
>> > size of the binaries grew from around 10MB (gcc v 4.5) to over 70MB in
>> > 4.9
>>
>> There is indeed a problem with huge binaries on OpenBSD/arm, which I've
>> not yet figured out, but i386/amd64/sparc64 are fine.
>
>
> I don't have huge binaries (c.f. "Buster Gonad and his Infeasibly Large 
> Testicles") on Open BSD. I have them on i686-pc-linux-gnu.
>
> -rwxr-xr-x 1 ian3 ian3  64M 2014-08-03 01:18 cc1
> -rwxr-xr-x 1 ian3 ian3  65M 2014-08-03 01:18 cc1obj
> -rwxr-xr-x 1 ian3 ian3  68M 2014-08-03 01:18 cc1plus
> -rwxr-xr-x 1 ian3 ian3 1.8M 2014-08-03 01:18 collect2
> -rwxr-xr-x 1 ian3 ian3  65M 2014-08-03 01:18 f951
>
>> Are you trying to build gcc from the vanilla sources? If so, you're in
>> for a treat...
>
>
> I didn't know there was chocolate source! Where is it?! And why is it a 
> secret?
>
> Ian
>


gcc-4.8-20140918 is now available

2014-09-18 Thread gccadmin
Snapshot gcc-4.8-20140918 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.8-20140918/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.8 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_8-branch 
revision 215364

You'll find:

 gcc-4.8-20140918.tar.bz2 Complete GCC

  MD5=c7ec8bf43b10eb40b650e1c6f7fa733b
  SHA1=8d6fe878bcd315918aadceb45a12aa200a6f99e4

Diffs from 4.8-20140911 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.8
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: Fwd: Building gcc-4.9 on OpenBSD

2014-09-18 Thread Ian Grant
On Thu, Sep 18, 2014 at 5:22 PM, Tobias Ulmer  wrote:
> On Wed, Sep 17, 2014 at 01:26:48PM -0400, Ian Grant wrote:
>> I can compile the first stage OK, and the binaries are quite modest:
>>
>> -rwxr-xr-x  1 ian  ian  17.2M Sep  6 03:47 prev-gcc/cc1
>> -rwxr-xr-x  1 ian  ian   1.2M Sep  6 04:24 prev-gcc/cpp
>> -rwxr-xr-x  1 ian  ian   1.2M Sep  6 04:24 prev-gcc/xgcc
>
> Gcc 4.9 binaries on OpenBSD/amd64 are resonable:
>
> -r-xr-xr-x  1 root  bin11.6M Sep  9 03:02 cc1
> -r-xr-xr-x  1 root  bin15.4M Sep  9 03:02 gnat1
> -r-xr-xr-x  1 root  bin   749K Sep  9 03:02 ecpp

I think we need to be able to explain this. It's an increase of over
60%, I wouldn't expect that to be due to the relative ineffiiciency of
Intel instruction encoding over AMD.  And it is not due to the
inclusion of libsylkrts (it's much easier to say "Intel library", how
many other libraries are there in GCC that were written by Intel?)
because that is not in the stage1 bootstrap.

Ian


Re: Fwd: Building gcc-4.9 on OpenBSD

2014-09-18 Thread Jonathan Wakely
On 18 September 2014 23:46, Ian Grant wrote:
> On Thu, Sep 18, 2014 at 5:22 PM, Tobias Ulmer  wrote:
>> On Wed, Sep 17, 2014 at 01:26:48PM -0400, Ian Grant wrote:
>>> I can compile the first stage OK, and the binaries are quite modest:
>>>
>>> -rwxr-xr-x  1 ian  ian  17.2M Sep  6 03:47 prev-gcc/cc1
>>> -rwxr-xr-x  1 ian  ian   1.2M Sep  6 04:24 prev-gcc/cpp
>>> -rwxr-xr-x  1 ian  ian   1.2M Sep  6 04:24 prev-gcc/xgcc
>>
>> Gcc 4.9 binaries on OpenBSD/amd64 are resonable:
>>
>> -r-xr-xr-x  1 root  bin11.6M Sep  9 03:02 cc1
>> -r-xr-xr-x  1 root  bin15.4M Sep  9 03:02 gnat1
>> -r-xr-xr-x  1 root  bin   749K Sep  9 03:02 ecpp
>
> I think we need to be able to explain this. It's an increase of over
> 60%, I wouldn't expect that to be due to the relative ineffiiciency of
> Intel instruction encoding over AMD.  And it is not due to the
> inclusion of libsylkrts (it's much easier to say "Intel library", how
> many other libraries are there in GCC that were written by Intel?)

liboffload might get added soon.


> because that is not in the stage1 bootstrap.

Are you looking at stripped binaries or unstripped?

Have you compared the binaries using size(1) instead of ls(1)?


Re: Fwd: Building gcc-4.9 on OpenBSD

2014-09-18 Thread Ian Grant
On Thu, Sep 18, 2014 at 6:54 PM, Jonathan Wakely  wrote:
> On 18 September 2014 23:46, Ian Grant wrote:
>> On Thu, Sep 18, 2014 at 5:22 PM, Tobias Ulmer  wrote:
>>> On Wed, Sep 17, 2014 at 01:26:48PM -0400, Ian Grant wrote:
 I can compile the first stage OK, and the binaries are quite modest:

 -rwxr-xr-x  1 ian  ian  17.2M Sep  6 03:47 prev-gcc/cc1
 -rwxr-xr-x  1 ian  ian   1.2M Sep  6 04:24 prev-gcc/cpp
 -rwxr-xr-x  1 ian  ian   1.2M Sep  6 04:24 prev-gcc/xgcc
>>>
>>> Gcc 4.9 binaries on OpenBSD/amd64 are resonable:
>>>
>>> -r-xr-xr-x  1 root  bin11.6M Sep  9 03:02 cc1
>>> -r-xr-xr-x  1 root  bin15.4M Sep  9 03:02 gnat1
>>> -r-xr-xr-x  1 root  bin   749K Sep  9 03:02 ecpp
>>
>> I think we need to be able to explain this. It's an increase of over
>> 60%, I wouldn't expect that to be due to the relative ineffiiciency of
>> Intel instruction encoding over AMD.  And it is not due to the
>> inclusion of libsylkrts (it's much easier to say "Intel library", how
>> many other libraries are there in GCC that were written by Intel?)
>
> liboffload might get added soon.

I don't know what that is. I'll look it up later maybe.

>> because that is not in the stage1 bootstrap.
>
> Are you looking at stripped binaries or unstripped?

I don't know. How should I find out, read the Makefile? :-) Doesn't
the stage-1 get stripped? I'm not a GCC developer, I'm a 'user.'

> Have you compared the binaries using size(1) instead of ls(1)?

Yes, they're a lot smaller. Are you suggesting the filesystem size is
just holes in the file? I would want to know what data is in there.
Think of this as a security audit.

Ian


Re: Fwd: Building gcc-4.9 on OpenBSD

2014-09-18 Thread Ian Grant
On Thu, Sep 18, 2014 at 6:54 PM, Jonathan Wakely  wrote:
> On 18 September 2014 23:46, Ian Grant wrote:
>> On Thu, Sep 18, 2014 at 5:22 PM, Tobias Ulmer  wrote:

> Have you compared the binaries using size(1) instead of ls(1)?

Actually, when I look at the output of size I realise I don't know
what it means:

ian3@jaguar:~/usr/libexec/gcc$ size i686-pc-linux-gnu/4.9.0/{cc1,f951}
   text   databssdechexfilename
14965183  23708 74494415733835 f0144b
i686-pc-linux-gnu/4.9.0/cc1
15882830  29264 75083216662926 fe418e
i686-pc-linux-gnu/4.9.0/f951

The phrase "dangerous GNU crap" comes to mind :-)

Ian


Re: Fwd: Building gcc-4.9 on OpenBSD

2014-09-18 Thread Jonathan Wakely
On 19 September 2014 00:07, Ian Grant wrote:
>
> Actually, when I look at the output of size I realise I don't know
> what it means:
>
> ian3@jaguar:~/usr/libexec/gcc$ size i686-pc-linux-gnu/4.9.0/{cc1,f951}
>text   databssdechexfilename
> 14965183  23708 74494415733835 f0144b
> i686-pc-linux-gnu/4.9.0/cc1
> 15882830  29264 75083216662926 fe418e
> i686-pc-linux-gnu/4.9.0/f951
>
> The phrase "dangerous GNU crap" comes to mind :-)

If you say so. The size command is older than GNU, or BSD for that matter.

Your OS probably has a man page for it.


Re: Fwd: Building gcc-4.9 on OpenBSD

2014-09-18 Thread Ian Grant
On Thu, Sep 18, 2014 at 8:32 PM, Jonathan Wakely  wrote:
>> ian3@jaguar:~/usr/libexec/gcc$ size i686-pc-linux-gnu/4.9.0/{cc1,f951}
>>text   databssdechexfilename
>> 14965183  23708 74494415733835 f0144b
>> i686-pc-linux-gnu/4.9.0/cc1
>> 15882830  29264 75083216662926 fe418e
>> i686-pc-linux-gnu/4.9.0/f951
>>
>> The phrase "dangerous GNU crap" comes to mind :-)

> If you say so. The size command is older than GNU, or BSD for that matter.

It's OK. A GNU can't be blamed for crapping, it's natural :-).

> Your OS probably has a man page for it.

It does. "Copyright (c) 1991-2013 Free Software Foundation, Inc."
But the man page doesn't tell me what the column headings actually
mean. And even if it did, why should I have to look up the manual to
find out what the headings mean? What's the point of headings? Since
they aren't even aligned, it might just as well leave the out
altogether: they're just a waste of screen space.

But I don't want to argue about GNU crap, it's a natural and
understandable phenomenon, and not particularly interesting. I want to
know what's in the GCC binaries. So let's focus on that.

What was the reason you suggested I look at the output of the size
command? What does that tell me about what is the cause of the  holes
in the file, or the extra padding, or whatever it is you think is the
explanation for this phenomenon?

Ian


Re: Fwd: Building gcc-4.9 on OpenBSD

2014-09-18 Thread Ian Grant
In case it isn't obvious, what I am interested in is how easily we can
know the problem of infeasibly large binaries isn't an instance of
this one:


http://livelogic.blogspot.com/2014/08/beware-insiduous-penetrator-my-son.html

Ian


RE: Fwd: Building gcc-4.9 on OpenBSD

2014-09-18 Thread Joe Buck
(delurking)

Ian Grant writes:

> In case it isn't obvious, what I am interested in is how easily we can know 
> the problem of infeasibly large binaries isn't an instance of this one:

>
> http://livelogic.blogspot.com/2014/08/beware-insiduous-penetrator-my-son.html

Ah, this is commonly called the Thompson hack, since Ken Thompson actually 
produced a successful demo:

http://www.win.tue.nl/~aeb/linux/hh/thompson/trust.html

The only way that the Thompson hack can survive a three-stage bootstrap is if 
the compiler used for the stage 1 build has the bad code.  The comparison 
between stages 2 and 3 require exact match, and any imperfection in the object 
code injection would reveal itself.

So, you can build GCC with LLVM or Intel's compiler or Microsoft's or IBM's or 
Sun's, doing cross-compilation where necessary.  The basic idea is:

1: build gcc with 3-stage bootstrap, starting with a compiler that you suspect 
might be infected.  call the result A.
2: do it again, starting with a different compiler that you think is 
independent of the compiler you used in step 1.  call it B.
3: compare A to B.  If they differ, you've found something that should be 
investigated.  If you don't, then either A and B are both clean, or A and B 
both have the identical inserted object code. Maybe they have a common ancestor?

Note that if you build gcc with a cross-compiler the object code will be 
different.  You have to use the cross-compiler to build one more time to 
"normalize": GCC 4.9.0 built with GCC 4.9.0 on operating system X should always 
be the same.

As far as I know no one has been paranoid enough to put in the time to do the 
experiment on a large scale, and it's harder because you can't build a modern 
GCC (or LLVM for that matter) with an ancient compiler.  But you can create a 
chain: grab an ancient gcc version off a 15-year-old CD, and build newer 
versions with it until you get up  to the present.  The result should be 
byte-for-byte identical with what you get when building the current compiler 
with a recent version.  If it is, then either the infection is 15 years old or 
does not exist.  Try it again by building cross-compilers from a Microsoft 
system.  Don't trust Apple, they used to use GCC so maybe all their LLVM 
binaries caught the bug.


BTW, if "size" is reporting much smaller size than the executable file itself 
and that motivates this concern, most of the difference is likely to be debug 
info, which is bigger since gcc switched to C++.  Might want to try "strip".



Re: Fwd: Building gcc-4.9 on OpenBSD

2014-09-18 Thread Ian Grant
On Thu, Sep 18, 2014 at 9:37 PM, Joe Buck  wrote:
> (delurking)

> Ah, this is commonly called the Thompson hack, since Ken Thompson
> actually produced a successful demo:

How do you know Thompson's attempt was the first instance? The
document I refer to in the blog is the "Unknown Air Force Report"
Thompson refers to. It was written by Roger Schell (cc'ed)

> http://www.win.tue.nl/~aeb/linux/hh/thompson/trust.html

> The only way that the Thompson hack can survive a three-stage
> bootstrap is if the compiler used for the stage 1 build has the bad
> code.

This is the overwhelmingly likely (probability 1) case. How else would
the stage-2 and three compilers get the bad code?

> The comparison between stages 2 and 3 require exact match,
> and any imperfection in the object code injection would reveal itself.

How? In the output of a utility, or a system device driver, on a
system booted from a boot loader and using standard libraries such
libc, all compiled by the same bug in the compilers which compiled the
stage 1, 2 and 3 C compilers?

> So, you can build GCC with LLVM or Intel's compiler or Microsoft's or IBM's
> or Sun's, doing cross-compilation where necessary.

Do these compilers all support cross-OS compilation to any OS? It
sounds a bit hard to me. I just can't imagine MS, say, going to a
great deal of trouble to make sure that their compiler targets Linux
and OpenBSD. GCC needs quite a lot of library and OS support, doesn't
it?

People will have to help me a bit with this, I've not yet managed to
cross-compile anything. This thread started because I was just trying
to build gcc from Vanilla gcc-4.9 sources on OpenBSD, and it doesn't
work. See the earlier messages. I was next going to  try to build
gcc-4.9 on OpenBSD, cross-targetting Linux on the same physical
machine (i.e. same CPU) but I don't imagine this will be at all easy,
given I can't even build the vanilla sources. People say there is
chocolate source, but no-one has told me where it is yet!

> The basic idea is:
>
> 1: build gcc with 3-stage bootstrap, starting with a compiler that you
> suspect might be infected.  call the result A.
> 2: do it again, starting with a different compiler that you think is
> independent of the compiler you used in step 1.  call it B.
> 3: compare A to B.  If they differ, you've found something that should
> be investigated.  If you don't, then either A and B are both clean, or A
> and B both have the identical inserted object code. Maybe they have
> common ancestor?
>
> Note that if you build gcc with a cross-compiler the object code will be 
> different.
> You have to use the cross-compiler to build one more time to "normalize":
> GCC 4.9.0 built with GCC 4.9.0 on operating system X should always be
> the same.

Yes, but the problem is when the object code bug is not in the
compiler binaries, it's something injected into the compiler binaries
from the infected ld.so, or glibc, or the IDE disk device driver, and
it infects the source to those programs.

> As far as I know no one has been paranoid enough to put in the time to do
> the experiment on a large scale, and it's harder because you can't build
> a modern GCC (or LLVM for that matter) with an ancient compiler.  But
> you can create a chain: grab an ancient gcc version off a 15-year-old CD,

When did you last try grabbing an ancient gcc off a 15 year old CD and
getting to run on a modern OS? Was it easy?

> and build newer versions with it until you get up  to the present.

And the rest of the chain, are they easier still?

> The result should be byte-for-byte identical with what you get when
>  building the current compiler with a recent version.

And what does that tell me, really?

> If it is, then either the infection is 15 years old or does not exist.

How do you figure that?

>  Try it again by building cross-compilers from a Microsoft system.
> Don't trust Apple, they used to use GCC so maybe all their LLVM
> binaries caught the bug.

Interesting idea.

> BTW, if "size" is reporting much smaller size than the executable
> file itself and that motivates this concern, most of the difference
> is likely to be debug info, which is bigger since gcc switched to
> C++.  Might want to try "strip".

Great. As I said, the exercise we are here engaged in is to convince
as many people as possible that GCC does NOT suffer from this problem
on any OS, either OS, Windows, OpenBSD, FreeBSD, Solaris, or Linux on
any arch., including IBM System z.

So can someone tell me the quickest way to build a new set of
binaries, stripped, or just how to tell whether the stage-1 binaries
are in fact stripped or not?

And can anyone tell me what are the 'non-vanilla' sources?

Ian


RE: Fwd: Building gcc-4.9 on OpenBSD

2014-09-18 Thread Thomas Preud'homme
> From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of
> Ian Grant
> 
> And can anyone tell me what are the 'non-vanilla' sources?

"Vanilla source" refers to unmodified source (as distributed on gcc.gnu.org for 
the case of gcc). This is in contrast to modified source from distribution for 
instance that will usually add some patches.

Best regards,

Thomas






Re: [RFC] Add asm constraint modifier to mark strict memory accesses

2014-09-18 Thread Yury Gribov
On 09/18/2014 09:33 PM, Dmitry Vyukov wrote:

What is the number of cases it will fix for kasan?


Re-added kernel people again.

AFAIR silly instrumentation that assumed all memory accesses in inline 
asm are must-accesses (instead of may-accesses) resulted in only one 
false positive. We haven't performed an extensive testing though.



It won't fix the memchr function because the size is indeed not known
statically. So it's a bad example.


Sure, we will _not_ be able to instrument memchr. But being able to 
identify "safe" inline asms would allow us to instrument those (and my 
gut feeling is that they are a vast majority).



My impression was that kernel has relatively small amount of assembly,


Well,
$ grep -r '"[=+]\?[moVv<>]" *(' ~/src/linux-stable/ | wc -l
1133

And also
$ grep -r '"[=+]\?[moVv<>]" *(' ~/src/ffmpeg-2.2.2/ | wc -l
211

> And the rest is just not interesting enough.

Now that may be the case. But how do we know without trying?

-Y