Re: Source Code for Profile Guided Code Positioning
On 01/15/2016 06:53 PM, vivek pandya wrote: Hello GCC Developers, Are 'Profile Guided Code Positioning' algorithms mentioned in http://dl.acm.org/citation.cfm?id=93550 this paper ( Pettis and Hanse ) implemented in gcc ? If yes kindly help me with code file location in gcc source tree. There's some stuff on Google branch: https://gcc.gnu.org/ml/gcc-patches/2011-09/msg01440.html -Y
Re: Source Code for Profile Guided Code Positioning
On 01/15/2016 08:44 PM, vivek pandya wrote: Thanks Yury for https://gcc.gnu.org/ml/gcc-patches/2011-09/msg01440.html this link. It implements procedure reordering as linker plugin. I have some questions : 1 ) Can you point me to some documentation for "how to write plugin for linkers " I am I have not seen doc for structs with 'ld_' prefix (i.e defined in plugin-api.h ) 2 ) There is one more algorithm for Basic Block ordering with execution frequency count in PH paper . Is there any implementation available for it ? Quite frankly - I don't know (I've only learned about Google implementation recently). I've added Sriram to maybe comment. -Y
Re: Option handling (support) of -fsanitize=use-after-scope
On 05/11/2016 04:18 PM, Martin Liška wrote: Hello. I've been working on use-after-scope sanitizer enablement in the GCC compiler ([1]) and as I've read following submit request ([2]), the LLVM compiler started to utilize following option: -mllvm -asan-use-after-scope=1 My initial attempt was to introduce a new option value for -fsanitize option (which would make both LLVM and GCC option compatible). Following the current behavior of the LLVM, I would have to add a new --param which would lead to a divergence. Is the suggested approach alterable for LLVM community? I would also suggest following default behavior: - If -fsanitize=address or -fsanitize=kernel-address is enabled, the use-after-scope sanitization should be enabled - Similarly, providing -fuse-after-scope should enable address sanitization (either use-space or kernel-space) Thank you for feedback, Martin [1] https://gcc.gnu.org/ml/gcc-patches/2016-05/msg00468.html [2] http://reviews.llvm.org/D19347 Cc-ed Google folks.
Improving Asan code on ARM targets
Hi all, I've recently noticed that GCC generates suboptimal code for Asan on ARM targets. E.g. for a 4-byte memory access check (shadow_val != 0) & (last_byte >= shadow_val) we get the following sequence: movr2, r0, lsr #3 andr3, r0, #7 addr3, r3, #3 addr2, r2, #536870912 ldrbr2, [r2]@ zero_extendqisi2 sxtbr2, r2 cmpr3, r2 movltr3, #0 movger3, #1 cmpr2, #0 moveqr3, #0 cmpr3, #0 bne.L5 ldrr0, [r0] Obviously a shorter code is possible: movr3, r0, lsr #3 andr1, r0, #7 addr1, r1, #4 addr3, r3, #536870912 ldrbr3, [r3]@ zero_extendqisi2 sxtbr3, r3 cmpr3, #0 cmpner1, r3 bgt.L5 ldrr0, [r0] A 30% improvement looked quite important given that Asan usually increases code-size by 1.5-2x so I decided to investigate this. It turned out that ARM backend already has full support for dominated comparisons (cmp-cmpne-bgt sequence above) and can generate efficient code if we provide it with a slightly more explicit gimple sequence: (shadow_val != 0) & (last_byte + 1 > shadow_val) Ideally backend should be able perform this transform itself. But I'm not sure this is possible: it needs to know that last_range + 1 can not overflow and this info is not available in RTL (because we don't have VRP pass there). I have attached a simple patch which changes Asan pass to generate the ARM-friendly code. I've only bootstrapped/regtested on x64 but I can perform additional tests on ARM if the patch make sense. As far as I can tell it does not worsen sanitized code on other platforms (x86/x64) while significantly improving ARM (15% less code for bzip). The patch is certainly not ideal: * it makes target-specific changes in machine-independent code * it does not help with 1-byte accesses (forwprop pass thinks that it's always beneficial to convert x + 1 > y to x >= y so it reverts my change) * it only improves Asan code whereas it would be great if ARM backend could improve generic RTL code but it achieves significant improvement on ARM without hurting other platforms. So my questions are: * is this kind of target-specific tweaking acceptable in middle-end? * if not - what would be a better option? -Y 2014-04-29 Yury Gribov * asan.c (build_check_stmt): Change generated code to improve code generated for ARM. diff --git a/gcc/asan.c b/gcc/asan.c index d7c282e..f00705a 100644 --- a/gcc/asan.c +++ b/gcc/asan.c @@ -1543,18 +1543,17 @@ build_check_stmt (location_t location, tree base, gimple_stmt_iterator *iter, { /* Slow path for 1, 2 and 4 byte accesses. Test (shadow != 0) - & ((base_addr & 7) + (size_in_bytes - 1)) >= shadow). */ + & ((base_addr & 7) + size_in_bytes) > shadow). */ gimple_seq seq = NULL; gimple shadow_test = build_assign (NE_EXPR, shadow, 0); gimple_seq_add_stmt (&seq, shadow_test); gimple_seq_add_stmt (&seq, build_assign (BIT_AND_EXPR, base_addr, 7)); gimple_seq_add_stmt (&seq, build_type_cast (shadow_type, gimple_seq_last (seq))); - if (size_in_bytes > 1) -gimple_seq_add_stmt (&seq, - build_assign (PLUS_EXPR, gimple_seq_last (seq), - size_in_bytes - 1)); - gimple_seq_add_stmt (&seq, build_assign (GE_EXPR, gimple_seq_last (seq), + gimple_seq_add_stmt (&seq, + build_assign (PLUS_EXPR, gimple_seq_last (seq), + size_in_bytes)); + gimple_seq_add_stmt (&seq, build_assign (GT_EXPR, gimple_seq_last (seq), shadow)); gimple_seq_add_stmt (&seq, build_assign (BIT_AND_EXPR, shadow_test, gimple_seq_last (seq)));
Re: Improving Asan code on ARM targets
Andrew wrote: > Does the patch series at located at: > http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01407.html > http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01405.html > Fix this code generation issue? I suspect it does and improves more > than just the above code. No, they don't help as is. -Y
Re: Improving Asan code on ARM targets
Andrew wrote: I think it would good to figure out how to improve this code gen with the above patches rather than changing asan. I suspect it might easy to expand them to handle this case too. True, let me take a closer look and get back to you. When will this is expected to land in trunk btw? -Y
Re: Improving Asan code on ARM targets
Andrew Pinski wrote: > Yury Gribov wrote: >> Andrew Pinski wrote: >>> Yury Gribov wrote: >>>> I've recently noticed that GCC generates suboptimal code >>>> for Asan on ARM targets. E.g. for a 4-byte memory access check >>> >>> Does the patch series at located at: >>> http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01407.html >>> http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01405.html >>> Fix this code generation issue? I suspect it does and improves more >>> than just the above code. >> >> No, they don't help as is. > > I think it would good to figure out how to improve this code gen > with the above patches rather than changing asan. > I suspect it might easy to expand them to handle this case too. I was indeed able to reuse Zhenqiang's work. After updating select_ccmp_cmp_order hook to also return suggestions on how to change comparisons to allow better code generation (so it sounds more like select_ccmp_cmp_layout now) I was able to use this information in expand_ccmp_expr to generate optimal code. The patch is still a draft (only supports Asan's case) and I think I'll wait until Zhenqiang's conditional compare patches get into trunk before going deeper (not sure when this is going to happen though...). -Y
Re: Cross-testing libsanitizer
Christophe, > Indeed, when testing on my laptop, execution tests fail because > libsanitizer wants to allocated 8GB of memory (I am using qemu as > execution engine). Is this 8G of RAM? If yes - I'd be curious to know which part of libsanitizer needs so much memory. -Y
Re: Cross-testing libsanitizer
Is this 8G of RAM? If yes - I'd be curious to know which part of libsanitizer needs so much memory. Here is what I have in gcc.log: ==12356==ERROR: AddressSanitizer failed to allocate 0x21000 (8589938688) bytes at address ff000 (errno: 12)^M ==12356==ReserveShadowMemoryRange failed while trying to map 0x21000 bytes. Perhaps you're using ulimit -v^M Interesting. AFAIK Asan maps shadow memory with NORESERVE flag so it should not consume any RAM at all... -Y
Re: Prototype of a --report-bug option
On 07/30/2014 11:56 AM, Richard Biener wrote: On Tue, Jul 29, 2014 at 8:35 PM, David Malcolm wrote: >At Cauldron on the Sunday morning there was a Release Management BoF >session, replacing the specRTL talk (does anyone know what happened to >the latter?) > >One of the topics was bug triage, and how many bug reports lacked basic >metadata on e.g. host/build/target, reproducer etc. > Heh... I was hoping this would be a patch to the driver directly (thus not a python script). Note that I don't care too much about the reproducing/-save-temps and backtrace for the driver option. Of course in case of an ICE producing a proper bug-url with the backtrace info included would even be better. All, we've been trying to upstream a patch for something like this for last month. It doesn't bring you to Bugzilla but at least generates a repro with host/target information and call stack. Could someone take a look? We could certainly enhance it to generate user-friendly links like in David's script. https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01649.html -Y
Re: ASAN test failures make compare_tests useless
On 08/16/2014 04:37 AM, Manuel López-Ibáñez wrote: On the compile farm, ASAN tests seem to fail a lot like: FAIL: c-c++-common/asan/global-overflow-1.c -O0 output pattern test, is ==31166==ERROR: AddressSanitizer failed to allocate 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno: 12) ==31166==ReserveShadowMemoryRange failed while trying to map 0xdfff0001000 bytes. Perhaps you're using ulimit -v , should match READ of size 1 at 0x[0-9a-f]+ thread T0.*( The problem is that those addresses and sizes are very random, so when I compare the test results of a pristine trunk with a patched one, I get: New tests that FAIL: unix//-m64: c-c++-common/asan/global-overflow-1.c -O0 output pattern test, is ==12875==ERROR: AddressSanitizer failed to allocate 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno: 12) unix//-m64: c-c++-common/asan/global-overflow-1.c -O0 output pattern test, is ==18428==ERROR: AddressSanitizer failed to allocate 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno: 12) [... hundreds of ASAN tests that failed...] Old tests that failed, that have disappeared: (Eeek!) unix//-m64: c-c++-common/asan/global-overflow-1.c -O0 output pattern test, is ==30142==ERROR: AddressSanitizer failed to allocate 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno: 12) unix//-m64: c-c++-common/asan/global-overflow-1.c -O0 output pattern test, is ==31166==ERROR: AddressSanitizer failed to allocate 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno: 12) [... the same hundreds of tests that already failed before...] The above makes very difficult to identify failures caused by my patch. Can we remove the "==" part of the error? This way compare_tests will ignore the failures. Alternatively, I could patch compare_tests to sed out that part before comparing. Would that be acceptable? Cheers, Manuel. Added Sanitizer folks. Frankly it'd be cool if dumping PIDs and addresses could be turned off.
Re: ASAN test failures make compare_tests useless
On 08/18/2014 09:42 AM, Yury Gribov wrote: On 08/16/2014 04:37 AM, Manuel López-Ibáñez wrote: On the compile farm, ASAN tests seem to fail a lot like: FAIL: c-c++-common/asan/global-overflow-1.c -O0 output pattern test, is ==31166==ERROR: AddressSanitizer failed to allocate 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno: 12) ==31166==ReserveShadowMemoryRange failed while trying to map 0xdfff0001000 bytes. Perhaps you're using ulimit -v , should match READ of size 1 at 0x[0-9a-f]+ thread T0.*( The problem is that those addresses and sizes are very random, so when I compare the test results of a pristine trunk with a patched one, I get: New tests that FAIL: unix//-m64: c-c++-common/asan/global-overflow-1.c -O0 output pattern test, is ==12875==ERROR: AddressSanitizer failed to allocate 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno: 12) unix//-m64: c-c++-common/asan/global-overflow-1.c -O0 output pattern test, is ==18428==ERROR: AddressSanitizer failed to allocate 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno: 12) [... hundreds of ASAN tests that failed...] Old tests that failed, that have disappeared: (Eeek!) unix//-m64: c-c++-common/asan/global-overflow-1.c -O0 output pattern test, is ==30142==ERROR: AddressSanitizer failed to allocate 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno: 12) unix//-m64: c-c++-common/asan/global-overflow-1.c -O0 output pattern test, is ==31166==ERROR: AddressSanitizer failed to allocate 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno: 12) [... the same hundreds of tests that already failed before...] The above makes very difficult to identify failures caused by my patch. Can we remove the "==" part of the error? This way compare_tests will ignore the failures. Alternatively, I could patch compare_tests to sed out that part before comparing. Would that be acceptable? Cheers, Manuel. Added Sanitizer folks. Frankly it'd be cool if dumping PIDs and addresses could be turned off. Ok, this time actually added them.
Re: ASAN test failures make compare_tests useless
On 08/18/2014 06:36 PM, Alexander Potapenko wrote: Added Sanitizer folks. Frankly it'd be cool if dumping PIDs and addresses could be turned off. Could you please name a reason for that? Reproducibility? -Y
Re: non-reproducible g++.dg/ubsan/align-2.C -Os execution failure
On 09/04/2014 11:12 AM, Tom de Vries wrote: > I ran into this non-reproducible failure while testing a non-bootstrap > build on x86_64: > ... > PASS: g++.dg/ubsan/align-2.C -Os (test for excess errors) Added UBSan folks. Can this be related to http://llvm.org/bugs/show_bug.cgi?id=20721 ? It has been causing sporadic align-4 errors. -Y
Re: [PATCH] RE: gcc parallel make check
On 09/09/2014 10:51 AM, VandeVondele Joost wrote: > Attached is an extended version of the patch, > it brings a 100% improvement in make -j32 -k check-gcc First of all, many thanks for working on this. +# ls -1 | ../../../contrib/generate_tcl_patterns.sh 300 "dg.exp=gfortran.dg/" How does this work with subdirectories? Can we replace ls with find? -check_p_numbers=1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 +check_p_numbers=1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 \ + 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 $(shell seq 1 40) ? + if (_assert_exit) exit 1 Haven't you already exited above? > A second part of the patch is a new file 'contrib/generate_tcl_patterns.sh' > which generates the needed regexp Can we provide a Makefile target to automatically update Makefile.in? -Y
Re: [PATCH] RE: gcc parallel make check
On 09/09/2014 06:14 PM, VandeVondele Joost wrote: I certainly don't want to claim that the patch I have now is perfect, it is rather an incremental improvement on the current setup. I'd second this. Writing patterns manually seems rather inefficient and error-prone (not undoable of course but unnecessarily complicated). And with current (crippled) version Joost already got 100% test time improvement. -Y
Re: [PATCH] RE: gcc parallel make check
On 09/09/2014 06:33 PM, Jakub Jelinek wrote: On Tue, Sep 09, 2014 at 06:27:10PM +0400, Yury Gribov wrote: On 09/09/2014 06:14 PM, VandeVondele Joost wrote: I certainly don't want to claim that the patch I have now is perfect, it is rather an incremental improvement on the current setup. I'd second this. Writing patterns manually seems rather inefficient and error-prone (not undoable of course but unnecessarily complicated). And with current (crippled) version Joost already got 100% test time improvement. But if there are jobs that just take 1s to complete, then clearly it doesn't make sense to split them off as separate job. I think we don't need 100% even split, but at least roughly is highly desirable. You mean enhancing the script to split across arbitrarily long prefixes? That would be great. -Y
Backporting KAsan patches to 4.9 branch
Hi all, Kernel Asan patches are currently being discussed in LKML. One of the points raised during review was that KAsan requires GCC 5.0 which is presumably unstable (e.g. compilation of kernel modules has been broken for two months due to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61848). Would it make sense to backport Kasan-related patches to 4.9 branch to make this feature more accessible to kernel developers? Quick analysis showed that at the very least this would require * r211091 (BUILT_IN_ASAN_REPORT_LOAD_N and friends) * r211092 (instrument unaligned accesses) * r211713 and r211699 (New asan-instrumentation-with-call-threshold parameter) * r213367 (initial support for -fsanitize=kernel-address) and also maybe ~10 bugfix patches. Is it ok to backport these to 4.9? Note that I would discard patches for other sanitizers (UBsan, Tsan). -Y
Re: Backporting KAsan patches to 4.9 branch
On 09/18/2014 01:57 PM, Jakub Jelinek wrote: On Thu, Sep 18, 2014 at 01:46:21PM +0400, Yury Gribov wrote: Kernel Asan patches are currently being discussed in LKML. One of the points raised during review was that KAsan requires GCC 5.0 which is presumably unstable (e.g. compilation of kernel modules has been broken for two months due to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61848). Would it make sense to backport Kasan-related patches to 4.9 branch to make this feature more accessible to kernel developers? Quick analysis showed that at the very least this would require * r211091 (BUILT_IN_ASAN_REPORT_LOAD_N and friends) * r211092 (instrument unaligned accesses) * r211713 and r211699 (New asan-instrumentation-with-call-threshold parameter) * r213367 (initial support for -fsanitize=kernel-address) and also maybe ~10 bugfix patches. Is it ok to backport these to 4.9? Note that I would discard patches for other sanitizers (UBsan, Tsan). I'd say so, if it doesn't need any library changes (especially not any ABI visible ones, guess bugfixes could be acceptable). Cool! I'll go for it then. What asan related patches are still pending review (sorry for missing some)? Np, AFAIK there are just two: * add -fasan-shadow-offset (https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01170.html) * enable -fsanitize-recover for KAsan by default (https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01169.html) Do we have any known regressions in 5 from 4.9? Not that I know of. -Y
[RFC] Add asm constraint modifier to mark strict memory accesses
Hi all, Current semantics of memory constraints in GCC inline asm (i.e. "m", "v", etc.) is somewhat loosy in that it tells GCC that asm code _may_ access given amount of bytes but is not guaranteed to do so. This is (ab)used by e.g. glibc (and also some pieces of kernel): __STRING_INLINE void * __rawmemchr (const void *__s, int __c) { ... __asm__ __volatile__ ("cld\n\t" "repne; scasb\n\t" ... "m" ( *(struct { char __x[0xfff]; } *)__s) Imprecise size specification prevents code analysis tools from understanding semantics of inline asm (without parsing inline asm instructions which e.g. Asan in Clang tries to do). In particular we can't automatically instrument inline asm in kernel with Kasan because we can not determine exact access size (see e.g. discussion in https://gcc.gnu.org/ml/gcc-patches/2014-05/msg02530.html). Would it make sense to add another constraint modifier (like "=", "&", etc.) that would tell compiler/tool that memory access in asm is _guaranteed_ to have the specified size? -Y
Re: [RFC] Add asm constraint modifier to mark strict memory accesses
On 09/18/2014 03:09 PM, Yury Gribov wrote: Hi all, Current semantics of memory constraints in GCC inline asm (i.e. "m", "v", etc.) is somewhat loosy in that it tells GCC that asm code _may_ access given amount of bytes but is not guaranteed to do so. This is (ab)used by e.g. glibc (and also some pieces of kernel): __STRING_INLINE void * __rawmemchr (const void *__s, int __c) { ... __asm__ __volatile__ ("cld\n\t" "repne; scasb\n\t" ... "m" ( *(struct { char __x[0xfff]; } *)__s) Imprecise size specification prevents code analysis tools from understanding semantics of inline asm (without parsing inline asm instructions which e.g. Asan in Clang tries to do). In particular we can't automatically instrument inline asm in kernel with Kasan because we can not determine exact access size (see e.g. discussion in https://gcc.gnu.org/ml/gcc-patches/2014-05/msg02530.html). Would it make sense to add another constraint modifier (like "=", "&", etc.) that would tell compiler/tool that memory access in asm is _guaranteed_ to have the specified size? -Y Added kernel folks.
Re: [RFC] Add asm constraint modifier to mark strict memory accesses
On 09/18/2014 03:16 PM, Jakub Jelinek wrote: On Thu, Sep 18, 2014 at 03:09:34PM +0400, Yury Gribov wrote: Current semantics of memory constraints in GCC inline asm (i.e. "m", "v", etc.) is somewhat loosy in that it tells GCC that asm code _may_ access given amount of bytes but is not guaranteed to do so. This is (ab)used by e.g. glibc (and also some pieces of kernel): __STRING_INLINE void * __rawmemchr (const void *__s, int __c) { ... __asm__ __volatile__ ("cld\n\t" "repne; scasb\n\t" ... "m" ( *(struct { char __x[0xfff]; } *)__s) Imprecise size specification prevents code analysis tools from understanding semantics of inline asm (without parsing inline asm instructions which e.g. Asan in Clang tries to do). In particular we can't automatically instrument inline asm in kernel with Kasan because we can not determine exact access size (see e.g. discussion in https://gcc.gnu.org/ml/gcc-patches/2014-05/msg02530.html). Would it make sense to add another constraint modifier (like "=", "&", etc.) that would tell compiler/tool that memory access in asm is _guaranteed_ to have the specified size? CCing Richard/Jeff on this for thoughts. Would that modifier mean that the inline asm is unconditionally reading resp. writing that memory? "m"/"=m" right now is always about might read or might write, not must. Yes, that's what I had in mind. Many inline asms (at least in kernel) do read memory region unconditionally. In any case, as no GCC versions support that, you'd need to heavily macroize it in the kernel, not sure the kernel people would like that very much. They said they could think about it. -Y
Re: [RFC] Add asm constraint modifier to mark strict memory accesses
On 09/18/2014 05:36 PM, Jeff Law wrote: On 09/18/14 05:19, Yury Gribov wrote: Would that modifier mean that the inline asm is unconditionally reading resp. writing that memory? "m"/"=m" right now is always about might read or might write, not must. Yes, that's what I had in mind. Many inline asms (at least in kernel) do read memory region unconditionally. That's precisely what I'd expect such a modifier to mean. Right now memory modifiers are strictly "may" but I can see a use case for "must". I think the question is will the kernel or glibc folks use that new capability and if so, do we get a significant improvement in the amount of checking we can do.So I think both those groups need to be looped into this conversation. Right. Should I x-post or better send separate emails and then report feedback on GCC list? From an implementation standpoint, are you thinking a different modifier (my first choice)? So we have constraints ("m", "v", "<", etc.) and modifiers which can be attached to arbitrary constraints ("+", "=", "&", etc.). I though about adding a new modifier so that it could be added to arbitrary memory constraint as needed. That wouldn't allow us to say something like the first X bytes of this memory region are written and the remaining Y bytes may be written, but I suspect that's not a use case we're likely to care about. Yeah, I don't think anyone needs this. -Y
Re: [RFC] Add asm constraint modifier to mark strict memory accesses
On 09/18/2014 09:33 PM, Dmitry Vyukov wrote: What is the number of cases it will fix for kasan? Re-added kernel people again. AFAIR silly instrumentation that assumed all memory accesses in inline asm are must-accesses (instead of may-accesses) resulted in only one false positive. We haven't performed an extensive testing though. It won't fix the memchr function because the size is indeed not known statically. So it's a bad example. Sure, we will _not_ be able to instrument memchr. But being able to identify "safe" inline asms would allow us to instrument those (and my gut feeling is that they are a vast majority). My impression was that kernel has relatively small amount of assembly, Well, $ grep -r '"[=+]\?[moVv<>]" *(' ~/src/linux-stable/ | wc -l 1133 And also $ grep -r '"[=+]\?[moVv<>]" *(' ~/src/ffmpeg-2.2.2/ | wc -l 211 > And the rest is just not interesting enough. Now that may be the case. But how do we know without trying? -Y
Re: [RFD] Using the 'memory constraint' trick to avoid memory clobber doesn't work
On 09/24/2014 12:31 PM, Richard Biener wrote: On Wed, Sep 24, 2014 at 9:43 AM, David Wohlferd wrote: Hans-Peter Nilsson: I should have listened to you back when you raised concerns about this. My apologies for ever doubting you. In summary: - The "trick" in the docs for using an arbitrarily sized struct to force register flushes for inline asm does not work. - Placing the inline asm in a separate routine can sometimes mask the problem with the trick not working. - The sample that has been in the docs forever performs an unhelpful, unexpected, and probably unwanted stack allocation + memcpy. Details: Here is the text from the docs: --- One trick to avoid [using the "memory" clobber] is available if the size of the memory being accessed is known at compile time. For example, if accessing ten bytes of a string, use a memory input like: "m"( ({ struct { char x[10]; } *p = (void *)ptr ; *p; }) ) Well - this can't work because you essentially are using a _value_ here (looking at the GIMPLE - I'm not sure if a statement expression evaluates to an lvalue. It should work if you simply do this without a stmt expression: "m" (*(struct { char x[10]; } *)ptr) because that's clearly an lvalue (and the GIMPLE correctly says so): : c.a = 1; c.b = 2; __asm__ __volatile__("rep; stosb" : "=D" Dest_4, "=c" Count_5 : "0" &c, "a" 0, "m" MEM[(struct foo *)&c], "1" 8); printf ("%u %u\n", 1, 2); note that we still constant propagated 1 and 2 for the reason that the asm didn't get any VDEF. That's because you do not have any memory output! So while it keeps 'c' live it doesn't consider it modified by the asm. You'd still need to clobber the memory, but "m" clobbers are not supported, only "memory". Thus fixed asm: __asm__ __volatile__ ("rep; stosb" : "=D" (Dest), "+c" (Count) : "0" (&c), "a" (0), "m" (*( struct foo { char x[8]; } *)&c) : "memory" ); where I'm not 100% sure if the "m" input is now pointless (that is, if a "memory" clobber also constitutes a use of all memory). Or maybe even __asm__ __volatile__ ("rep; stosb" : "=D" (Dest), "+c" (Count), "+m" (*(struct foo { char x[8]; } *)&c) : "0" (&c), "a" (0) ); to avoid the big-hammer memory clobber? -Y
Re: Testing Leak Sanitizer?
On 09/30/2014 07:15 PM, Christophe Lyon wrote: Hello, After I've recently enabled Address Sanitizer for AArch64 in GCC, I'd like to enable Leak Sanitizer. I'd like to know what are the requirements wrt testing it? IIUC there are no lsan tests in the GCC testsuite so far. Should I just test a few sample programs to check if basic functionality is OK? The patch seems to be a 1-line patch, I just want to check the acceptance criteria. AFAIK compiler-rt testsuite supports running under non-Clang compiler. Don't ask me how to setup the beast though.
Re: msan and gcc ?
On 10/01/2014 10:39 PM, Kostya Serebryany wrote: On Wed, Oct 1, 2014 at 11:38 AM, Toon Moene wrote: On 10/01/2014 08:00 PM, Kostya Serebryany wrote: -gcc folks. Why not use clang then? It offers many more nice features. What's the Fortran front-end called for clang (or do you really think we are going to write Weather Forecasting codes in C :-) ) Oh, crap. :) Well, there's always f2c ;)
Re: msan and gcc ?
On 10/02/2014 11:35 AM, Jakub Jelinek wrote: On Thu, Oct 02, 2014 at 11:30:50AM +0400, Yury Gribov wrote: On 10/01/2014 10:39 PM, Kostya Serebryany wrote: On Wed, Oct 1, 2014 at 11:38 AM, Toon Moene wrote: On 10/01/2014 08:00 PM, Kostya Serebryany wrote: -gcc folks. Why not use clang then? It offers many more nice features. What's the Fortran front-end called for clang (or do you really think we are going to write Weather Forecasting codes in C :-) ) Oh, crap. :) Well, there's always f2c ;) You mean for performance critical code? Fortran has different aliasing rules than C, so it is hard to express those in C... No-no, I only meant debugging.
Re: bug report - libsanitizer compilation fail
On 10/06/2014 03:09 PM, Daniel Doron wrote: Hi, I am sending this bug report here because I can't register an account in bugzilla... gcc version: gcc-linaro-4.9-2014.09 (I checked also the main repo git, the code is the same) kernel: 2.6.37 "home/daniel/Downloads/.build/src/gcc-custom/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cc:675:43: error: 'EVIOCGPROP' was not declared in this scope" This happens when compiling with kernel 2.6.37 headers. #if EV_VERSION > (0x01) unsigned IOCTL_EVIOCGKEYCODE_V2 = EVIOCGKEYCODE_V2; unsigned IOCTL_EVIOCGPROP = EVIOCGPROP(0); unsigned IOCTL_EVIOCSKEYCODE_V2 = EVIOCSKEYCODE_V2; #else unsigned IOCTL_EVIOCGKEYCODE_V2 = IOCTL_NOT_PRESENT; unsigned IOCTL_EVIOCGPROP = IOCTL_NOT_PRESENT; unsigned IOCTL_EVIOCSKEYCODE_V2 = IOCTL_NOT_PRESENT; #endif although in kernel 2.6.37 the EV_VERSION is indeed > (0x01) the EVIOCGPROP define is missing and only appears in 2.6.38 onwards. You'll probably want to report this to upstream project (which is compiler-rt). -Y
Re: Backporting KAsan patches to 4.9 branch
On 09/18/2014 01:57 PM, Jakub Jelinek wrote: > On Thu, Sep 18, 2014 at 01:46:21PM +0400, Yury Gribov wrote: >> Kernel Asan patches are currently being discussed in LKML. One of the points>> raised during review was that KAsan requires GCC 5.0 which is presumably >> unstable (e.g. compilation of kernel modules has been broken for two months >> due to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61848). >> >> Would it make sense to backport Kasan-related patches to 4.9 branch to make >> this feature more accessible to kernel developers? Quick analysis showed >> that at the very least this would require >> ... >> Is it ok to backport these to 4.9? Note that I would discard patches for >> other sanitizers (UBsan, Tsan). > > I'd say so, if it doesn't need any library changes > (especially not any ABI > visible ones, guess bugfixes could be acceptable). Finally got time to look into this. I've successfully backported 22 patches to 4.9: * bugfixes (12 patches) * install Asan headers (1 patch) * libsanitizer merge (1 patch) - this is questionable, see below for discussion * BUILT_IN_ASAN_REPORT_{LOAD,STORE}_N (2 patches) * instrumentation with calls (1 patch) * optimize strlen instrumentation (1 patch) * move inlining to sanopt pass (2 patches) * Kasan (2 patches) One problem is that for BUILT_IN_ASAN_REPORT_{LOAD,STORE}_N patch I need libsanitizer APIs (__asan_loadN, __asan_storeN) which were introduced in a giant libsanitizer merge in 5.0. In current patchset I backport the whole merge patch (and a bunch of cherry-picks which followed it) but it changes libsanitizer ABI (new version of __asan_init_vXXX, etc.) which is probably undesirable. Another option would be to backport just the necessary minimum (__asan_loadN, __asan_storeN). How should I proceed? Another question: Should I update patch CL dates for backported patches? If not - should I insert them to CLs in chronological order or just stack on top of previous contents? -Y
Re: Backporting KAsan patches to 4.9 branch
On 10/14/2014 03:19 PM, Dmitry Vyukov wrote: On Tue, Oct 14, 2014 at 3:07 PM, Yury Gribov wrote: On 09/18/2014 01:57 PM, Jakub Jelinek wrote: On Thu, Sep 18, 2014 at 01:46:21PM +0400, Yury Gribov wrote: Kernel Asan patches are currently being discussed in LKML. One of the points>> raised during review was that KAsan requires GCC 5.0 which is presumably unstable (e.g. compilation of kernel modules has been broken for two months due to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61848). Would it make sense to backport Kasan-related patches to 4.9 branch to make this feature more accessible to kernel developers? Quick analysis showed that at the very least this would require ... Is it ok to backport these to 4.9? Note that I would discard patches for other sanitizers (UBsan, Tsan). I'd say so, if it doesn't need any library changes (especially not any ABI visible ones, guess bugfixes could be acceptable). Finally got time to look into this. I've successfully backported 22 patches to 4.9: * bugfixes (12 patches) * install Asan headers (1 patch) * libsanitizer merge (1 patch) - this is questionable, see below for discussion * BUILT_IN_ASAN_REPORT_{LOAD,STORE}_N (2 patches) * instrumentation with calls (1 patch) * optimize strlen instrumentation (1 patch) * move inlining to sanopt pass (2 patches) * Kasan (2 patches) One problem is that for BUILT_IN_ASAN_REPORT_{LOAD,STORE}_N patch I need libsanitizer APIs (__asan_loadN, __asan_storeN) which were introduced in a giant libsanitizer merge in 5.0. In current patchset I backport the whole merge patch (and a bunch of cherry-picks which followed it) but it changes libsanitizer ABI (new version of __asan_init_vXXX, etc.) which is probably undesirable. Another option would be to backport just the necessary minimum (__asan_loadN, __asan_storeN). How should I proceed? Backporting only __asan_loadN/__asan_storeN looks like the safest option to me. This would break forward compatibility of 4.9's libsanitizer which seems to be unacceptable. -Y
Re: [RFC] Adjusted VRP
On 10/30/2014 01:27 PM, Richard Biener wrote: Well, VRP is not path-insensitive - it is the value-ranges we are able to retain after removing the ASSERT_EXPRs VRP inserts. Why can't you do the ASAN optimizations in the VRP transform phase? I think this is not Asan-specific: Marat's point was that allowing basic-block-precise ranges would generally allow middle-end to produce better code. -Y
Re: [RFC] Adjusted VRP
On 10/30/2014 04:19 PM, Marat Zakirov wrote: On 10/30/2014 02:32 PM, Jakub Jelinek wrote: On Thu, Oct 30, 2014 at 02:16:04PM +0300, Yury Gribov wrote: On 10/30/2014 01:27 PM, Richard Biener wrote: Well, VRP is not path-insensitive - it is the value-ranges we are able to retain after removing the ASSERT_EXPRs VRP inserts. Why can't you do the ASAN optimizations in the VRP transform phase? I think this is not Asan-specific: Marat's point was that allowing basic-block-precise ranges would generally allow middle-end to produce better code. The reason for get_range_info in the current form is that it is cheap, and unless we want to make some SSA_NAMEs non-propagatable [*], IMHO it should stay that way. Now that we have ASAN_ internal calls, if you want to optimize away ASAN_CHECK calls where VRP suggests that e.g. array index will be within the right bounds and you'd optimize away ASAN_CHECK to a VAR_DECL access if the index was constant (say minimum or maximum of the range), you can do so in VRP and it is the right thing to do it there. [*] - that is something I've been talking about for __builtin_unreachable () etc., whether it would be worth it if range_info of certain SSA_NAME that would VRP want to remove again was significantly better than range info of the base SSA_NAME, to keep that SSA_NAME around and e.g. block forwprop etc. from propagating the SSA_NAME copy, unless something other than SSA_NAME has been propagated into it. Richard was against that though. We didn't find reasonable performance gains to use VRP in asan. But even if we found we couldn't use it because it is not safe for asan. It make some optimistic conclusions invalid for asan. Adjusted VRP memory upper bound is #{trees that are compared} x nblocks which could be reduced by some threshold. Do you have some concrete numbers at hand? -Y
Re: [RFC] UBSan unsafely uses VRP
On 11/11/2014 05:15 PM, Jakub Jelinek wrote: There are also some unsafe code in functions ubsan_expand_si_overflow_addsub_check, ubsan_expand_si_overflow_mul_check which uses get_range_info to reduce checks number. As seen before vrp usage for sanitizers may decrease quality of error detection. Using VRP is completely intentional there, we don't want to generate too slow code if you decide you want to optimize your code (for -O0 VRP isn't performed of course). On the other hand detection quality is probably more important than important regardless of optimization level. When I use a checker, I don't want it to miss bugs due to overly aggressive optimization. I wish we had some test to check that sanitizer optimizations are indeed conservative. -Y
Re: ubsan, asan testing is broken due to coloring
[CC-ing sanitizer team.] On 11/12/2014 08:02 AM, Andrew Pinski wrote: With some configurations (looks like out of tree testing more than in tree testing), all of ubsan and asan tests fail due to the libsanitizer using coloring and that confuses the dejagnu pattern matching. Right, we fix new errors like this every now and then but they keep popping up. I don't have time to look fully into how to fix this issue and I don't care much to coloring anyways so I disabled in the source for my own use so the tests now pass. First, we could run with ASAN_OPTIONS=color=0. I think we once tracked this error to QEMU incorrectly returning 1 to ASan's isatty() but never bothered to fix because fixing tests is so easier. -Y
Re: [RFC] UBSan unsafely uses VRP
On 11/12/2014 11:45 AM, Marek Polacek wrote: On Wed, Nov 12, 2014 at 11:42:39AM +0300, Yury Gribov wrote: On 11/11/2014 05:15 PM, Jakub Jelinek wrote: There are also some unsafe code in functions ubsan_expand_si_overflow_addsub_check, ubsan_expand_si_overflow_mul_check which uses get_range_info to reduce checks number. As seen before vrp usage for sanitizers may decrease quality of error detection. Using VRP is completely intentional there, we don't want to generate too slow code if you decide you want to optimize your code (for -O0 VRP isn't performed of course). On the other hand detection quality is probably more important than important regardless of optimization level. When I use a checker, I don't want it to miss bugs due to overly aggressive optimization. Yes, but as said above, VRP is only run with >-O2 and -Os. Hm, I must be missing something. 99% of users will only run their code under -O2 because it'll be too slow otherwise. Why should we penalize them for this by lowering analysis quality? Isn't error detection the main goal of sanitizers (performance being the secondary at best)? I wish we had some test to check that sanitizer optimizations are indeed conservative. I think most of the tests we have are tested with various optimization levels. Existing tests are really a joke when we consider interblock optimization. Most don't even contain any non-trivial control flow. -Y
Re: [RFC] UBSan unsafely uses VRP
On 11/12/2014 04:26 PM, Jakub Jelinek wrote: But, if -O0 isn't too slow for them, having unnecessary bloat even at -O2 is bad the same. But not using VRP at all, you are giving up all the cases where you know something won't overflow because you e.g. sign extend or zero extend from some smaller type, sum op such values, and something with constant, or you can use a cheaper code to multiply etc. Sure, I was not suggesting anything like that - just pointing out that we must be careful about potential loss of precision and do all we can to avoid it. Faster code should not justify lower quality (as used to be in 1960-s). Turning off -faggressive-loop-optimizations is certainly the right thing for -fsanitize=undefined (any undefined I'd say), so are perhaps selected other optimizations. Totally agree. -Y
Re: [RFC] UBSan unsafely uses VRP
On 11/12/2014 04:26 PM, Jakub Jelinek wrote: On Wed, Nov 12, 2014 at 12:58:37PM +0300, Yury Gribov wrote: On 11/12/2014 11:45 AM, Marek Polacek wrote: On Wed, Nov 12, 2014 at 11:42:39AM +0300, Yury Gribov wrote: On 11/11/2014 05:15 PM, Jakub Jelinek wrote: There are also some unsafe code in functions ubsan_expand_si_overflow_addsub_check, ubsan_expand_si_overflow_mul_check which uses get_range_info to reduce checks number. As seen before vrp usage for sanitizers may decrease quality of error detection. Using VRP is completely intentional there, we don't want to generate too slow code if you decide you want to optimize your code (for -O0 VRP isn't performed of course). On the other hand detection quality is probably more important than important regardless of optimization level. When I use a checker, I don't want it to miss bugs due to overly aggressive optimization. Yes, but as said above, VRP is only run with >-O2 and -Os. Hm, I must be missing something. 99% of users will only run their code under -O2 because it'll be too slow otherwise. Why should we penalize them for this by lowering analysis quality? Isn't error detection the main goal of sanitizers (performance being the secondary at best)? But, if -O0 isn't too slow for them, having unnecessary bloat even at -O2 is bad the same. But not using VRP at all, you are giving up all the cases where you know something won't overflow because you e.g. sign extend or zero extend from some smaller type, sum op such values, and something with constant, or you can use a cheaper code to multiply etc. Turning off -faggressive-loop-optimizations is certainly the right thing for -fsanitize=undefined (any undefined I'd say), so are perhaps selected other optimizations. Jakub
Re: limiting call clobbered registers for library functions
On 01/29/2015 08:32 PM, Richard Henderson wrote: On 01/29/2015 02:08 AM, Paul Shortis wrote: I've ported GCC to a small 16 bit CPU that has single bit shifts. So I've handled variable / multi-bit shifts using a mix of inline shifts and calls to assembler support functions. The calls to the asm library functions clobber only one (by const) or two (variable) registers but of course calling these functions causes all of the standard call clobbered registers to be considered clobbered, thus wasting lots of candidate registers for use in expressions surrounding these shifts and causing unnecessary register saves in the surrounding function prologue/epilogue. I've scrutinized and cloned the actions of other ports that do the same, however I'm unable to convince the various passes that only r1 and r2 can be clobbered by these library calls. Is anyone able to point me in the proper direction for a solution to this problem ? You wind up writing a pattern that contains a call, but isn't represented in rtl as a call. Could it be useful to provide a pragma for specifying function register usage? This would allow e.g. library writer to write a hand-optimized assembly version and then inform compiler of it's binary interface. Currently a surrogate of this can be achieved by putting inline asm code in static inline functions in public library headers but this has it's own disadvantages (e.g. code bloat). -Y
Re: limiting call clobbered registers for library functions
On 01/30/2015 11:16 AM, Matthew Fortune wrote: Yury Gribov writes: On 01/29/2015 08:32 PM, Richard Henderson wrote: On 01/29/2015 02:08 AM, Paul Shortis wrote: I've ported GCC to a small 16 bit CPU that has single bit shifts. So I've handled variable / multi-bit shifts using a mix of inline shifts and calls to assembler support functions. The calls to the asm library functions clobber only one (by const) or two (variable) registers but of course calling these functions causes all of the standard call clobbered registers to be considered clobbered, thus wasting lots of candidate registers for use in expressions surrounding these shifts and causing unnecessary register saves in the surrounding function prologue/epilogue. I've scrutinized and cloned the actions of other ports that do the same, however I'm unable to convince the various passes that only r1 and r2 can be clobbered by these library calls. Is anyone able to point me in the proper direction for a solution to this problem ? You wind up writing a pattern that contains a call, but isn't represented in rtl as a call. Could it be useful to provide a pragma for specifying function register usage? This would allow e.g. library writer to write a hand-optimized assembly version and then inform compiler of it's binary interface. Currently a surrogate of this can be achieved by putting inline asm code in static inline functions in public library headers but this has it's own disadvantages (e.g. code bloat). This sounds like a good idea in principle. I seem to recall seeing something similar to this in other compiler frameworks that allow a number of special calling conventions to be defined and enable functions to be attributed to use one of them. I.e. not quite so general as specifying an arbitrary clobber list but some sensible pre-defined alternative conventions. FYI a colleague from kernel mentioned that they already achieve this by wrapping the actual call with inline asm e.g. static inline int foo(int x) { asm( ".global foo_core\n" // foo_core accepts single parameter in %rax, // returns result in %rax and // clobbers %rbx "call foo_core\n" : "+a"(x) : : "rbx" ); return x; } We still can't mark inline asm with things like __attribute__((pure)), etc. though so it's not an ideal solution. -Y
Re: gcc wiki project
On 03/24/2015 03:20 PM, Jonathan Wakely wrote: On Mon, Mar 23, 2015 at 06:14:30PM -0500, David Kunsman wrote: Hello, I was just reading through the current projects wiki page and I noticed how out of date pretty much all of them are. So I was planning on doing "spring cleaning" by going down the list tracking down what has been and what needs to be down and updating all the wikis. Do you think this is something that is worthwhile to work on? Yes, I think that would be very useful. On 24 March 2015 at 12:16, Martin Jambor wrote: Yes, I think that even just moving hopelessly outdated stuff to some "Archive" section, I don't see any need to move pages (that would break old links). So why not fix links as well? -Y
Re: gcc addresssanitizer in MIPS
> Does someone use addresssanitizer in other platform (i386/x64/arm/ppc) > suffer this problem? Hi Jean, Yes, we do see this error on ARM. Full description and suggested patch are available at http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58543 I'm curious whether suggested patch is going to work for Andrew. -Y
Re: gcc addresssanitizer in MIPS
> Yes, we do see this error on ARM. Here is another instance of the same bug: http://permalink.gmane.org/gmane.comp.debugging.address-sanitizer/531 > Full description and suggested patch are available at http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58543 > I'm curious whether suggested patch is going to work for Andrew. -Y
Re: gcc addresssanitizer in MIPS
> Hi Yury, try to use the patch for asan.c to see if it solve your problem. I tried but unfortunately it did not work for me. Could you try the patch suggested in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58543 (I've attached it) when you have time? This was verified against gcc testsuite on x64 and ARM. > My test was to use attached time.cpp for asan test. BTW I can't reproduce your error on ARM (using gcc trunk): $ /home/ygribov/install/gcc-master-arm-full/bin/arm-none-linux-gnueabi-gcc -fsanitize=address -O2 time.cpp $ qemu-arm -R 0 -L /home/ygribov/install/gcc-master-arm-full/arm-none-linux-gnueabi/sys-root -E LD_LIBRARY_PATH=/lib:/usr/lib:/home/ygribov/install/gcc-master-arm-full/arm-none-linux-gnueabi/lib a.out -Y diff --git a/gcc/asan.c b/gcc/asan.c index 32f1837..acb00ea 100644 --- a/gcc/asan.c +++ b/gcc/asan.c @@ -895,7 +895,7 @@ asan_clear_shadow (rtx shadow_mem, HOST_WIDE_INT len) gcc_assert ((len & 3) == 0); top_label = gen_label_rtx (); - addr = force_reg (Pmode, XEXP (shadow_mem, 0)); + addr = copy_to_reg (force_reg (Pmode, XEXP (shadow_mem, 0))); shadow_mem = adjust_automodify_address (shadow_mem, SImode, addr, 0); end = force_reg (Pmode, plus_constant (Pmode, addr, len)); emit_label (top_label);
Re: gcc addresssanitizer in MIPS
> "copy_to_mode_reg (Pmode, XEXP (shadow_mem, 0))" would be more direct. > But it looks good to me with that change FWIW. Thanks, Richard. Note that Jakub has proposed an optimized patch on gcc-patches ML (in Re: [PATCH] Invalid unpoisoning of stack redzones on ARM). -Y
Re: Report on the bounded pointers work
> If you're referring to mudflap (Frank Eigler's work), > ... > It never reached a point where interoperability across objects with and without mudflap instrumentation worked Jeff, Could you add more details? E.g. I don't see how mudflap interoperability is different from AdressSanitizer which seems to be state of the art. -Y
Asm volatile causing performance regressions on ARM
Hi all, We have recently ran into a performance/code size regression on ARM targets after transition from GCC 4.7 to GCC 4.8 (this regression is also present in 4.9). The following code snippet uses Linux-style compiler barriers to protect memory writes: #define barrier() __asm__ __volatile__ ("": : :"memory") #define write(v,a) { barrier(); *(volatile unsigned *)(a) = (v); } #define v1 0x0010 #define v2 0xaabbccdd void test(unsigned base) { write(v1, base + 0x100); write(v2, base + 0x200); write(v1, base + 0x300); write(v2, base + 0x400); } Code generated by GCC 4.7 under -Os (all good): mov r2, #7340032 str r2, [r0, #3604] ldr r3, .L2 str r3, [r0, #3612] str r2, [r0, #3632] str r3, [r0, #3640] (note that compiler decided to load v2 from constant pool). Now code generated by GCC 4.8/4.9 under -Os is much larger because v1 and v2 are reloaded before every store: mov r3, #7340032 str r3, [r0, #3604] ldr r3, .L2 str r3, [r0, #3612] mov r3, #7340032 str r3, [r0, #3632] ldr r3, .L2 str r3, [r0, #3640] v1 and v2 are constant literals and can't really be changed by user so I would expect compiler to combine loads. After some investigation, we discovered that this behavior is caused by big hammer in gcc/cse.c: /* A volatile ASM or an UNSPEC_VOLATILE invalidates everything. */ if (NONJUMP_INSN_P (insn) && volatile_insn_p (PATTERN (insn))) flush_hash_table (); This code (introduced in http://gcc.gnu.org/viewcvs/gcc?view=revision&revision=193802) aborts CSE after seeing a volatile inline asm. Is this compiler behavior reasonable? AFAIK GCC documentation only says that __volatile__ prevents compiler from removing the asm but it does not mention that it supresses optimization of all surrounding expressions. If this behavior is not intended, what would be the best way to fix performance? I could teach GCC to not remove constant RTXs in flush_hash_table() but this is probably very naive and won't cover some corner-cases. -Y
Re: Asm volatile causing performance regressions on ARM
Richard Biener wrote: If this behavior is not intended, what would be the best way to fix performance? I could teach GCC to not remove constant RTXs in flush_hash_table() but this is probably very naive and won't cover some corner-cases. That could be a good starting point though. Though with modifying "machine state" you can modify constants as well, no? Valid point but this would mean relying on compiler to always load all constants from memory (instead of, say, generating them via movhi/movlo) for a piece of code which looks extremely unstable. What is the general attitude towards volatile asm? Are people interested in making it more defined/performant or should we just leave this can of worms as is? I can try to improve generated code but my patches will be doomed if there is no consensus on what volatile asm actually means... -Y
Re: linux says it is a bug
Richard wrote: > volatile __asm__("":::"memory") > > is a memory barrier and a barrier for other volatile instructions. AFAIK asm without output arguments is implicitly marked as volatile. So it may not be needed in barrier() at all. -Y
Re: linux says it is a bug
>> Asms without outputs are automatically volatile. So there ought be zero change >> with and without the explicit use of the __volatile__ keyword. > > That’s what the documentation says but it wasn’t actually true > as of a couple of releases ago, as I recall. Looks like 2005: $ git annotate gcc/c/c-typeck.c ... 89552023( bonzini 2005-10-05 12:17:16 + 9073) /* asm statements without outputs, including simple ones, are treated 89552023( bonzini 2005-10-05 12:17:16 + 9074) as volatile. */ 89552023( bonzini 2005-10-05 12:17:16 + 9075) ASM_INPUT_P (args) = simple; 89552023( bonzini 2005-10-05 12:17:16 + 9076) ASM_VOLATILE_P (args) = (noutputs == 0); -Y
Re: linux says it is a bug
What is volatile instructions? Can you give us an example? Check volatile_insn_p. AFAIK there are two classes of volatile instructions: * volatile asm * unspec volatiles (target-specific instructions for e.g. protecting function prologues) -Y
Re: gcc-4.9: How to generate Makefile.in from a modified Makefile.am?
You must use autoconf 2.65, exactly. Perhaps we could update http://gcc.gnu.org/wiki/Regenerating_GCC_Configuration ? -Y
Re: gcc-4.9: How to generate Makefile.in from a modified Makefile.am?
>> You must use autoconf 2.65, exactly. > configure.ac:27: error: Please use exactly Autoconf 2.64 instead of 2.69. Hm... -Y