Re: TYPE_BINFO and canonical types at LTO
On 02/13/2014 04:48 PM, Jan Hubicka wrote: all bases are also fields of within the type, so the second loop should notice all the types seen by first loop if I am correct? Yes, except that empty bases don't get fields because they have no data. But since they have no data they aren't interesting to aliasing either, so you should be OK just looking at field types. Jason
Re: TYPE_BINFO and canonical types at LTO
On Fri, 14 Feb 2014, Jan Hubicka wrote: > Hi, > I have noticed that record_component_aliases is called during LTO time and it > examines contents of BINFO: > 0x5cd7a5 record_component_aliases(tree_node*) > ../../gcc/alias.c:1005 > 0x5cd4a9 get_alias_set(tree_node*) > ../../gcc/alias.c:895 > 0x5cc67a component_uses_parent_alias_set_from(tree_node const*) > ../../gcc/alias.c:548 > 0x5ccc42 reference_alias_ptr_type_1 > ../../gcc/alias.c:660 > 0x5ccf93 get_alias_set(tree_node*) > ../../gcc/alias.c:740 > 0xb823d8 indirect_refs_may_alias_p > ../../gcc/tree-ssa-alias.c:1125 > 0xb82d8d refs_may_alias_p_1(ao_ref*, ao_ref*, bool) > ../../gcc/tree-ssa-alias.c:1279 > 0xb848df stmt_may_clobber_ref_p_1(gimple_statement_base*, ao_ref*) > ../../gcc/tree-ssa-alias.c:2013 > 0xb85d27 walk_non_aliased_vuses(ao_ref*, tree_node*, void* (*)(ao_ref*, > tree_node*, unsigned int, void*), void* (*)(ao_ref*, tree_node*, void*), > void*) > ../../gcc/tree-ssa-alias.c:2411 > 0xc509f3 vn_reference_lookup(tree_node*, tree_node*, vn_lookup_kind, > vn_reference_s**) > ../../gcc/tree-ssa-sccvn.c:2063 > 0xc52ea4 visit_reference_op_store > ../../gcc/tree-ssa-sccvn.c:2970 > 0xc55404 extract_and_process_scc_for_name > ../../gcc/tree-ssa-sccvn.c:3825 > > This smells bad, since it is given a canonical type that is after the > structural equivalency merging that ignores BINFOs, so it may be completely > different class with completely different bases than the original. Bases are > structuraly merged, too and may be exchanged for normal fields because > DECL_ARTIFICIAL (that separate bases and fields) does not seem to be part of > the canonical type definition in LTO. Can you elaborate on that DECL_ARTIFICIAL thing? That is, what is broken by considering all fields during that merging? Note that the BINFO walk below only adds extra aliasing - it should be harmless correctness-wise, no? > I wonder if that code is needed after all: > case QUAL_UNION_TYPE: > /* Recursively record aliases for the base classes, if there are any. > */ > if (TYPE_BINFO (type)) > { > int i; > tree binfo, base_binfo; > > for (binfo = TYPE_BINFO (type), i = 0; >BINFO_BASE_ITERATE (binfo, i, base_binfo); i++) > record_alias_subset (superset, > get_alias_set (BINFO_TYPE (base_binfo))); > } > for (field = TYPE_FIELDS (type); field != 0; field = DECL_CHAIN (field)) > if (TREE_CODE (field) == FIELD_DECL && !DECL_NONADDRESSABLE_P (field)) > record_alias_subset (superset, get_alias_set (TREE_TYPE (field))); > break; > all bases are also fields of within the type, so the second loop should notice > all the types seen by first loop if I am correct? > So perhaps the loop can be dropped at first place. Yeah, I remember seeing that code and thinking the same multiple times. Though I also vaguely remember that removing that loop regressed something. How is virtual inheritance represented in the fields? But I'd be happy if this BINFO user would go away ;) (similar in general for the get_alias_set langhook - with LTO we only preserve extra alias-set zero answers from that) Richard.
Need help: Is a VAR_DECL type builtin or not?
Given a specific VAR_DECL tree node, I need to find out whether its type is built in or not. Up to now I have tree tn = TYPE_NAME (TREE_TYPE (var_decl)); if (tn != NULL_TREE && TREE_CODE (tn) == TYPE_DECL && DECL_NAME (tn)) { ... } This if-condition is true for both, int x; const int x; ... and typedef int i_t; i_t x; const i_t x; ... I need to weed out the class of VAR_DECLs that directly use built in types. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany
Re: [RFC][PATCH 0/5] arch: atomic rework
On Thu, Feb 13, 2014 at 09:07:55PM -0800, Torvald Riegel wrote: > That depends on what your goal is. A compiler that we don't need to fight in order to generate sane code would be nice. But as Linus said; we can continue to ignore you lot and go on as we've done.
MSP430 in gcc4.9 ... enable interrupts?
I have built a crosscompiler for the MSP430, using a gcc4.9 snapshot (gcc-4.9-20140112) and the compiler seems OK and builds a simple "blinky" LED flashing example. But my slightly larger example, originally built using Peter Bigot's mspgcc backend, no longer compiles ... mspgcc had a number of intrinsic functions, such as __nop(), __eint() and __dint() respectively. Calling these would execute a nop, enable and disable interrupts respectively. Others such as __bis_status_register(), __bic_status_register() would manipulate system status, low power modes etc. Now in the MSP430 port for gcc4.9, these intrinsic functions have gone. Perusing the config/msp430 source files, e.g. config/msp430/msp430.md I can see evidence that the _functionality_ is still there, e.g. (define_insn "enable_interrupts" [(unspec_volatile [(const_int 0)] UNS_EINT)] "" "EINT" ) ... (define_insn "bis_SR" [(unspec_volatile [(match_operand 0 "nonmemory_operand" "ir")] UNS_BIS_SR)] "" "BIS.W\t%0, %O0(SP)" ) ... but how do I access it? In other words, what C code fragment would cause the "enable_interrupts" instruction to be emitted, and generate "EINT" in the assembler or object output? - Brian
[RFC] Rationale for passing vectors by value in SIMD registers
MIPS is currently evaluating the benefit of using SIMD registers to pass vector data by value. It is currently unclear how important it is for vector data to be passed in SIMD registers. I.e. the need for passing vector data by value in real world code is not immediately obvious. The performance advantage is therefore also unclear. Can anyone offer insight in the rationale behind decision decisions made for other architectures ABIs? For example, the x86 and x86_64 calling convention for vector data types presumes that they will passed in SSE/AVX registers and raises warnings if passed when sse/avx support is not enabled. This is what MIPS is currently considering however there are two concerns: 1) What about the ability to create architecture/implementation independent APIs that may include vector types in the prototypes. Such APIs may be built for varying levels of hardware support to make the most of a specific architecture implementation but be called from otherwise implementation agnostic code. To support such a scenario we would need to use a common calling convention usable on all architecture variants. 2) Although vector types are not specifically covered by existing ABI definitions for MIPS we have unfortunately got a defacto standard for how to pass these by value. Vector types are simply considered to be small structures and passed as such following normal ABI rules. This is still a concern even though it is generally accepted that there is some room for change when it comes to vector data types in an existing ABI. If anyone could offer a brief history the x86 ABI with respect to vector data types that may also be interesting. One question would be whether the use of vector registers in the calling convention was only enabled by default once there was a critical mass of implementations, and therefore the default ABI was changed to start making assumptions about the availability of features like SSE and AVX. Comments from any other architecture that has had to make such changes over time would also be welcome. Thanks in advance, Matthew
Re: [RFC] Rationale for passing vectors by value in SIMD registers
On Fri, Feb 14, 2014 at 2:17 AM, Matthew Fortune wrote: > MIPS is currently evaluating the benefit of using SIMD registers to pass > vector data by value. It is currently unclear how important it is for vector > data to be passed in SIMD registers. I.e. the need for passing vector data by > value in real world code is not immediately obvious. The performance > advantage is therefore also unclear. > > Can anyone offer insight in the rationale behind decision decisions made for > other architectures ABIs? For example, the x86 and x86_64 calling convention > for vector data types presumes that they will passed in SSE/AVX registers and > raises warnings if passed when sse/avx support is not enabled. This is what > MIPS is currently considering however there are two concerns: > > 1) What about the ability to create architecture/implementation independent > APIs that may include vector types in the prototypes. Such APIs may be built > for varying levels of hardware support to make the most of a specific > architecture implementation but be called from otherwise implementation > agnostic code. To support such a scenario we would need to use a common > calling convention usable on all architecture variants. > 2) Although vector types are not specifically covered by existing ABI > definitions for MIPS we have unfortunately got a defacto standard for how to > pass these by value. Vector types are simply considered to be small > structures and passed as such following normal ABI rules. This is still a > concern even though it is generally accepted that there is some room for > change when it comes to vector data types in an existing ABI. > > If anyone could offer a brief history the x86 ABI with respect to vector data > types that may also be interesting. One question would be whether the use of > vector registers in the calling convention was only enabled by default once > there was a critical mass of implementations, and therefore the default ABI > was changed to start making assumptions about the availability of features > like SSE and AVX. > > Comments from any other architecture that has had to make such changes over > time would also be welcome. PPC and arm and AARCH64 are common targets where vectors are passed/return via value. The idea is simple, sometimes you have functions like vector float vsinf(vector float a) where you want to be faster and avoid a round trip to L1 (or even L2). These kind of functions are common for vector programming. That is extending the scalar versions to the vector versions. Thanks, Andrew Pinski > > Thanks in advance, > Matthew >
Re: gnattools cannot be built for freestanding/bare metal environment without hacking up the build machinery
Luke A. Guest archeia.com> writes: > > Hi, > > I've been over this before and have got nowhere with it. > Say you want to build an Ada compiler for embedded work, ... > You can build it with "make all-gcc" and install with "make install-gcc" > ... > But what about the gnattools? Not buildable. A message in the ml > archives states to build them with "make -C gcc gnattools," but this > fails: > ... > You can't disable libada using the command line as that also disables > gnattools - I don't know the reason for requiring this behaviour. > Revisiting the MSP430 as it's now an official gcc target, this is still a problem here too, so there are at least 3 targets for which it's a problem. Looking at http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19959 and comparing comments #6 and #14, perhaps this command should be "make -C gcc gnattools-cross" ? I'll try to do a clean build and test this today. But it would be better if the build "just worked" so at > if test "${ENABLE_LIBADA}" != "yes" ; then > noconfigdirs="$noconfigdirs" > fi we need a better test here (also checking for a crosscompiler build). > I think that we need a configure flag to disable libada and not > gnattools for these bare board targets. Following comment #15 in bug 19959, perhaps it's time to open a bug against --disable-libada. - Brian
Optimizing bit extract
Hello gcc I have been looking at optimizations of pixel-format conversion recently and have noticed that gcc does take advantage of SSE4a extrq, BMI1 bextr TBM bextri or BMI2 pext instructions when it could be useful. As far as I can tell it should not be that hard. A bextr expression can typically be recognized as ((x >> s) & mask) or ((x << s1)) >> s2). But I am unsure where to do such a matching since the mask needs to have specific form to be valid for bextr, so it seems it needs to be done before instruction selection. Secondly the bextr instruction in itself only replace two already fast instructions so is very minor (unless extracting variable bit-fields which is harder recognize). The real optimization comes from being able to use pext (parallel bit extract), which can implement several bextr expressions in parallel. So, where would be the right place to implement such instructions. Would it make sense to recognize bextr early before we get to i386 code, or would it be better to recognize it late. And where do I put such instruction selection optimizations? Motivating example: unsigned rgb32_to_rgb16(unsigned rgb32) { unsigned char red = (rgb32 >> 19) & 0x1f; unsigned char green = (rgb32 >> 10) & 0x3f; unsigned char blue = rgb32 & 0x1f; return (red << 11) | (green << 5) | blue; } can be implemented as pext(rgb32, 0x001f3f1f) Best regards `Allan Sandfeld
Re: Need help: Is a VAR_DECL type builtin or not?
On Fri, Feb 14, 2014 at 9:59 AM, Dominik Vogt wrote: > Given a specific VAR_DECL tree node, I need to find out whether > its type is built in or not. Up to now I have > > tree tn = TYPE_NAME (TREE_TYPE (var_decl)); > if (tn != NULL_TREE && TREE_CODE (tn) == TYPE_DECL && DECL_NAME (tn)) > { > ... > } > > This if-condition is true for both, > > int x; > const int x; > ... > > and > > typedef int i_t; > i_t x; > const i_t x; > ... > > I need to weed out the class of VAR_DECLs that directly use built > in types. Try DECL_IS_BUILTIN. But I question how you define "builtin" here? Richard. > Ciao > > Dominik ^_^ ^_^ > > -- > > Dominik Vogt > IBM Germany >
Re: Optimizing bit extract
On Fri, Feb 14, 2014 at 2:23 PM, Allan Sandfeld Jensen wrote: > Hello gcc > > I have been looking at optimizations of pixel-format conversion recently and > have noticed that gcc does take advantage of SSE4a extrq, BMI1 bextr TBM > bextri or BMI2 pext instructions when it could be useful. > > As far as I can tell it should not be that hard. A bextr expression can > typically be recognized as ((x >> s) & mask) or ((x << s1)) >> s2). But I am > unsure where to do such a matching since the mask needs to have specific form > to be valid for bextr, so it seems it needs to be done before instruction > selection. > > Secondly the bextr instruction in itself only replace two already fast > instructions so is very minor (unless extracting variable bit-fields which is > harder recognize). The real optimization comes from being able to use pext > (parallel bit extract), which can implement several bextr expressions in > parallel. > > So, where would be the right place to implement such instructions. Would it > make sense to recognize bextr early before we get to i386 code, or would it be > better to recognize it late. And where do I put such instruction selection > optimizations? > > Motivating example: > > unsigned rgb32_to_rgb16(unsigned rgb32) { > unsigned char red = (rgb32 >> 19) & 0x1f; > unsigned char green = (rgb32 >> 10) & 0x3f; > unsigned char blue = rgb32 & 0x1f; >return (red << 11) | (green << 5) | blue; > } > > can be implemented as pext(rgb32, 0x001f3f1f) We have a special pass that already deals with similar patterns, the "bswap" pass in tree-ssa-math-opts.c. It does symbolic execution to produce the composition of a value. It currently handles byte-shifts only I think (not shifting by 19 or 10) but this is certainly the way I'd recognize pext() (and other generic shuffles supported by vector ISAs). You'd have to extend the representation it uses to handle these more arbitrary shifts/masks of course. Richard. > Best regards > `Allan Sandfeld
Re: gnattools cannot be built for freestanding/bare metal environment without hacking up the build machinery
On Fri, 2014-02-14 at 12:11 +, Brian Drummond wrote: > Revisiting the MSP430 as it's now an official gcc target, this is still a > problem here too, so there are at least 3 targets for which it's a problem. It's a problem for all targets. > Looking at http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19959 > and comparing comments #6 and #14, perhaps this command should be > "make -C gcc gnattools-cross" ? I'll try to do a clean build and test this > today. > It won't work, I've tried all those so called work arounds. They rely on having libada built, it will complain to say that it can't be built because there is no libada. See the attachments. > But it would be better if the build "just worked" so at > > > if test "${ENABLE_LIBADA}" != "yes" ; then > > noconfigdirs="$noconfigdirs" > > fi > we need a better test here (also checking for a crosscompiler build). I have added one. It does work, sort of. > > I think that we need a configure flag to disable libada and not > > gnattools for these bare board targets. > > Following comment #15 in bug 19959, perhaps it's time to open a bug against > --disable-libada. The attachments allow you to configure the toolchain for bare metal, no RTS. This flag is --enable-cross-gnattools, it checks if this is a cross compiler and if it is, disables libada, but not gnattools. There is an issue and I don't know why it does this, but it won't allow you to build the gnattools straight away. You have to use: make all-gcc make -C gnattools/ gnattools rm gcc/stamp-tools make -C gcc cross-gnattools make install-gcc This will then build. I have not checked to see if you can still build gcc for normal hosted with this patch, neither have I checked to make sure --disable-libada works in hosted either. I have built the following with these patches: arm-none-eabi mips-elf msp430-elf x86_64-elf Not one installs libada and I have gnattools: $ ls ~/opt/tinyada/lib/gcc/arm-none-eabi/4.9.0/ include include-fixed install-tools plugin $ ls ~/opt/tinyada/lib/gcc/mips-elf/4.9.0/ include include-fixed install-tools plugin $ ls ~/opt/tinyada/lib/gcc/msp430-elf/4.9.0/ include include-fixed install-tools plugin $ ls ~/opt/tinyada/lib/gcc/x86_64-elf/4.9.0/ include include-fixed install-tools plugin Luke. >From f8c74c16b9f7ef3be02a9a7d3480baf88d09efd6 Mon Sep 17 00:00:00 2001 From: "Luke A. Guest" Date: Fri, 14 Feb 2014 13:53:27 + Subject: [PATCH 1/2] Set the target for a bare metal environment. --- gnattools/configure| 9 + gnattools/configure.ac | 9 + 2 files changed, 18 insertions(+) diff --git a/gnattools/configure b/gnattools/configure index 883b705..b5bf03d 100755 --- a/gnattools/configure +++ b/gnattools/configure @@ -2090,6 +2090,15 @@ case "${target}" in indepsw.adb>From a2b4516f93f4d99e5ddae4c1eed78f2014f0875b Mon Sep 17 00:00:00 2001 From: "Luke A. Guest" Date: Fri, 14 Feb 2014 13:54:29 + Subject: [PATCH 2/2] Added --enable-cross-gnattools flag for bare metal environment. --- configure| 25 ++-- configure.ac | 63 2 files changed, 61 insertions(+), 27 deletions(-) diff --git a/configure b/configure index 749a35e..c0d706d 100755 --- a/configure +++ b/configure @@ -749,6 +749,7 @@ enable_ld enable_libquadmath enable_libquadmath_support enable_libada +enable_cross_gnattools enable_libssp enable_libstdcxx enable_static_libjava @@ -1466,6 +1467,10 @@ Optional Features: --disable-libquadmath-support disable libquadmath support for Fortran --enable-libada build libada directory + --enable-cross-gnattools + Enable cross gnattools for cross-compiler for + freestanding environment, --disable-libada is set + automatically --enable-libssp build libssp directory --disable-libstdcxx do not build libstdc++-v3 directory --enable-static-libjava[=ARG] @@ -3068,8 +3073,24 @@ else ENABLE_LIBADA=yes fi -if test "${ENABLE_LIBADA}" != "yes" ; then - noconfigdirs="$noconfigdirs gnattools" + +# Check whether --enable-cross-gnattools was given. +if test "${enable_cross_gnattools+set}" = set; then : + enableval=$enable_cross_gnattools; ENABLE_CROSS_GNATTOOLS=$enableval +else + ENABLE_CROSS_GNATTOOLS=yes +fi + + +if test "${is_cross_compiler}" = "yes" && test "${ENABLE_CROSS_GNATTOOLS}" = "yes" ; then + if test "${target_vendor}" = "none" || test "${target_vendor}" = "unknown" ; then +enable_libada=no +ENABLE_LIBADA=$enable_libada + fi +else + if test "${ENABLE_LIBADA}" != "yes" ; then +noconfigdirs="$noconfigdirs gnattools" + fi fi # Check whether --enable-libssp was given. diff --git a/configure.ac b/configure.ac index b24b33d..4fcac1a 100644 --- a/configure.ac +++ b/configure.ac @@ -113,11 +113,11 @@ extra_host_args= ### or a host dependent tool. Then put it into the appropriate list
Re: [RFC] Offloading Support in libgomp
2014-01-31 22:03 GMT+04:00 Jakub Jelinek : > Implicit map(tofrom: a) on #pragma omp target is what the standard > requires, so I don't see a bug on the compiler side. > Jakub There is an exception in the standard (page 177, lines 17-21): > If a corresponding list item of the original list item is in the enclosing > device data > environment, the new device data environment uses the corresponding list item > from the > enclosing device data environment. No additional storage is allocated in the > new device > data environment and neither initialization nor assignment is performed, > regardless of > the map-type that is specified. So, the pointer 'a' should inherit map-type ALLOC from the enclosing device data environment. -- Ilya
Re: [RFC] Offloading Support in libgomp
On Fri, Feb 14, 2014 at 07:24:16PM +0400, Ilya Verbin wrote: > 2014-01-31 22:03 GMT+04:00 Jakub Jelinek : > > Implicit map(tofrom: a) on #pragma omp target is what the standard > > requires, so I don't see a bug on the compiler side. > > Jakub > > There is an exception in the standard (page 177, lines 17-21): > > > If a corresponding list item of the original list item is in the enclosing > > device data > > environment, the new device data environment uses the corresponding list > > item from the > > enclosing device data environment. No additional storage is allocated in > > the new device > > data environment and neither initialization nor assignment is performed, > > regardless of > > the map-type that is specified. > > So, the pointer 'a' should inherit map-type ALLOC from the enclosing > device data environment. The standard itself is very unclear. I'll cite my omp-lang mail from September: > Ok, I'm for now implementing this refcounted model. > > One still unclear thing is what is supposed to happen if multiple host > threads > enter a target data construct mapping at least one same object with different > > map kind. > > Say thread A enters #pragma omp target data map(tofrom:p[:64]), then > > thread B enters #pragma omp target data map(alloc:p[:64]) while thread A is > > still running the body of it's target data (so, the mapping just increments > > refcount of the p[:64] array section), then thread A leaves the target data > > construct, decrements p[:64] refcount, but as it is non-zero, doesn't > > deallocate it, and finally thread B enters end of its target data construct > and > unmaps p[:64]. The question is, when (if ever) is the array section supposed > > to be copied back to host? Shall it be done at the end of thread's A target > > data section, or at the end of thread's B target data section (i.e. propagate > > the flag, has at least one of the mapping's requested copy from the device to > > host at the end of it's lifetime), or not copied at all? > > What if thread B doesn't request the whole array section, but only a portion > > thereof map(alloc:p[:32]) ? Would it copy the whole p[:64] array section > > back, or just a portion of it? Though, admittedly, this latter case of a > subset > might be harder to construct valid non-racy testcase for, one needs to make > > sure one of the target data constructs is always entered before the other; > > though perhaps with #pragma omp atomic and spinning it might be doable. > and will just paraphrase the Sep 9th answer I got for that, because not sure I'm allowed to repost it. The answer was that on entry the standard is pretty clear what happens, the first encountering thread/data construct allocates and optionally copies based on the flags, all others when it is already mapped do nothing. On exit, the standard is silent and none of the solutions are right, the committee will discuss it further. So, for now the implementation choice was to or in the copy from device bit. Now, you could argue this case is different, because it is not different threads, but the same thread, just nested construct on the same thread. But how to reliably differentiate that? Even if you stored some thread identification into the tree along with each mapping (what thread mapped this in), what if some other thread also does the same (outer #pragma omp target data, inner
Re: gnattools cannot be built for freestanding/bare metal environment without hacking up the build machinery
On Fri, 2014-02-14 at 15:32 +, Brian Drummond wrote: > OK I'll take a look. > Too many make and install targets; I have no idea how this process > interacts with the process specified here: > https://sourceforge.net/apps/mediawiki/mspgcc/index.php?title=Install:redhat > for the msp430, using (in short) Well, like I've said before that process won't work when you enable Ada, my patches will make it work, but you need to add the extra/new flag to configure. > make all-host > sudo make install-host > (Configure and build Newlib for MSP430) > make all-target > sudo make install-target > > Maybe that split build process actually resolves your issue... > But who can tell? : I can't say I've found adequate documentation. It really doesn't. > With the original (avr/msp430) patch/hack, gnattools were automatically > built and installed by make all-host, make install-host. Yes, but that hack is from avr-ada which, if you look back in this ml, I specifically state that it breaks normal builds. My patches, in theory, should not. Trust me, I've been trying to build these style tools for around 10 years, on and off, and they have never built correctly for these targets and any attempt to put in a bug report about it is a waste of time as someone comes along states it works and closes it, even though it doesn't. Luke.
Issue with CRTP generation under 4.8.1
I created a CRTP (Curiously recurring template pattern) and added non-static member variables to my base class and that works without issue. But when I add non-static variables to the subclass instance the initialization and values for the variables in this class don't get initialized properly and also don't consistently keep the same values. I checked the sizeof the object and it does seem to contain the correct size that is the Base member variables plus the Subclasses member variables and the this pointer is consistently the same values. It's just that when dumping the values to the screen the ones in the subclass aren't the values that I'm expecting within my template specialization methods of the subclass. Since multiple instances of my object are dumping the same values regardless of the object I'm guessing that the computation of the offset in the object for these variables is being calculated incorrectly. Note that 472 should have been printed and not 0. Example of bug: #include struct ParamOne { double val {0.0}; }; struct ParamTwo { int val {0}; }; template class Baseclass { public: using subclass_type = P; using data_type = Data; using other_type = Other; bool Method( const Data &data); public: int m_BaseClassValue { 304 }; }; template using pdata_type = typename P::data_type; template using pother_type = typename P::other_type; template bool Baseclass::Method( const Data &data ) { P& Subclass = static_cast( *this ); pother_type other; other.val = 11; return Subclass.SubclassMethod( data, other ); } template class Subclass : public Baseclass, Data, Other> { public: using data_type = Data; using other_type = Other; bool SubclassMethod( const Data &data, Other &other ); public: int m_SubClassValue { 472 }; }; template bool Subclass::SubclassMethod( const Data &data, Other &other ) { return true; } template<> bool Subclass::SubclassMethod( const ParamOne &data, ParamTwo &other ) { printf( "The this pointer is %lx with a size of %ld and values of %d and %d\n", (long)this, sizeof(*this), m_BaseClassValue, m_SubClassValue ); return true; } int main(int argc, char **argv) { ParamOne one; one.val = 5.0; Baseclass, ParamOne, ParamTwo> test; test.Method(one); return 0; } Output is: The this pointer is 7fffc4e87670 with a size of 8 and values of 304 and 0. Compile options are: -g;-Wall;-std=c++11;-O0
Re: Issue with CRTP generation under 4.8.1
On 14 February 2014 07:46, Andrew Stern wrote: > I created a CRTP (Curiously recurring template pattern) > and added non-static member variables to my base class and that works without > issue. This mailing list is for development of GCC, not for bug reports or help using it. I would suggest the gcc-help list instead, but you should consider reformatting your code before you ask anyone to look at it, so it is readable. And maybe not have variables with the same names as class templates.
Re: TYPE_BINFO and canonical types at LTO
> > This smells bad, since it is given a canonical type that is after the > > structural equivalency merging that ignores BINFOs, so it may be completely > > different class with completely different bases than the original. Bases > > are > > structuraly merged, too and may be exchanged for normal fields because > > DECL_ARTIFICIAL (that separate bases and fields) does not seem to be part of > > the canonical type definition in LTO. > > Can you elaborate on that DECL_ARTIFICIAL thing? That is, what is broken > by considering all fields during that merging? To make the code work with LTO, one can not merge struct B {struct A a} struct B: A {} these IMO differ only by DECL_ARTIFICIAL flag on the fields. > > Note that the BINFO walk below only adds extra aliasing - it should > be harmless correctness-wise, no? If it is needed for the second case, then it we will produce wrong code with LTO when we merge first and second case toghetr. If it is not needed, then we are safe but we don't really need the loop then. Having the testcase where the BINFO walk is needed for corectness, I can turn it into wrong code in LTO by interposing the class by structure. I am rebuilding firefox with sanity checking that the second loop never adds anything useful. Lets see. > > > I wonder if that code is needed after all: > > case QUAL_UNION_TYPE: > > /* Recursively record aliases for the base classes, if there are any. > > */ > > if (TYPE_BINFO (type)) > > { > > int i; > > tree binfo, base_binfo; > > > > for (binfo = TYPE_BINFO (type), i = 0; > >BINFO_BASE_ITERATE (binfo, i, base_binfo); i++) > > record_alias_subset (superset, > > get_alias_set (BINFO_TYPE (base_binfo))); > > } > > for (field = TYPE_FIELDS (type); field != 0; field = DECL_CHAIN > > (field)) > > if (TREE_CODE (field) == FIELD_DECL && !DECL_NONADDRESSABLE_P > > (field)) > > record_alias_subset (superset, get_alias_set (TREE_TYPE (field))); > > break; > > all bases are also fields of within the type, so the second loop should > > notice > > all the types seen by first loop if I am correct? > > So perhaps the loop can be dropped at first place. > > Yeah, I remember seeing that code and thinking the same multiple times. > Though I also vaguely remember that removing that loop regressed > something. How is virtual inheritance represented in the fields? struct A {int a;}; struct B: virtual A {}; struct C: virtual A {}; struct D: B, C {}; struct D *d; struct B *b; int t(void) { d->a=1; return b->a; } I srepresented as: constant 192> unit size constant 24> align 64 symtab 0 alias set 6 canonical type 0x76c58000 fields unit size align 64 symtab 0 alias set 3 canonical type 0x76c452a0 fields context full-name "struct B" needs-constructor X() X(constX&) this=(X&) n_parents=1 use_template=0 interface-unknown pointer_to_this chain > ignored decl_6 BLK file t.C line 4 col 8 size unit size align 64 offset_align 128 offset bit offset context chain ignored decl_6 BLK file t.C line 4 col 8 size unit size align 64 offset_align 128 offset bit offset context chain >> context full-name "struct D" needs-constructor X() X(constX&) this=(X&) n_parents=2 use_template=0 interface-unknown pointer_to_this chain > You my end up with A being a field, too, but that only bypas the recursion. > > But I'd be happy if this BINFO user would go away ;) Me too, indeed. > > (similar in general for the get_alias_set langhook - with LTO we > only preserve extra alias-set zero answers from that) I see, I was surprised we construct alias sets at LTO time, I believed we always stream them in&out. Honza > > Richard.
Re: [RFC][PATCH 0/5] arch: atomic rework
On Thu, Feb 13, 2014 at 08:43:01PM -0800, Torvald Riegel wrote: > On Thu, 2014-02-13 at 18:01 -0800, Paul E. McKenney wrote: [ . . . ] > > Another option would be to flag the conditional expression, prohibiting > > the compiler from optimizing out any conditional branches. Perhaps > > something like this: > > > > r1 = atomic_load(x, memory_order_control); > > if (control_dependency(r1)) > > atomic_store(y, memory_order_relaxed); > > That's the one I had in mind and talked to you about earlier today. My > gut feeling is that this is preferably over the other because it "marks" > the if-statement, so the compiler knows exactly which branches matter. > I'm not sure one would need the other memory order for that, if indeed > all you want is relaxed -> branch -> relaxed. But maybe there are > corner cases (see the weaker-than-relaxed discussion in SG1 today). Linus, Peter, any objections to marking places where we are relying on ordering from control dependencies against later stores? This approach seems to me to have significant documentation benefits. Thanx, Paul
Re: [RFC] Offloading Support in libgomp
On 02/14/2014 07:43 AM, Jakub Jelinek wrote: > So, perhaps we should just stop for now oring the copyfrom in and just use > the copyfrom from the very first mapping only, and wait for what the committee > actually agrees on. > > Richard, your thoughts on this? I think stopping the or'ing until the issue is resolved is the best plan. r~
Re: MSP430 in gcc4.9 ... enable interrupts?
The constructs in the *.md files are for the compiler's internal use (i.e. there are function attributes that trigger those). You don't need compiler support for these opcodes at the user level; the right way is to implement those builtins as inline assembler in a common header file: static inline __attribute__((always_inline)) void __nop() { asm volatile ("NOP"); } static inline __attribute__((always_inline)) void __eint() { asm volatile ("EINT"); } Or more simply: #define __eint() asm("EINT") #define __nop() asm("NOP") For opcodes with parameters, you use a more complex form of inline assembler: static inline __attribute__((always_inline)) void BIC_SR(const int x) { asm volatile ("BIC.W %0,R2" :: "i" (x)); }
Re: [RFC][PATCH 0/5] arch: atomic rework
On Fri, 2014-02-14 at 10:50 +0100, Peter Zijlstra wrote: > On Thu, Feb 13, 2014 at 09:07:55PM -0800, Torvald Riegel wrote: > > That depends on what your goal is. First, I don't know why you quoted that, but without the context, quoting it doesn't make sense. Let me repeat the point. The standard is the rule set for the compiler. Period. The compiler does not just serve the semantics that you might have in your head. It does have to do something meaningful for all of its users. Thus, the goal for the compiler is to properly compile programs in the language as specified. If there is a deficiency in the standard (bug or missing feature) -- and thus the specification, we need to have a patch for the standard that fixes this deficiency. If you think that this is the case, that's where you fix it. If your goal is to do wishful thinking, imagine some kind of semantics in your head, and then assume that magically, implementations will do just that, then that's bound to fail. > A compiler that we don't need to fight in order to generate sane code > would be nice. But as Linus said; we can continue to ignore you lot and > go on as we've done. I don't see why it's so hard to understand that you need to specify semantics, and the place (or at least the base) for that is the standard. Aren't you guys the ones replying "send a patch"? :) This isn't any different. If you're uncomfortable working with the standard, then say so, and reach out to people that aren't. You can surely ignore the specification of the language(s) that you are depending on. But that won't help you. If you want a change, get involved. (Oh, and claiming that the other side doesn't get it doesn't count as getting involved.) There's no fight between people here. It's just a technical problem that we have to solve in the right way.
Re: [RFC][PATCH 0/5] arch: atomic rework
On Fri, 2014-02-14 at 09:29 -0800, Paul E. McKenney wrote: > On Thu, Feb 13, 2014 at 08:43:01PM -0800, Torvald Riegel wrote: > > On Thu, 2014-02-13 at 18:01 -0800, Paul E. McKenney wrote: > > [ . . . ] > > > > Another option would be to flag the conditional expression, prohibiting > > > the compiler from optimizing out any conditional branches. Perhaps > > > something like this: > > > > > > r1 = atomic_load(x, memory_order_control); > > > if (control_dependency(r1)) > > > atomic_store(y, memory_order_relaxed); > > > > That's the one I had in mind and talked to you about earlier today. My > > gut feeling is that this is preferably over the other because it "marks" > > the if-statement, so the compiler knows exactly which branches matter. > > I'm not sure one would need the other memory order for that, if indeed > > all you want is relaxed -> branch -> relaxed. But maybe there are > > corner cases (see the weaker-than-relaxed discussion in SG1 today). > > Linus, Peter, any objections to marking places where we are relying on > ordering from control dependencies against later stores? This approach > seems to me to have significant documentation benefits. Let me note that at least as I'm concerned, that's just a quick idea. At least I haven't looked at (1) how to properly specify the semantics of this, (2) whether it has any bad effects on unrelated code, (3) and whether there are pitfalls for compiler implementations. It looks not too bad at first glance, though.
Re: [RFC][PATCH 0/5] arch: atomic rework
On Fri, Feb 14, 2014 at 9:29 AM, Paul E. McKenney wrote: > > Linus, Peter, any objections to marking places where we are relying on > ordering from control dependencies against later stores? This approach > seems to me to have significant documentation benefits. Quite frankly, I think it's stupid, and the "documentation" is not a benefit, it's just wrong. How would you figure out whether your added "documentation" holds true for particular branches but not others? How could you *ever* trust a compiler that makes the dependency meaningful? Again, let's keep this simple and sane: - if a compiler ever generates code where an atomic store movement is "visible" in any way, then that compiler is broken shit. I don't understand why you even argue this. Seriously, Paul, you seem to *want* to think that "broken shit" is acceptable, and that we should then add magic markers to say "now you need to *not* be broken shit". Here's a magic marker for you: DON'T USE THAT BROKEN COMPILER. And if a compiler can *prove* that whatever code movement it does cannot make a difference, then let it do so. No amount of "documentation" should matter. Seriously, this whole discussion has been completely moronic. I don't understand why you even bring shit like this up: > > r1 = atomic_load(x, memory_order_control); > > if (control_dependency(r1)) > > atomic_store(y, memory_order_relaxed); I mean, really? Anybody who writes code like that, or any compiler where that "control_dependency()" marker makes any difference what-so-ever for code generation should just be retroactively aborted. There is absolutely *zero* reason for that "control_dependency()" crap. If you ever find a reason for it, it is either because the compiler is buggy, or because the standard is so shit that we should never *ever* use the atomics. Seriously. This thread has devolved into some kind of "just what kind of idiotic compiler cesspool crap could we accept". Get away from that f*cking mindset. We don't accept *any* crap. Why are we still discussing this idiocy? It's irrelevant. If the standard really allows random store speculation, the standard doesn't matter, and sane people shouldn't waste their time arguing about it. Linus
Re: [RFC][PATCH 0/5] arch: atomic rework
On Fri, Feb 14, 2014 at 11:50 AM, Linus Torvalds wrote: > > Why are we still discussing this idiocy? It's irrelevant. If the > standard really allows random store speculation, the standard doesn't > matter, and sane people shouldn't waste their time arguing about it. Btw, the other part of this coin is that our manual types (using volatile and various architecture-specific stuff) and our manual barriers and inline asm accesses are generally *fine*. The C11 stuff doesn't buy us anything. The argument that "new architectures might want to use it" is prue and utter bollocks, since unless the standard gets the thing *right*, nobody sane would ever use it for some new architecture, when the sane thing to do is to just fill in the normal barriers and inline asms. So I'm very very serious: either the compiler and the standard gets things right, or we don't use it. There is no middle ground where "we might use it for one or two architectures and add random hints". That's just stupid. The only "middle ground" is about which compiler version we end up trusting _if_ it turns out that the compiler and standard do get things right. From Torvald's explanations (once I don't mis-read them ;), my take-away so far has actually been that the standard *does* get things right, but I do know from over-long personal experience that compiler people sometimes want to be legalistic and twist the documentation to the breaking point, at which point we just go "we'd be crazy do use that". See our use of "-fno-strict-aliasing", for example. The C standard aliasing rules are a mistake, stupid, and wrong, and gcc uses those stupid type-based alias rules even when statically *proving* the aliasing gives the opposite result. End result: we turn the shit off. Exact same deal wrt atomics. We are *not* going to add crazy "this here is a control dependency" crap. There's a test, the compiler *sees* the control dependency for chrissake, and it still generates crap, we turn that broken "optimization" off. It really is that simple. Linus
Re: MSP430 in gcc4.9 ... enable interrupts?
On Fri, 2014-02-14 at 14:17 -0500, DJ Delorie wrote: > The constructs in the *.md files are for the compiler's internal use > (i.e. there are function attributes that trigger those). You don't > need compiler support for these opcodes at the user level; the right > way is to implement those builtins as inline assembler > static inline __attribute__((always_inline)) > void __nop() > { > asm volatile ("NOP"); > } Thanks for the answer. I thought I was missing something, but apparently not. Now it's clear that inline assembler is the official way, I can adapt that approach to my situation. - Brian
Re: TYPE_BINFO and canonical types at LTO
> > > This smells bad, since it is given a canonical type that is after the > > > structural equivalency merging that ignores BINFOs, so it may be > > > completely > > > different class with completely different bases than the original. Bases > > > are > > > structuraly merged, too and may be exchanged for normal fields because > > > DECL_ARTIFICIAL (that separate bases and fields) does not seem to be part > > > of > > > the canonical type definition in LTO. > > > > Can you elaborate on that DECL_ARTIFICIAL thing? That is, what is broken > > by considering all fields during that merging? > > To make the code work with LTO, one can not merge > struct B {struct A a} > struct B: A {} > > these IMO differ only by DECL_ARTIFICIAL flag on the fields. > > > > Note that the BINFO walk below only adds extra aliasing - it should > > be harmless correctness-wise, no? > > If it is needed for the second case, then it we will produce wrong code with > LTO > when we merge first and second case toghetr. If it is not needed, then we > are safe > but we don't really need the loop then. > > Having the testcase where the BINFO walk is needed for corectness, I can turn > it into wrong code in LTO by interposing the class by structure. > > I am rebuilding firefox with sanity checking that the second loop never adds > anything > useful. Lets see. The code really seems to be adding only cases of zero sized classes. I use the following hack in my tree. I do not know how to discover zero sized class, so I test of unit size 1, but I think if there was other cases, we would notice anyway. The reason why I am looking into is that because I am trying to evaulate how expensive BINFOs are. I am trying to keep only those that matters for debug info and devirt, and avoid streaming others. Honza Index: alias.c === --- alias.c (revision 20) +++ alias.c (working copy) @@ -995,20 +996,40 @@ record_component_aliases (tree type) case RECORD_TYPE: case UNION_TYPE: case QUAL_UNION_TYPE: - /* Recursively record aliases for the base classes, if there are any. */ - if (TYPE_BINFO (type)) - { - int i; - tree binfo, base_binfo; - - for (binfo = TYPE_BINFO (type), i = 0; - BINFO_BASE_ITERATE (binfo, i, base_binfo); i++) - record_alias_subset (superset, -get_alias_set (BINFO_TYPE (base_binfo))); - } - for (field = TYPE_FIELDS (type); field != 0; field = DECL_CHAIN (field)) - if (TREE_CODE (field) == FIELD_DECL && !DECL_NONADDRESSABLE_P (field)) - record_alias_subset (superset, get_alias_set (TREE_TYPE (field))); + { +#ifdef ENABLE_CHECKING +bitmap_obstack my_obstack; +bitmap_obstack_initialize (&my_obstack); + bitmap added = BITMAP_ALLOC (&my_obstack); +#endif + alias_set_type t; + for (field = TYPE_FIELDS (type); field != 0; field = DECL_CHAIN (field)) + if (TREE_CODE (field) == FIELD_DECL && !DECL_NONADDRESSABLE_P (field)) + { + t = get_alias_set (TREE_TYPE (field)); +#ifdef ENABLE_CHECKING + bitmap_set_bit (added, t); +#endif + record_alias_subset (superset, t); + } +#ifdef ENABLE_CHECKING + /* Recursively record aliases for the base classes, if there are any. */ + if (!in_lto_p && TYPE_BINFO (type)) + { + int i; + tree binfo, base_binfo; + + for (binfo = TYPE_BINFO (type), i = 0; +BINFO_BASE_ITERATE (binfo, i, base_binfo); i++) + if (!bitmap_bit_p (added, get_alias_set (BINFO_TYPE (base_binfo + { + gcc_assert (integer_onep (TYPE_SIZE_UNIT (BINFO_TYPE (base_binfo; + } + } + BITMAP_FREE (added); + bitmap_obstack_release (&my_obstack); +#endif + } break; case COMPLEX_TYPE:
Re: [RFC][PATCH 0/5] arch: atomic rework
On Fri, Feb 14, 2014 at 12:02:23PM -0800, Linus Torvalds wrote: > On Fri, Feb 14, 2014 at 11:50 AM, Linus Torvalds > wrote: > > > > Why are we still discussing this idiocy? It's irrelevant. If the > > standard really allows random store speculation, the standard doesn't > > matter, and sane people shouldn't waste their time arguing about it. > > Btw, the other part of this coin is that our manual types (using > volatile and various architecture-specific stuff) and our manual > barriers and inline asm accesses are generally *fine*. > > The C11 stuff doesn't buy us anything. The argument that "new > architectures might want to use it" is prue and utter bollocks, since > unless the standard gets the thing *right*, nobody sane would ever use > it for some new architecture, when the sane thing to do is to just > fill in the normal barriers and inline asms. > > So I'm very very serious: either the compiler and the standard gets > things right, or we don't use it. There is no middle ground where "we > might use it for one or two architectures and add random hints". > That's just stupid. > > The only "middle ground" is about which compiler version we end up > trusting _if_ it turns out that the compiler and standard do get > things right. From Torvald's explanations (once I don't mis-read them > ;), my take-away so far has actually been that the standard *does* get > things right, but I do know from over-long personal experience that > compiler people sometimes want to be legalistic and twist the > documentation to the breaking point, at which point we just go "we'd > be crazy do use that". > > See our use of "-fno-strict-aliasing", for example. The C standard > aliasing rules are a mistake, stupid, and wrong, and gcc uses those > stupid type-based alias rules even when statically *proving* the > aliasing gives the opposite result. End result: we turn the shit off. > > Exact same deal wrt atomics. We are *not* going to add crazy "this > here is a control dependency" crap. There's a test, the compiler > *sees* the control dependency for chrissake, and it still generates > crap, we turn that broken "optimization" off. It really is that > simple. >From what I can see at the moment, the standard -generally- avoids speculative stores, but there are a few corner cases where it might allow them. I will be working with the committee to see exactly what the situation is. Might be that I am confused and that everything really is OK, might be that I am right but the corner cases are things that no sane kernel developer would do anyway, it might be that the standard needs a bit of repair, or it might be that the corner cases are somehow inherent and problematic (but I hope not!). I will let you know what I find, but it will probably be a few months. In the meantime, agreed, we keep doing what we have been doing. And maybe in the long term as well, for that matter. One way of looking at the discussion between Torvald and myself would be as a seller (Torvald) and a buyer (me) haggling over the fine print in a proposed contract (the standard). Whether that makes you feel better or worse about the situation I cannot say. ;-) Thanx, Paul
Re: asking your advice about bug
On 02/12/2014 11:51 AM, Roman Gareev wrote: Hi Roman, thanks for the quick feedback! I've found out that this bug appeared in revision 189156 (svn://gcc.gnu.org/svn/gcc/trunk) and similar error message appeared in revision 191757 (svn://gcc.gnu.org/svn/gcc/trunk) (maybe it's because of changes in diagnostic.c). If subtract_commutative_associative_deps, a function located in gcc/graphite-dependences.c, is commented out, the error will disappear . I am trying to find a bug in this function now. *Could you *please answer a few questions about it? 1) Where can I find the algorithm for finding associative commutative reduction, which was used in subtract_commutative_associative_deps? It seems as if there is no such description available. This is a problem by itself and we should probably add documentation about what is going on. Unfortunately, I did not write the code and I also don't really get what it is doing. The intuition seems to be that the dependences between a set of reduction statements are computed and those dependences are then removed from the overall set of dependences. This is in general a good idea, but the code could need some improvements and fixes. Several things look shady here. Here one example 1) We only remove dependences, but we should also add new ones Let 'a->a' be reduction dependences, then dependences between 'a' can only be removed in case we add new dependences between 'b1' and all 'a' and all 'a' and 'b2. b1 | a -> a -> a | b2 This should probably be fixed, but I don't think this is the problem of the current bug report. In fact, to fix the bug report, I don't even think we need to understand the full algorithm. The first question to ask is: Why are we segfaulting? Which statement is causing the segfault? 2) What is the number returned by isl_union_map_compute_flow? (I haven't found its description in “Integer Set Library: Manual”) 3) I've found the following terms in subtract_commutative_associative_deps: “may accesses”, “must access”. “Integer Set Library: Manual” gives the following definition: «If any of the source accesses are marked as being may accesses, then there will be a dependence to the last must access and to any may access that follows this last must access». *Could you *please describe their meaning? Are they related to transitively-covered dependences? Thanks Sven for answering those. Tobias
Re: [RFC][PATCH 0/5] arch: atomic rework
On Fri, Feb 14, 2014 at 6:08 PM, Paul E. McKenney wrote: > > One way of looking at the discussion between Torvald and myself would be > as a seller (Torvald) and a buyer (me) haggling over the fine print in > a proposed contract (the standard). Whether that makes you feel better > or worse about the situation I cannot say. ;-) Oh, I'm perfectly fine with that. But we're not desperate to buy, and I actually think the C11 people are - or at least should be - *way* more desperate to sell their model to us than we are to take it. Which means that as a buyer you should say "this is what we want, if you don't give us this, we'll just walk away". Not try to see how much we can pay for it. Because there is very little upside for us, and _unless_ the C11 standard gets things right it's just extra complexity for us, coupled with new compiler fragility and years of code generation bugs. Why would we want that extra complexity and inevitable compiler bugs? If we then have to fight compiler writers that point to the standard and say "..but look, the standard says we can do this", then at that point it went from "extra complexity and compiler bugs" to a whole 'nother level of frustration and pain. So just walk away unless the C11 standard gives us exactly what we want. Not "something kind of like what we'd use". EXACTLY. Because I'm not in the least interested in fighting compiler people that have a crappy standard they can point to. Been there, done that, got the T-shirt and learnt my lesson. And the thing is, I suspect that the Linux kernel is the most complete - and most serious - user of true atomics that the C11 people can sell their solution to. If we don't buy it, they have no serious user. Sure, they'll have lots of random other one-off users for their atomics, where each user wants one particular thing, but I suspect that we'll have the only really unified portable code base that handles pretty much *all* the serious odd cases that the C11 atomics can actually talk about to each other. Oh, they'll push things through with or without us, and it will be a collection of random stuff, where they tried to please everybody, with particularly compiler/architecture people who have no f*cking clue about how their stuff is used pushing to make it easy/efficient for their particular compiler/architecture. But we have real optimized uses of pretty much all relevant cases that people actually care about. We can walk away from them, and not really lose anything but a small convenience (and it's a convenience *only* if the standard gets things right). And conversely, the C11 people can walk away from us too. But if they can't make us happy (and by "make us happy", I really mean no stupid games on our part) I personally think they'll have a stronger standard, and a real use case, and real arguments. I'm assuming they want that. That's why I complain when you talk about things like marking control dependencies explicitly. That's *us* bending over backwards. And as a buyer, we have absolutely zero reason to do that. Tell the C11 people: "no speculative writes". Full stop. End of story. Because we're not buying anything else. Similarly, if we need to mark atomics "volatile", then now the C11 atomics are no longer even a "small convenience", now they are just extra complexity wrt what we already have. So just make it clear that if the C11 standard needs to mark atomics volatile in order to get non-speculative and non-reloading behavior, then the C11 atomics are useless to us, and we're not buying. Remember: a compiler can *always* do "as if" optimizations - if a compiler writer can prove that the end result acts 100% the same using an optimized sequence, then they can do whatever the hell they want. That's not the issue. But if we can *ever* see semantic impact of speculative writes, the compiler is buggy, and the compiler writers need to be aware that it is buggy. No ifs, buts, maybes about it. So I'm perfectly fine with you seeing yourself as a buyer. But I want you to be a really *picky* and anal buyer - one that knows he has the upper hand, and can walk away with no downside. Linus
Re: [RFC][PATCH 0/5] arch: atomic rework
On Fri, Feb 14, 2014 at 6:44 PM, Linus Torvalds wrote: > > And conversely, the C11 people can walk away from us too. But if they > can't make us happy (and by "make us happy", I really mean no stupid > games on our part) I personally think they'll have a stronger > standard, and a real use case, and real arguments. I'm assuming they > want that. I should have somebody who proof-reads my emails before I send them out. I obviously meant "if they *can* make us happy" (not "can't"). Linus
Re: [RFC][PATCH 0/5] arch: atomic rework
On Fri, Feb 14, 2014 at 06:48:02PM -0800, Linus Torvalds wrote: > On Fri, Feb 14, 2014 at 6:44 PM, Linus Torvalds > wrote: > > > > And conversely, the C11 people can walk away from us too. But if they > > can't make us happy (and by "make us happy", I really mean no stupid > > games on our part) I personally think they'll have a stronger > > standard, and a real use case, and real arguments. I'm assuming they > > want that. > > I should have somebody who proof-reads my emails before I send them out. > > I obviously meant "if they *can* make us happy" (not "can't"). Understood. My next step is to take a more detailed look at the piece of the standard that should support RCU. Depending on how that turns out, I might look at other parts of the standard vs. Linux's atomics and memory-ordering needs. Should be interesting. ;-) Thanx, Paul
Re: [RFC][PATCH 0/5] arch: atomic rework
On Fri, Feb 14, 2014 at 10:35:44PM -0800, Paul E. McKenney wrote: > On Fri, Feb 14, 2014 at 06:48:02PM -0800, Linus Torvalds wrote: > > On Fri, Feb 14, 2014 at 6:44 PM, Linus Torvalds > > wrote: > > > > > > And conversely, the C11 people can walk away from us too. But if they > > > can't make us happy (and by "make us happy", I really mean no stupid > > > games on our part) I personally think they'll have a stronger > > > standard, and a real use case, and real arguments. I'm assuming they > > > want that. > > > > I should have somebody who proof-reads my emails before I send them out. > > > > I obviously meant "if they *can* make us happy" (not "can't"). > > Understood. My next step is to take a more detailed look at the piece > of the standard that should support RCU. Depending on how that turns > out, I might look at other parts of the standard vs. Linux's atomics > and memory-ordering needs. Should be interesting. ;-) And perhaps a better way to represent the roles is that I am not the buyer, but rather the purchasing agent for the -potential- buyer. -You- are of course the potential buyer. If I were to see myself as the buyer, then I must confess that the concerns you implicitly expressed in your prior email would be all too well-founded! Thanx, Paul