On Fri, Jul 28, 2017 at 12:18 PM, Georg-Johann Lay <a...@gjlay.de> wrote: > Richard Biener schrieb: > >> On Thu, Jul 27, 2017 at 3:32 PM, Georg-Johann Lay <a...@gjlay.de> wrote: >>> >>> On 27.07.2017 14:34, Richard Biener wrote: >>>> >>>> On Thu, Jul 27, 2017 at 2:29 PM, Georg-Johann Lay <a...@gjlay.de> wrote: >>>>> >>>>> For some targets, the best place to put read-only lookup tables as >>>>> generated by -ftree-switch-conversion is not the generic address space >>>>> but some target specific address space. >>>>> >>>>> This is the case for AVR, where for most devices .rodata must be >>>>> located in RAM. >>>>> >>>>> Part #1 adds a new, optional target hook that queries the back-end >>>>> about the desired address space. There is currently only one user: >>>>> tree-switch-conversion.c::build_one_array() which re-builds value_type >>>>> and array_type if the address space returned by the backend is not >>>>> the generic one. >>>>> >>>>> Part #2 is the AVR part that implements the new hook and adds some >>>>> sugar around it. >>>> >>>> >>>> Given that switch-conversion just creates a constant initializer doesn't >>>> AVR >>>> benefit from handling those uniformly (at RTL expansion?). Not sure but >>>> I think it goes through the regular constant pool handling. >>>> >>>> Richard. >>> >>> >>> avr doesn't use constant pools because they are not profitable for >>> several reasons. >>> >>> Moreover, it's not sufficient to just patch the pool's section, you'd >>> also have to patch *all* accesses to also use the exact same address >>> space so that they use the correct instruction (LPM instead of LD). >> >> >> Sure. So then not handle it in constant pool handling but in expanding >> the initializers of global vars. I think the entry for this is >> varasm.c:assemble_variable - that sets DECL_RTL for the variable. > > > Expand and RTL is too late. The CSWTCH tables are created by some tree > pass, and it also generated accesses which use the address-space of the > element types. If any tree variable or copy thereof uses generic space, > then you will get wrong code. In the case of CSWTCH, you must get the > AS into value_type as noted by Steven > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49857#c3 > >>> Loop optimization, for example, may move addr-space pointers out of >>> loops and actually does this for some CSWTCH tables. I didn't look >>> into pool handling, but don't expect it allows to consistently >>> patch all accesses in the aftermath. >>> >>> *If* that's possible, then it should also be possible to patch vtables >>> and all of their accesses, aka. >>> >>> https://gcc.gnu.org/PR43745 >>> >>> e.g. in a target-specific tree pass? >> >> >> Ah, ok. I see what you mean. Once the address of a global var is >> taken the pointers refering to it have to use the proper address-space. > > > Yes, and all copies of these pointers etc. generated so far. That's the > point of address spaces. The binary representation of a pointer may > be exact the same for different ASes, but still accesses have to be > handled differently: Different legitimate addresses, different > instructions to access through these pointers, etc. Just converting > a pointer to a different AS will give you wrong code. > >> So yeah, it looks like this would need to be done in a (target specific) >> pass, probably an IPA pass in case the variables are used from multiple >> functions. > > > It would at least be an ABI change. If different modules are accessing > the vtable, then they must use the same AS. Having a copy of the vtable > in each module is pointless: If these modules don't agree about the AS > and host vtables in different ASes, we lost from the optimization point: > one module would have it in flash (consuming flash) and a different > module would have it in RAM (consumes RAM and flash to initialize that > RAM at startup). So this would not use 1x flash as intended, and not > 1x flash + 1x RAM like in the non optimized version, but 2x flash + > 1x RAM. Some comdat won't help because that gives wrong code. > >>> With vtables it's basically the same problem, you'd have to patch *all* >>> accesses to match the vtable's address-space. c++ need not to expose >>> address-spaces to the application, handling address-spaces internally >>> would be sufficient. >> >> >> The IPA pass should be able to handle this transparently (vtables are just >> global decls with initializers). > > > Global is bad. If the address of the object escapes, that's be wrong > code because location + all accesses must agree about the AS. > >> Of course the question is whether the linker can handle relocations >> between address-spaces for example? > > > What's the linker to do with it? Maybe there's confusion about what > I mean with "address space". I mean the named address spaces as of > > http://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html > > according to ISO/IEC DTR 18037. It's not in the C-standard and only > supported by gcc as a GNU extension. Suppose the following GNU-C > > extern const char c; > > char readc (const char *pc) > { > return c + *pc; > } > > The generic AS allows direct addressing and addressing via X=r26 for > example: > > readc: > ; (set (reg:HI 26) > ; (reg:HI 24)) > movw r26,r24 ; 18 *movhi/1 > ; (set (reg:QI 25) > ; (mem:QI (reg:HI 26) [S1 A8])) > ld r25,X ; 8 movqi_insn/4 > ; (set (reg:QI 24) > ; (mem:QI (symbol_ref:HI ("c")) [S1 A8])) > lds r24,c ; 9 movqi_insn/4 > ; (set (reg:QI 24) > ; (plus:QI (reg:QI 24) > ; (reg:QI 25))) > add r24,r25 ; 15 addqi3/1 > ret > > This locates c in RAM and *pc must also be located in RAM which > is a waste. Now use AS __flash, i.e. non-volatile memory: > > extern const __flash char c; > > char readc (const __flash char *pc) > { > return c + *pc; > } > > readc: > ; (set (reg:HI 30) > ; (reg:HI 24)) > movw r30,r24 ; 20 *movhi/1 > ; (set (reg:QI 25) > ; (mem:QI (reg:HI 30) [S1 A8 AS1])) > lpm r25,Z ; 7 movqi_insn/4 > ; (set (reg:HI 30) > ; (symbol_ref:HI ("c"))) > ldi r30,lo8(c) ; 17 *movhi/5 > ldi r31,hi8(c) > ; (set (reg:QI 24) > ; (mem:QI (reg:HI 30) [S1 A8 AS1])) > lpm r24,Z ; 8 movqi_insn/4 > ; (set (reg:QI 24) > ; (plus:QI (reg:QI 24) > ; (reg:QI 25))) > add r24,r25 ; 14 addqi3/1 > ret > > And access to __flash must use instruction LPM *,Z. > Direct addressing or using X or Y register are not allowed. > Using Z+const addressing is not allowed. Using LD or LDS on > the same binary address will lead to wrong code, for example will > read from RAM address 0x42 instead of from flash address 0x42. > > How could the linker fix an address to generic AS to one that > goes to non-generic AS? The linker doesn't even have that > information. > > >> As of the patch I think it is too specific (switch-conversion only) >> for my taste. > > > It's actually not restricted to switch-conversion. It can be used for > any object provided > > 1) The object is in static storage and located in that AS. > > 2) All accesses to that object use that AS. > > 3) The object is read-only, i.e. not modified after load-time. > >> What you can do I think is in assemble_variable handle the case of >> ! TREE_PUBLIC && ! TREE_ADDRESSABLE -- non-exported, non-address >> taken variables can be put into a different address-space transparently >> by adjusting their type to reflect the changed address-space. > > > /* Assemble everything that is needed for a variable or function > declaration. > Not used for automatic variables, and not used for function definitions. > Should not be called for variables of incomplete structure type. > > TOP_LEVEL is nonzero if this variable has file scope. > AT_END is nonzero if this is the special handling, at end of compilation, > to define things that have had only tentative definitions. > DONT_OUTPUT_DATA if nonzero means don't actually output the > initial value (that will be done by the caller). */ > > void > assemble_variable (tree decl, int top_level ATTRIBUTE_UNUSED, > int at_end ATTRIBUTE_UNUSED, int dont_output_data) > { > > So I can patch TYPE_ADDR_SPACE of /decl/ and this will by magic > diffuse to all places that access it? Wow!
For the ! TREE_ADDRESSABLE case yes, I think so (of course you need to double-check...). ! TREE_PUBLIC is because otherwise you're not seeing all accesses and thus TREE_ADDRESSABLE cannot be trusted. TREE_STATIC should be always set in this function. >> This should catch the switch-conversion case that didn't run into the >> loop case. It won't fix vtables I think because they are always exported >> unless you use LTO, they also end up address-taken I think. >> >> For vtables (a bigger chunk than switch-conversion decls) adjusting >> things in the C++ FE is probably a good idea if it helps AVR. >> >> Richard. > > > It don't think the C++ FE has an understanding of ASes at all. So even > if it doesn't ICE or error, it's likely non-generic ASes are not > preserved / respected during FE transformations. For now I'd say > that when you use C+ +/ vtables on such a µC one must live with the > consequences. Ah - I totally forgot C++ doesn't handle address-spaces... Richard. > Johann