GCC porting tutorials
Hello, My name is Radu Hobincu, I am part of a team at "Politehnica" University of Bucharest that is developing a massive parallel computing architecture and currently my job is to port the GCC compiler to this new machine. I've been looking over the GCC official site at http://gcc.gnu.org/ but I couldn't find an official porting tutorial. Is there such a thing? And maybe a small example for a lightweight architecture? Regards, Radu
Re: GCC porting tutorials
Hello again and thank you a lot for the quick replies! I am impressed by the number of mails I got in such a short time. You helped us loads. I will also try to document our work every step of the way, maybe it will help someone else in the future. Regards, Radu
GCC porting questions
Hello again, I have written here a few weeks ago regarding some tutorials on GCC porting and got some very interesting replies. However, I seem to have gotten stuck with a couple of issues in spite of my massive Googling, and I was wondering if anyone could spare a couple of minutes for some clarifications. I am having troubles with the condition codes (cc_status). I have looked over a couple of architectures and I do not seem to understand how they work. The machine I am porting GCC for has 1 4bit status register for carry, zero, less than and equal. I do not have explicit comparison instructions, all of the ALU instructions modify one or more flags. What I figured out so far looking over AVR and Cris machine descriptions is that each instruction that modifies the flags contain an attr declaration which specify what flags it is changing. Also, there is a macro called NOTICE_UPDATE_CC which sets up the cc_status accordingly by reading this attr. This is the part of the code I do not understand. There are certain functions for which I could not find any descriptions, like "single_set" and macros like "SET_DEST" and "SET_SRC". Also, looking over conditions.h, I see that the CC_STATUS structure contains 2 rtx fields: "value1" and "value2", and also an int called "flags". What do they represent? Is "flags" the contents of the machine's flag register? Thanks in advance, Radu
Re: GCC porting questions
Thanks for the reply. I scrolled a lot through the i386 md and c files. I notice that the i386 architecture has dedicated instructions for comparing values and ALU instructions only specify (clobber (reg:CC FLAGS_REG)). What I do not understand is how they specify the way ALU instructions affect the flags. In order to set the flags, I am trying something like this: (define_expand "addsi3" [( parallel [(set (match_operand:SI 0 "register_operand" "") (plus:SI (match_operand:SI 1 "register_operand" "") (match_operand:SI 2 "nonmemory_operand" "")) ) (set (reg:CC FLAGS_REG) (compare:SI (match_dup 1) (match_dup 2)))] )] "" " if(GET_CODE(operands[2])==CONST_INT && INTVAL(operands[2])==1){ emit_insn(gen_inc(operands[0], operands[1])); DONE; } " ) and to use them: (define_insn "beq" [(set (pc) (if_then_else (eq:SI (reg:CC FLAGS_REG) (const_int 0)) (label_ref (match_operand 0 "" "")) (pc) ) )] "" "jeq \\t%l0" ) but that does not look right. The carry and zero flags should be set after the operation and the less than and equal, before the sum is done, since the destination register can just as well be the same with one of the sources. The "parallel" statement, afaik, tells the compiler to evaluate the operands first, then execute both insns which means that all flags will be set with the state of the operands before the operation. I am probably a bit confused about the compiler behavior since I am thinking more like in the way of machine execution. The compiler doesn't know the values of the operands at compile time, so it doesn't really set any flags in the condition register. How does it work then? Sorry for the large text and thanks again for your time. > "Radu Hobincu" writes: > >> I have written here a few weeks ago regarding some tutorials on GCC porting and got some very interesting replies. However, I seem to have gotten stuck with a couple of issues in spite of my massive Googling, and >> I was wondering if anyone could spare a couple of minutes for some clarifications. >> I am having troubles with the condition codes (cc_status). I have looked >> over a couple of architectures and I do not seem to understand how they work. >> The machine I am porting GCC for has 1 4bit status register for carry, zero, less than and equal. I do not have explicit comparison >> instructions, >> all of the ALU instructions modify one or more flags. >> What I figured out so far looking over AVR and Cris machine descriptions >> is that each instruction that modifies the flags contain an attr declaration which specify what flags it is changing. Also, there is a macro called NOTICE_UPDATE_CC which sets up the cc_status accordingly by >> reading this attr. This is the part of the code I do not understand. There >> are certain functions for which I could not find any descriptions, like "single_set" and macros like "SET_DEST" and "SET_SRC". Also, looking over >> conditions.h, I see that the CC_STATUS structure contains 2 rtx fields: "value1" and "value2", and also an int called "flags". What do they represent? Is "flags" the contents of the machine's flag register? > > For a new port I recommend that you avoid cc0, cc_status, and > NOTICE_UPDATE_CC. Instead, model the condition codes as 1 or 4 > pseudo-registers. In your define_insn statements, include SET > expressions which show how the condition code is updated. This is how the i386 backend works; see uses of FLAGS_REG in i386.md. > > As far as things like single_set, SET_DEST, and SET_SRC, you have reached the limits of the internal documentation. You have to open the source code and look at the comments. Similarly, the description of the CC_STATUS fields may be found in the comments above the > definition of CC_STATUS in conditions.h. > > Ian >
Re: GCC porting questions
Hello again, I managed to get the thing working and I have two last issues to solve. 1. My machine does not have any kind of floating point instructions. When I write in the C source code float f = 0.5f; The compiler crashes with "Segmentation fault". Running a gdb on it, the output becomes Program received signal SIGSEGV, Segmentation fault. 0x775e343a in vfprintf () from /lib/libc.so.6 I do not have a multiply instruction either but when I write "int c = a * b;" the compiler properly inserts a LIBCALL to __mulsi3. Any idea what to do with the float? 2. When I try "char c = 'c';", the compiler fails an assert: test0.c:17: internal compiler error: in emit_move_multi_word, at expr.c:3273 This is strange since a char is smaller than an int, it should not be calling emit_move_MULTI_word. I have #define UNITS_PER_WORD 4 #define MIN_UNITS_PER_WORD 1 /* Promote those modes that are smaller than an int, to int mode. */ #define PROMOTE_MODE(MODE, UNSIGNEDP, TYPE) \ ((GET_MODE_CLASS (MODE) == MODE_INT \ && GET_MODE_SIZE (MODE) < UNITS_PER_WORD) \ ? (MODE) = SImode : 0) in my header file. Again, I do not know how to proceed. Thank you again for your time, R.
Re: GCC porting questions
> "Radu Hobincu" writes: > >> The compiler crashes with "Segmentation fault". > >> 2. When I try "char c = 'c';", the compiler fails an assert: > > It's time to break out the debugger and look at the source code and > figure out what the compiler is doing. Neither of these problems ring > any sort of bell for me. > > Ian > All right, thank you again and sorry for the spam. R.
GCC RTX generation question
Hello, I wrote here before a few months ago, I'm trying to port GCC to a simple RISC machine and I have two problems I don't seem to be able to fix. I'm using gcc 4.4.3 for both compiling and as source code. 1. I have the following code: --- extern void doSmth(); void bugTest(){ doSmth(); } --- It compiles fine with -O0, but when I try to use -O3, I get the following compiler error: - test0.c:13: error: unrecognizable insn: (call_insn 7 6 8 3 test0.c:12 (call (mem:SI (mem:SI (reg/f:SI 41) [0 S4 A32]) [0 S4 A32]) (const_int 0 [0x0])) -1 (nil) (nil)) test0.c:13: internal compiler error: in extract_insn, at recog.c:2048 - I don't understand why the compiler generates (call (mem (mem (reg) )))... and also, I was under the impression that any address should checked by the GO_IF_LEGITIMATE_ADDRESS macro, but I checked and the macro doesn't receive a (mem (reg)) rtx to verify. This is most likely a failure of my part to describe something correctly, but the error message isn't very explicit. 2. I have another piece of code that fails to compile with -O3. - struct desc{ int int1; int int2; int int3; }; int bugTest(struct desc *tDesc){ return *((int*)(tDesc->int1 + 16)); } -- This time the compiler crashes with a segmentation fault. From what I could dig up with gdb, the compilers tries to make a LIBCALL for a memcopy, but I'm not really sure why. At the end is the back-trace of the crash. If someone could give me a hint or two, it would be greatly appreciated. Thanks, Radu assign_temp (type_or_decl=0x0, keep=0, memory_required=1, dont_promote=1) at ../../gcc-4.4.3/gcc/function.c:889 889 if (DECL_P (type_or_decl)) (gdb) bt #0 assign_temp (type_or_decl=0x0, keep=0, memory_required=1, dont_promote=1) at ../../gcc-4.4.3/gcc/function.c:889 #1 0x081312cd in emit_push_insn (x=0xb7d0a5c0, mode=SImode, type=0x0, size=0xb7c912d8, align=8, partial=0, reg=0x0, extra=0, args_addr=0xb7c92290, args_so_far=0xb7c912b8, reg_parm_stack_space=0, alignment_pad=0xb7c912b8) at ../../gcc-4.4.3/gcc/expr.c:3756 #2 0x080cf0cb in emit_library_call_value_1 (retval=, orgfun=, value=, fn_type=LCT_NORMAL, outmode=VOIDmode, nargs=3, p=0xbfffef60 "\300\245з\006") at ../../gcc-4.4.3/gcc/calls.c:3701 #3 0x080cf8ed in emit_library_call (orgfun=0xb7cce7a0, fn_type=LCT_NORMAL, outmode=VOIDmode, nargs=3) at ../../gcc-4.4.3/gcc/calls.c:3952 #4 0x08124d31 in expand_assignment (to=0xb7c940f0, from=0xb7c9a5a0, nontemporal=0 '\000') at ../../gcc-4.4.3/gcc/expr.c:4381 #5 0x08126803 in expand_expr_real_1 (exp=0xb7c95750, target=, tmode=, modifier=EXPAND_NORMAL, alt_rtl=0x0) at ../../gcc-4.4.3/gcc/expr.c:9257 #6 0x0812b4ed in expand_expr_real (exp=0xb7c95750, target=0xb7c912b8, tmode=VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0) at ../../gcc-4.4.3/gcc/expr.c:7129 #7 0x0823bd9b in expand_expr (exp=0xb7c95750) at ../../gcc-4.4.3/gcc/expr.h:539 #8 expand_expr_stmt (exp=0xb7c95750) at ../../gcc-4.4.3/gcc/stmt.c:1352
Re: GCC RTX generation question
> "Radu Hobincu" writes: > >> 1. I have the following code: >> >> --- >> extern void doSmth(); >> >> void bugTest(){ >> doSmth(); >> } >> --- >> >> It compiles fine with -O0, but when I try to use -O3, I get the >> following >> compiler error: >> >> - >> test0.c:13: error: unrecognizable insn: >> (call_insn 7 6 8 3 test0.c:12 (call (mem:SI (mem:SI (reg/f:SI 41) [0 S4 >> A32]) [0 S4 A32]) >> (const_int 0 [0x0])) -1 (nil) >> (nil)) >> test0.c:13: internal compiler error: in extract_insn, at recog.c:2048 >> - >> >> I don't understand why the compiler generates (call (mem (mem (reg) >> )))... >> and also, I was under the impression that any address should checked by >> the GO_IF_LEGITIMATE_ADDRESS macro, but I checked and the macro doesn't >> receive a (mem (reg)) rtx to verify. This is most likely a failure of my >> part to describe something correctly, but the error message isn't very >> explicit. > > This looks like gcc is loading the function address from memory. Is > that required for your target? Assuming it is, then the problem seems > to be that the operand predicate for your call instruction accepts > (mem:SI (mem:SI (reg:SI 41))). That seems odd. Thank you, you are right, the description of "call" was way off. Fixed it and it works now with any optimization level. >> 2. I have another piece of code that fails to compile with -O3. >> >> - >> struct desc{ >> int int1; >> int int2; >> int int3; >> }; >> >> int bugTest(struct desc *tDesc){ >> return *((int*)(tDesc->int1 + 16)); >> } >> -- > > That code looks awfully strange. Is that an integer or a pointer? > >> This time the compiler crashes with a segmentation fault. From what I >> could dig up with gdb, the compilers tries to make a LIBCALL for a >> memcopy, but I'm not really sure why. At the end is the back-trace of >> the >> crash. > > gcc is invoking memmove. This is happening in the return statement. > For some reason gcc thinks that the function returns a struct. Your > example does not return a struct.. I can not explain this. Ok, after changing both PARM_BOUNDARY and STACK_BOUNDARY from 8 to 32, now the compiler no longer crashes with segmentation fault, but it still generates a memmove syscall. To explain the code, I have a structure holding some info about a serial interface. One of the fields of the structure is the base address at which the serial is mapped in the main memory. Offseted by 16 bytes is the address from which I can read the available byte count received by the serial. It would probably be a better practice to define the base as (*int) rather than (int) but this should work as well. I tried both return *((int*)tDesc->int1 + 4); return *((int*)(tDesc->int1 + 16)); The result is the same: a system call. Is this in any way related to the back-end definition which I might have done wrong, or is it middle-end related? Regards, Radu
Re: GCC RTX generation question
> "Radu Hobincu" writes: > >>>> 2. I have another piece of code that fails to compile with -O3. >>>> >>>> - >>>> struct desc{ >>>>int int1; >>>>int int2; >>>>int int3; >>>> }; >>>> >>>> int bugTest(struct desc *tDesc){ >>>>return *((int*)(tDesc->int1 + 16)); >>>> } >>>> -- >>> >>> That code looks awfully strange. Is that an integer or a pointer? >>> >>>> This time the compiler crashes with a segmentation fault. From what I >>>> could dig up with gdb, the compilers tries to make a LIBCALL for a >>>> memcopy, but I'm not really sure why. At the end is the back-trace of >>>> the >>>> crash. >>> >>> gcc is invoking memmove. This is happening in the return statement. >>> For some reason gcc thinks that the function returns a struct. Your >>> example does not return a struct.. I can not explain this. >> >> Ok, after changing both PARM_BOUNDARY and STACK_BOUNDARY from 8 to 32, >> now >> the compiler no longer crashes with segmentation fault, but it still >> generates a memmove syscall. >> >> To explain the code, I have a structure holding some info about a serial >> interface. One of the fields of the structure is the base address at >> which >> the serial is mapped in the main memory. Offseted by 16 bytes is the >> address from which I can read the available byte count received by the >> serial. It would probably be a better practice to define the base as >> (*int) rather than (int) but this should work as well. I tried both >> >> return *((int*)tDesc->int1 + 4); >> return *((int*)(tDesc->int1 + 16)); >> >> The result is the same: a system call. Is this in any way related to the >> back-end definition which I might have done wrong, or is it middle-end >> related? > > I don't know. There is something very odd about the fact that gcc > thinks that you are returning a struct when you are actually returning > an int. In particular, as far as I can see, cfun->returns_struct is > true. I think you need to try to figure out why that is happening. > > Ian > Ok, thanks again for pointing me in the right direction. It seems that I declared the FUNCTION_VALUE_REGNO_P as register 12, but I didn't specify it as a CALL_USED_REGISTERS. So the compiler tried to return the value in memory. Since the returned value was something that was supposed to be read from memory, it probably decided to use memmove to copy the 4 bytes of the int pointer from the return statement to the stack (not sure if it's faster than a read and a write with an additional general register tho). Anyway, thank you! Radu
Dedicated logical instructions
Hello again, I have another, quick question: I have dedicated logical instructions in my RISC machine (lt - less than, gt - greater than, ult - unsigned less than, etc.). I'm also working on adding instructions for logical OR, AND, NOT, XOR. While reading GCC internals, I've stumbled on this: "Except when they appear in the condition operand of a COND_EXPR, logical and and or operators are simplified as follows: a = b && c becomes T1 = (bool)b; if (T1) T1 = (bool)c; a = T1;" I really, really don't want this. Is there any way I can define the instructions in the .md file so the compiler generates code for computing a boolean expression without using branches (using these dedicated insns)? Regards, Radu
Re: Dedicated logical instructions
Thank you, that worked out eventually. However, now I have another problem. I have 2 instructions in the ISA: 'where' and 'endwhere' which modify the behavior of the instructions put in between them. I made a macro with inline assembly for each of them. The problem is that since `endwhere` doesn't have any operands and doesn't clobber any registers, the GCC optimization reorders it and places the `endwhere` immediately after `where` leaving all the instructions outside the block. A hack-solution came in mind, and that is specifying that the asm inline uses all the registers as operands without actually placing them in the instruction mnemonic. The problem is I don't know how to write that especially when I don't know the variables names (I want to use the same macro in more than one place). These are the macros: #define WHERE(_condition) \ __asm__ __volatile__("move %0 %0, wherenz 0xf" \ : \ : "v" (_condition) \ ); #define ENDWHERE\ __asm__ __volatile__("nop, endwhere"); This is the C code: vector doSmth(vector a, vector b){ WHERE(LT(a, b)) a++; ENDWHERE return a; } And this is what cc1 -O3 outputs: ;# 113 "/home/rhobincu/svnroot/connex/trunk/software/gcc/examples/include/connex.h" 1 lt R31 R16 R17 ;# 4 "/home/rhobincu/svnroot/connex/trunk/software/gcc/examples/test0.c" 1 move R31 R31, wherenz 0xf ;# 6 "/home/rhobincu/svnroot/connex/trunk/software/gcc/examples/test0.c" 1 nop, endwhere iadd R31 R16 1 You can see that the `nop,endwhere` and the `iadd ...` insns are inverted. I think this is similar to having instructions for enabling and disabling interrupts: the instructions have no operands, but the compiler shouldn't move the block in between them for optimization. Thank you, and please, if I waste too much of your time with random questions, tell me and I will stop. :) Regards, R. > "Radu Hobincu" writes: > >> I have another, quick question: I have dedicated logical instructions in >> my RISC machine (lt - less than, gt - greater than, ult - unsigned less than, etc.). I'm also working on adding instructions for logical OR, AND, >> NOT, XOR. While reading GCC internals, I've stumbled on this: >> "Except when they appear in the condition operand of a COND_EXPR, logical >> `and` and `or` operators are simplified as follows: a = b && c >> becomes >> T1 = (bool)b; >> if (T1) >> T1 = (bool)c; >> a = T1;" >> I really, really don't want this. Is there any way I can define the instructions in the .md file so the compiler generates code for computing >> a boolean expression without using branches (using these dedicated insns)? > > That is the only correct way to implement && and || in C, C++, and other similar languages. The question you should be asking is whether gcc will be able to put simple cases without side effects back together again. The answer is that, yes, it should be able to do that. > > You should not worry about this level of things when it comes to writing your backend port. Language level details like this are handled by the frontend, not the backend. When your port is working, come back to this and make sure that you get the kind of code you want. > > Ian >
Re: Dedicated logical instructions
> "Radu Hobincu" writes: > >> However, now I have another problem. I have 2 instructions in the ISA: >> 'where' and 'endwhere' which modify the behavior of the instructions put >> in between them. I made a macro with inline assembly for each of them. >> The >> problem is that since `endwhere` doesn't have any operands and doesn't >> clobber any registers, the GCC optimization reorders it and places the >> `endwhere` immediately after `where` leaving all the instructions >> outside >> the block. > > That's tricky in general. You want an absolute barrier, but gcc doesn't > really provide one that can be used in inline asm. The closest you can > come is by adding a clobber of "memory": > asm volatile ("xxx" : /* outputs */ : /* inputs */ : "memory"); > That will block all instructions that load or store from memory from > moving across the barrier. However, it does not currently block > register changes from moving across the barrier. I don't know whether > that matters to you. It does matter unfortunately. I've tried with memory clobber with the same result (the addition in the example doesn't do any memory loads/stores). > You didn't really describe what these instructions do, but they sound > like looping instructions which ideally gcc would generate itself. They > have some similarity to the existing doloop pattern, q.v. If you can > get gcc to generate the instructions itself, then it seems to me that > you will get better code in general and you won't have to worry about > this issue. > > Ian > I have 16 vectorial registers in the machine R16-R31 which all have 128 cells of 16 bits each. These support ALU operations and load/stores just as normal registers, but in one clock. So an add R16 R17 R18 will add the whole R17 array with R18 (corresponding cells) and place the result in R16. The 'where' instruction places a mask on the array so the operation is done only where a certain condition is met. In the example in the previous e-mail, where `a` is less than `b`. I've read the description of doloop and I don't think I can use it in this case. I'll have to dig more or settle with -O0 and cry. Thank you, anyway! R.