Re: Could we start accepting rich-text postings on the gcc lists?
Hi, On Fri, Nov 23, 2012 at 12:12:17PM -0800, Andrew Pinski wrote: > On Fri, Nov 23, 2012 at 11:53 AM, Diego Novillo wrote: > > In this day and age of rich-text capable mailers, restricting postings > > to be text-only seems quaint and antiquated. Are there any hard > > requirements that force us to only accept plain text messages? > > I think it is a bad idea to accept non plain text messages (except for > attachments). > Just for the record, I'm also against rich text reaching gcc mailing lists. I use mutt for work email and consider such messages irritating in one or (usually) more ways whenever I get them. Martin
The Linux binutils 2.23.51.0.6 is released
This is the beta release of binutils 2.23.51.0.6 for Linux, which is based on binutils 2012 1123 in CVS on sourceware.org plus various changes. It is purely for Linux. All relevant patches in patches have been applied to the source tree. You can take a look at patches/README to see what have been applied and in what order they have been applied. Starting from the 2.21.51.0.3 release, you must remove .ctors/.dtors section sentinels when building glibc or other C run-time libraries. Otherwise, you will run into: http://sourceware.org/bugzilla/show_bug.cgi?id=12343 Starting from the 2.21.51.0.2 release, BFD linker has the working LTO plugin support. It can be used with GCC 4.5 and above. For GCC 4.5, you need to configure GCC with --enable-gold to enable LTO plugin support. Starting from the 2.21.51.0.2 release, binutils fully supports compressed debug sections. However, compressed debug section isn't turned on by default in assembler. I am planning to turn it on for x86 assembler in the future release, which may lead to the Linux kernel bug messages like WARNING: lib/ts_kmp.o (.zdebug_aranges): unexpected non-allocatable section. But the resulting kernel works fine. Starting from the 2.20.51.0.4 release, no diffs against the previous release will be provided. You can enable both gold and bfd ld with --enable-gold=both. Gold will be installed as ld.gold and bfd ld will be installed as ld.bfd. By default, ld.bfd will be installed as ld. You can use the configure option, --enable-gold=both/gold to choose gold as the default linker, ld. IA-32 binary and X64_64 binary tar balls are configured with --enable-gold=both/ld --enable-plugins --enable-threads. Starting from the 2.18.50.0.4 release, the x86 assembler no longer accepts fnstsw %eax fnstsw stores 16bit into %ax and the upper 16bit of %eax is unchanged. Please use fnstsw %ax Starting from the 2.17.50.0.4 release, the default output section LMA (load memory address) has changed for allocatable sections from being equal to VMA (virtual memory address), to keeping the difference between LMA and VMA the same as the previous output section in the same region. For .data.init_task : { *(.data.init_task) } LMA of .data.init_task section is equal to its VMA with the old linker. With the new linker, it depends on the previous output section. You can use .data.init_task : AT (ADDR(.data.init_task)) { *(.data.init_task) } to ensure that LMA of .data.init_task section is always equal to its VMA. The linker script in the older 2.6 x86-64 kernel depends on the old behavior. You can add AT (ADDR(section)) to force LMA of .data.init_task section equal to its VMA. It will work with both old and new linkers. The x86-64 kernel linker script in kernel 2.6.13 and above is OK. The new x86_64 assembler no longer accepts monitor %eax,%ecx,%edx You should use monitor %rax,%ecx,%edx or monitor which works with both old and new x86_64 assemblers. They should generate the same opcode. The new i386/x86_64 assemblers no longer accept instructions for moving between a segment register and a 32bit memory location, i.e., movl (%eax),%ds movl %ds,(%eax) To generate instructions for moving between a segment register and a 16bit memory location without the 16bit operand size prefix, 0x66, mov (%eax),%ds mov %ds,(%eax) should be used. It will work with both new and old assemblers. The assembler starting from 2.16.90.0.1 will also support movw (%eax),%ds movw %ds,(%eax) without the 0x66 prefix. Patches for 2.4 and 2.6 Linux kernels are available at http://www.kernel.org/pub/linux/devel/binutils/linux-2.4-seg-4.patch http://www.kernel.org/pub/linux/devel/binutils/linux-2.6-seg-5.patch The ia64 assembler is now defaulted to tune for Itanium 2 processors. To build a kernel for Itanium 1 processors, you will need to add ifeq ($(CONFIG_ITANIUM),y) CFLAGS += -Wa,-mtune=itanium1 AFLAGS += -Wa,-mtune=itanium1 endif to arch/ia64/Makefile in your kernel source tree. Please report any bugs related to binutils 2.23.51.0.6 to hjl.to...@gmail.com and http://www.sourceware.org/bugzilla/ Changes from binutils 2.23.51.0.5: 1. Update from binutils 2012 1123. 2. Fix 64-bit jecxz encoding regression in x86 assembler. PR 14859. 3. Revert an accidental linker change. PR 14862. 4. Fix x32 TLS LD to LE optimization in gold. PR 14858. 5. Add "-z global" option to set DF_1_GLOBAL to ld. 6. Improve ld plugin error handling. 7. Port ld lib32 arrangement from Debian. 8. Properly set the output maxpagesize when rewriting program header. PR 14493. 9. Add additional DF_1_XXX support to readelf. 10. Improve nacl support with separate code segments. 11. Improve macos support. 12. Improve arm support. 13. Improve microblaze support. 14. Improve mips support. 15. Improve ppc support. 16. Improve sparc support. Changes from binutils 2.23.51.0.4: 1. Update from binutils 2012 1110.
Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon
On Mon, Nov 5, 2012 at 2:59 PM, Kenneth Zadeck wrote: > > On 11/04/2012 11:54 AM, Richard Biener wrote: >> >> On Thu, Nov 1, 2012 at 2:10 PM, Richard Sandiford >> wrote: >>> >>> Kenneth Zadeck writes: I would like you to respond to at least point 1 of this email. In it there is code from the rtl level that was written twice, once for the case when the size of the mode is less than the size of a HWI and once for the case where the size of the mode is less that 2 HWIs. my patch changes this to one instance of the code that works no matter how large the data passed to it is. you have made a specific requirement for wide int to be a template that can be instantiated in several sizes, one for 1 HWI, one for 2 HWI. I would like to know how this particular fragment is to be rewritten in this model? It seems that I would have to retain the structure where there is one version of the code for each size that the template is instantiated. >>> >>> I think richi's argument was that wide_int should be split into two. >>> There should be a "bare-metal" class that just has a length and HWIs, >>> and the main wide_int class should be an extension on top of that >>> that does things to a bit precision instead. Presumably with some >>> template magic so that the length (number of HWIs) is a constant for: >>> >>>typedef foo<2> double_int; >>> >>> and a variable for wide_int (because in wide_int the length would be >>> the number of significant HWIs rather than the size of the underlying >>> array). wide_int would also record the precision and apply it after >>> the full HWI operation. >>> >>> So the wide_int class would still provide "as wide as we need" >>> arithmetic, >>> as in your rtl patch. I don't think he was objecting to that. >> >> That summarizes one part of my complaints / suggestions correctly. In >> other >> mails I suggested to not make it a template but a constant over object >> lifetime >> 'bitsize' (or maxlen) field. Both suggestions likely require more thought >> than >> I put into them. The main reason is that with C++ you can abstract from >> where >> wide-int information pieces are stored and thus use the arithmetic / >> operation >> workers without copying the (source) "wide-int" objects. Thus you should >> be able to write adaptors for double-int storage, tree or RTX storage. > > We had considered something along these lines and rejected it. I am not > really opposed to doing something like this, but it is not an obvious > winning idea and is likely not to be a good idea. Here was our thought > process: > > if you abstract away the storage inside a wide int, then you should be able > to copy a pointer to the block of data from either the rtl level integer > constant or the tree level one into the wide int. It is certainly true > that making a wide_int from one of these is an extremely common operation > and doing this would avoid those copies. > > However, this causes two problems: > 1) Mike's first cut at the CONST_WIDE_INT did two ggc allocations to make > the object. it created the base object and then it allocated the array. > Richard S noticed that we could just allocate one CONST_WIDE_INT that had > the array in it. Doing it this way saves one ggc allocation and one > indirection when accessing the data within the CONST_WIDE_INT. Our plan is > to use the same trick at the tree level. So to avoid the copying, you seem > to have to have a more expensive rep for CONST_WIDE_INT and INT_CST. I did not propose having a pointer to the data in the RTX or tree int. Just the short-lived wide-ints (which are on the stack) would have a pointer to the data - which can then obviously point into the RTX and tree data. > 2) You are now stuck either ggcing the storage inside a wide_int when they > are created as part of an expression or you have to play some game to > represent the two different storage plans inside of wide_int. Hm? wide-ints are short-lived and thus never live across a garbage collection point. We create non-GCed objects pointing to GCed objects all the time and everywhere this way. > Clearly this > is where you think that we should be going by suggesting that we abstract > away the internal storage. However, this comes at a price: what is > currently an array access in my patches would (i believe) become a function > call. No, the workers (that perform the array accesses) will simply get a pointer to the first data element. Then whether it's embedded or external is of no interest to them. > From a performance point of view, i believe that this is a non > starter. If you can figure out how to design this so that it is not a > function call, i would consider this a viable option. > > On the other side of this you are clearly correct that we are copying the > data when we are making wide ints from INT_CSTs or CONST_WIDE_INTs.But > this is why we represent data inside of
Re: Unused field in graphds.h struct graph?
On Tue, Nov 20, 2012 at 11:29 PM, Lawrence Crowl wrote: > In graphds.h, struct graph has a field "htab_t indices". > As near as I can tell, it is completely unused. It builds > and tests fine with the field #if'd out. > > Shall I remove the field? Sure. Please make sure to have graphite enabled when building, it is one of the graphds users. Thanks, Richard. > -- > Lawrence Crowl
Re: Dependences for call-preserved regs on exposed pipeline target?
On 11/25/12 23:33, Maxim Kuvyrkov wrote: > You essentially need a fix-up pass just before the end of compilation > (machine-dependent reorg, if memory serves me right) to space instructions > consuming values from CPRs from the CALL_INSNS that set those CPRs. I.e., > for the 99% of compilation you don't care about this restriction, it's only > the very last VLIW bundling and delay slot passes that need to know about it. > > You, probably, want to make the 2nd scheduler pass run as machine-dependent > reorg (as ia64 does) and enable an additional constraint (through scheduling > bypass) for the scheduler DFA to space CALL_INSNs from their consumers for at > least for 2 cycles. One challenge here is that scheduler operates on basic > blocks, and it is difficult to track dependencies across basic block > boundaries. To workaround basic-block scope of the scheduler you could emit > dummy instructions at the beginning of basic blocks that have predecessors > that end with CALL_INSNs. These dummy instructions would set the appropriate > registers (probably just assign the register to itself), and you will have a > bypass (see define_bypass) between these dummy instructions and consumers to > guarantee the 2-cycle delay. Thanks for the advice. We're already on the same page--I have most of what you recommend: I only schedule once from machine_dependent_reorg, after splitting loads/stores, calls/branches into "init" and "fini" phases bound at fixed clock offsets by record_delay_slot_pair(). I already have a fixup pass to handle inter-EBB hazards. (The selective scheduler would handle interblock automatically, but I had trouble with it initially with split load/stores. I want to revisit that.) Regarding CPRs, I strongly desire to avoid kludgy fixups for schedules created with an incomplete dependence graph when the generic scheduler can do the job perfectly with a complete dependence graph. G
Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon
On 11/26/2012 10:03 AM, Richard Biener wrote: On Mon, Nov 5, 2012 at 2:59 PM, Kenneth Zadeck wrote: On 11/04/2012 11:54 AM, Richard Biener wrote: On Thu, Nov 1, 2012 at 2:10 PM, Richard Sandiford wrote: Kenneth Zadeck writes: I would like you to respond to at least point 1 of this email. In it there is code from the rtl level that was written twice, once for the case when the size of the mode is less than the size of a HWI and once for the case where the size of the mode is less that 2 HWIs. my patch changes this to one instance of the code that works no matter how large the data passed to it is. you have made a specific requirement for wide int to be a template that can be instantiated in several sizes, one for 1 HWI, one for 2 HWI. I would like to know how this particular fragment is to be rewritten in this model? It seems that I would have to retain the structure where there is one version of the code for each size that the template is instantiated. I think richi's argument was that wide_int should be split into two. There should be a "bare-metal" class that just has a length and HWIs, and the main wide_int class should be an extension on top of that that does things to a bit precision instead. Presumably with some template magic so that the length (number of HWIs) is a constant for: typedef foo<2> double_int; and a variable for wide_int (because in wide_int the length would be the number of significant HWIs rather than the size of the underlying array). wide_int would also record the precision and apply it after the full HWI operation. So the wide_int class would still provide "as wide as we need" arithmetic, as in your rtl patch. I don't think he was objecting to that. That summarizes one part of my complaints / suggestions correctly. In other mails I suggested to not make it a template but a constant over object lifetime 'bitsize' (or maxlen) field. Both suggestions likely require more thought than I put into them. The main reason is that with C++ you can abstract from where wide-int information pieces are stored and thus use the arithmetic / operation workers without copying the (source) "wide-int" objects. Thus you should be able to write adaptors for double-int storage, tree or RTX storage. We had considered something along these lines and rejected it. I am not really opposed to doing something like this, but it is not an obvious winning idea and is likely not to be a good idea. Here was our thought process: if you abstract away the storage inside a wide int, then you should be able to copy a pointer to the block of data from either the rtl level integer constant or the tree level one into the wide int. It is certainly true that making a wide_int from one of these is an extremely common operation and doing this would avoid those copies. However, this causes two problems: 1) Mike's first cut at the CONST_WIDE_INT did two ggc allocations to make the object. it created the base object and then it allocated the array. Richard S noticed that we could just allocate one CONST_WIDE_INT that had the array in it. Doing it this way saves one ggc allocation and one indirection when accessing the data within the CONST_WIDE_INT. Our plan is to use the same trick at the tree level. So to avoid the copying, you seem to have to have a more expensive rep for CONST_WIDE_INT and INT_CST. I did not propose having a pointer to the data in the RTX or tree int. Just the short-lived wide-ints (which are on the stack) would have a pointer to the data - which can then obviously point into the RTX and tree data. There is the issue then what if some wide-ints are not short lived. It makes me nervous to create internal pointers to gc ed memory. 2) You are now stuck either ggcing the storage inside a wide_int when they are created as part of an expression or you have to play some game to represent the two different storage plans inside of wide_int. Hm? wide-ints are short-lived and thus never live across a garbage collection point. We create non-GCed objects pointing to GCed objects all the time and everywhere this way. Again, this makes me nervous but it could be done. However, it does mean that now the wide ints that are not created from rtxes or trees will be more expensive because they are not going to get their storage "for free", they are going to alloca it. however, it still is not clear, given that 99% of the wide ints are going to fit in a single hwi, that this would be a noticeable win. Clearly this is where you think that we should be going by suggesting that we abstract away the internal storage. However, this comes at a price: what is currently an array access in my patches would (i believe) become a function call. No, the workers (that perform the array accesses) will simply get a pointer to the first data element. Then whether it's embedded or external is of no interest to them. so is your plan that the wide int constructors from rtx or
Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon
On Mon, Nov 26, 2012 at 5:03 PM, Kenneth Zadeck wrote: > On 11/26/2012 10:03 AM, Richard Biener wrote: >> >> On Mon, Nov 5, 2012 at 2:59 PM, Kenneth Zadeck >> wrote: >>> >>> On 11/04/2012 11:54 AM, Richard Biener wrote: On Thu, Nov 1, 2012 at 2:10 PM, Richard Sandiford wrote: > > Kenneth Zadeck writes: >> >> I would like you to respond to at least point 1 of this email. In it >> there is code from the rtl level that was written twice, once for the >> case when the size of the mode is less than the size of a HWI and once >> for the case where the size of the mode is less that 2 HWIs. >> >> my patch changes this to one instance of the code that works no matter >> how large the data passed to it is. >> >> you have made a specific requirement for wide int to be a template >> that >> can be instantiated in several sizes, one for 1 HWI, one for 2 HWI. >> I >> would like to know how this particular fragment is to be rewritten in >> this model? It seems that I would have to retain the structure where >> there is one version of the code for each size that the template is >> instantiated. > > I think richi's argument was that wide_int should be split into two. > There should be a "bare-metal" class that just has a length and HWIs, > and the main wide_int class should be an extension on top of that > that does things to a bit precision instead. Presumably with some > template magic so that the length (number of HWIs) is a constant for: > > typedef foo<2> double_int; > > and a variable for wide_int (because in wide_int the length would be > the number of significant HWIs rather than the size of the underlying > array). wide_int would also record the precision and apply it after > the full HWI operation. > > So the wide_int class would still provide "as wide as we need" > arithmetic, > as in your rtl patch. I don't think he was objecting to that. That summarizes one part of my complaints / suggestions correctly. In other mails I suggested to not make it a template but a constant over object lifetime 'bitsize' (or maxlen) field. Both suggestions likely require more thought than I put into them. The main reason is that with C++ you can abstract from where wide-int information pieces are stored and thus use the arithmetic / operation workers without copying the (source) "wide-int" objects. Thus you should be able to write adaptors for double-int storage, tree or RTX storage. >>> >>> We had considered something along these lines and rejected it. I am not >>> really opposed to doing something like this, but it is not an obvious >>> winning idea and is likely not to be a good idea. Here was our thought >>> process: >>> >>> if you abstract away the storage inside a wide int, then you should be >>> able >>> to copy a pointer to the block of data from either the rtl level integer >>> constant or the tree level one into the wide int. It is certainly true >>> that making a wide_int from one of these is an extremely common operation >>> and doing this would avoid those copies. >>> >>> However, this causes two problems: >>> 1) Mike's first cut at the CONST_WIDE_INT did two ggc allocations to >>> make >>> the object. it created the base object and then it allocated the array. >>> Richard S noticed that we could just allocate one CONST_WIDE_INT that had >>> the array in it. Doing it this way saves one ggc allocation and one >>> indirection when accessing the data within the CONST_WIDE_INT. Our plan >>> is >>> to use the same trick at the tree level. So to avoid the copying, you >>> seem >>> to have to have a more expensive rep for CONST_WIDE_INT and INT_CST. >> >> I did not propose having a pointer to the data in the RTX or tree int. >> Just >> the short-lived wide-ints (which are on the stack) would have a pointer to >> the data - which can then obviously point into the RTX and tree data. > > There is the issue then what if some wide-ints are not short lived. It makes > me nervous to create internal pointers to gc ed memory. I thought they were all short-lived. >>> 2) You are now stuck either ggcing the storage inside a wide_int when >>> they >>> are created as part of an expression or you have to play some game to >>> represent the two different storage plans inside of wide_int. >> >> Hm? wide-ints are short-lived and thus never live across a garbage >> collection >> point. We create non-GCed objects pointing to GCed objects all the time >> and everywhere this way. > > Again, this makes me nervous but it could be done. However, it does mean > that now the wide ints that are not created from rtxes or trees will be more > expensive because they are not going to get their storage "for free", they > are going to alloca it. No, those would simply use the embedded stor
Re: RFC - Alternatives to gengtype
On Sun, Nov 25, 2012 at 10:45 AM, Richard Biener wrote: > On Sun, Nov 25, 2012 at 4:21 PM, Diego Novillo wrote: >> On Sun, Nov 25, 2012 at 10:09 AM, Richard Biener >> wrote: >> >>> I'd say the most pragmatic solution is to stick with gengtype but >>> make it more dependent on annotations (thus, explicit). That is, >> >> Yes. That is the direction in which I've been leaning towards. My >> preference is to transitionally move to manual markers >> (http://gcc.gnu.org/wiki/cxx-conversion/gc-alternatives#Do_GC_marking_manually) >> and over time transition to memory pool management. > > Note that the most GCed thing is a 'tree' and the solution is not > to move trees to memory pools but to use less trees in the first place! True, but you are describing an orthogonal problem. There are other data structures in GC. Long term, I would like to move all of them out of GC. > Improving things wrt tree usage also means to isolate C/C++ frontend > IL from the middle-end. I once proposed to cp tree.[ch] and at gimplification > time re-allocate and copy from the frontend tree "kind" to the gimple > tree "kind". > Of course our FE / middle-end separation enemy (debug info) makes this not > so viable at the moment. Right. > >>> I suppose I agree that garbage collection is not technically >>> required for writing a compiler, but getting rid of GC in GCC >>> entirely will be a hard and error-prone task (even if you >>> factor out PCH which is an entirely different mess). >> >> Agreed. As far as PCH is concerned, my preferred long term approach >> is to move to streamable types. We have an almost working >> implementation in the PPH branch and we already have a streaming >> framework in LTO. > > Of course that's not all we preserve in PCH ... (look for "interesting" global > data marked as GC root just for the sake of PCH). That's fine. We can stream that data as well. Identifying all that is also helpful to realize just how much loose global state we have. Coalescing that global state would be a good cleanup too. Diego.
Re: Time for GCC 5.0? (TIC)
> Marketing loves high numbers after all! If you truly think this way, we're going to have to revoke your hacker's license ;-)
Re: Dependences for call-preserved regs on exposed pipeline target?
On 27/11/2012, at 4:34 AM, Greg McGary wrote: > On 11/25/12 23:33, Maxim Kuvyrkov wrote: >> You essentially need a fix-up pass just before the end of compilation >> (machine-dependent reorg, if memory serves me right) to space instructions >> consuming values from CPRs from the CALL_INSNS that set those CPRs. I.e., >> for the 99% of compilation you don't care about this restriction, it's only >> the very last VLIW bundling and delay slot passes that need to know about it. >> >> You, probably, want to make the 2nd scheduler pass run as machine-dependent >> reorg (as ia64 does) and enable an additional constraint (through scheduling >> bypass) for the scheduler DFA to space CALL_INSNs from their consumers for >> at least for 2 cycles. One challenge here is that scheduler operates on >> basic blocks, and it is difficult to track dependencies across basic block >> boundaries. To workaround basic-block scope of the scheduler you could emit >> dummy instructions at the beginning of basic blocks that have predecessors >> that end with CALL_INSNs. These dummy instructions would set the >> appropriate registers (probably just assign the register to itself), and you >> will have a bypass (see define_bypass) between these dummy instructions and >> consumers to guarantee the 2-cycle delay. > > Thanks for the advice. We're already on the same page--I have most of what > you > recommend: I only schedule once from machine_dependent_reorg, after splitting > loads/stores, calls/branches into "init" and "fini" phases bound at fixed > clock > offsets by record_delay_slot_pair(). I already have a fixup pass to handle > inter-EBB hazards. (The selective scheduler would handle interblock > automatically, but I had trouble with it initially with split load/stores. I > want > to revisit that.) Regarding CPRs, I strongly desire to avoid kludgy fixups > for > schedules created with an incomplete dependence graph when the generic > scheduler > can do the job perfectly with a complete dependence graph. I wonder if "kludgy fixups" refers to the dummy-instruction solution I mentioned above. The complete dependence graph is a myth. You cannot have a complete dependence graph for a function -- scheduler works on DAG regions (and I doubt it will ever support anything more complex), so you would have to do something to account for inter-region dependencies anyway. It is simpler to have a unified solution that would handle both inter- and intra-region dependencies, rather than implementing two different approaches. -- Maxim Kuvyrkov CodeSourcery / Mentor Graphics
Re: Hash table iterators.
On 11/23/12, Andrew MacLeod wrote: > On 11/22/2012 01:18 PM, Lawrence Crowl wrote: > > I have found that tree-flow.h implements iteration over htab_t, > > while there is no current facility to do that with hash_table. > > Unfortunately, the specific form does not match the standard C++ > > approach to iterators. We have several choices. > > > > (1) Ignore the problem and leave all such tables as htab_t. > > > > (2) Write new hash_table iteration functions to match the form of > > the existing GCC macro/function approach. > > > > (3) Write new hash_table iteration functions to match the form used > > by the C++ standard. This approach would entail modifying the loops. > > > > Diego and I have a preference for (3). What do you prefer? > > I don't like (1) for sure. > > Before deciding a preference between (2) and (3), what are the > actual differences? ie, is (2) doing something practical that > (3) has to bend over for, or is (3)'s format better but wasn't > practical before? is (2) otherwise useful going forward? For iterating over a hash table containing elements of type T, (2) The for statement is parameterized by an iterator variable and a variable of type T. The loop copies the element into the T variable, and that variable is used in the body. (3) The for statement is parameterized only by an iterator variable. The loop uses "*iterator_variable" to obtain a reference to the element. With (3), we have well-established practice for writing generic algorithms. With (2), we seem to have just for loops. -- Lawrence Crowl
GCC 4.7.2 error handling type short
I have run into a problem with both 4.6.1 and 4.7.2 of the gcc compiler handling type short. Sizeof(unsigned short) returns a length of 2 as expected, but when I use a union of a character buffer and some fields including a unsigned short the value returned is 2 bytes but the buffer pointer is moved 4 bytes. Here is the code for the union of the fs structure with the buffer (the super block structure and union are at the bottom of the listing): /* taken from filsys.h for Intel Xenix */ #defineu_shortunsigned short #definedaddr_tunsigned int #defineino_tunsigned short #definetime_tunsigned int #define FS_CLEAN106 #defineBMAPSIZE994/* Max size of CG bit map */ /* Equals BSIZE-sizeof(struct cylinder)*/ #define MAXCGS80/* Max CG's per filsys */ #defineMAXEXTSIZE32/* Max extent size */ #define FNEWCG64/* When a file grows beyond FNEWCG KB, allocate blocks from a new cylinder group */ #define SNEWCG512/* Move to a new cylinder group after every subsequent SNEWCG KB */ /* * Cylinder group header */ struct cylinder { daddr_tcg_doffset;/* offset to first data block from start of filsys */ daddr_tcg_ioffset;/* offset to first inode block from start of filsys */ u_shortcg_dblocks;/* number of data blocks in cg */ ino_tcg_ifirst;/* next free inode in linked list */ charcg_number;/* cg sequence number in filsys */ charcg_currextent;/* current extent size */ u_shortcg_lowat;/* if free blocks drop below cg_lowat, recompute cg_currextent */ u_shortcg_hiwat;/* if free blocks increase beyond cg_hiwat, recompute cg_currextent */ u_shortcg_erotor;/* position of next candidate block for allocation */ charcg_ilock;/* inode manipulation lock */ charcg_reserved[9];/* reserved field. (9 to align on word boundary) */ charcg_bits[BMAPSIZE];/* bit map. 0 = allocated. 1 = free */ }; /* * Contains global policy information. * Stored in the superblock. */ structcginfo { u_shortfs_cgincore;/* points to buf structure containing cg header. Null if not in core */ daddr_tfs_cgblk;/* disk address of cg header */ u_shortfs_cgffree;/* number of free data blocks in cg */ ino_tfs_cgifree;/* number of free inodes in cg */ ino_tfs_cgdirs;/* number of directories in cg */ }; /* * Super block */ struct filsys { charfs_fname[6];/* file system name */ charfs_fpack[6];/* pack name */ daddr_tfs_fsize;/* number of data blocks in fs */ u_shortfs_cgblocks;/* number of blocks per cg */ daddr_tfs_maxblock;/* max disk block in fs */ ino_tfs_cginodes;/* number of inodes per cg */ ino_tfs_maxino;/* max inumber in fs */ time_tfs_time;/* time last modified */ charfs_fmod;/* modified flag */ charfs_ronly;/* read-only fs */ charfs_clean;/* fs was cleanly unmounted */ charfs_type;/* fs type and version */ u_shortfs_fnewcg;/* contains FNEWCG */ u_shortfs_snewcg;/* contains SNEWCG */ daddr_tfs_ffree;/* number of free data blocks in fs */ ino_tfs_ifree;/* number of free inodes in fs */ ino_tfs_dirs;/* number of directories in fs */ charfs_extentsize;/* native extent size */ charfs_cgnum;/* number of cg's in fs */ charfs_cgrotor;/* next cg to be searched */ charfs_reserved[15];/* reserved. (15 to align on word boundary) */ structcginfo fs_cylinder[MAXCGS];/* contains global policy information per cylinder group */ }; I use this routine to to dump the info from the superblock: void dumpsuper(void) { if (*super.fs.fs_fname) printf("fs_fname = %s\n", super.fs.fs_fname); if (*super.fs.fs_fname) printf("fs_fpack = %s\n", super.fs.fs_fpack); printf("fs_fsize = %d\n", super.fs.fs_fsize); printf("fs_cgblocks = %d\n", super.fs.fs_cgblocks); printf("fs_maxblock = %d\n", super.fs.fs_maxblock); printf("fs_cginodes = %d\n", super.fs.fs_cginodes); printf("fs_maxino = %d\n", super.fs.fs_maxino); printf("len = %d\n", sizeof(unsigned short)); dumphex(1024, 256); } When run, I get this result: MAKI
Re: GCC 4.7.2 error handling type short
On Nov 26, 2012, at 3:57 PM, Bill Beech (NJ7P) wrote: > I have run into a problem with both 4.6.1 and 4.7.2 of the gcc compiler > handling type short. Sizeof(unsigned short) returns a length of 2 as > expected, but when I use a union of a character buffer and some fields > including a unsigned short the value returned is 2 bytes but the buffer > pointer is moved 4 bytes. > ... > As you can see the value at 0410 in the file, 6601 is returned as 358, which > is correct. The 4-byte > value following 67 01 00 00 is not returned for the unsigned int but rather > 00 00 30 00 is returned next (which equals 3145728 decimal). While a > sizeof(unsigned short) returns 2 bytes, in this case the pointer into the > unioned buffer is moved 4 bytes. > > This bug makes it hell to you any of your products to build emulators for the > 16-bit processors. > > Is there a definition for a 16-bit quantity that will work in a union? > > Thanks! > > Bill Beech > NJ7P You meant struct, right, not union? Every field has a size as well as an alignment. The starting address of each field is forced to be a multiple of its alignment. In many cases, for primitive data types (like the various size integers) the alignment equals the size; for example, a 4-byte int has alignment 4. So if you have a struct of short then int, the compiler has to insert 2 bytes of padding before the int to obey the alignment. In some cases, there are types that don't have alignment == sizeof, for example long long int on Intel is size 8 but (by default) alignment 4. Since you mentioned 16-bit processors -- are you talking about a port for a 16-bit processor, where you want int (size 4) to be aligned 2? (For example, that would be sensible on a PDP-11.) If so, you'd want to tell the compiler how to do that; I'm not sure of the details, presumably they are in the GCC Internals manual. Or are you talking about an existing port which has defined the alignment of int to be 4? If so, that might be because unaligned accesses would cause exceptions. Or it may just be a convention. In either case, you can use the "packed" attribute to override the normal alignment of fields. See the GCC documentation for details. paul
Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon
Richard, I spent a good part of the afternoon talking to Mike about this. He is on the c++ standards committee and is a much more seasoned c++ programmer than I am. He convinced me that with a large amount of engineering and c++ "foolishness" that it was indeed possible to get your proposal to POSSIBLY work as well as what we did. But now the question is why would any want to do this? At the very least you are talking about instantiating two instances of wide-ints, one for the stack allocated uses and one for the places where we just move a pointer from the tree or the rtx. Then you are talking about creating connectors so that the stack allocated functions can take parameters of pointer version and visa versa. Then there is the issue that rather than just saying that something is a wide int, that the programmer is going to have to track it's origin. In particular, where in the code right now i say. wide_int foo = wide_int::from_rtx (r1); wide_int bar = wide_int::from_rtx (r2) + foo; now i would have to say wide_int_ptr foo = wide_int_ptr::from_rtx (r1); wide_int_stack bar = wide_int_ptr::from_rtx (r2) + foo; then when i want to call some function using a wide_int ref that function now must be either overloaded to take both or i have to choose one of the two instantiations (presumably based on which is going to be more common) and just have the compiler fix up everything (which it is likely to do). And so what is the payoff: 1) No one except the c++ elite is going to understand the code. The rest of the community will hate me and curse the ground that i walk on. 2) I will end up with a version of wide-int that can be used as a medium life container (where i define medium life as not allowed to survive a gc since they will contain pointers into rtxes and trees.) 3) An no clients that actually wanted to do this!!I could use as an example one of your favorite passes, tree-vrp. The current double-int could have been a medium lifetime container since it has a smaller footprint, but in fact tree-vrp converts those double-ints back into trees for medium storage. Why, because it needs the other fields of a tree-cst to store the entire state. Wide-ints also "suffer" this problem. their only state are the data, and the three length fields. They have no type and none of the other tree info so the most obvious client for a medium lifetime object is really not going to be a good match even if you "solve the storage problem". The fact is that wide-ints are an excellent short term storage class that can be very quickly converted into our two long term storage classes. Your proposal is requires a lot of work, will not be easy to use and as far as i can see has no payoff on the horizon. It could be that there could be future clients for a medium lifetime value, but asking for this with no clients in hand is really beyond the scope of a reasonable review. I remind you that the purpose of these patches is to solve problems that exist in the current compiler that we have papered over for years. If someone needs wide-ints in some way that is not foreseen then they can change it. kenny On 11/26/2012 11:30 AM, Richard Biener wrote: On Mon, Nov 26, 2012 at 5:03 PM, Kenneth Zadeck wrote: On 11/26/2012 10:03 AM, Richard Biener wrote: On Mon, Nov 5, 2012 at 2:59 PM, Kenneth Zadeck wrote: On 11/04/2012 11:54 AM, Richard Biener wrote: On Thu, Nov 1, 2012 at 2:10 PM, Richard Sandiford wrote: Kenneth Zadeck writes: I would like you to respond to at least point 1 of this email. In it there is code from the rtl level that was written twice, once for the case when the size of the mode is less than the size of a HWI and once for the case where the size of the mode is less that 2 HWIs. my patch changes this to one instance of the code that works no matter how large the data passed to it is. you have made a specific requirement for wide int to be a template that can be instantiated in several sizes, one for 1 HWI, one for 2 HWI. I would like to know how this particular fragment is to be rewritten in this model? It seems that I would have to retain the structure where there is one version of the code for each size that the template is instantiated. I think richi's argument was that wide_int should be split into two. There should be a "bare-metal" class that just has a length and HWIs, and the main wide_int class should be an extension on top of that that does things to a bit precision instead. Presumably with some template magic so that the length (number of HWIs) is a constant for: typedef foo<2> double_int; and a variable for wide_int (because in wide_int the length would be the number of significant HWIs rather than the size of the underlying array). wide_int would also record the precision and apply it after the full HWI operation. So the wide_int class would still provide "as wide as we need" arithmetic, as in your rtl patch. I don'
Re: embedded Linux: improvement issues
On 27/11/2012, at 4:51 PM, ETANI NORIKO wrote: > Dear Sirs, > > > I am researching the status quo of embedded Linux and find out your website > of "Embedded Linux Conference 2013". We are looking for the engineer at a > distributor side in order to consult our implementation issues and improve > embedded Linux for our system. We have developed high-level API for many-core > system based on OpenCL sponsored by NEDO in Japan. > > Our development environments are as follows. > PC: Sony VAIO > OS: Windows 7 Professional Service Pack 1 > VM: VMware Player 4.0.3 > HOST: 32-bit Fedora 16 > TARGET: MIPS typed Linux created with GNU Linux GCC and uClibc > > We found out the following 3 vital implementation issues in our development. > 1. MPFR and GMP should be available for "LD" to link some object files and > create a binary file. > The MPFR library is a C library for multiple-precision floating-point > computations with correct rounding. GMP is a free library for arbitrary > precision arithmetic, operating on signed integers, rational numbers, and > floating point numbers. These libraries are installed into GCC compiler. So, > a binary file executed on device core for computing in many-core system > cannot use them because it is created with "LD". It sounds like you want to create a cross-compiler toolchain (binutils, GCC) and use it to generate Linux/uClibc rootfs for your MIPS target. I.e., the compiler will run on x86 and generate code MIPS. Building a cross-toolchain is a difficult task, weeks of work if you don't know exactly what you are doing. Get one of the precompiled packages if you can (google "cross toolchain for MIPS"). The MPFR and GMP libraries are used by the compiler, which is an x86 program, so can simply install these libraries from your Fedora distribution: "yum install gmp-devel mpfr-devel libmpc-devel". Read http://gcc.gnu.org/wiki/InstallingGCC for additional details. The main point is that there are libraries used by the target (e.g., uClibc) and by host (e.g., GMP, MPFR, MPC). > 2. About generation of uClibc, it should be available for a developer to > select some functions among Linux standard library and create uClibc. I don't quite understand what you mean here. > 3. Please tell us how to create our Linux for C++ because we have no > information about it. For this you want to specify "--enable-languages=c,c++" when configuring the compiler. Thank you, -- Maxim Kuvyrkov CodeSourcery / Mentor Graphics
Re: Dependences for call-preserved regs on exposed pipeline target?
On 11/26/12 12:46, Maxim Kuvyrkov wrote: > I wonder if "kludgy fixups" refers to the dummy-instruction solution I > mentioned above. The complete dependence graph is a myth. You cannot have a > complete dependence graph for a function -- scheduler works on DAG regions > (and I doubt it will ever support anything more complex), so you would have > to do something to account for inter-region dependencies anyway. > > It is simpler to have a unified solution that would handle both inter- and > intra-region dependencies, rather than implementing two different approaches. I retract any implication that your bypass proposal is a kludge. I found using bypasses to be very compact and effective. Thanks for the extra nudge. G