Problem in porting GLIBC...
Hello We are trying to port GLIBC 2.2.5 on the ABACUS (processor similar to SPARC) platform. We have done much of the porting work. At this stage we are trying to get the 'ld-2.2.5.so' working. We are facing problems in this. We are trying to run this on again on our own Linux kernel for ABACUS processor. The GOT & PLT generated through the GCC for ld-2.2.5.so is causing the problems. At this moment we are into the main of the user test case. We have jumped on the User Entry through the LD.so There must be some problem in the relocation and now we are unable to trace the flow as we are not having the debugger and we use to have print statememts to debug the LD. Now it is not possible to use the print statements also. Could you please give us some hint about how to proceed further. As I am not member of the group please give me mail on my personal ID, [EMAIL PROTECTED] Thanks & Warm Regards VIVEK
Problem in GCC porting...
Hello All We are trying to fix the already ported GCC 2.95 on the ABACUS processor. ABACUS processor is very much similar to SPARC from SUN. We are facing 2 major problems in the fixing. 1. Variable initialization within block: If we declare and initialize a variable within a block in the function, initialization happens only at the first time, not every time, if code enters that block again. main() { --- --- { int a = 10; a += 10; } --- --- } In the above case if we enter the block inside main for second time or more the value of 'a' get continued as of 'static' variable. 2. Function pointer returning some value: In case of function pointers returning something we are not able to collect the same in the return register or variable. These 2 errors in the compiler is causing problems in other activities also. If anyone could through some light on these issues this will help me a lot. Please send me reply on my ID [EMAIL PROTECTED] as I am not a member of this group. Thanks in anticipation... VIVEK
mirror a GCC mailing list using Google Groups?
I would like to mirror the gfortran mailing list using Google Groups, for reasons described at http://gcc.gnu.org/ml/fortran/2006-10/msg00692.html . Someone suggested I contact the "GCC steering committee" to get feedback, so I am posting here. Vivek Rao
arm-elf-gcc shared flat support
Hi , I am working on Shared flat file support for uClinux (No MMU ARM ).The gcc version I am using is 2.95 and 3.4.0.Theory of operation is similar to that implemented for m68k.One of the major requirement is to call functions via GOT. so a code **c-code** foo() {} main() { foo(); } ** is to be called as compiler output*** ldr r3, .L4 mov lr,pc ldrpc[sl,r3] .L4: .word foo(GOT) ** as opposed to bl foo(PLT) where sl holds the address of GOT.(binfmt_flat loader ensures that before the program start) in gcc 3.4.0 this is some how achived if the function attribute __attribute__((weak)) is specified.But no idea for 2.95 Kindly bail me out on this one. Sincere Thanks in advance. Vivek Tyagi
Re: arm-elf-gcc shared flat support
Hi Richard ,Paul This is the wrong list for these sorts of questions, you should really be asking on gcc-help. The project I am working on require changes to be made in the gcc backend(probably front end too for complete solution).so I thought best to discuss it with developers. Is there some reason why linker-generated PLT sequences aren't a reasonable solution? and for Paul Why on earth do you need to do this? Can't you get the linker to generate PLT sequences like we do for normal shared libraries? As my major work area is ARM uClinux ,My apologies if my explanation on this compiler project is erroneous.Kindly ignore if this is already known. As per my understanding ( Kindly bear with me) the file format for NO MMU ARM in uClinux is "Binary Flat format commonly known as BFLT" (http://www.beyondlogic.org/uClinux/bflt.htm for reference).This format is achieved by running "elf2flt" tool on the ELF file generated by the cross compiler tool chain(here arm-elf-gcc).Now the flat files do not have a PLT .They are very simple format with TEXT:GOT:DATA:RELCO sections(in that order).so IMHO the generic linker modification for PLT sequences cannot be done.This matter was discussed long back in the same list by John Lee,Bernardo Innocenti et al around 11 Jun 2004.. For providing shared Library support for uClinux environment,which is essentially without MMU some changes are required in the toolchain.One approach was developed by snapgear team for m86k (http://www.ucdot.org/article.pl?sid=03/11/25/1126257&mode=thread).But there is no similar implementation (read open source) for ARM..Just to be sure, I raised this issue at uClinux developer forum and received the same answer.So I proceeded to spare some time and work on an opensource implementation for ARM on similar lines that is done for m68k. The pivot is to handle PIC register r10(sl) and the "-R" option in ld. The way it works is the symbol information are copied(via -R) option form the library to the main executable at compile time.The load time values are taken care by the binfmt_flat loader via GOT. For this to work every thing (data variable,functions)should be accessed via GOT so that the copied symbol information can be fixed up by the linker. Taking my original example **c-code** foo() {} main() { foo(); } ** In the actual senario the function foo() would be part of shared library. exe **shared-lib*** extern foo();foo() main() { { } foo(); } exe **shared-lib*** now the object files would be linked as arm-elf-gcc exe.o -Wl,-R,shared-lib.o Here you could see the need of modifying the compiler for generating call to foo() via GOT rather than calling it by "bl" .The -R flag would copy a value for foo() in exe( from shared-lib ) which does not make any sense .On the other hand if foo() is called via GOT ,-R would copy the reloc value for foo() in exe. This is fixed up by the binfmt_flat loader to point to the correct address of foo() in shared-lib (the library is loaded first by the loader...). Hope this explains the requirement of indirect function call The other changes require store/ restore of PIC register for all function calls and loading PIC register with the address of relevant GOT (the GOT address are maintained in a array updated by binfmt_flat loader. I have implemented -mshared-library-id flag for arm. so the gcc -mshared-library-id=1 generates following function prologue and epilogue(for gcc 2.95) mov ip ,sp stmdb sp!,{sl,fp,ip,lr,pc}/*store sl on stack*/ sub sl,sl,#8/*bad hack to update sl ,sl is to be loaded with address of new GOT which lies 2 words before the current location for lib id =1*/ ldr sl,[sl] /*bad hack continues.. ideally this should be one instruction i.e ldr sl,[sl,# - 8] ...its in my TODO list*/ ... ... ldmdb fp,{sl,fp,sp,pc}/*restore sl */ Now I am not sure if this is a very efficient approach.But lack of MMU does not leave us with too many choices.Implementing this for ARM would take care of inefficiency caused in ARM uClinux due to lack of shared Library support.The above mentioned changes can be done for the latest gcc also.But I am facing some relocation issues with the uClibc compiled with 3.4.0 with PIC reg modification hacks.It works fine with 2.95 so I worked with that first. All this is flexible,feel free to add in your suggestions.. Thanks Vivek Tyagi
cross compiling
Are there any ready to use gcc rpms for, host:x-86,redhat9.0 target:alpha thanking you vivek __ Do you Yahoo!? Yahoo! Mail - 250MB free storage. Do more. Manage less. http://info.mail.yahoo.com/mail_250
benchmarks
Hello everybody, I need benchmark programs for my project. Does anybody have or know the links to C benchmarks that can be compiled using gcc? Thanking you, Vivek __ Yahoo! Messenger Show us what our next emoticon should look like. Join the fun. http://www.advision.webevents.yahoo.com/emoticontest
guidance for GSoC 2016 under GCC
Hello GCC developers, I would like to work on one of the following idea in GSoC 2016 for GCC. Function Reordering (Improvement) with LTO Inter-procedural value range propagation pass Implement tree level section anchors to improve code generation at ARM/PPC. I have done some reading for first and second topic. I would like your guidance. For first topic I have read Martin's master thesis and as far as I understand currently he has implemented function reordering with PGO support but this project would be using LTO support. Am I thinking it right ? For second project I have read the IPCP.c file in gcc source code which implements Inter-procedural constant propagation with call graphs and jump functions. According to Chapter 11, page 664 of Optimizing Compilers for Modern Architectures: A Dependence-based Approach book range propagation pass can be designed by extending IPCP. Here extensions to IPCP would be deciding ranges of variable from for loops, const assignment or if/else statement and modifying jump functions so that ranges can be calculated base on operations. Also we may use data structure for range as used in tree-vrp.c of gcc. For third project I have not started studying about it. Please suggest some readings. Apart from this I have learned how to write simple passes and plugins for gcc and its related data structures ( learned from Diego Novillo's slide ). I have also written some simple optimization passes with LLVM libs. Please provide more information or experimental patches to study. Sincerely, Vivek Pandya P.S : Actually I tried to contact Mr. Jan Hubicka as mentioned on idea page but it seems that he is not reachable on his mail address j...@suse.cz that is why I have mail to gcc dev list.
Re: ipa vrp implementation in gcc
> On Mon, Jan 11, 2016 at 4:07 PM, Richard Biener > wrote: >> >> On Mon, Jan 11, 2016 at 1:38 AM, Kugan >> wrote: >> > Hi All, >> > >> > I am looking at implementing a ipa vrp pass. Jan Hubicka also talks >> > about this in 2013 GNU Cauldron as one of the optimization he would like >> > to see in gcc. So my question is, is any one implementing it. If not we >> > would like to do that. >> > > > Hello I am Vivek Pandya, I am actually working on a GSoC 2016 proposal for > his work and it is very similar to extending ipa-cp pass. I am also in touch > with Jan Hubicka. These comments will certainly help me but if is urgent for > any one you can begin work on this. Jan has shown interest to mentor me for > this project but any help from community is always appreciated. > >> >> > I also looked at the ipa-cp implementation to see how this can be done. >> > Going by this, one of the way to implement this is (skipping all the >> > details): >> > >> > - Have an early tree-vrp so that we can have value ranges for parameters >> > at call sites. > > Actually a tree-vrp pass already exists. But as Jan has suggested me that > ipa-vrp implementation should not be too much costly. So I am also thinking > to include this work in my proposal and also using the analysis to improve > LTO heuristics as the project duration will be around 2.5 months. >> >> >> I'd rather use the IPA analysis phase for this and use a VRP algorithm >> that doesn't require ASSERT_EXPR insertion. >> >> > - Create jump functions that captures the value ranges of call sites >> > propagate the value ranges. In 2013 talk, Jan Hubicka talks about >> > >> > - Modifying ipa-prop.[h|c] to handles this but wouldn't it be easier to >> > have its own and much more simpler implementation ? >> >> No idea. >> >> > - Once we have the value ranges for parameter/return values, we could >> > rely on tree-vrp to use this and do the optimizations >> >> Yep. IPA transform phase should annotate parameter default defs with >> computed ranges. >> >> > Does this make any sense? Any thoughts/suggestions to work on this is >> > highly appreciated. >> >> IPA alignment propagation should already be somewhat similar as in doing >> an intersection step during propagation. >> >> Richard. >> >> > Thanks, >> > Kugan > > Your comments certainly helps me to develop my proposal. Please let me know > any updated to avoid the confusion and duplication of work. > Sincerely, > Vivek
Source Code for Profile Guided Code Positioning
Hello GCC Developers, Are 'Profile Guided Code Positioning' algorithms mentioned in http://dl.acm.org/citation.cfm?id=93550 this paper ( Pettis and Hanse ) implemented in gcc ? If yes kindly help me with code file location in gcc source tree. Sincerely, Vivek Pandya
Re: Source Code for Profile Guided Code Positioning
Thanks Yury for https://gcc.gnu.org/ml/gcc-patches/2011-09/msg01440.html this link. It implements procedure reordering as linker plugin. I have some questions : 1 ) Can you point me to some documentation for "how to write plugin for linkers " I am I have not seen doc for structs with 'ld_' prefix (i.e defined in plugin-api.h ) 2 ) There is one more algorithm for Basic Block ordering with execution frequency count in PH paper . Is there any implementation available for it ? Sincerely, Vivek
Re: ipa vrp implementation in gcc
Vivek Pandya On Mon, Jan 18, 2016 at 4:16 AM, Kugan wrote: > > > > Hello I am Vivek Pandya, I am actually working on a GSoC 2016 proposal > > for his work and it is very similar to extending ipa-cp pass. I am also > > in touch with Jan Hubicka. > > Hi Vivek, > > Glad to know that you are planning to work on this. Could you please put > you plan in an accessible place (or post it here) so that we know what > you plans are. That way we can work on what you are not working. And > also possible contribute to your plan in other ways (like testing and > reviewing). > Hello Kugan, Actually my work will include extending the ipa-cp pass to propagate range information and then integrating this information to improve LTO optimizations (at-least one). But as mentioned by Jan Hubicka the real problem is not to extend ipa-cp pass but tree-vrp it self a big task and scheduling it at early stage will cost a performance lose. So actually I was looking at some alternatives to Patterson's approach and particularly I found this non iterative method: https://www.cs.berkeley.edu/~daw/papers/range-tacas04.pdf which has already implemented in LLVM .http://homepages.dcc.ufmg.br/~fernando/publications/papers/SBLP2011_douglas.pdf So my plan for this is first implementing above mentioned approach till 23 May , 2016 (My college project ) and then use this local pass for Value range analysis and then in my GSoC 2016 project I will use this pass for ipa-vrp pass and improving other ipa optimizations to use this information. Though in particular I have yet not figured implementation details. Currently I am learning about gcc IRs. If you have any further idea ( specially about constraints based method ) please let me know and help building my implementation approach. Sincerely, Vivek > > Thanks, > Kugan
Re: ipa vrp implementation in gcc
Vivek Pandya On Mon, Jan 18, 2016 at 11:35 AM, vivek pandya wrote: > Vivek Pandya > > > On Mon, Jan 18, 2016 at 4:16 AM, Kugan > wrote: >> >> >> > Hello I am Vivek Pandya, I am actually working on a GSoC 2016 proposal >> > for his work and it is very similar to extending ipa-cp pass. I am also >> > in touch with Jan Hubicka. >> >> Hi Vivek, >> >> Glad to know that you are planning to work on this. Could you please put >> you plan in an accessible place (or post it here) so that we know what >> you plans are. That way we can work on what you are not working. And >> also possible contribute to your plan in other ways (like testing and >> reviewing). >> > Hello Kugan, > > Actually my work will include extending the ipa-cp pass to propagate > range information and then integrating this information to improve LTO > optimizations (at-least one). But as mentioned by Jan Hubicka the real > problem is not to extend ipa-cp pass but tree-vrp it self a big task > and scheduling it at early stage will cost a performance lose. > So actually I was looking at some alternatives to Patterson's approach > and particularly I found this non iterative method: > https://www.cs.berkeley.edu/~daw/papers/range-tacas04.pdf which has > already implemented in LLVM > .http://homepages.dcc.ufmg.br/~fernando/publications/papers/SBLP2011_douglas.pdf Also please some one suggest me wether this non iterative method will be good to have as alternative VRP or not ? i.e will it serve our purpose of light weight VRP to be used at earlier stage ?? > So my plan for this is first implementing above mentioned approach > till 23 May , 2016 (My college project ) and then use this local pass > for Value range analysis and then in my GSoC 2016 project I will use > this pass for ipa-vrp pass and improving other ipa optimizations to > use this information. > Though in particular I have yet not figured implementation details. > Currently I am learning about gcc IRs. > If you have any further idea ( specially about constraints based > method ) please let me know and help building my implementation > approach. > > Sincerely, > Vivek >> >> Thanks, >> Kugan
GCC Compiler Optimization ignores or mistreats MFENCE memory barrier related instruction
Hi, We are trying to create a memory barrier with following testcase. = #include void Test() { float fDivident = 0.1f; float fResult = 0.0f; fResult = ( fDivident / fResult ); __asm volatile ("mfence" ::: "memory"); printf("\nResult: %f\n", fResult); } == 'mfence' performs a serializing operation on all load-from-memory and store-to-memory instructions that were issued prior the MFENCE instruction. This serializing operation guarantees that every load and store instruction that precedes the MFENCE instruction in program order becomes globally visible before any load or store instruction that follows the MFENCE instruction. The mfence instruction with memory clobber asm instruction should create a barrier between division and printf instructions. When the testcase is compiled with optimization options O1 and above it can be observed that the mfence instruction is reordered and precedes division instruction. We expected that the two sets of assembly instructions, one pertaining to division operation and another pertaining to the printf operation, would not get mixed up on reordering by the GCC compiler optimizer because of the presence of the __asm volatile ("mfence" ::: "memory"); line between them. But, the generated assembly, which is inlined below for reference, isn't quite right as per our expectation. pushl %ebp# 23*pushsi2[length = 1] movl%esp, %ebp # 24*movsi_internal/1 [length = 2] subl$24, %esp # 25pro_epilogue_adjust_stack_si_add/1 [length = 3] mfence fldz# 20*movxf_internal/3 [length = 2] fdivrs .LC0# 13*fop_xf_4_i387/1[length = 6] You may note that the mfence instruction is generated before the fdivrs instruction. Can you please let us know if the usage of the "asm (mfence)" instruction as given in the above testcase is the right way of creating the expected memory barrier between the two sets of instructions pertaining to the division and printf operations, respectively or not? If yes, then we think, it's a bug in Compiler. Could you please confirm? If no, then what is the correct usage of "asm (mfence)" so as to get/ achieve the memory barrier functionality as expected in the above testcase? Thanks, Vivek Kinhekar
RE: GCC Compiler Optimization ignores or mistreats MFENCE memory barrier related instruction
Thanks for the quick response, Alexander! Regards, Vivek Kinhekar +91-7709046470 -Original Message- From: Alexander Monakov Sent: Friday, April 13, 2018 5:58 PM To: Vivek Kinhekar Cc: gcc@gcc.gnu.org Subject: Re: GCC Compiler Optimization ignores or mistreats MFENCE memory barrier related instruction On Fri, 13 Apr 2018, Vivek Kinhekar wrote: > The mfence instruction with memory clobber asm instruction should > create a barrier between division and printf instructions. No, floating-point division does not touch memory, so the asm does not (and need not) restrict its motion. Alexander
RE: GCC Compiler Optimization ignores or mistreats MFENCE memory barrier related instruction
Hello Alexander, In the given testcase, the generated fdivrs instruction performs the division of a symbol ref (memory value) by FPU Stack Register and stores the value in FPU Stack Register. Please find the following RTL Dump of the fdivrs instruction generated. It clearly access the memory for read access! === #(insn:TI 13 20 16 2 (set (reg:XF 8 st) #(div:XF (float_extend:XF (mem/u/c:SF (symbol_ref/u:SI ("*.LC0") [flags 0x2]) [4 S4 A32])) #(reg:XF 8 st))) {*fop_xf_4_i387} # (nil)) fdivrs .LC0# 13*fop_xf_4_i387/1[length = 6] === Are we missing anything subtle here? Regards, Vivek Kinhekar -Original Message- From: Alexander Monakov Sent: Friday, April 13, 2018 5:58 PM To: Vivek Kinhekar Cc: gcc@gcc.gnu.org Subject: Re: GCC Compiler Optimization ignores or mistreats MFENCE memory barrier related instruction On Fri, 13 Apr 2018, Vivek Kinhekar wrote: > The mfence instruction with memory clobber asm instruction should > create a barrier between division and printf instructions. No, floating-point division does not touch memory, so the asm does not (and need not) restrict its motion. Alexander
RE: GCC Compiler Optimization ignores or mistreats MFENCE memory barrier related instruction
Oh! Thanks for the quick response, Jakub. Regards, Vivek Kinhekar -Original Message- From: Jakub Jelinek Sent: Friday, April 13, 2018 7:08 PM To: Vivek Kinhekar Cc: Alexander Monakov ; gcc@gcc.gnu.org Subject: Re: GCC Compiler Optimization ignores or mistreats MFENCE memory barrier related instruction On Fri, Apr 13, 2018 at 01:34:21PM +, Vivek Kinhekar wrote: > Hello Alexander, > > In the given testcase, the generated fdivrs instruction performs the > division of a symbol ref (memory value) by FPU Stack Register and > stores the value in FPU Stack Register. The stack registers are not memory. > Please find the following RTL Dump of the fdivrs instruction generated. > It clearly access the memory for read access! That is a constant read, that doesn't count either. It is in memory only because the instruction doesn't support constant immediates, the memory is read-only. Jakub
optimizing a DSO
I am trying to speedup the load and startup time of a shared library. After reading Ulrich Drepper's paper on "How to write shared libraries", it seems that the easiest thing to try would be to reduce the number of symbols that are globally visible. After carefully adding __attribute__((visibility ("default"))) to only the symbols that should be globally visible and using the gcc option -fvisibility=hidden to hide all symbols by default, I managed to reduce the number of globally visible symbols. But now, it seems that even though the number of symbols needing relocation has decreased, the cost of searching for a symbol in the "optimized" dso has gone up. Here is the output from "eu-readelf -I" before and after reducing the number of globally visible symbols. It seems that the cost of both successful and unsuccessful lookup has gone up. I haven't yet done any profiling but I am guessing that my runtime symbol lookup cost will go up. Is this to be expected? BEFORE: Histogram for bucket list length in section [ 1] '.gnu.hash' (total of 4099 buckets): Addr: 0x0158 Offset: 0x000158 Link to section: [ 2] '.dynsym' Symbol Bias: 652 Bitmask Size: 4096 bytes 26% bits set 2nd hash shift: 15 Length Number % of total Coverage 01123 27.4% 11470 35.9% 28.1% 2 955 23.3% 64.7% 3 3919.5% 87.1% 4 1323.2% 97.2% 5 230.6% 99.4% 6 50.1%100.0% Average number of tests: successful lookup: 1.617107 unsuccessful lookup: 1.274945 AFTER: Histogram for bucket list length in section [ 1] '.gnu.hash' (total of 2053 buckets): Addr: 0x0158 Offset: 0x000158 Link to section: [ 2] '.dynsym' Symbol Bias: 652 Bitmask Size: 4096 bytes 21% bits set 2nd hash shift: 15 Length Number % of total Coverage 0 288 14.0% 1 576 28.1% 14.7% 2 575 28.0% 44.1% 3 367 17.9% 72.2% 4 1658.0% 89.0% 5 643.1% 97.2% 6 160.8% 99.6% 7 20.1%100.0% Average number of tests: successful lookup: 1.916007 unsuccessful lookup: 1.90794
help regarding suif
I'm using the SUIF compiler system in my project. If any body has used this tool, please let me know how I can convert the suif format to alpha, so that it can be run on simplescalar. Thank you Vivek
Doubt regarding gcc
hey, i would like to know the latest stable version of gcc... i have went through the website.. waiting for the reply, vivek