Problem in porting GLIBC...

2005-02-16 Thread vivek
Hello

We are trying to port GLIBC 2.2.5 on the ABACUS (processor similar to
SPARC) platform. We have done much of the porting work.

At this stage we are trying to get the 'ld-2.2.5.so' working. We are
facing problems in this. We are trying to run this on again on our own
Linux kernel for ABACUS processor. The GOT & PLT generated through the GCC
for ld-2.2.5.so is causing the problems.

At this moment we are into the main of the user test case. We have jumped
on the User Entry through the LD.so

There must be some problem in the relocation and now we are unable to
trace the flow as we are not having the debugger and we use to have  print
statememts to debug the LD. Now it is not possible to use the print
statements also.

Could you please give us some hint about how to proceed further.

As I am not member of the group please give me mail on my personal ID,
[EMAIL PROTECTED]

Thanks & Warm Regards

VIVEK



Problem in GCC porting...

2005-02-17 Thread vivek
Hello All

We are trying to fix the already ported GCC 2.95 on the ABACUS processor.
ABACUS processor is very much similar to SPARC from SUN.

We are facing 2 major problems in the fixing.

1. Variable initialization within block: If we declare and initialize a
variable within a block in the function, initialization happens only at
the first time, not every time, if code enters that block again.

main()
{
---
---
   {
   int a = 10;
   a += 10;
   }
---
---
}

In the above case if we enter the block inside main for second time or
more the value of 'a' get continued as of 'static' variable.

2. Function  pointer returning some value:
In case of function pointers returning something we are not able to
collect the same in the return register or variable.

These 2 errors in the compiler is causing problems in other activities also.

If anyone could through some light on these issues this will help me a lot.

Please send me reply on my ID [EMAIL PROTECTED] as I am not a member of
this group.

Thanks in anticipation...

VIVEK



mirror a GCC mailing list using Google Groups?

2006-10-30 Thread Vivek Rao
I would like to mirror the gfortran mailing list using
Google Groups, for reasons described at
http://gcc.gnu.org/ml/fortran/2006-10/msg00692.html .
Someone suggested I contact the "GCC steering
committee" to get feedback, so I am posting here.

Vivek Rao



arm-elf-gcc shared flat support

2007-03-30 Thread vivek tyagi

Hi ,

I am working on Shared flat file support for uClinux (No MMU ARM ).The
gcc version
I am using is 2.95 and 3.4.0.Theory of operation is similar to that
implemented for m68k.One of the major requirement is to call functions
via GOT.
so a code

**c-code**
foo()
{}
main()
{
foo();
}

**

is to be called as

compiler output***

ldr r3, .L4
mov   lr,pc
ldrpc[sl,r3]

.L4:
.word   foo(GOT)

**

as opposed to
bl foo(PLT)

where sl holds the address of GOT.(binfmt_flat loader ensures that
before the program start)

in gcc 3.4.0 this is some how achived if the function attribute
__attribute__((weak)) is specified.But no idea for 2.95

Kindly bail me out on this one.
Sincere Thanks in advance.

Vivek Tyagi


Re: arm-elf-gcc shared flat support

2007-03-30 Thread vivek tyagi

Hi Richard ,Paul


This is the wrong list for these sorts of questions, you should really
be asking on gcc-help.



The project I am working on require changes to be made in the gcc
backend(probably front end too for complete solution).so I thought
best to discuss it with developers.


Is there some reason why linker-generated PLT sequences aren't a
reasonable solution?


and for Paul


Why on earth do you need to do this? Can't you get the linker to generate PLT
sequences like we do for normal shared libraries?



As my major work area is ARM uClinux ,My apologies if my explanation
on this compiler project is erroneous.Kindly ignore if this is already
known.

As per my understanding ( Kindly bear with me)  the file format for
NO MMU ARM in uClinux is "Binary Flat format commonly known as BFLT"
(http://www.beyondlogic.org/uClinux/bflt.htm  for reference).This
format is achieved by running "elf2flt" tool on the ELF file generated
by the cross compiler tool chain(here arm-elf-gcc).Now the flat files
do not have a PLT .They are very simple format with
TEXT:GOT:DATA:RELCO sections(in that order).so IMHO  the generic
linker modification for PLT sequences cannot be  done.This matter was
discussed long back in the same list by  John Lee,Bernardo Innocenti
et al around 11 Jun 2004..
For providing shared Library support for uClinux environment,which is
essentially without MMU some changes are required in the
toolchain.One approach was developed by snapgear team for m86k
(http://www.ucdot.org/article.pl?sid=03/11/25/1126257&mode=thread).But
there is no similar implementation (read open source) for ARM..Just
to be sure, I raised this issue at uClinux developer forum and
received the same answer.So I proceeded to spare some time and work on
an opensource implementation for ARM on similar lines that is done for
m68k.

The pivot is to handle PIC register r10(sl) and the "-R" option in ld.
The way it works is the symbol information are copied(via -R) option
form the library to the main executable at compile time.The load time
values are taken care by the binfmt_flat loader via GOT. For this to
work  every thing (data variable,functions)should be accessed via GOT
so that the copied symbol information can be fixed up by the linker.

Taking my original example

**c-code**
foo()
{}
main()
{
foo();
}

**



In the  actual senario  the function foo() would be part of shared library.

exe   **shared-lib***
extern foo();foo()
main() {
{  }
foo();
}
exe   **shared-lib***

now the object files would be linked as
arm-elf-gcc  exe.o -Wl,-R,shared-lib.o

Here you could see the need of modifying the compiler for generating
call to foo() via GOT rather than calling it by "bl" .The -R flag
would copy a value for foo() in exe( from shared-lib ) which does not
make any sense .On the other hand if foo() is called via GOT ,-R would
copy  the reloc value for foo() in exe. This is fixed up by the
binfmt_flat loader to point to the correct address of foo() in
shared-lib (the library is loaded first by the loader...).

Hope this explains the requirement of indirect function call



The other changes require store/ restore of  PIC register for all
function calls and loading PIC register with the address of relevant
GOT (the GOT address are maintained in a array updated by binfmt_flat
loader.

I have implemented -mshared-library-id flag for arm.

so the gcc -mshared-library-id=1

generates following function prologue and epilogue(for gcc 2.95)
   mov ip ,sp
   stmdb   sp!,{sl,fp,ip,lr,pc}/*store sl on stack*/
   sub sl,sl,#8/*bad hack to update sl ,sl is
to be loaded with address of new GOT which lies 2 words before the
current location for lib id =1*/
   ldr sl,[sl] /*bad hack continues.. ideally this
should be one instruction  i.e ldr  sl,[sl,# - 8] ...its in my TODO
list*/
   
   ...
   ...
   ldmdb   fp,{sl,fp,sp,pc}/*restore sl */


Now I am not sure if this is a very efficient  approach.But lack of
MMU does not leave us with too many choices.Implementing this for ARM
would take care of inefficiency caused in ARM uClinux due to lack of
shared Library support.The above mentioned changes can be done for the
latest gcc  also.But I am facing some relocation issues with the
uClibc compiled with 3.4.0 with PIC reg modification hacks.It works
fine with 2.95 so I worked with that first.


All this is flexible,feel free to add in your suggestions..


Thanks
Vivek Tyagi


cross compiling

2005-02-28 Thread vivek sukumaran
Are there any ready to use gcc rpms for,
host:x-86,redhat9.0
target:alpha

thanking you 
vivek



__ 
Do you Yahoo!? 
Yahoo! Mail - 250MB free storage. Do more. Manage less. 
http://info.mail.yahoo.com/mail_250


benchmarks

2005-04-08 Thread vivek sukumaran
Hello everybody,
I need benchmark programs for my project.
Does anybody have or know the links to C benchmarks
that can be compiled using gcc?
Thanking you,
Vivek



__ 
Yahoo! Messenger 
Show us what our next emoticon should look like. Join the fun. 
http://www.advision.webevents.yahoo.com/emoticontest


guidance for GSoC 2016 under GCC

2016-01-03 Thread vivek pandya
Hello GCC developers,
I would like to work on one of the following idea in GSoC 2016 for GCC.

Function Reordering (Improvement) with LTO
Inter-procedural value range propagation pass
Implement tree level section anchors to improve code generation at ARM/PPC.

I have done some reading for first and second topic. I would like your guidance.
For first topic I have read Martin's master thesis and as far as I
understand currently he has implemented function reordering with PGO
support but this project would be using LTO support. Am I thinking it
right ?

For second project I have read the IPCP.c file in gcc source code
which implements Inter-procedural constant propagation with call
graphs and jump functions. According to Chapter 11, page 664 of
Optimizing Compilers for Modern Architectures: A Dependence-based
Approach book range propagation pass can be designed by extending
IPCP. Here extensions to IPCP would be deciding ranges of variable
from for loops, const assignment or if/else statement and modifying
jump functions so that ranges can be calculated base on operations.
Also we may use data structure for range as used in tree-vrp.c of gcc.

For third project I have not started studying about it. Please suggest
some readings.

Apart from this I have learned how to write simple passes and plugins
for gcc and its related data structures ( learned from Diego Novillo's
slide ). I have also written some simple optimization passes with LLVM
libs.

Please provide more information or experimental patches to study.

Sincerely,
Vivek Pandya

P.S : Actually I tried to contact Mr. Jan Hubicka as mentioned on idea
page but it seems that he is not reachable on his mail address
j...@suse.cz that is why I have mail to gcc dev list.


Re: ipa vrp implementation in gcc

2016-01-11 Thread vivek pandya
> On Mon, Jan 11, 2016 at 4:07 PM, Richard Biener 
> wrote:
>>
>> On Mon, Jan 11, 2016 at 1:38 AM, Kugan
>>  wrote:
>> > Hi All,
>> >
>> > I am looking at implementing a ipa vrp pass. Jan Hubicka also talks
>> > about this in 2013 GNU Cauldron as one of the optimization he would like
>> > to see in gcc. So my question is, is any one implementing it. If not we
>> > would like to do that.
>> >
>
> Hello I am Vivek Pandya, I am actually working on a GSoC 2016 proposal for
> his work and it is very similar to extending ipa-cp pass. I am also in touch
> with Jan Hubicka. These comments will certainly help me but if is urgent for
> any one you can begin work on this. Jan has shown interest to mentor me for
> this project but any help from community is always appreciated.
>
>>
>> > I also looked at the ipa-cp implementation to see how this can be done.
>> > Going by this, one of the way to implement this is (skipping all the
>> > details):
>> >
>> > - Have an early tree-vrp so that we can have value ranges for parameters
>> > at call sites.
>
> Actually a tree-vrp pass already exists. But as Jan has suggested me that
> ipa-vrp implementation should not be too much costly. So I am also thinking
> to include this work in my proposal and also using the analysis to improve
> LTO heuristics as the project duration will be around 2.5 months.
>>
>>
>> I'd rather use the IPA analysis phase for this and use a VRP algorithm
>> that doesn't require ASSERT_EXPR insertion.
>>
>> > - Create jump functions that captures the value ranges of call sites
>> > propagate the value ranges. In 2013 talk, Jan Hubicka talks about
>> >
>> > - Modifying ipa-prop.[h|c] to handles this but wouldn't it be easier to
>> > have its own and much more simpler implementation ?
>>
>> No idea.
>>
>> > - Once we have the value ranges for parameter/return values, we could
>> > rely on tree-vrp to use this and do the optimizations
>>
>> Yep.  IPA transform phase should annotate parameter default defs with
>> computed ranges.
>>
>> > Does this make any sense? Any thoughts/suggestions to work on this is
>> > highly appreciated.
>>
>> IPA alignment propagation should already be somewhat similar as in doing
>> an intersection step during propagation.
>>
>> Richard.
>>
>> > Thanks,
>> > Kugan
>
> Your comments certainly helps me to develop my proposal. Please let me know
> any updated to avoid the confusion and duplication of work.
> Sincerely,
> Vivek


Source Code for Profile Guided Code Positioning

2016-01-15 Thread vivek pandya
Hello GCC Developers,

Are 'Profile Guided Code Positioning' algorithms mentioned in
http://dl.acm.org/citation.cfm?id=93550 this paper ( Pettis and Hanse
) implemented in gcc ?
If yes kindly help me with code file location in gcc source tree.

Sincerely,
Vivek Pandya


Re: Source Code for Profile Guided Code Positioning

2016-01-15 Thread vivek pandya
Thanks Yury for
https://gcc.gnu.org/ml/gcc-patches/2011-09/msg01440.html this link.
It implements procedure reordering as linker plugin.
I have some questions :
1 ) Can you point me to some documentation for "how to write plugin
for linkers " I am I have not seen doc for structs with 'ld_' prefix
(i.e defined in plugin-api.h )
 2 ) There is one more algorithm for Basic Block ordering with
execution frequency count in PH paper . Is there any implementation
available for it ?

Sincerely,
Vivek


Re: ipa vrp implementation in gcc

2016-01-17 Thread vivek pandya
Vivek Pandya


On Mon, Jan 18, 2016 at 4:16 AM, Kugan
 wrote:
>
>
> > Hello I am Vivek Pandya, I am actually working on a GSoC 2016 proposal
> > for his work and it is very similar to extending ipa-cp pass. I am also
> > in touch with Jan Hubicka.
>
> Hi Vivek,
>
> Glad to know that you are planning to work on this. Could you please put
> you plan in an accessible place (or post it here) so that we know what
> you plans are. That way we can work on what you are not working. And
> also possible contribute to your plan in other ways (like testing and
> reviewing).
>
Hello Kugan,

Actually my work will include extending the ipa-cp pass to propagate
range information and then integrating this information to improve LTO
optimizations (at-least one). But as mentioned by Jan Hubicka the real
problem is not to extend ipa-cp pass but tree-vrp it self a big task
and scheduling it at early stage will cost a performance lose.
So actually I was looking at some alternatives to Patterson's approach
and particularly I found this non iterative method:
https://www.cs.berkeley.edu/~daw/papers/range-tacas04.pdf which has
already implemented in LLVM
.http://homepages.dcc.ufmg.br/~fernando/publications/papers/SBLP2011_douglas.pdf
So my plan for this is first implementing above mentioned approach
till 23 May , 2016 (My college project ) and then use this local pass
for Value range analysis and then in my GSoC 2016 project I will use
this pass for ipa-vrp pass and improving other ipa optimizations to
use this information.
Though in particular I have yet not figured implementation details.
Currently I am learning about gcc IRs.
If you have any further idea ( specially about constraints based
method ) please let me know and help building my implementation
approach.

Sincerely,
Vivek
>
> Thanks,
> Kugan


Re: ipa vrp implementation in gcc

2016-01-17 Thread vivek pandya
Vivek Pandya



On Mon, Jan 18, 2016 at 11:35 AM, vivek pandya  wrote:
> Vivek Pandya
>
>
> On Mon, Jan 18, 2016 at 4:16 AM, Kugan
>  wrote:
>>
>>
>> > Hello I am Vivek Pandya, I am actually working on a GSoC 2016 proposal
>> > for his work and it is very similar to extending ipa-cp pass. I am also
>> > in touch with Jan Hubicka.
>>
>> Hi Vivek,
>>
>> Glad to know that you are planning to work on this. Could you please put
>> you plan in an accessible place (or post it here) so that we know what
>> you plans are. That way we can work on what you are not working. And
>> also possible contribute to your plan in other ways (like testing and
>> reviewing).
>>
> Hello Kugan,
>
> Actually my work will include extending the ipa-cp pass to propagate
> range information and then integrating this information to improve LTO
> optimizations (at-least one). But as mentioned by Jan Hubicka the real
> problem is not to extend ipa-cp pass but tree-vrp it self a big task
> and scheduling it at early stage will cost a performance lose.
> So actually I was looking at some alternatives to Patterson's approach
> and particularly I found this non iterative method:
> https://www.cs.berkeley.edu/~daw/papers/range-tacas04.pdf which has
> already implemented in LLVM
> .http://homepages.dcc.ufmg.br/~fernando/publications/papers/SBLP2011_douglas.pdf

Also please some one suggest me wether this non iterative method will
be good to have as alternative VRP or not ? i.e will it serve our
purpose of light weight VRP to be used at earlier stage ??

> So my plan for this is first implementing above mentioned approach
> till 23 May , 2016 (My college project ) and then use this local pass
> for Value range analysis and then in my GSoC 2016 project I will use
> this pass for ipa-vrp pass and improving other ipa optimizations to
> use this information.
> Though in particular I have yet not figured implementation details.
> Currently I am learning about gcc IRs.
> If you have any further idea ( specially about constraints based
> method ) please let me know and help building my implementation
> approach.
>
> Sincerely,
> Vivek
>>
>> Thanks,
>> Kugan


GCC Compiler Optimization ignores or mistreats MFENCE memory barrier related instruction

2018-04-13 Thread Vivek Kinhekar
Hi,


We are trying to create a memory barrier with following testcase.

=
#include 

void Test()
{
float fDivident = 0.1f;
float fResult = 0.0f;

fResult = ( fDivident / fResult );

__asm volatile ("mfence" ::: "memory");

printf("\nResult: %f\n", fResult);
}
==



'mfence' performs a serializing operation on all load-from-memory and 
store-to-memory instructions that were issued prior the MFENCE instruction. 
This serializing operation guarantees that every load and store instruction 
that precedes the MFENCE instruction in program order becomes globally visible 
before any load or store instruction that follows the MFENCE instruction.

The mfence instruction with memory clobber asm instruction should create a 
barrier between division and printf instructions.



When the testcase is compiled with optimization options O1 and above it can be 
observed that the mfence instruction is reordered and precedes division 
instruction.

We expected that the two sets of assembly instructions, one pertaining to 
division operation and another pertaining to the printf operation, would not 
get mixed up on reordering by the GCC compiler optimizer because of the 
presence of the __asm volatile ("mfence" ::: "memory"); line between them.



But, the generated assembly, which is inlined below for reference, isn't quite 
right as per our expectation.



pushl   %ebp# 23*pushsi2[length = 1]
movl%esp, %ebp  # 24*movsi_internal/1   [length = 2]
subl$24, %esp   # 25pro_epilogue_adjust_stack_si_add/1  
[length = 3]
mfence
fldz# 20*movxf_internal/3   [length = 2]
fdivrs  .LC0# 13*fop_xf_4_i387/1[length = 6]

You may note that the mfence instruction is generated before the fdivrs 
instruction.

Can you please let us know if the usage of the "asm (mfence)" instruction as 
given in the above testcase is the right way of creating the expected memory 
barrier between the two sets of instructions pertaining to the division and 
printf operations, respectively or not?

If yes, then we think, it's a bug in Compiler. Could you please confirm?

If no, then what is the correct usage of "asm (mfence)" so as to get/ achieve 
the memory barrier functionality as expected in the above testcase?

Thanks,
Vivek Kinhekar


RE: GCC Compiler Optimization ignores or mistreats MFENCE memory barrier related instruction

2018-04-13 Thread Vivek Kinhekar
Thanks for the quick response, Alexander!

Regards,
Vivek Kinhekar
+91-7709046470

-Original Message-
From: Alexander Monakov  
Sent: Friday, April 13, 2018 5:58 PM
To: Vivek Kinhekar 
Cc: gcc@gcc.gnu.org
Subject: Re: GCC Compiler Optimization ignores or mistreats MFENCE memory 
barrier related instruction

On Fri, 13 Apr 2018, Vivek Kinhekar wrote:
> The mfence instruction with memory clobber asm instruction should 
> create a barrier between division and printf instructions.

No, floating-point division does not touch memory, so the asm does not (and 
need not) restrict its motion.

Alexander


RE: GCC Compiler Optimization ignores or mistreats MFENCE memory barrier related instruction

2018-04-13 Thread Vivek Kinhekar
Hello Alexander,

In the given testcase, the generated fdivrs instruction performs the division 
of a symbol ref (memory value) by FPU Stack Register and stores the value in 
FPU Stack Register. 

Please find the following RTL Dump of the fdivrs instruction generated. It 
clearly access the memory for read access!
===
#(insn:TI 13 20 16 2 (set (reg:XF 8 st)
#(div:XF (float_extend:XF (mem/u/c:SF (symbol_ref/u:SI ("*.LC0") [flags 
0x2]) [4 S4 A32]))
#(reg:XF 8 st)))  {*fop_xf_4_i387}
# (nil))
fdivrs  .LC0# 13*fop_xf_4_i387/1[length = 6]
===

Are we missing anything subtle here?

Regards,
Vivek Kinhekar

-Original Message-
From: Alexander Monakov  
Sent: Friday, April 13, 2018 5:58 PM
To: Vivek Kinhekar 
Cc: gcc@gcc.gnu.org
Subject: Re: GCC Compiler Optimization ignores or mistreats MFENCE memory 
barrier related instruction

On Fri, 13 Apr 2018, Vivek Kinhekar wrote:
> The mfence instruction with memory clobber asm instruction should 
> create a barrier between division and printf instructions.

No, floating-point division does not touch memory, so the asm does not (and 
need not) restrict its motion.

Alexander


RE: GCC Compiler Optimization ignores or mistreats MFENCE memory barrier related instruction

2018-04-13 Thread Vivek Kinhekar
Oh! Thanks for the quick response, Jakub.

Regards,
Vivek Kinhekar

-Original Message-
From: Jakub Jelinek  
Sent: Friday, April 13, 2018 7:08 PM
To: Vivek Kinhekar 
Cc: Alexander Monakov ; gcc@gcc.gnu.org
Subject: Re: GCC Compiler Optimization ignores or mistreats MFENCE memory 
barrier related instruction

On Fri, Apr 13, 2018 at 01:34:21PM +, Vivek Kinhekar wrote:
> Hello Alexander,
> 
> In the given testcase, the generated fdivrs instruction performs the 
> division of a symbol ref (memory value) by FPU Stack Register and 
> stores the value in FPU Stack Register.

The stack registers are not memory.

> Please find the following RTL Dump of the fdivrs instruction generated. 
> It clearly access the memory for read access! 

That is a constant read, that doesn't count either.  It is in memory only 
because the instruction doesn't support constant immediates, the memory is 
read-only.

Jakub


optimizing a DSO

2010-05-10 Thread Vivek Verma
I am trying to speedup the load and startup time of a shared library. 
After reading Ulrich Drepper's paper on "How to write shared libraries", 
it seems that the easiest thing to try would be to reduce the number of 
symbols that are  globally visible. After carefully adding 
__attribute__((visibility ("default"))) to only the symbols that should 
be globally visible and using the gcc option -fvisibility=hidden to hide 
all symbols by default, I managed to reduce the number of globally 
visible symbols. But now, it seems that even though the number of 
symbols needing relocation has decreased, the cost of searching for a 
symbol in the "optimized" dso has gone up. Here is the output from 
"eu-readelf -I" before and after reducing the number of globally visible 
symbols. It seems that the cost of both successful and unsuccessful 
lookup has gone up. I haven't yet done any profiling but I am guessing 
that my runtime symbol lookup cost will go up. Is this to be expected?


BEFORE:

Histogram for bucket list length in section [ 1] '.gnu.hash' (total of 
4099 buckets):

Addr: 0x0158  Offset: 0x000158  Link to section: [ 2] '.dynsym'
Symbol Bias: 652
Bitmask Size: 4096 bytes  26% bits set  2nd hash shift: 15
Length  Number  % of total  Coverage
 01123   27.4%
 11470   35.9% 28.1%
 2 955   23.3% 64.7%
 3 3919.5% 87.1%
 4 1323.2% 97.2%
 5  230.6% 99.4%
 6   50.1%100.0%
Average number of tests:   successful lookup: 1.617107
 unsuccessful lookup: 1.274945


AFTER:

Histogram for bucket list length in section [ 1] '.gnu.hash' (total of 
2053 buckets):

Addr: 0x0158  Offset: 0x000158  Link to section: [ 2] '.dynsym'
Symbol Bias: 652
Bitmask Size: 4096 bytes  21% bits set  2nd hash shift: 15
Length  Number  % of total  Coverage
 0 288   14.0%
 1 576   28.1% 14.7%
 2 575   28.0% 44.1%
 3 367   17.9% 72.2%
 4 1658.0% 89.0%
 5  643.1% 97.2%
 6  160.8% 99.6%
 7   20.1%100.0%
Average number of tests:   successful lookup: 1.916007
 unsuccessful lookup: 1.90794





help regarding suif

2005-02-13 Thread vivek sukumaran
I'm using the SUIF compiler system in my project. If any body has used
this tool, please let me know how I can convert the suif format to
alpha, so that it can be run on simplescalar.
Thank you
Vivek


Doubt regarding gcc

2009-08-07 Thread Mr. Vivek Varghese Jacob
hey,

i would like to know the latest stable version of gcc...
i have went through the website..

waiting for the reply,

vivek