About VLIW backend

2007-11-06 Thread Li Wang
Hi,
I wonder if any efforts have been made to retarget GCC to VLIW
backend.Is there any project trying to do that? Is it included in the
GCC mainstream? Thanks.

Regards,
Li Wang


Re: About VLIW backend

2007-11-06 Thread Li Wang

Hi,
I know that. But I am talking to a more _pure_ VLIW architecture 
which totally relies on static scheduling rather than EPIC architecture. 
Thanks.

Li Wang wrote:

Hi,
I wonder if any efforts have been made to retarget GCC to VLIW
backend.Is there any project trying to do that? Is it included in the
GCC mainstream? Thanks.


the ia64 is a VLIW architecture!


Regards,
Li Wang








Re: How to describe function units allocation

2007-11-14 Thread Li Wang

Hi,
   Thanks. As you know, I am trying to retarget GCC to a somewhat 
different VLIW backend by beginning from understanding the TMS320C6x  
port codes. Now I know that I could achieve the functional units 
allocation in assembler. However, I am still interesting in that if 
possible to do this by just modifying cc1. Not involve the assembler 
gas. If possible to achieve that by only coding the .md, .h and .c files?


Regards,
Li Wang

Hi,
For the backend TI DSP TMS320C6x, There are four types of functional
units which are .L unit, .M unit, .S unit and .D unit, and each type
consists of two units named .X1 and .X2 respectively. Namely, there are
total 8 units. Except the .M units surve only for multiply, other units
share many functions. For example, they both enable 32 bits arithmetical
operation. And in the assembly, which functional unit is used to perform
operation must be explicitly indicated. For example, ADD .S1 A0, A1, A2;
ADD .L1 A0, A1, A2; ADD .D1 A0, A1, A2 achieve the same goal by using
different units. Surely, when producing assembly, a functional unit
allocation somewhat like register allocation is needed. I wonder how can
I describe the relationship in the machine description file, and whether
I need write a functional unit allocation algorithm or it is done by a
general purpose allocation algorithm embedded in GCC, like register
allocation, I only need give some architecture descriptions? Thanks in
advance for your kind assistance.



IMHO. the functional units that accompany the assembly instruction are
optional. However, for c6x-gcc the reason cc1 doesnt allocate
functional units is that the assembler ( as part of the c6x binutils )
does the functional unit allocation on its own. There are some notes
about how the assembler does this in Extending the GNU Assembler for
Texas Instruments TMS320C6x-DSP.pdf

HTH,
Pranav

  

Regards,
Li Wang




  




Re: How to let GCC produce flat assembly

2007-11-15 Thread Li Wang
Hi,
I may need explain this problem more clearly.For a backend which runs as
coprocessor to a host processor, such as GPU, which incoporates large
numbers of ALUS and processes only arithmetic operations and some other
simple operations, runs in VLIW pattern to accelerate the host
processor. Say, this coprocessor is referred as 'raw processor', note, I
don't mention GPU, GPU is similar in mechnism but more complex than
this. It owns simple ISA, and has no dedicated ESP, EBP to support
function call, It fetches the VLIW instruction from instruction memory
one by one,
and execute it. If I want to let GCC produce assembly for it, how should
I code the machine description file? Should I first let cc1 produce a
elf assembly for it, and then let binutils trunate it to a flat
assembly? It seems ugly hacking. Thanks.

Regards,
Li Wang
> Li Wang wrote:
>   
>> Hi,
>> I wonder how to let GCC produce flat assembly, say, just like the .com
>> file under the DOS, without function calls and complicate executable
>> file headers, only instructions. How to modify the machine description
>> file to achieve that? Thanks in advance.
>> 
>
> Perhaps you are asking on the wrong list.
>
> And what exactly do you want to achieve and why?
>
> What is your target system?
>
> Why using (and appropriately configuring) the binutils (in particular
> its linker, ld, implicitly invoked by gcc) not appropriate for your
> needs? I am sure that you can configure it appropriately (binutils is
> very powerful).
>
> You still will need other generated data than the instructions.
> Typically, constants such as strings. And many other stuff.
>
>
>   



How to let GCC produce flat assembly

2007-11-15 Thread Li Wang
Hi,
I wonder how to let GCC produce flat assembly, say, just like the .com
file under the DOS, without function calls and complicate executable
file headers, only instructions. How to modify the machine description
file to achieve that? Thanks in advance.

Regards,
Li Wang


Re: How to let GCC produce flat assembly

2007-11-15 Thread Li Wang

Hi,
   Thanks for your attention and response. I think I am still not very 
accurate to describe what I want to do. I am too anxious to explain far 
from clearly. Now permit me use a simple example, for the simple C 
program below, compiled by cc1 targetting to x86 platform, the assembly 
is as follows,


int main()
{
int a, b, c;
   
a = 2;

b = 2;
c = a + b;
return 0;
}

   .file"test.c"
   .text
.globl main
   .typemain,@function
main:
   pushl%ebp
   movl%esp, %ebp
   subl$24, %esp
   andl$-16, %esp
   movl$0, %eax
   subl%eax, %esp
   movl$2, -4(%ebp)
   movl$2, -8(%ebp)
   movl-8(%ebp), %eax
   addl-4(%ebp), %eax
   movl%eax, -12(%ebp)
   movl$0, %eax
   leave
   ret

As you said, the coprocessor has no ABI to describe a stack and a 
function interface, then inline applies. But how could I inline 'main'? 
And I am sorry for I misuse the word 'elf assembly', what exactly I mean 
by that is how to omit the section or any other informations helps 
linker to organize a executable from the cc1 output. In a word, codes 
something like the following is what I want, If possible to let cc1 
produce such assembly? Thanks.


   movl$2, -4(%ebp)
   movl$2, -8(%ebp)
   movl-8(%ebp), %eax
   addl    -4(%ebp), %eax

Regards,
Li Wang

On Thu, Nov 15, 2007 at 04:20:49PM -0800, Li Wang wrote:
  

I may need explain this problem more clearly.



Yes, my earlier message directing you to gcc-help was because I thought
you didn't grasp what the compiler should do and what the linker should
do; sorry about that.

  

For a backend which runs as
coprocessor to a host processor, such as GPU, which incoporates large
numbers of ALUS and processes only arithmetic operations and some other
simple operations, runs in VLIW pattern to accelerate the host
processor. Say, this coprocessor is referred as 'raw processor', note, I
don't mention GPU, GPU is similar in mechnism but more complex than
this. It owns simple ISA, and has no dedicated ESP, EBP to support
function call.



But those registers aren't dedicated to support function calls on the x86
except by convention.  If your coprocessor has no ABI to describe a stack
and a function interface, you need to invent one, so that you can do
function calls.  gcc can inline the calls where it makes sense, and the
scores can be adjusted so that a lot of inlining happens if your stack is
inefficient.

  

If I want to let GCC produce assembly for it, how should
I code the machine description file? Should I first let cc1 produce a
elf assembly for it, and then let binutils trunate it to a flat
assembly? It seems ugly hacking. Thanks.



gcc produces assembler code.  as turns it into object code.  ld links
to form an executable.  That's the way that it works.


  




Re: How to let GCC produce flat assembly

2007-11-16 Thread Li Wang

Dave Korn 写道:

On 16 November 2007 05:56, Li Wang wrote:

  

As you said, the coprocessor has no ABI to describe a stack and a
function interface, then inline applies. But how could I inline 'main'?
And I am sorry for I misuse the word 'elf assembly', what exactly I mean
by that is how to omit the section or any other informations helps
linker to organize a executable from the cc1 output. In a word, codes
something like the following is what I want, If possible to let cc1
produce such assembly? Thanks.

movl$2, -4(%ebp)
movl$2, -8(%ebp)
movl-8(%ebp), %eax
addl-4(%ebp), %eax




  Various CPU backends (but IIRC not i386) implement a "naked" function
attribute, which suppresses function epilogue and prologue generation.  You
could implement something like that.
  
It seems to be what I want. Could you please give more clues? Which 
backend and where I can find that "naked" function attribute, thanks.


cheers,
  DaveK
  

Regards,
Li Wang


Generate Codes for a something like stack/dataflow computer

2007-12-06 Thread Li Wang
Hi,
We are retargetting GCC to a VLIW chip, which runs as a coprocessor to a
general purpose processor. The coprocessor is responsible for
expediating some code sections which have good parallel characteristics
without any dependences. Its ISA enables it can only fetch data
sequentially rather than random access from a on-chip memory which is
shared by the host processor, through dedicated function units named
DBx. The host processor is responsible to place data there, and told the
DBx base address and data length. Once the data is fetched by the
coprocessor, it is stored to local registers owned by the coprocessor,
and before the computing ends, the data will always reside in the
coprocessor's registers. Namely, without spills and it permits no
spills. From the coprocessor standpoint, the instructions supports no
memory operands and no any addressing mode. It supports only register
move and arithmetical operations. It looks something like data flow
computer or stack computer. Let's take the following codes as an example:

int main()
{
int a[16], b[16], c[16];

compute(a, b, c);
return 0;
}
void compute(int a[], int b[], int c[])
{
for (int j = 0; j < 16; j++)
c[j] = a[j] + b[j];
return;
}

We want to put the function compute() executed on the coprocessor, and
host processor organizes and places the data at proper positions in the
on-chip memory, prepare the DBx function units. Assume DB0 is allocated
to array a[], DB1 to b[], DB2 to c[]. Then the assemble codes for the
coprocessor we want to generate like as follows,

L3:
if (data in DB0 not exausted)
goto L1;
else
goto L2;
L1:
get R0, DB0; // load a data from the on-chip memory through DB0 to R0
get R1, DB1;
add R2, R0, R1;
put R2, DB2; // store result to DB2
goto L3;
L2:
end;

Could anyone give some hints how to implement that, currently the GCC
internals for addressing mode in the machine description could support that?

Li