Re: Generate annotations for a binary translator

2011-06-14 Thread 陳韋任
> Different targets use the machine reorg pass for all sorts of different
> things.  Most of the code in reorg.c is actually not the machine reorg
> pass, it is the delay slots pass (pass_delay_slots).  The machine reorg
> pass (pass_machine_reorg) simply calls targetm.machine_dependent_reorg,
> which is what a backend (in config/*) calls
> TARGET_MACHINE_DEPENDENT_REORG.

  Which means if config/arch does NOT define TARGET_MACHINE_DEPENDENT_REORG
, then pass_machine_reorg does NOTHING for that arch. Am I right?

Regards,
chenwj

-- 
Wei-Ren Chen (陳韋任)
Computer Systems Lab, Institute of Information Science,
Academia Sinica, Taiwan (R.O.C.)
Tel:886-2-2788-3799 #1667


Re: Generate annotations for a binary translator

2011-06-14 Thread 陳韋任
> Sure: the document source is gcc/doc/cfg.texi.  Thanks.

  Already sent to gcc-patches. :-)
  http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01003.html

Regards,
chenwj

-- 
Wei-Ren Chen (陳韋任)
Computer Systems Lab, Institute of Information Science,
Academia Sinica, Taiwan (R.O.C.)
Tel:886-2-2788-3799 #1667


Re: Seeing gcc as an intelligent agent

2011-06-14 Thread Franck Z
In your opinion, could it be of any use in the project if I tried to merge 
ccache into gcc, so as to assess this "intelligent agent" approach ?


I can't really tell if, as for an underlying object-oriented structure that 
is already present for serialization technique in gcc's source code, despite 
it being written in C, gcc is "sort of" already build as an intelligent 
agent or if it could be worth clearly showing it in the modular structure of 
gcc.


It could lead to some comfort for the community in understanding and 
thinking on the project, as introducing intermediate languages, like gimple, 
RTL and such, did.


I haven't completely tested ccache yet, but I still expect big compilations 
to perturb my PC (freezing, many disk accesses). If part-time contributors 
to open-source projects are like me, they may also enjoy the opportunity to 
have a very silent background - yet not too slow - compilation, while they 
work on other tasks.


Stealth is also one of the reasons why intelligent agents are successful.

If I were to try this, should I work from a tagged version from the trunk, 
or is there a branch I should ask to participate into ?


Thank you. 



Re: RFA (fold): PATCH for c++/49290 (folding *(T*)(ar+10))

2011-06-14 Thread Richard Guenther
On Mon, 13 Jun 2011, Jason Merrill wrote:

> On 06/13/2011 06:51 AM, Richard Guenther wrote:
> > But I suppose you want the array-ref be folded to a constant eventually?
> 
> Right.
> 
> I'm not going to keep arguing about VIEW_CONVERT_EXPR, but that brings me back
> to my original question: is it OK to add a permissive mode to the function, or
> should I copy the whole thing into the front end?

I think you should copy the whole thing into the front end for now.

Note that we want to arrive at a point where our constant folding
can handle the MEM_REF case for arbitrary constant constructors.
See fold_const_aggregate_ref in gimple-fold.c - probably not usable
from the frontend directly though.  And it doesn't yet handle
non-array constructors without having a component-ref tree.
But if we eventually have all the code in that routine you might
switch to it instead.

Richard.


GCC Optimisation status update

2011-06-14 Thread Dimitrios Apostolou

Hello list,

I've been working on my project full time since last week, and on a 
part-time basis before then. Hopefully I'll be posting updates/patches 
more often now that my exams are over. For anyone that wants to talk to 
me, I'm jimis on the IRC.


I've looked a little into hash tables, symtab and hashtab, with no 
encouraging results unfortunately. Measurements showed that too many 
collisions were happening (more than N/2 for N searches) so I was hoping 
that reducing them would make a difference. This was not the case, the 
difference was negligible, so I stopped looking into hash tables for the 
time being. The only patches worth submitting until now are probably some 
statistics printing for various hash tables when -fmem-report is passed.


In the future I plan to try more radical changes like changing the hash 
function (we are using Bob Jenkins' v2 hash, upgrade to v3 which is 
supposed to be faster and better) and breaking the functionality of 
htab_find_slot* functions into smaller ones. Maybe also use the symtab 
hash table outside of the preprocessor, where only strings are stored 
(file_table in dwarf2out for example). Nicola (CC'd), you'd mentioned that 
you have done work on hash tables. Which parts did you change? Have you 
seen any measurable differences?


I spent more time fiddling with dwarf2out_* functions that output the 
final assembly, and this has proven more fruitful. I measured a 
significant amount of time spent into libc's vfprintf(), mostly coming 
from ASM_OUTPUT_ASCII macros and dw2_asm_output_data(), with a format 
string mostly like "%s %#x". In order to avoid the argument parsing 
overhead and the hex conversion (implemented suboptimally in glibc) I 
changed the hottest callers with fwrite()/fputs() and implemented a 
puthexl() function for hex conversion:


static void puthexl (unsigned long value, FILE *f)
{
  static char hex_repr[16]= {'0', '1', '2', '3', '4', '5', '6', '7',
 '8', '9', 'a', 'b', 'c', 'd', 'e', 'f'};
  static char buf[2 + 2*sizeof(value)]= "0x";
  int i;
  int j= 2;

  for (i = 8*sizeof(value)-4; i>=0; i-= 4)
{
  char c= (value >> i) & 0xf;
  if (c!=0 || j>2)
{
  buf[j]= hex_repr[(int)c];
  j++;
}
}

  if (j>2)
fwrite(buf, 1, j, f);
  else
putc('0', f);
}

I also performed some other minor changes like implementing 
ASM_OUTPUT_LIMITED_STRING with a function using a jump table in elfos.h. 
BTW, elfos.h is using a too complex ASM_OUTPUT_ASCII macro, it actually 
reparses the whole string to find substrings to output with 
ASM_OUTPUT_LIMITED_STRING... is that necessary?


All in all I measured a 30-50 ms speedup in cc1 run time out of ~850ms 
total. Take the numbers with a grain of salt until I fix some library 
problems with an old PC, I'll publish final numbers then.



All comments are welcome,
Dimitris


P.S. I am keeping notes at http://gcc.gnu.org/wiki/OptimisingGCC feel free 
to comment/edit on anything




PING^4 APPROVED patch for AMD64 targets running GNU/kFreeBSD, anyone?

2011-06-14 Thread Robert Millan
This patch for AMD64 targets running GNU/kFreeBSD has been approved
already, would anyone be so kind to commit it?  I'm afraid I don't have
write perms currently.

See: http://gcc.gnu.org/ml/gcc-patches/2011-06/msg00884.html

Thank you very much :-)

2011/6/10 Richard Henderson :
> On 06/10/2011 01:59 PM, Robert Millan wrote:
>> 2011-06-02  Robert Millan  
>>
>>   * config/i386/kfreebsd-gnu.h: Resync with `config/i386/linux.h'.
>>   * config/kfreebsd-gnu.h (GNU_USER_DYNAMIC_LINKER): Resync with
>>   `config/linux.h'.
>>
>>   * config/i386/kfreebsd-gnu64.h: New file.
>>   * config.gcc (x86_64-*-kfreebsd*-gnu): Replace `i386/kfreebsd-gnu.h'
>>   with `i386/kfreebsd-gnu64.h'.
>>
>>   * config/i386/linux64.h (GNU_USER_LINK_EMULATION32)
>>   (GNU_USER_LINK_EMULATION64): New macros.
>>   * config/i386/gnu-user64.h (LINK_SPEC): Rely on
>>   `GNU_USER_LINK_EMULATION32' and `GNU_USER_LINK_EMULATION64' instead
>>   of hardcoding `elf_i386' and `elf_x86_64'.
>
> Ok.
>
>
> r~
>

-- 
Robert Millan
2011-06-02  Robert Millan  

* config/i386/kfreebsd-gnu.h: Resync with `config/i386/linux.h'.
* config/kfreebsd-gnu.h (GNU_USER_DYNAMIC_LINKER): Resync with
`config/linux.h'.

* config/i386/kfreebsd-gnu64.h: New file.
* config.gcc (x86_64-*-kfreebsd*-gnu): Replace `i386/kfreebsd-gnu.h'
with `i386/kfreebsd-gnu64.h'.

* config/i386/linux64.h (GNU_USER_LINK_EMULATION32)
(GNU_USER_LINK_EMULATION64): New macros.
* config/i386/gnu-user64.h (LINK_SPEC): Rely on
`GNU_USER_LINK_EMULATION32' and `GNU_USER_LINK_EMULATION64' instead
of hardcoding `elf_i386' and `elf_x86_64'.

Index: gcc/config/i386/kfreebsd-gnu64.h
===
--- gcc/config/i386/kfreebsd-gnu64.h(revision 0)
+++ gcc/config/i386/kfreebsd-gnu64.h(revision 0)
@@ -0,0 +1,26 @@
+/* Definitions for AMD x86-64 running kFreeBSD-based GNU systems with ELF 
format
+   Copyright (C) 2011
+   Free Software Foundation, Inc.
+   Contributed by Robert Millan.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#define GNU_USER_LINK_EMULATION32 "elf_i386_fbsd"
+#define GNU_USER_LINK_EMULATION64 "elf_x86_64_fbsd"
+
+#define GLIBC_DYNAMIC_LINKER32 "/lib/ld.so.1"
+#define GLIBC_DYNAMIC_LINKER64 "/lib/ld-kfreebsd-x86-64.so.1"
Index: gcc/config/i386/kfreebsd-gnu.h
===
--- gcc/config/i386/kfreebsd-gnu.h  (revision 174566)
+++ gcc/config/i386/kfreebsd-gnu.h  (working copy)
@@ -1,5 +1,5 @@
 /* Definitions for Intel 386 running kFreeBSD-based GNU systems with ELF format
-   Copyright (C) 2004, 2007, 2011
+   Copyright (C) 2011
Free Software Foundation, Inc.
Contributed by Robert Millan.
 
@@ -19,11 +19,5 @@
 along with GCC; see the file COPYING3.  If not see
 .  */
 
-#undef GNU_USER_LINK_EMULATION
 #define GNU_USER_LINK_EMULATION "elf_i386_fbsd"
-
-#undef GNU_USER_DYNAMIC_LINKER32
-#define GNU_USER_DYNAMIC_LINKER32 "/lib/ld.so.1"
-
-#undef GNU_USER_DYNAMIC_LINKER64
-#define GNU_USER_DYNAMIC_LINKER64 "/lib/ld-kfreebsd-x86-64.so.1"
+#define GLIBC_DYNAMIC_LINKER "/lib/ld.so.1"
Index: gcc/config/i386/linux64.h
===
--- gcc/config/i386/linux64.h   (revision 174566)
+++ gcc/config/i386/linux64.h   (working copy)
@@ -24,6 +24,9 @@
 see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 .  */
 
+#define GNU_USER_LINK_EMULATION32 "elf_i386"
+#define GNU_USER_LINK_EMULATION64 "elf_x86_64"
+
 #define GLIBC_DYNAMIC_LINKER32 "/lib/ld-linux.so.2"
 #define GLIBC_DYNAMIC_LINKER64 "/lib64/ld-linux-x86-64.so.2"
 
Index: gcc/config/i386/gnu-user64.h
===
--- gcc/config/i386/gnu-user64.h(revision 174566)
+++ gcc/config/i386/gnu-user64.h(working copy)
@@ -69,7 +69,8 @@
  %{!mno-sse2avx:%{mavx:-msse2avx}} %{msse2avx:%{!mavx:-msse2avx}}"
 
 #undef LINK_SPEC
-#define LINK_SPEC "%{" SPEC_64 ":-m elf_x86_64} %{" SPEC_32 ":-m elf_i386} \
+#define LINK_SPEC "%{" SPEC_64 ":-m " GNU_USER_LINK_EMULATION64 "} \
+   %{" SPEC_32 ":-m " GNU_USER_LINK_EMULATION32 "} \
   %{shared:-shared} \
   %{!shared: \
 %{!static: \
Index: gcc/config/kfreebsd

Re: GCC Optimisation status update

2011-06-14 Thread Jakub Jelinek
On Tue, Jun 14, 2011 at 03:13:00PM +0300, Dimitrios Apostolou wrote:
> parsing overhead and the hex conversion (implemented suboptimally in

Can you back that up?  glibc conversion to hex representation is fairly
heavily optimized, see _itoa_word in stdio-common/_itoa.h.  In fact, I'd
say it is much better than your implementation.  The overhead
you may see almost certainly comes from the fact that printf family
have to handle lots of different stuff, including narrow/wide output,
optional groupping, localized digits and all kinds of other things.

> glibc) I changed the hottest callers with fwrite()/fputs() and
> implemented a puthexl() function for hex conversion:
> 
> static void puthexl (unsigned long value, FILE *f)
> {
>   static char hex_repr[16]= {'0', '1', '2', '3', '4', '5', '6', '7',
>'8', '9', 'a', 'b', 'c', 'd', 'e', 'f'};
>   static char buf[2 + 2*sizeof(value)]= "0x";
>   int i;
>   int j= 2;
> 
>   for (i = 8*sizeof(value)-4; i>=0; i-= 4)
> {
>   char c= (value >> i) & 0xf;
>   if (c!=0 || j>2)
>   {
> buf[j]= hex_repr[(int)c];

Why not just "0123456789abcdef"[c] instead (and make c an int,
instead of char).  Furthermore, consider instead of filling from the beginning
filling from the end.  buf shouldn't be static.
Hardcoding CHAR_BIT to 8 is not portable.

Jakub


Re: Generate annotations for a binary translator

2011-06-14 Thread Ian Lance Taylor
陳韋任  writes:

>> Different targets use the machine reorg pass for all sorts of different
>> things.  Most of the code in reorg.c is actually not the machine reorg
>> pass, it is the delay slots pass (pass_delay_slots).  The machine reorg
>> pass (pass_machine_reorg) simply calls targetm.machine_dependent_reorg,
>> which is what a backend (in config/*) calls
>> TARGET_MACHINE_DEPENDENT_REORG.
>
>   Which means if config/arch does NOT define TARGET_MACHINE_DEPENDENT_REORG
> , then pass_machine_reorg does NOTHING for that arch. Am I right?

Correct.

Ian


Re: GCC Optimisation status update

2011-06-14 Thread Dimitrios Apostolou

Hi Jakub,

On Tue, 14 Jun 2011, Jakub Jelinek wrote:

On Tue, Jun 14, 2011 at 03:13:00PM +0300, Dimitrios Apostolou wrote:

parsing overhead and the hex conversion (implemented suboptimally in


Can you back that up?  glibc conversion to hex representation is fairly
heavily optimized, see _itoa_word in stdio-common/_itoa.h.  In fact, I'd
say it is much better than your implementation.  The overhead
you may see almost certainly comes from the fact that printf family
have to handle lots of different stuff, including narrow/wide output,
optional groupping, localized digits and all kinds of other things.


It's true that I saw but didn't really understand glibc's _itoa(), and 
just assumed it was slower since it works for all base, not only base 
16 which is special. Nevertheless my measurements showed puthexl()
faster, and the genericity of the function is certainly an important 
factor.


If _itoa() indeed has a better algorithm I could use it.


glibc) I changed the hottest callers with fwrite()/fputs() and
implemented a puthexl() function for hex conversion:

static void puthexl (unsigned long value, FILE *f)
{
  static char hex_repr[16]= {'0', '1', '2', '3', '4', '5', '6', '7',
 '8', '9', 'a', 'b', 'c', 'd', 'e', 'f'};
  static char buf[2 + 2*sizeof(value)]= "0x";
  int i;
  int j= 2;

  for (i = 8*sizeof(value)-4; i>=0; i-= 4)
{
  char c= (value >> i) & 0xf;
  if (c!=0 || j>2)
{
  buf[j]= hex_repr[(int)c];


Why not just "0123456789abcdef"[c] instead


Code just looked more beautiful, that's all. :-)


(and make c an int, instead of char).


done


Furthermore, consider instead of filling from the beginning filling from the 
end.


My first version (out of 5 others) was doing that. I think there was no 
actual performance difference, and filling from start helped skipping 
leading zeroes and appending right after "0x".



buf shouldn't be static.


buf initialisation time I avoid that way is almost negligible, so I could 
change it if necessary. But what is the problem with it being static?



Hardcoding CHAR_BIT to 8 is not portable.


Changed, didn't have a clue about that issue! I'll be delighted to learn 
which platform has CHAR_BIT != 8 :-)


Thanks for your comments,
Dimitris


Re: Seeing gcc as an intelligent agent

2011-06-14 Thread Ian Lance Taylor
"Franck Z"  writes:

> In your opinion, could it be of any use in the project if I tried to
> merge ccache into gcc, so as to assess this "intelligent agent"
> approach ?

I didn't really understand your description, but there have been a
couple of gcc projects in this general space: compiler server and
incremental compiler.  The latter is described at
http://gcc.gnu.org/wiki/IncrementalCompiler .  Simply integrating ccache
into gcc does not make sense to me, as nothing will be gained.

> If I were to try this, should I work from a tagged version from the
> trunk, or is there a branch I should ask to participate into ?

I suspect that any such project would be done on a branch, yes.  Anybody
with write access to the SVN repository is permitted to create a branch.
The main requirement is that all contributors to the branch have an
appropriate copyright assignment.  http://gcc.gnu.org/svnwrite.html .

Ian


Re: GCC Optimisation status update

2011-06-14 Thread Jakub Jelinek
On Tue, Jun 14, 2011 at 05:59:47PM +0300, Dimitrios Apostolou wrote:
> >>static void puthexl (unsigned long value, FILE *f)
> >>{
> >>  static char hex_repr[16]= {'0', '1', '2', '3', '4', '5', '6', '7',
> >> '8', '9', 'a', 'b', 'c', 'd', 'e', 'f'};
> >>  static char buf[2 + 2*sizeof(value)]= "0x";
> >>  int i;
> >>  int j= 2;
> >>
> >>  for (i = 8*sizeof(value)-4; i>=0; i-= 4)
> >>{
> >>  char c= (value >> i) & 0xf;
> >>  if (c!=0 || j>2)
> >>{
> >>  buf[j]= hex_repr[(int)c];
> >
> >Why not just "0123456789abcdef"[c] instead
> 
> Code just looked more beautiful, that's all. :-)

Well, I find that two lines hex_repr initializer much less readable.

Anyway, what glibc does is:
static inline char * __attribute__ ((unused, always_inline))
_itoa_word (unsigned long value, char *buflim,
unsigned int base, int upper_case)
{
  const char *digits = (upper_case ? _itoa_upper_digits : _itoa_lower_digits);

  switch (base)
{
# define SPECIAL(Base) \
case Base: \
  do \
*--buflim = digits[value % Base]; \
  while ((value /= Base) != 0); \
  break

  SPECIAL (10);
  SPECIAL (16);
  SPECIAL (8);
default:
  do
*--buflim = digits[value % base];
  while ((value /= base) != 0);
}
  return buflim;
}
and as it is called with constant base and constant upper_case, the
switch/modulo/division is optimized.

You'd use it as:
void
puthexl (unsigned long value, FILE *f)
{
  char buf[2 + CHAR_BIT * sizeof (value) / 4];
  if (value == 0)
putc ('0', f);
  else
{
  char *p = buf + sizeof (buf);
  do
*--p = "0123456789abcdef"[value % 16];
  while ((value /= 16) != 0);
  *--p = 'x';
  *--p = '0';
  fwrite (p, 1, buf + sizeof (buf) - p, f);
}
}

If the number is small, which is the common case,
this will iterate just small number of items
instead of always 16 times.

Anyway, generally, I wonder if replacing lots of
fprintf calls won't lead to less readable and maintainable
code, if many of the fprintfs will need to be replaced
e.g. by two separate calls (one fwrite, one puthexl
or similar).

Plus, what I said on IRC, regarding transformation
of fprintf calls to fwrite if there are no %s in
the format string, we should leave that to the host
compiler.  It actually already does such transformations
for fprintf, but in this case we have fprintf_unlocked
due to system.h macros, and that isn't optimized by gcc
into fwrite_unlocked.  That IMHO should be fixed on the
host gcc side though.

Jakub


Re: PING^4 APPROVED patch for AMD64 targets running GNU/kFreeBSD, anyone?

2011-06-14 Thread Uros Bizjak
Hello!

> This patch for AMD64 targets running GNU/kFreeBSD has been approved
> already, would anyone be so kind to commit it?  I'm afraid I don't have
> write perms currently.

I have committed your patch to SVN mainline after bootstrapping it on
x86_64-pc-linux-gnu.

Thanks,
Uros.


Configure gcc with --multilib=... ?

2011-06-14 Thread Matt Turner
Hi,

I'd like to ship multilib Gentoo/MIPS installations with only n32 and
n64 ABIs (ie, no o32). The reasoning is that if your system can use
either 64-bit ABI you don't have any reason to run o32, given that
o32-only installation media also exists.

I say this mail http://gcc.gnu.org/ml/gcc/2010-01/msg00063.html
suggesting the addition of a --multilib= configure option. Has such a
thing been added? Is there a way to configure gcc to build only n32
and n64 ABIs?

Thanks,
Matt


Re: GCC Optimisation status update

2011-06-14 Thread H.J. Lu
On Tue, Jun 14, 2011 at 8:21 AM, Jakub Jelinek  wrote:
> On Tue, Jun 14, 2011 at 05:59:47PM +0300, Dimitrios Apostolou wrote:
>> >>static void puthexl (unsigned long value, FILE *f)
>> >>{
>> >>  static char hex_repr[16]= {'0', '1', '2', '3', '4', '5', '6', '7',
>> >>                         '8', '9', 'a', 'b', 'c', 'd', 'e', 'f'};
>> >>  static char buf[2 + 2*sizeof(value)]= "0x";
>> >>  int i;
>> >>  int j= 2;
>> >>
>> >>  for (i = 8*sizeof(value)-4; i>=0; i-= 4)
>> >>    {
>> >>      char c= (value >> i) & 0xf;
>> >>      if (c!=0 || j>2)
>> >>    {
>> >>      buf[j]= hex_repr[(int)c];
>> >
>> >Why not just "0123456789abcdef"[c] instead
>>
>> Code just looked more beautiful, that's all. :-)
>
> Well, I find that two lines hex_repr initializer much less readable.
>
> Anyway, what glibc does is:
> static inline char * __attribute__ ((unused, always_inline))
> _itoa_word (unsigned long value, char *buflim,
>            unsigned int base, int upper_case)
> {
>  const char *digits = (upper_case ? _itoa_upper_digits : _itoa_lower_digits);
>
>  switch (base)
>    {
> # define SPECIAL(Base) \
>    case Base: \
>      do \
>        *--buflim = digits[value % Base]; \
>      while ((value /= Base) != 0); \
>      break
>
>      SPECIAL (10);
>      SPECIAL (16);
>      SPECIAL (8);
>    default:
>      do
>        *--buflim = digits[value % base];
>      while ((value /= base) != 0);
>    }
>  return buflim;
> }
> and as it is called with constant base and constant upper_case, the
> switch/modulo/division is optimized.
>
> You'd use it as:
> void
> puthexl (unsigned long value, FILE *f)
> {
>  char buf[2 + CHAR_BIT * sizeof (value) / 4];
>  if (value == 0)
>    putc ('0', f);
>  else
>    {
>      char *p = buf + sizeof (buf);
>      do
>        *--p = "0123456789abcdef"[value % 16];
>      while ((value /= 16) != 0);
>      *--p = 'x';
>      *--p = '0';
>      fwrite (p, 1, buf + sizeof (buf) - p, f);
>    }
> }
>
> If the number is small, which is the common case,
> this will iterate just small number of items
> instead of always 16 times.
>
> Anyway, generally, I wonder if replacing lots of
> fprintf calls won't lead to less readable and maintainable
> code, if many of the fprintfs will need to be replaced
> e.g. by two separate calls (one fwrite, one puthexl
> or similar).
>
> Plus, what I said on IRC, regarding transformation
> of fprintf calls to fwrite if there are no %s in
> the format string, we should leave that to the host
> compiler.  It actually already does such transformations
> for fprintf, but in this case we have fprintf_unlocked
> due to system.h macros, and that isn't optimized by gcc
> into fwrite_unlocked.  That IMHO should be fixed on the
> host gcc side though.
>

We are working on a patch which will improve decimal
itoa by up to 10X.  It will take a while to finish it.


-- 
H.J.


libgcc: problems adding asm sources (libgcc/siditi-object.mk)

2011-06-14 Thread Georg-Johann Lay
Hi, I intend to add some assembler sources to libgcc build.

Using the straight forward way in ./gcc/config/avr/t-avr

@ -52,7 +50,30 @@ LIB1ASMFUNCS = \
...
+   _ffssi2 \
+   _ffshi2 \
+   _loop_ffsqi2 \
+   _ctzsi2 \
+   _ctzhi2 \
+   _clzdi2 \
+   _clzsi2 \
+   _clz \
+   _paritydi2 \
+   _paritysi2 \
+   _parityhi2 \
+   _popcounthi2 \
+   _popcountsi2 \
+   _popcountdi2 \
+   _popcountqi2 \
+   _bswapsi2 \
+   _bswapdi2

all works fine except that I get warnings from make for hi functions:

(in /avr/ligbcc)

Makefile:375: warning: overriding commands for target `_ffshi2.o'
../../../../gcc.gnu.org/trunk/libgcc/siditi-object.mk:15: warning:
ignoring old commands for target `_ffshi2.o'
Makefile:375: warning: overriding commands for target `_ctzhi2.o'
../../../../gcc.gnu.org/trunk/libgcc/siditi-object.mk:15: warning:
ignoring old commands for target `_ctzhi2.o'
Makefile:375: warning: overriding commands for target `_parityhi2.o'
../../../../gcc.gnu.org/trunk/libgcc/siditi-object.mk:15: warning:
ignoring old commands for target `_parityhi2.o'
Makefile:375: warning: overriding commands for target `_popcounthi2.o'
../../../../gcc.gnu.org/trunk/libgcc/siditi-object.mk:15: warning:
ignoring old commands for target `_popcounthi2.o'
make: `_clzhi2.o' is up to date.

Am I something missing? Adding fragments to LIB1ASMFUNCS should filter
them out in filter-out.

Adding these function to LIB2FUNCS_EXCLUDE does not help; the warnings
persist (I see correct LIB2FUNCS_EXCLUDE in ./gcc/libgcc.mvars).

Setup is with current trunk (175011):

../../gcc.gnu.org/trunk/configure --target=avr
--prefix=/local/gnu/install/gcc-4.7 --disable-nls --disable-shared
--enable-languages=c,c++


Johann


Re: Configure gcc with --multilib=... ?

2011-06-14 Thread Joseph S. Myers
On Tue, 14 Jun 2011, Matt Turner wrote:

> I say this mail http://gcc.gnu.org/ml/gcc/2010-01/msg00063.html
> suggesting the addition of a --multilib= configure option. Has such a
> thing been added? Is there a way to configure gcc to build only n32

No, the project has not yet reached that stage (right now 9000 lines of 
patches have been pending review for over two weeks 
 
, which largely 
blocks subsequent patches) and the proposed option was a driver option, 
not a configure option; the proposal explicitly excluded issues with how 
the set of multilibs is configured.  There is no general configure support 
for adjusting the set of multilibs, although there are some fairly 
flexible options on SH and ad hoc options for other targets such as 
controlling 64-bit libraries on i686-pc-linux-gnu with 
--enable-targets=all (and see how HJ's x32 patches allow configuring 
whether x32 multilibs are enabled).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: libgcc: problems adding asm sources (libgcc/siditi-object.mk)

2011-06-14 Thread Ian Lance Taylor
Georg-Johann Lay  writes:

> Am I something missing? Adding fragments to LIB1ASMFUNCS should filter
> them out in filter-out.

I think the problem is that libgcc/config/avr/t-avr does not filter
LIB1ASMFUNCS out of the lists it generates.  You will need to adjust it
one way or another.

Ian


Re: libgcc: problems adding asm sources (libgcc/siditi-object.mk)

2011-06-14 Thread Georg-Johann Lay
Georg-Johann Lay schrieb:
> Hi, I intend to add some assembler sources to libgcc build.
> 
> Using the straight forward way in ./gcc/config/avr/t-avr
> 
> @ -52,7 +50,30 @@ LIB1ASMFUNCS = \
> ...
> +   _ffssi2 \
> +   _ffshi2 \
> +   _loop_ffsqi2 \
> +   _ctzsi2 \
> +   _ctzhi2 \
> +   _clzdi2 \
> +   _clzsi2 \
> +   _clz \
> +   _paritydi2 \
> +   _paritysi2 \
> +   _parityhi2 \
> +   _popcounthi2 \
> +   _popcountsi2 \
> +   _popcountdi2 \
> +   _popcountqi2 \
> +   _bswapsi2 \
> +   _bswapdi2
> 
> all works fine except that I get warnings from make for hi functions:
> 
> (in /avr/ligbcc)
> 
> Makefile:375: warning: overriding commands for target `_ffshi2.o'
> ../../../../gcc.gnu.org/trunk/libgcc/siditi-object.mk:15: warning:
> ignoring old commands for target `_ffshi2.o'
> Makefile:375: warning: overriding commands for target `_ctzhi2.o'
> ../../../../gcc.gnu.org/trunk/libgcc/siditi-object.mk:15: warning:
> ignoring old commands for target `_ctzhi2.o'
> Makefile:375: warning: overriding commands for target `_parityhi2.o'
> ../../../../gcc.gnu.org/trunk/libgcc/siditi-object.mk:15: warning:
> ignoring old commands for target `_parityhi2.o'
> Makefile:375: warning: overriding commands for target `_popcounthi2.o'
> ../../../../gcc.gnu.org/trunk/libgcc/siditi-object.mk:15: warning:
> ignoring old commands for target `_popcounthi2.o'
> make: `_clzhi2.o' is up to date.
> 
> Am I something missing? Adding fragments to LIB1ASMFUNCS should filter
> them out in filter-out.
> 
> Adding these function to LIB2FUNCS_EXCLUDE does not help; the warnings
> persist (I see correct LIB2FUNCS_EXCLUDE in ./gcc/libgcc.mvars).
> 
> Setup is with current trunk (175011):
> 
> ../../gcc.gnu.org/trunk/configure --target=avr
> --prefix=/local/gnu/install/gcc-4.7 --disable-nls --disable-shared
> --enable-languages=c,c++
> 

Ok, I found ./libgcc/config/avr/t-avr :-)

> Johann



Re: Seeing gcc as an intelligent agent

2011-06-14 Thread Franck Z
I'm more relying on a hunch right now, than on a sound analysis. Maybe I'm 
wrong...


I have to take a closer look at how ccache proceeds to speed up compilation 
and precisely at how gcc internally uses I/O functions, to be more 
articulate on the subject.


Maybe I could slowly and publicly form a detailed answer to your question, 
as I improve my understanding of what's at stake (without being too much of 
a nuisance for this mailing list) as in the link you provide? Something like 
an "assignment agenda" as in the link?


Thank you for all the information. 



Re: Configure gcc with --multilib=... ?

2011-06-14 Thread H.J. Lu
On Tue, Jun 14, 2011 at 9:26 AM, Joseph S. Myers
 wrote:
> On Tue, 14 Jun 2011, Matt Turner wrote:
>
>> I say this mail http://gcc.gnu.org/ml/gcc/2010-01/msg00063.html
>> suggesting the addition of a --multilib= configure option. Has such a
>> thing been added? Is there a way to configure gcc to build only n32
>
> No, the project has not yet reached that stage (right now 9000 lines of
> patches have been pending review for over two weeks
> 
> , which largely
> blocks subsequent patches) and the proposed option was a driver option,
> not a configure option; the proposal explicitly excluded issues with how
> the set of multilibs is configured.  There is no general configure support
> for adjusting the set of multilibs, although there are some fairly
> flexible options on SH and ad hoc options for other targets such as
> controlling 64-bit libraries on i686-pc-linux-gnu with
> --enable-targets=all (and see how HJ's x32 patches allow configuring
> whether x32 multilibs are enabled).
>

The updated initial x32 patch is at:

http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01088.html

-- 
H.J.


Re: libgcc: problems adding asm sources (libgcc/siditi-object.mk)

2011-06-14 Thread Georg-Johann Lay
Ian Lance Taylor schrieb:
> Georg-Johann Lay  writes:
> 
>> Am I something missing? Adding fragments to LIB1ASMFUNCS should filter
>> them out in filter-out.
> 
> I think the problem is that libgcc/config/avr/t-avr does not filter
> LIB1ASMFUNCS out of the lists it generates.  You will need to adjust it
> one way or another.
> 
> Ian

Thanks, that works. Just removed the parts from the list appears to be
the simplest way.

Wondering why there is now just another t-target, both t-targets
containing snips of libgcc.

Johann





Re: libgcc: problems adding asm sources (libgcc/siditi-object.mk)

2011-06-14 Thread Ian Lance Taylor
Georg-Johann Lay  writes:

> Ian Lance Taylor schrieb:
>> Georg-Johann Lay  writes:
>> 
>>> Am I something missing? Adding fragments to LIB1ASMFUNCS should filter
>>> them out in filter-out.
>> 
>> I think the problem is that libgcc/config/avr/t-avr does not filter
>> LIB1ASMFUNCS out of the lists it generates.  You will need to adjust it
>> one way or another.
>
> Thanks, that works. Just removed the parts from the list appears to be
> the simplest way.
>
> Wondering why there is now just another t-target, both t-targets
> containing snips of libgcc.

There is a very slowly moving incomplete transition to move all the
libgcc configury support and sources from gcc/config/* to
libgcc/config/*.

If you are creating new files you can help that transition by creating
them in libgcc rather than gcc.

Ian


Re: GCC Optimisation status update

2011-06-14 Thread Dimitrios Apostolou

Hi Jakub,

On Tue, 14 Jun 2011, Jakub Jelinek wrote:

You'd use it as:
void
puthexl (unsigned long value, FILE *f)
{
 char buf[2 + CHAR_BIT * sizeof (value) / 4];
 if (value == 0)
   putc ('0', f);
 else
   {
 char *p = buf + sizeof (buf);
 do
   *--p = "0123456789abcdef"[value % 16];
 while ((value /= 16) != 0);
 *--p = 'x';
 *--p = '0';
 fwrite (p, 1, buf + sizeof (buf) - p, f);
   }
}

If the number is small, which is the common case,
this will iterate just small number of items
instead of always 16 times.


Thanks for the explanation, I measured your version and is indeed faster 
for the common case, so I'll be using it.




Anyway, generally, I wonder if replacing lots of
fprintf calls won't lead to less readable and maintainable
code, if many of the fprintfs will need to be replaced
e.g. by two separate calls (one fwrite, one puthexl
or similar).

Plus, what I said on IRC, regarding transformation
of fprintf calls to fwrite if there are no %s in
the format string, we should leave that to the host
compiler.  It actually already does such transformations
for fprintf, but in this case we have fprintf_unlocked
due to system.h macros, and that isn't optimized by gcc
into fwrite_unlocked.  That IMHO should be fixed on the
host gcc side though.


You're probably right, it's just that for starters I'm looking into the 
easy stuff to optimise, the low hanging fruit. I think it's gonna be much 
harder to implement these optimisations into some optimising pass of the 
compiler itself, and for now I don't even know where to look at for this, 
I'll probably check it later.



Thanks,
Dimitris



Re: GCC Optimisation status update

2011-06-14 Thread zoltan

> We are working on a patch which will improve decimal
> itoa by up to 10X.  It will take a while to finish it.

What's the method?

I have a function converting 32 bit unsigneds to decimal which costs one
32x32->64 multiply with a constant (a single constant, not a look-up
table) plus a max. 8-times loop involving a few 64-bit adds and shifts,
which can be unrolled for speed (there's very little in the loop body,
really). There's also an initial overhead of up to three 32-bit compare
and subtracts.

The 64 bit unsigned to decimal conversion costs two calls to the above
routine, three 32x32->64 multiplies and a few preparation steps, which
are simple 64-bit add/sub things.

The routines are used on 32-bit ARM chips where multiply is dirt cheap;
for chips with no 32x32->64 multiply they might not be feasible. The
routines are also quite simple. Would they be useful for you, they've been
released under the GPL (with an additional relaxational clause, but that's
irrelevant here). I don't know if the method is well-known already, casual
search on the Net did not find binary to decimal conversion using the
above technique at the time when I came up with it (couple of years ago),
so it may not be that widespread.

I also have routines to convert 32 and 64 bit numbers to arbitrary base
without using division but again, they are heavily reliant on the cheap
32x32->64 multiply and cheap 64-bit shifts.

Zoltan



Re: GCC Optimisation status update

2011-06-14 Thread H.J. Lu
On Tue, Jun 14, 2011 at 2:51 PM,   wrote:
>
>> We are working on a patch which will improve decimal
>> itoa by up to 10X.  It will take a while to finish it.
>
> What's the method?
>

We use SSSE3 and SSE4 instructions for shift and multiply.

-- 
H.J.


gcc-4.4-20110614 is now available

2011-06-14 Thread gccadmin
Snapshot gcc-4.4-20110614 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.4-20110614/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.4 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_4-branch 
revision 175061

You'll find:

 gcc-4.4-20110614.tar.bz2 Complete GCC

  MD5=574003db9f7c833ea3e70072ec7ab39c
  SHA1=ee07de308b8479ee7d084536715ecbe3c029e304

Diffs from 4.4-20110607 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.4
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


[BUG 49411] questions

2011-06-14 Thread Quentin Neill
I wrote http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49411, and I have
some general questions:

1. I left it as normal P3, is that appropriate for an obvious ICE like this?
2. Should I open another bug for 4.6?
3. What's the best/right way to specify testcases in GCC bugs?

And the specific question about how to fix it:
I see in gcc/config/i386/sse.md that  'define_insn "xop_rotr3"'
matches operand 2 with "const_0_to_operand"; doesn't
that preclude the -1 from matching?
-- 
Quentin


[google] Merged gcc-4_6-branch into google/gcc-4_6

2011-06-14 Thread Diego Novillo
This brings google/gcc-4_6 up to rev 175007.

Validated on x86_64.


Diego.


Re: [BUG 49411] questions

2011-06-14 Thread Jonathan Wakely
On 14 June 2011 23:53, Quentin Neill wrote:
> I wrote http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49411, and I have
> some general questions:
>
> 1. I left it as normal P3, is that appropriate for an obvious ICE like this?

The GCC release managers set the priorities, not bug reporters.

> 2. Should I open another bug for 4.6?

Nope, it's easier to deal with a single PR for a single issue, not one
per release series.

> 3. What's the best/right way to specify testcases in GCC bugs?

If the testcase is small (fewer than +30 lines) I usually just paste
it into the comments box as you did. Otherwise attach a file,
preprocessed source if appropriate (i.e. if it depends on any header
that doesn't come from gcc)


Returning unions (Was: Re: Ping^5: Re: Updated^2: RFA: Fix middle-end/46500 (void * encapsulated))

2011-06-14 Thread Joern Rennecke

Quoting Bernd Schmidt :


* Most sane ABIs pass single-word structs in registers


Unfortunately, the i386 SYSV ABI (used generally for i386 elf toolchains)
is half-way insane in that respect: function return of small aggregates also
goes via a caller-passed pointer to a stack slot.  (You can avoid that using
-freg-struct-return, but that option is not safe unless you have a
full set of multilibs built with that option.)


* For the most part, gcc runs on i686 and there it doesn't make a
  difference.


It used to make a difference for function value return.  But apparently
we have lost that feature of transparent union somewhere between gcc 2.7.0
and gcc 4.4.5 .


Re: Returning unions (Was: Re: Ping^5: Re: Updated^2: RFA: Fix middle-end/46500 (void * encapsulated))

2011-06-14 Thread H.J. Lu
On Tue, Jun 14, 2011 at 5:44 PM, Joern Rennecke  wrote:
> Quoting Bernd Schmidt :
>
>> * Most sane ABIs pass single-word structs in registers
>
> Unfortunately, the i386 SYSV ABI (used generally for i386 elf toolchains)
> is half-way insane in that respect: function return of small aggregates also
> goes via a caller-passed pointer to a stack slot.  (You can avoid that using
> -freg-struct-return, but that option is not safe unless you have a
> full set of multilibs built with that option.)
>
>> * For the most part, gcc runs on i686 and there it doesn't make a
>>  difference.
>
> It used to make a difference for function value return.  But apparently
> we have lost that feature of transparent union somewhere between gcc 2.7.0
> and gcc 4.4.5 .
>

Do you have a testcase for i386?


-- 
H.J.


Re: Generate annotations for a binary translator

2011-06-14 Thread 陳韋任
> >> Different targets use the machine reorg pass for all sorts of different
> >> things.  Most of the code in reorg.c is actually not the machine reorg
> >> pass, it is the delay slots pass (pass_delay_slots).  The machine reorg
> >> pass (pass_machine_reorg) simply calls targetm.machine_dependent_reorg,
> >> which is what a backend (in config/*) calls
> >> TARGET_MACHINE_DEPENDENT_REORG.
> >
> >   Which means if config/arch does NOT define TARGET_MACHINE_DEPENDENT_REORG
> > , then pass_machine_reorg does NOTHING for that arch. Am I right?
> 
> Correct.

  I am looking into config/arch which defines TARGET_MACHINE_DEPENDENT_REORG and
trying to figure out what kind of operations might change the CFG.

  Take the function ix86_reorg in config/i386/i386.c as an example.

1. Functions like ix86_pad_short_function, ix86_pad_returns and 
ix86_avoid_jump_mispredicts
   add padding, i.e., nop instruction.

2. Function move_or_delete_vzeroupper move or delete vzeroupper in different 
cases.

  I think above operations only modify basic blocks in the CFG but NOT the 
edges between
basic blocks. And the CFG should be the same in this case, right? Although I am 
not 100%
sure if they do exactly what the comments say. ;-)

Regards,
chenwj

-- 
Wei-Ren Chen (陳韋任)
Computer Systems Lab, Institute of Information Science,
Academia Sinica, Taiwan (R.O.C.)
Tel:886-2-2788-3799 #1667


Re: Returning unions (Was: Re: Ping^5: Re: Updated^2: RFA: Fix middle-end/46500 (void * encapsulated))

2011-06-14 Thread Joern Rennecke

Quoting "H.J. Lu" :


Do you have a testcase for i386?


struct args { int i0, i1; };

union args_u { struct args *a; } __attribute__((transparent_union));

union args_u
f (union args_u in)
{
  union args_u out;

  out.a = in.a + 1;

  return out;
}


Re: Generate annotations for a binary translator

2011-06-14 Thread Ian Lance Taylor
陳韋任  writes:

>   I am looking into config/arch which defines TARGET_MACHINE_DEPENDENT_REORG 
> and
> trying to figure out what kind of operations might change the CFG.

When I want to look for a backend that does something crazy, I usually
start with sh.  And sure enough, sh_reorg calls split_branches which
creates jumps over jumps, which is a CFG change.  (There may be other
CFG changes in there, that was just the first one I saw.)

Ian


Re: Seeing gcc as an intelligent agent

2011-06-14 Thread Franck Z

Hello,

After a good night's sleep, I think I've found a way to write down more 
clearly the rationale that motivates me.


It's about the semantics gcc can derive from the command line arguments. 
It's a very efficient way of specifying about the desired result of the 
compilation, but less about how compilation should be done internally by 
gcc.


It's possible to do so, for sure. For instance, we can tell gcc not to 
compile a header file, but to use a pre-compiled header instead in the 
command line. Still, this kind of specification is likely to be indistinctly 
set throughout the execution of a make command.


It seems to me that (some) internal data for each execution of gcc (as it 
builds itself throughout the execution of a makefile) could be made 
available to all the gcc commands, with the help of the mmap() protocole.


Therefore, with the help of this unaltered semantic context, each gcc 
command could assess what has been done before it and adjust its way of 
performing its compilation sequence, without having to parse any external 
file, but for the hits/misses the use of mmap() can induce.


I understand that, for a process, to assess the state of its environment and 
make plans accordingly, is what is referred to by the phrase "intelligent 
agent".


It could affect both duplicate compilation tasks throughout a makefile, but 
also the good use of disk ressource, which is the slow physical part in the 
compilation process.


I'm interested in merging ccache into gcc, because it seems to already 
perform an assessment of compilation context. It could help me verify if 
mmap() is a good tool for an intelligent agent approach, without having to 
yet devise the difficult part of the agent, which I guess is the "context 
assessment" part.


However, in terms of disk use, this merge could, as of now, improve the 
ccache solution, as ccache relies on intermediary file output from gcc.


However, I still need to deepen a lot my knowledge of every part I mention 
here. I think my first step should be to try to make available some simple 
datum, like a "Hello, world!" string for instance, to all the make and gcc 
processes launched in a make session with mmap().


Best regards,
Franck Z