Bugs in operations of "long" type from GCC-3.4.6 32 bit.

2007-03-27 Thread J.C. Pizarro

Hi people,

bug_mult_and_shift_long.c
-
#include  // it's for public domain by J.C. Pizarro, hahahahaha
int main() {
  long long int a,b,c;
  unsigned int hi_a,lo_a,hi_b,lo_b,hi_c,lo_c;

  a = 100L; // 10G
  b = 1L; // 100M
  c = a * b;
  hi_a = (((unsigned long)a)>>32); // <- warning? why? (>>32)
  hi_a = unsigned long)a)>>31)>>1); // <- no warning! dirty workaround

  lo_a = unsigned long)(a<<32))>>31)>>1); // <- warning? why? (<<32)
  lo_a = a; // <- no warning!
  hi_b = unsigned long)b)>>1)>>31);
  lo_b = b & 0xL;

  hi_c = unsigned long)c)>>16)>>16);
  lo_c = c & 0xL;
  printf("a = %ld\n",a);
  printf("b = %ld\n",b);
  printf("c = a x b = %ld\n",c);

  printf("high(a) = 0x%08X ; low(a) = 0x%08X\n",hi_a,lo_a);
  printf("high(b) = 0x%08X ; low(b) = 0x%08X\n",hi_b,lo_b);
  printf("high(c) = 0x%08X ; low(c) = 0x%08X\n",hi_c,lo_c);

  printf("why this multiply?\n");
  printf("It's a bug: long x long -> { 0, unsigned int } \
instead of long x long -> long\n");
  return 0;
}


run.sh
--
#!/bin/sh
gcc bug_mult_and_shift_long.c
./a.out
rm -f a.out


run.log
---
Reading specs from /usr/lib/gcc/i486-slackware
-linux/3.4.6/specs
Configured with: ../gcc-3.4.6/configure --prefix=/usr --enable-shared
--enable-threads=posix --enable-__cxa_atexit --disable-checking
--with-gnu-ld --verbose --target=i486-slackware-linux
--host=i486-slackware-linux

Thread model: posix
gcc version 3.4.6
bug_mult_and_shift_long.c: In function `main':
bug_mult_and_shift_long.c:5: warning: integer constant is too large
for "long" type
bug_mult_and_shift_long.c:8: warning: right shift count >= width of type

a = 1410065408
b = 1
c = a x b = -1486618624
high(a) = 0x ; low(a) = 0x540BE400
high(b) = 0x ; low(b) = 0x05F5E100
high(c) = 0x ; low(c) = 0xA764
why this multiply?

It's a bug: long x long -> { 0, unsigned int } instead of long x long -> long


Brief summary, there are 3 bugs:
1. Error and Warning in assignment of a long constant (with L letter)
(it's not true that a = 1410065408, high(a) = 0x).
2. Warning in shifts << & >> of a long variable (i don't know if there
is an error).
3. Error in multiply of long variables. (it's not true that c = -1486618624).

Sincerely yours, J.C. Pizarro


testing_long_GCC_march2007.tar.gz
Description: GNU Zip compressed data


Bugs in operations of "long" type from GCC 4.1.3 20070326 (prerelease) 32 bit too.

2007-03-27 Thread J.C. Pizarro

Hi people,

bug_mult_and_shift_long.c
-
#include  // it's for public domain by J.C. Pizarro, hahahahaha
int main() {
 long long int a,b,c;
 unsigned int hi_a,lo_a,hi_b,lo_b,hi_c,lo_c;

 a = 100L; // 10G
 b = 1L; // 100M
 c = a * b;
 hi_a = (((unsigned long)a)>>32); // <- warning? why? (>>32)
 hi_a = unsigned long)a)>>31)>>1); // <- no warning! dirty workaround

 lo_a = unsigned long)(a<<32))>>31)>>1); // <- warning? why? (<<32)
 lo_a = a; // <- no warning!
 hi_b = unsigned long)b)>>1)>>31);
 lo_b = b & 0xL;

 hi_c = unsigned long)c)>>16)>>16);
 lo_c = c & 0xL;
 printf("a = %ld\n",a);
 printf("b = %ld\n",b);
 printf("c = a x b = %ld\n",c);

 printf("high(a) = 0x%08X ; low(a) = 0x%08X\n",hi_a,lo_a);
 printf("high(b) = 0x%08X ; low(b) = 0x%08X\n",hi_b,lo_b);
 printf("high(c) = 0x%08X ; low(c) = 0x%08X\n",hi_c,lo_c);

 printf("why this multiply?\n");
 printf("It's a bug: long x long -> { 0, unsigned int } \
instead of long x long -> long\n");
 return 0;
}


run.sh
--
#!/bin/sh
gcc bug_mult_and_shift_long.c
./a.out
rm -f a.out


run.log
---
Reading specs from /usr/lib/gcc/i486-slackware-linux/3.4.6/specs
Configured with: ../gcc-3.4.6/configure --prefix=/usr --enable-shared
--enable-threads=posix --enable-__cxa_atexit --disable-checking
--with-gnu-ld --verbose --target=i486-slackware-linux
--host=i486-slackware-linux
Thread model: posix
gcc version 3.4.6
bug_mult_and_shift_long.c: In function `main':
bug_mult_and_shift_long.c:5: warning: integer constant is too large
for "long" type
bug_mult_and_shift_long.c:8: warning: right shift count >= width of type
a = 1410065408
b = 1
c = a x b = -1486618624
high(a) = 0x ; low(a) = 0x540BE400
high(b) = 0x ; low(b) = 0x05F5E100
high(c) = 0x ; low(c) = 0xA764
why this multiply?
It's a bug: long x long -> { 0, unsigned int } instead of long x long -> long



Using built-in specs.
Target: i686-pc-linux-gnu
Configured with: ./configure --prefix=/opt/gcc413
Thread model: posix
gcc version 4.1.3 20070326 (prerelease)
bug_mult_and_shift_long.c: In function 'main':
bug_mult_and_shift_long.c:5: warning: integer constant is too large
for 'long' type
bug_mult_and_shift_long.c:8: warning: right shift count >= width of type
a = 1410065408
b = 1
c = a x b = -1486618624
high(a) = 0x ; low(a) = 0x540BE400
high(b) = 0x ; low(b) = 0x05F5E100
high(c) = 0x ; low(c) = 0xA764
why this multiply?
It's a bug: long x long -> { 0, unsigned int } instead of long x long -> long


Brief summary, there are 3 bugs:
1. Error and Warning in assignment of a long constant (with L letter)
(it's not true that a = 1410065408, high(a) = 0x).
2. Warning in shifts << & >> of a long variable (i don't know if there
is an error).
3. Error in multiply of long variables. (it's not true that c = -1486618624).

Sincerely yours, J.C. Pizarro


testing_long_GCC_march2007_2.tar.gz
Description: GNU Zip compressed data


Re: Bugs in operations of "long" type from GCC-3.4.6 32 bit.

2007-03-27 Thread J.C. Pizarro

2007/3/28, Andreas Schwab <[EMAIL PROTECTED]> wrote:


"J.C. Pizarro" <[EMAIL PROTECTED]> writes:

> Brief summary, there are 3 bugs:

None of them are bugs in the compiler, only in your program.


Sure? My program is a testsuite of why the long type doesn't work, no
why my program doesn't work.


> 1. Error and Warning in assignment of a long constant (with L letter)
> (it's not true that a = 1410065408, high(a) = 0x).

The constant overflows the range of long, causing undefined behaviour.


The range of long should be -(2^63) ..  +((2^63)-1).
There is not reason for an undefined behaviour.


> 2. Warning in shifts << & >> of a long variable (i don't know if there
> is an error).

A shift count greater than or equal to the width of a type causes
undefined behaviour.


For shifts, the range of a long type should be 0 .. 64 (65 values), not 0 .. 31.
There is not reason for an undefined behaviour.

There is another future strange thing, the range -63 .. -1, 0, 1 ..
64. can be specially reversible, the reverse of << is >>, and
viceversa. Hahahaha, is it useful? I don't know.


> 3. Error in multiply of long variables. (it's not true that c = -1486618624).

%ld is not the correct format for a value of type long long, causing
undefined behaviour.


%ld means long format for %d, is it wrong?

Thanks, bye friend ;)


Re: Bugs in operations of "long" type from GCC-3.4.6 32 bit.

2007-03-27 Thread J.C. Pizarro

27 Mar 2007 16:35:16 -0700, Ian Lance Taylor <[EMAIL PROTECTED]>:

You are confusing long and long long.  This is C, not Java.  The
suffix for a long long constant is LL, not L.


Very thanks, LL and %lld are the solution. The GCC compiler has not
bugs that i said.


testing_long_GCC_march2007_3.tar.gz
Description: GNU Zip compressed data


Re: Bugs in operations of "long" type from GCC-3.4.6 32 bit.

2007-03-27 Thread J.C. Pizarro

2007/3/28, Andreas Schwab <[EMAIL PROTECTED]>:

"J.C. Pizarro" <[EMAIL PROTECTED]> writes:

> The range of long should be -(2^63) ..  +((2^63)-1).

Your long has only 32 bits.

> For shifts, the range of a long type should be 0 .. 64 (65 values), not 0 .. 
31.

Your long has only 32 bits.

>> %ld is not the correct format for a value of type long long, causing
>> undefined behaviour.
>
> %ld means long format for %d, is it wrong?

%ld is for long, not long long.

Andreas.


Very thanks, LL and %lld are the solution. The GCC compiler has not
bugs that i said.


testing_long_GCC_march2007_3.tar.gz
Description: GNU Zip compressed data


Is it possible to do some GCC's stages more "modular"?

2007-03-30 Thread J.C. Pizarro

Hi people

I want to talk an interesting topic of GCC hierarchy of subhierarchies.

By example, i want to add my personal option of optimization to GCC
but I see that it's very "monolithic".
I don't see the subhierarchy of optimation stage in the snapshot tree.

Sincerely yours, J.C. Pizarro


Re: Proposal: changing representation of memory references

2007-04-04 Thread J.C. Pizarro

It's poorly implemented, unrefactored, without formal specification,
without OO hierarchy, etc.

"The pointers are the evilness of the optimization".


Re: Integer overflow in operator new

2007-04-06 Thread J.C. Pizarro

2007/4/6, Karl Chen <[EMAIL PROTECTED]>:


Hi all, apologies if this has been discussed before, but I
couldn't find anything about this issue in gcc mailing list
archives.

Use of operator new (et al) appears to have an integer overflow;
this function:

int * allocate_int(size_t n)
{
return new int[n];
}

with gcc-4.1 on IA-32 compiles to:

_Z12allocate_intj:
pushl   %ebp
movl%esp, %ebp
subl$8, %esp
movl8(%ebp), %eax
sall$2, %eax   
movl%eax, (%esp)
call_Znaj
leave
ret

which is equivalent to the compilation of:

int * allocate_int(size_t n)
{
return (int*) operator new[](4 * n);
}

"4 * n", unchecked, is vulnerable to integer overflow.  On IA-32,
"new int[0x4001]" becomes equivalent to "new int[1]".  I've
verified this on gcc-2.95 through 4.1.  For larger objects the
effects are exaggerated; smaller counts are needed to overflow.

This is similar to the calloc integer overflow vulnerability in
glibc, which was fixed back in 2002.  Interestingly, RUS-CERT
2002-08:02 did mention 'operator new', and so did Bugtraq 5398.
http://cert.uni-stuttgart.de/advisories/calloc.php
http://www.securityfocus.com/bid/5398/discuss

See also this 2004 article by Raymond Chen:
http://blogs.msdn.com/oldnewthing/archive/2004/01/29/64389.aspx

Possible fixes might be to abort or to pass ULONG_MAX (0x)
to 'operator new' and let it return NULL or throw bad_alloc.

At least one other compiler already specifically guards against
integer overflow in 'operator new'.

--
Karl 2007-04-06 07:30




You've reason! There was not anything about this issue in gcc mailing
list archives.
But, i've more discussion about it!.

A possible workaround could be it but it's vulnerable if it's defined
with -DNDEBUG :

int * allocate_int(size_t n)
{
// it's another integer overflow, a positive can become to a negative.
//n=1073741823 (0x3FFF) => n*4=-4 (0xFFFC)
//return (int*) operator new[](-4); !!! it's easy for
buffer overflow.
assert(0 <= (4 * n));
// it's an assert against your integer overflow.
assert((4ULL * n) <= ULONG_MAX);
return (int*) operator new[](4 * n);
}


Re: Integer overflow in operator new

2007-04-06 Thread J.C. Pizarro

Good points.

Regarding negatives, I believe 'operator new' takes a size_t,
which is unsigned, but if it were signed it, the multiplication
would indeed be in danger of creating a negative.

If possible, I would prefer a solution that's built-in to operator
new.  I was thinking it should be implemented when code is
generated, for example using jc/jo/seto on i386.

--
Karl 2007-04-06 15:41


I've a good proposition for catching intruders in the code using
an option  -DCATCH_NEW_INTRUDER by example:

   int * allocate_int(size_t n)
   {
   int *p;
#ifdef CATCH_NEW_INTRUDER
   log_and_raise_if_new_intruder_anomaly(n,4);
#endif //CATCH_NEW_INTRUDER
   p = (int*) operator new[](4 * n);
#ifdef CATCH_NEW_INTRUDER
   log_and_raise_if_new_intruder_anomaly_return_not_null(n,4,p);
#endif //CATCH_NEW_INTRUDER
       return p;
   }

J.C. Pizarro


Re: Integer overflow in operator new

2007-04-06 Thread J.C. Pizarro


The assert should not overflow.  I suggest

#include 
#include 
assert( n < SIZE_MAX / sizeof(int) );

which requires two pieces of information that the programmer
otherwise wouldn't need, SIZE_MAX and sizeof(type).

Asking programmers to write extra code for rare events, has
not been very successful.  It would be better if the compiler
incorporated this check into operator new, though throwing
an exception rather than asserting.  The compiler should be
able to eliminate many of the conditionals.

--
Lawrence Crowl


The operator new is reused many times as many callers.
So, the programmers don't have many times to write extra
code for calling into operator new.


The compiler should be able to eliminate many of the conditionals.

Yes but no, there are cases that the compiler can't eliminate the
conditionals that depend on run-time, e.g., "n" is non-constant parameter.

J.C. Pizarro


Re: Integer overflow in operator new

2007-04-06 Thread J.C. Pizarro

06 Apr 2007 18:53:47 -0500, Gabriel Dos Reis <[EMAIL PROTECTED]>:

"J.C. Pizarro" <[EMAIL PROTECTED]> writes:

[...]

| > The compiler should be able to eliminate many of the conditionals.
| Yes but no, there are cases that the compiler can't eliminate the
| conditionals that depend on run-time, e.g., "n" is non-constant parameter.

What is the performance penalty compared to the actual allocation work
on "typical" modern systems?

-- Gaby



It depends in the people's tastes.

If the allocator is slow then there is no performance penalty.

But if someone implements one fastest bucket-based quickallocator then
the performance penalty with this check is considerable.

I remember the Amdahl Law.

J.C. Pizarro


Super bad accuracy in the output of gprof when is used -pg.

2007-04-06 Thread J.C. Pizarro

I've probed the profiling of p7zip-4.44  (c++, lzma,
linux-2.6.20.5.tar as data).

There is an absolute lack of profile timing information because of
a lot of 0.00 and little bit of 0.01. There is not entry of >0.01 seconds.

Its output really confuses me and your.

The name 'seconds' must be replaced by 'us' (microseconds) or 'ms'
(milliseconds).
So, then, they won't have many 0.00 us instead of 0.00 s.

Because of this timing inacurracy, i can't write a comparison of the
application's fine-grained results and afterly apply the speedup formula
from the Amdahl Law.

Can it use the privileged i586's RDTSC to count the time in CPU cycles?

Thanks

J.C. Pizarro


probing_p7zip-4.44_20070407.tar.bz2
Description: BZip2 compressed data


Re: Super bad accuracy in the output of gprof when is used -pg.

2007-04-06 Thread J.C. Pizarro

The name 'seconds' must be replaced by 'us' (microseconds) or 'ms'
(milliseconds).
So, then, they won't have many 0.00 us instead of 0.00 s.


An advice, extend it from 2 to 4 decimal digits for
better comparison between similar functions.

* 0. us instead of 0.00 us
* 0. ms instead of 0.00 ms

J.C. Pizarro


Re: Integer overflow in operator new

2007-04-07 Thread J.C. Pizarro

2007/4/7, Ross Ridge <[EMAIL PROTECTED]>:

Joe Buck writes:
>If a check were to be implemented, the right thing to do would be to throw
>bad_alloc (for the default new) or return 0 (for the nothrow new).

What do you do if the user has defined his own operator new that does
something else?


The callees checkers should to be with optional stubs, by example, the user
wants to catch the error, log and send an e-mail to him and to data center.


>There cases where the penalty for this check could have
>an impact, like for pool allocators that are otherwise very cheap.
>If so, there could be a flag to suppress the check.

Excessive code size growth could also be problem for some programs.


A solution is using the -shared option to generate ".so" library.

Another future solution is pack the big ".so" library with UPX
(Ultimate Packer for eXecutables) or extend the ELF format to
permit pack the sections with GZ, BZ2 or LZMA.



            Ross Ridge



J.C. Pizarro.


Re: Integer overflow in operator new

2007-04-07 Thread J.C. Pizarro

2007/4/7, Robert Dewar <[EMAIL PROTECTED]>:

> A solution is using the -shared option to generate ".so" library.

That does not solve things in environments like embedded
environments where there are no shared libraries.


Use -Os and "strip --strip-all". And remove code if you don't like it.


> Another future solution is pack the big ".so" library with UPX
> (Ultimate Packer for eXecutables) or extend the ELF format to
> permit pack the sections with GZ, BZ2 or LZMA.

We are worried about code space in memory, not space on disk!


Or extend the ELF format to permit pack non-solidly or solidly (=>
slower stream)
the many subsections with GZ, BZ2, UPX or LZMA (<=1MiB to uncompress e.g.).
Theirs buffers are very small to permit to raise an exception.
Like squashfs for embedded systems instead cramfs.


Re: Integer overflow in operator new

2007-04-07 Thread J.C. Pizarro

2007/4/7, Robert Dewar <[EMAIL PROTECTED]>:
> > A solution is using the -shared option to generate ".so" library.
>
> That does not solve things in environments like embedded
> environments where there are no shared libraries.

Use -Os and "strip --strip-all". And remove code if you don't like it.

> > Another future solution is pack the big ".so" library with UPX
> > (Ultimate Packer for eXecutables) or extend the ELF format to
> > permit pack the sections with GZ, BZ2 or LZMA.
>
> We are worried about code space in memory, not space on disk!

Or extend the ELF format to permit pack non-solidly the many
subsections with GZ, BZ2, UPX or LZMA (<=1MiB to uncompress e.g.).
Theirs buffers are very small
to permit to raise an exception.
Like squashfs for embedded systems instead cramfs.



This same idea is applicable to pack the gigant /usr/lib/libgcj.so that
its  current non-packed size on disk is >=9 MiB, sometimes >=50 MiB.

Remember, Java is generated to C++ with gcj and compiled with g++.


Re: Integer overflow in operator new

2007-04-08 Thread J.C. Pizarro

Joe Buck wrote:

> > > inline size_t __compute_size(size_t num, size_t size) {
> > > size_t product = num * size;
> > > return product >= num ? product : ~size_t(0);
> > > }


2007/4/9, Ross Smith <[EMAIL PROTECTED]> wrote:

On Monday, 9 April 2007 10:23, Florian Weimer wrote:
> * Ross Ridge:
> > Florian Weimer writes:
> >>I don't think this check is correct.  Consider num = 0x3334 and
> >>size = 6.  It seems that the check is difficult to perform
> >> efficiently unless the architecture provides unsigned
> >> multiplication with overflow detection, or an instruction to
> >> implement __builtin_clz.
> >
> > This should work instead:
> >
> > inline size_t __compute_size(size_t num, size_t size) {
> > if (num > ~size_t(0) / size)
> > return ~size_t(0);
> > return num * size;
> > }
>
> Yeah, but that division is fairly expensive if it can't be performed
> at compile time.  OTOH, if __compute_size is inlined in all places,
> code size does increase somewhat.

You could avoid the division in nearly all cases by checking for
reasonably-sized arguments first:

inline size_t __compute_size(size_t num, size_t size) {
static const int max_bits = sizeof(size_t) * CHAR_BITS;
int low_num, low_size;
low_num = num < ((size_t)1 << (max_bits * 5 / 8));
low_size = size < ((size_t)1 << (max_bits * 3 / 8));
if (__builtin_expect(low_num && low_size, 1)
|| num <= ~(size_t)0 / size)
return num * size;
else
return ~size_t(0);
}


This code is bigger than Joe Buck's.

-

Joe Buck's code: 10 instructions
Ross Ridge's code: 16 instructions
Ross Smith's code: 16 instructions

-

Joe Buck's code: 10 instructions
__compute_size:
pushl   %ebp
movl%esp, %ebp
movl8(%ebp), %eax
movl%eax, %edx
imull   12(%ebp), %edx
cmpl%eax, %edx
orl $-1, %edx
popl%ebp
movl%edx, %eax
ret

Ross Ridge's code: 16 instructions
__compute_size:
pushl   %ebp
orl $-1, %eax
movl%esp, %ebp
xorl%edx, %edx
movl12(%ebp), %ecx
pushl   %ebx
movl8(%ebp), %ebx
divl%ecx
orl $-1, %edx
cmpl%eax, %ebx
movl%ebx, %edx
imull   %ecx, %edx
popl%ebx
movl%edx, %eax
popl%ebp
ret

Ross Smith's code: 16 instructions
__compute_size:
pushl   %ebp
orl $-1, %eax
movl%esp, %ebp
xorl%edx, %edx
movl12(%ebp), %ecx
pushl   %ebx
movl8(%ebp), %ebx
divl%ecx
orl $-1, %edx
cmpl%eax, %ebx
movl%ebx, %edx
imull   %ecx, %edx
popl%ebx
movl%edx, %eax
popl%ebp
ret


compute_size_april2007.tar.gz
Description: GNU Zip compressed data


Re: Integer overflow in operator new

2007-04-08 Thread J.C. Pizarro

Joe Buck wrote:

> > > inline size_t __compute_size(size_t num, size_t size) {
> > > size_t product = num * size;
> > > return product >= num ? product : ~size_t(0);
> > > }


2007/4/9, Ross Smith <[EMAIL PROTECTED]> wrote:

On Monday, 9 April 2007 10:23, Florian Weimer wrote:
> * Ross Ridge:
> > Florian Weimer writes:
> >>I don't think this check is correct.  Consider num = 0x3334 and
> >>size = 6.  It seems that the check is difficult to perform
> >> efficiently unless the architecture provides unsigned
> >> multiplication with overflow detection, or an instruction to
> >> implement __builtin_clz.
> >
> > This should work instead:
> >
> > inline size_t __compute_size(size_t num, size_t size) {
> > if (num > ~size_t(0) / size)
> > return ~size_t(0);
> > return num * size;
> > }
>
> Yeah, but that division is fairly expensive if it can't be performed
> at compile time.  OTOH, if __compute_size is inlined in all places,
> code size does increase somewhat.

You could avoid the division in nearly all cases by checking for
reasonably-sized arguments first:

inline size_t __compute_size(size_t num, size_t size) {
static const int max_bits = sizeof(size_t) * CHAR_BITS;
int low_num, low_size;
low_num = num < ((size_t)1 << (max_bits * 5 / 8));
low_size = size < ((size_t)1 << (max_bits * 3 / 8));
if (__builtin_expect(low_num && low_size, 1)
|| num <= ~(size_t)0 / size)
return num * size;
else
return ~size_t(0);
}


This code is bigger than Joe Buck's.

I'm sorry, the previous 3rd source code .c is an error mine.

-

Joe Buck's code: 10 instructions
Ross Ridge's code: 16 instructions
Ross Smith's code: 23 instructions

-
Joe Buck's code: 9 instructions
__compute_size:
pushl   %ebp
movl%esp, %ebp
movl8(%ebp), %edx
movl%edx, %eax
imull   12(%ebp), %eax
cmpl%edx, %eax
orl $-1, %eax
popl%ebp
ret

Ross Ridge's code: 16 instructions
__compute_size:
pushl   %ebp
movl%esp, %ebp
orl $-1, %eax
xorl%edx, %edx
movl12(%ebp), %ecx
divl%ecx
pushl   %ebx
movl8(%ebp), %ebx
orl $-1, %edx
cmpl%eax, %ebx
movl%ebx, %edx
imull   %ecx, %edx
popl%ebx
movl%edx, %eax
popl%ebp
ret

Ross Smith's code: 23 instructions
__compute_size:
pushl   %ebp
movl%esp, %ebp
pushl   %ebx
movl8(%ebp), %ebx
cmpl$1048575, %ebx
movl12(%ebp), %ecx
setbe   %dl
xorl%eax, %eax
cmpl$4095, %ecx
setbe   %al
andb$1, %dl
testl   %eax, %eax
orl $-1, %eax
    xorl%edx, %edx
divl%ecx
orl $-1, %edx
cmpl%eax, %ebx
movl%ebx, %edx
imull   %ecx, %edx
popl%ebx
movl%edx, %eax
popl%ebp
ret

J.C. Pizarro


Re: Integer overflow in operator new

2007-04-08 Thread J.C. Pizarro

And this tarball.

J.C. Pizarro.


compute_size_april2007_2.tar.gz
Description: GNU Zip compressed data


Re: Integer overflow in operator new

2007-04-08 Thread J.C. Pizarro

One instruction more in GCC-4.1.x vs GCC-3.4.6?

Joe Buck's code: 10 instructions   [ -Os of gcc-4.1.3-20070326 ]
__compute_size:
pushl   %ebp
movl%esp, %ebp

movl8(%ebp), %eax
movl%eax, %edx
imull   12(%ebp), %edx
cmpl%eax, %edx
orl $-1, %edx
popl%ebp
movl%edx, %eax   # <--- this extra instruction because return EAX = 
EDX?

ret


Joe Buck's code: 9 instructions   [ -Os of gcc-3.4.6 ]
__compute_size:
pushl   %ebp
movl%esp, %ebp

movl8(%ebp), %edx
movl%edx, %eax
imull   12(%ebp), %eax
cmpl%edx, %eax
orl $-1, %eax
popl%ebp
# <--- no extra instruction because return EAX = EAX?

    ret


J.C. Pizarro


Re: Integer overflow in operator new

2007-04-09 Thread J.C. Pizarro

2007/4/9, Ross Ridge <[EMAIL PROTECTED]> wrote:

Florian Weimer writes:
>Yeah, but that division is fairly expensive if it can't be performed
>at compile time.  OTOH, if __compute_size is inlined in all places,
>code size does increase somewhat.

Well, I believe the assumption was that __compute_size would be inlined.
If you want to minimize code size and avoid the division then a library
function something like following might work:

void *__allocate_array(size_t num, size_t size, size_t max_num) {
if (num > max_num)
size = ~size_t(0);
else
size *= num;
return operator new[](size);
}

GCC would caclulate the constant "~size_t(0) / size" and pass it as the
third argument.  You'ld be trading a multiply for a couple of constant
outgoing arguments, so the code growth should be small.  Unfortunately,
you'd be trading what in most cases is a fast shift and maybe add or
two for slower multiply.

So long as whatever switch is used to enable this check isn't on by
default and its effect on code size and speed is documented, I don't
think it matters that much what those effects are.  Anything that works
should make the people concerned about security happy.   People more
concerned with size or speed aren't going to enable this feature.

Ross Ridge




Hi Ross Ridge,

I tuned it a little bit.

-
#include 

void *__allocate_array_of_RossRidge(size_t num, size_t size, size_t max_num) {

  if (num > max_num)
size = ~size_t(0);
  else
size *= num;
  return operator new[](size);
}

void *__allocate_array_of_JCPizarro(size_t num, size_t size, size_t max_num) {
  if (num > max_num) return operator new[](~size_t(0));
  return operator new[](size*num);
}

-

_Z29__allocate_array_of_RossRidgejjj:
[ gcc v3.4.6 : 9 instructions ]
   movl4(%esp), %edx
   cmpl12(%esp), %edx
   movl8(%esp), %eax
   orl $-1, %eax
   imull   %edx, %eax
   pushl   %eax
   call_Znaj
   popl%edx
   ret

_Z29__allocate_array_of_RossRidgejjj:
[ gcc 4.1.3 20070326 (prerelease) : 8 instructions ]
movl4(%esp), %eax
orl $-1, %ecx
cmpl12(%esp), %eax
movl8(%esp), %edx
movl%edx, %ecx
imull   %eax, %ecx
movl%ecx, 4(%esp)
jmp _Znaj

_Z29__allocate_array_of_JCPizarrojjj:
[ gcc 4.1.3 20070326 (prerelease) and gcc 3.4.6 : 7 instructions ]
movl4(%esp), %edx
cmpl12(%esp), %edx
movl8(%esp), %eax
movl$-1, 4(%esp)
imull   %edx, %eax
movl%eax, 4(%esp)
jmp _Znaj

-------------

J.C. Pizarro


Re: Integer overflow in operator new

2007-04-09 Thread J.C. Pizarro




allocate_array_april2007.tar.gz
Description: GNU Zip compressed data


Re: Integer overflow in operator new

2007-04-09 Thread J.C. Pizarro

2007/4/9, J.C. Pizarro <[EMAIL PROTECTED]> wrote:


_Z29__allocate_array_of_RossRidgejjj:
[ gcc v3.4.6 : 9 instructions ]
movl4(%esp), %edx
cmpl12(%esp), %edx   # comparing and ?? i lose me
movl8(%esp), %eax
orl $-1, %eax
imull   %edx, %eax   # signed multiply!!! 1 bit signed + unsigned 
31x31!!!
pushl   %eax
call_Znaj
popl%edx
ret

_Z29__allocate_array_of_RossRidgejjj:
[ gcc 4.1.3 20070326 (prerelease) : 8 instructions ]
movl4(%esp), %eax
orl $-1, %ecx
cmpl12(%esp), %eax   # comparing and ?? i lose me
movl8(%esp), %edx
movl%edx, %ecx
imull   %eax, %ecx   # signed multiply!!! 1 bit signed + unsigned 
31x31!!!
movl%ecx, 4(%esp)
jmp _Znaj

_Z29__allocate_array_of_JCPizarrojjj:
[ gcc 4.1.3 20070326 (prerelease) and gcc 3.4.6 : 7 instructions ]
movl4(%esp), %edx
cmpl12(%esp), %edx   # comparing and ?? i lose me
movl8(%esp), %eax
movl$-1, 4(%esp)
imull   %edx, %eax   # signed multiply!!! 1 bit signed + unsigned 
31x31!!!
movl%eax, 4(%esp)
jmp _Znaj

-----

J.C. Pizarro



I don't see a conditional jump or a test of the zero flag. Am i confuse?

The multiply is signed. It is need more researching a little bit.


allocate_array_april2007.tar.gz
Description: GNU Zip compressed data


Re: Integer overflow in operator new

2007-04-09 Thread J.C. Pizarro

2007/4/9, Robert Dewar <[EMAIL PROTECTED]>:

J.C. Pizarro wrote:

> The multiply is signed. It is need more researching a little bit.

So what, the low order 32 bits are unaffected. I think this is just
confusion on your part!




Yes, i accidently eliminated the lines containing the point '.' for
removing redundant info.

--

void *__allocate_array(size_t num, size_t size, size_t max_num) {
 if (num > max_num) return mynewXX(~size_t(0));
 return mynewXX(size*num);
}

_Z16__allocate_arrayjjj:
.LFB6:
movl4(%esp), %edx
cmpl12(%esp), %edx
movl8(%esp), %eax
jbe .L11
movl$-1, 4(%esp)
jmp .L15
.L11:
imull   %edx, %eax
movl%eax, 4(%esp)
.L15:
jmp _Z7mynewXXj

--

#include 
#include 
#include 

/* Objective: to detect numbers that are vulnerable to __allocate_array(..).
*
* Mainly about the effects of "imull %edx, %eax".
* With 3 assumptions for good effects to research.
*
* All quick & dirty by J.C. Pizarro
*/

size_t my_num, my_size, my_max_num;
void* mynewXX(size_t size) {
  unsigned long long mult;

  if (my_num > my_max_num) {
 // size is ~size_t(0)
  } else {
 if (size > 200) return NULL; // 3rd assumption of that used size
<= 200 bytes of memory
 mult = (unsigned long long)my_size * (unsigned long long)my_num;
 if ((mult >= 0x8000ULL) || (((unsigned int)size) >=
0x8000)
  || (mult > size)) {
 printf("oh!: num=%u; size=%u; max_num=0x%08X; num*size=%u (0x%08X);
long=%llu (0x%08X%08X)\n",
my_num,
my_size,
my_max_num,
size,size,
mult,((unsigned int)(mult>>32)),((unsigned int)(mult&~0)));
 fflush(stdout);
 }
  }
  return NULL;
}

void *__allocate_array(size_t num, size_t size, size_t max_num) {
 if (num > max_num) return mynewXX(~size_t(0));
 return mynewXX(size*num);
}

void randattack_allocate_array_start_until_infinity(void) {
  srand(time(NULL));
  while(1) {
 my_num = rand();
 my_size = rand();
 my_max_num = (rand() << 29) + ((~0)>>(32-29));
 my_size &= 0x003F; // 1st assumption of that my_size <= 63
bytes of element
 if (my_num <= my_max_num)   // 2nd assumption of that my_num is
<= my_max_num
__allocate_array(my_num,my_size,my_max_num);
  }
}

int main(int argc,char *argv[]) {
  randattack_allocate_array_start_until_infinity();
}

--

# gcc version 4.1.3 20070326 (prerelease)
oh!: num=715827888; size=36; max_num=0x9FFF; num*size=192
(0x00C0); long=25769803968 (0x000600C0)
oh!: num=1762037869; size=39; max_num=0xDFFF; num*size=155
(0x009B); long=68719476891 (0x0010009B)
oh!: num=460175073; size=28; max_num=0x5FFF; num*size=156
(0x009C); long=12884902044 (0x0003009C)
oh!: num=1073741826; size=28; max_num=0xDFFF; num*size=56
(0x0038); long=30064771128 (0x000000070038)
...

--

J.C. Pizarro


randattack_allocate_array_april2007.tar.gz
Description: GNU Zip compressed data


Re: Integer overflow in operator new

2007-04-09 Thread J.C. Pizarro

#include 

void *__allocate_array_of_RossRidge(size_t num, size_t size, size_t max_num) {

  if (num > max_num)
size = ~size_t(0);
  else
size *= num;
  return operator new[](size);
}

void *__allocate_array_of_JCPizarro(size_t num, size_t size, size_t
max_num) {
  if (num > max_num) return operator new[](~size_t(0));
  return operator new[](size*num);
}

void *__allocate_array_of_JCPizarro2(size_t num, size_t size, size_t max_num) {
  size_t result;
  if (num > max_num) return operator new[](~size_t(0));
  __asm __volatile("mull
%%edx":"=a"(result):"a"(num),"d"(size):/*???*/); // quick & dirty
  // See http://www.cs.sjsu.edu/~kirchher/CS047/multDiv.html
  // One-operand imul:   &   Unsigned mul:
  return operator new[](result);
}

-

_Z29__allocate_array_of_RossRidgejjj:
[ gcc v3.4.6 : 11 instructions ]
movl4(%esp), %edx
cmpl12(%esp), %edx
movl8(%esp), %eax
jbe .L2
orl $-1, %eax
jmp .L3
.L2:
imull   %edx, %eax   # signed multiply!!! 1 bit signed + unsigned 
31x31!!!
.L3:
pushl   %eax
call_Znaj
popl%edx
ret

_Z29__allocate_array_of_RossRidgejjj:
[ gcc 4.1.3 20070326 (prerelease) : 9 instructions ]
movl4(%esp), %eax
orl $-1, %ecx
cmpl12(%esp), %eax
movl8(%esp), %edx
ja  .L16
movl%edx, %ecx
imull   %eax, %ecx   # signed multiply!!! 1 bit signed + unsigned 
31x31!!!
.L16:
movl%ecx, 4(%esp)
jmp _Znaj

_Z29__allocate_array_of_JCPizarrojjj:
[ gcc 4.1.3 20070326 (prerelease) and gcc 3.4.6 : 9 instructions ]
movl4(%esp), %edx
cmpl12(%esp), %edx
movl8(%esp), %eax
jbe .L8
movl$-1, 4(%esp)
jmp .L12# <- why not jmp _Znaj directly?
.L8:
imull   %edx, %eax   # signed multiply!!! 1 bit signed + unsigned 
31x31!!!
movl%eax, 4(%esp)
.L12:
jmp _Znaj

_Z30__allocate_array_of_JCPizarro2jjj:
[ gcc 4.1.3 20070326 (prerelease) and gcc 3.4.6 : 9 instructions ]
movl4(%esp), %eax
cmpl12(%esp), %eax
movl8(%esp), %edx
jbe .L2
movl$-1, 4(%esp)
jmp .L6# <- why not jmp _Znaj directly?
.L2:
#APP
mull   %edx   # unsigned 32x32!!! mul is little bit slower than imul
in clock cycles.
#NO_APP
movl%eax, 4(%esp)
.L6:
jmp _Znaj

---------

J.C. Pizarro


allocate_array_april2007_2.tar.gz
Description: GNU Zip compressed data


Re: Integer overflow in operator new

2007-04-09 Thread J.C. Pizarro

2007/4/9, Ross Smith <[EMAIL PROTECTED]> wrote:

On Monday, 9 April 2007 13:09, J.C. Pizarro wrote:
>
> This code is bigger than Joe Buck's.
>
> Joe Buck's code: 10 instructions
> Ross Ridge's code: 16 instructions
> Ross Smith's code: 16 instructions

Well, yes, but it also doesn't have the bug Joe's code had. That was
sort of the whole point. If you don't care whether it gives the right
answer you might as well just leave the status quo.

--
Ross Smith  [EMAIL PROTECTED]  Auckland, New Zealand
   "Those who can make you believe absurdities can
   make you commit atrocities."-- Voltaire



I'm sorry Ross Smith due to my bug.

In http://gcc.gnu.org/ml/gcc/2007-04/msg00232.html appears

I'm sorry, the previous 3rd source code .c is an error mine.

Joe Buck's code: 10 instructions
Ross Ridge's code: 16 instructions
Ross Smith's code: 23 instructions   # <- rectified.

J.C. Pizarro


Re: Integer overflow in operator new. Solved?

2007-04-09 Thread J.C. Pizarro

#include 

void *__allocate_array_OptionA(size_t num, size_t size) { // 1st best
  unsigned long long tmp = (unsigned long long)size * num;
  if (tmp >= 0x8000ULL) tmp=~size_t(0);
  return operator new[](tmp);
}

void *__allocate_array_OptionB(size_t num, size_t size) { // 2nd best
  unsigned long long tmp = (unsigned long long)size * num;
  if (tmp >= 0x8000ULL) return(operator new[](~size_t(0)));
  return operator new[](tmp);
}

-

_Z24__allocate_array_OptionAjj:
[ gcc 4.1.3 20070326 (prerelease) : 9 instructions ]
movl8(%esp), %eax
mull4(%esp)
cmpl$0, %edx
ja  .L11
cmpl$2147483647, %eax
jbe .L9
.L11:
orl $-1, %eax
.L9:
movl%eax, 4(%esp)
jmp _Znaj

_Z24__allocate_array_OptionBjj:
[ gcc 4.1.3 20070326 (prerelease) : 10 instructions ]
movl8(%esp), %eax
mull4(%esp)
cmpl$0, %edx
ja  .L4
cmpl$2147483647, %eax
jbe .L2
.L4:
movl$-1, 4(%esp)
jmp .L7# <- why not jmp _Znaj directly?
.L2:
movl%eax, 4(%esp)
.L7:
jmp _Znaj

-

It seems to be solved the integer overflow in operator new.

J.C. Pizarro.


allocate_array_longmult_april2007.tar.gz
Description: GNU Zip compressed data


Re: Integer overflow in operator new. Solved?

2007-04-09 Thread J.C. Pizarro

2007/4/9, Joe Buck <[EMAIL PROTECTED]>:

On Mon, Apr 09, 2007 at 09:47:07AM -0700, Andrew Pinski wrote:
> On 4/9/07, J.C. Pizarro <[EMAIL PROTECTED]> wrote:
> >#include 
> >
> >void *__allocate_array_OptionA(size_t num, size_t size) { // 1st best
> >   unsigned long long tmp = (unsigned long long)size * num;
> >   if (tmp >= 0x8000ULL) tmp=~size_t(0);
> >   return operator new[](tmp);
> >}
>
> First this just happens to be the best for x86, what about PPC or
> really any embedded target where people are more concern about code
> size than say x86.

It's nowhere close to best for x86.  But to get the best, you'd need
to use assembly language, and the penalty in time is one instruction:
insert a jnc (jump short if no carry), with the prediction flag set
as "taken", after the mull instruction.  This would jump over code
to load all-ones into the result.  You have to multiply, and the processor
tells you if there's an overflow.

A general approach would be to have an intrinsic for unsigned multiply
with saturation, have a C fallback, and add an efficient implemention of
the intrinsic on a per-target basis.


To optimize even more the x86, it still has to use:
1. Use imul instead of mul because it's little bit faster in cycles.
2. Use jns/js (sign's conditional jump) instead of jnc/jc (carry's
conditional jump).
3. To modify the C-preprocessor and/or C/C++ compiler for:
  #if argument X is a constant then
 use this code specific of constant X
  #else if argument Y is not a constant then
 use this code specific of non-constant Y
  #else
     use this general code
  #else

The 3rd option is too complex but powerful like nearly Turing machine ;)

J.C. Pizarro


Re: Integer overflow in operator new. Solved?

2007-04-09 Thread J.C. Pizarro

4. Conditional moves (cmov).


Re: Integer overflow in operator new

2007-04-09 Thread J.C. Pizarro

2007/4/9, Lawrence Crowl <[EMAIL PROTECTED]>:

On 4/7/07, Joe Buck <[EMAIL PROTECTED]> wrote:
> Consider an implementation that, when given
>
>  Foo* array_of_foo = new Foo[n_elements];
>
> passes __compute_size(elements, sizeof Foo) instead of
> n_elements*sizeof Foo to operator new, where __compute_size is
>
> inline size_t __compute_size(size_t num, size_t size) {
> size_t product = num * size;
> return product >= num ? product : ~size_t(0);
> }
>
> This counts on the fact that any operator new implementation has to
> fail when asked to supply every single addressible byte, less one.

This statement is true only for linear address spaces.  For segmented
address spaces, it is quite feasible to have a ~size_t(0) much smaller
than addressable memory.


We've working in linear address spaces.
How for segmented address spaces? You give me examples.


The optimization above would be wrong for such machines because
the allocation would be smaller than the requested size.


To request a size of ~size_t(0) is to request a size
of 0x or 0xULL that the allocator will always
"sorry, there is not memory of 0x or 0xULL bytes.


> It would appear that the extra cost, for the non-overflow case, is
> two instructions (on most architectures): the compare and the
> branch, which can be arranged so that the prediction is not-taken.

That is the dynamic count.  The static count, which could affect
density of cache use, should also include the alternate return value.

--
Lawrence Crowl



With CoreDuo, the density of cache use is not our problem because there are
L2 caches of 2 MiB! 4 MiB! and even more 6 MiB!!!.
Our main problem is to reach the maximum performance for future days.

J.C. Pizarro


Re: Integer overflow in operator new. Solved?

2007-04-09 Thread J.C. Pizarro

2007/4/9, Andrew Pinski <[EMAIL PROTECTED]>:

On 4/9/07, J.C. Pizarro <[EMAIL PROTECTED]> wrote:
> 3. To modify the C-preprocessor and/or C/C++ compiler for:
>#if argument X is a constant then
>   use this code specific of constant X
>#else if argument Y is not a constant then
>   use this code specific of non-constant Y
>#else
>   use this general code
>#endif

Well lets say this, we already support this to some extend, by using
__builtin_constant_p and then just inlining.  Also there exists
already an optimization pass which does IPA constant prop.

Guess you are not well into GCC development after all.

-- Pinski



Of course, i'm a novice because i like and i don't like the
GCC development's model.

How will it be the code using  __builtin_constant_p that i don't know?


Re: Integer overflow in operator new. Solved?

2007-04-09 Thread J.C. Pizarro

2007/4/9, J.C. Pizarro <[EMAIL PROTECTED]> wrote:

2007/4/9, Andrew Pinski <[EMAIL PROTECTED]> wrote:
> On 4/9/07, J.C. Pizarro <[EMAIL PROTECTED]> wrote:
> Well lets say this, we already support this to some extend, by using
> __builtin_constant_p and then just inlining.  Also there exists
> already an optimization pass which does IPA constant prop.
>
> Guess you are not well into GCC development after all.
>
> -- Pinski
>

Of course, i'm a novice because i like and i don't like the
GCC development's model.

How will it be the code using  __builtin_constant_p that i don't know?



How many code's species are they? A compiler has codes that contains ...:

1. Middle level code of the language C.
2. Little higher level code of the language C++.
3. Code of the preprocessor of the language C/C++.
4. Low level code of the language ASM.
5. Inline asm code embedded into the language C/C++.
6. High level code of the language Java.
7. Code for IPA??? <- i don't know this weird language. Is it with attributes?.
8. Code for GIMPLE??? <- i don't know this weird language.
9. Code for RTL??? <- i don't know this weird language.
10. ...


Re: Integer overflow in operator new. Solved? Experimental i686 code.

2007-04-09 Thread J.C. Pizarro

#include  // by J.C. Pîzarro

...

// See http://www.cs.sjsu.edu/~kirchher/CS047/multDiv.html
// One-operand imul:   &   Unsigned mul:

// warning: 32 bit, i686, possible risk of -x * -y = valid x * y, ...
// warning: it's made quick & dirty, possible to give clobbered situations.
// warning: it is not ready for x86-64, ppc, ppc64, etc.
// NO WARRANTY!!! IT'S VERY EXPERIMENTAL!!! NOT TESTED YET!!!
void *__allocate_array_OptionC(size_t num, size_t size) {
  unsigned int result;
  __asm__ __volatile__
  (
  "orl $-1,%%ecx"
   "\n\t" "imull   %2" // See the flags OF, SF, CF, .. are affected or not.
   "\n\t" "cmovol %%ecx,%%eax" // i dude if it works or not. Not tested ...
//"\n\t" "cmovcl %%ecx,%%eax"
   :"=a"(result)
   :"a"(num),"g"(size)
   :/*???*/); // There are 0 conditional jumps!!! hehehehe!
  return operator new[](result);
}

-

* gcc version 4.1.3 20070326 (prerelease)
* 6 instructions of i686 !!! (cmovo came from i686)
* no conditional jump !!!

_Z24__allocate_array_OptionCjj:
movl4(%esp), %eax
#APP
orl $-1,%ecx
imull   8(%esp)
cmovol %ecx,%eax
#NO_APP
movl    %eax, 4(%esp)
jmp _Znaj

-

J.C. Pizarro


allocate_array_20070409-1.tar.gz
Description: GNU Zip compressed data


Re: Integer overflow in operator new. Solved? Experimental i686 code.

2007-04-09 Thread J.C. Pizarro

#include  // by J.C. Pîzarro

...

// This function doesn't touch the ECX register that is touched by OptionC.

__volatile__ static const int minus_one = -1;

void *__allocate_array_OptionD(size_t num, size_t size) {
  register unsigned int result;
  __asm__ __volatile__
  (
  "imull   %2" // See the flags OF, SF, CF, .. are affected or not.
   "\n\t" "cmovol %3,%%eax" // i dude if it works or not. Not tested ...
//"\n\t" "cmovcl %3,%%eax"
   :"=a"(result)
   :"a"(num),"m"(size),"m"(minus_one)
   :"%edx"/*???*/); // There are 0 conditional jumps!!! hehehehe!
  return operator new[](result);
}

-

* gcc version 4.1.3 20070326 (prerelease)
* 6 instructions of i686 !!! (cmovo came from i686)
* no conditional jump !!!

_Z24__allocate_array_OptionDjj:
subl$12, %esp# <- unneeded
movl16(%esp), %eax
#APP
imull   20(%esp)
cmovol minus_one,%eax
#NO_APP
movl%eax, (%esp) # <- better movl %eax, 4(%esp)
call_Znaj# <- better jmp _Znaj
addl$12, %esp# <- unneeded
ret  # <- unneeded

minus_one:
.long   -1

-

* hand-written
* 5 instructions of i686 !!! (cmovo came from i686)
* no conditional jump !!!

_Z24__allocate_array_OptionDjj:
movl4(%esp), %eax
#APP
imull   8(%esp)
cmovol minus_one,%eax
#NO_APP
movl%eax, 4(%esp)
jmp _Znaj

minus_one:
.long   -1

---------

Here has reached 5 instructions.
Anyone with 4 instructions?

J.C. Pizarro


allocate_array_20070409-2.tar.gz
Description: GNU Zip compressed data


Re: RFC: GIMPLE tuples. Design and implementation proposal

2007-04-09 Thread J.C. Pizarro

2007/4/10, Diego Novillo <[EMAIL PROTECTED]>:


Following up on the recent discussion about GIMPLE tuples
(http://gcc.gnu.org/ml/gcc/2007-03/msg01126.html), we have summarized
our main ideas and implementation proposal in the attached document.

This should be enough to get the implementation going, but there will be
many details that still need to be addressed.

Thoughts/comments on the proposal?


Thanks.




Tuple representation of the GIMPLE instructions:
HEADER:
  code16 bits
  subcode 16bits
  nextword
  prevword
  bb  word
  locus   word
  block   word
BODY:
  OP0 word
  ..
  OPN word

I want to talk to you,

1. Are there fields for flags, annotations, .. for special situations?
2. These structures are poorly specified.
 Have they advanced structures like lists, e.g., list of predecessors
  instructions of loops, predecessors instructions of forwarded
  jumps, etc. instead of poor "prev"?
3. Are there fields for more debug information?

Good bye

J.C. Pizarro :)


Re: RFC: GIMPLE tuples. Design and implementation proposal

2007-04-09 Thread J.C. Pizarro

The "conditional jumps" are sometimes bad.

However, they've appeared the "conditional moves" to don't jump
and consecuently to reduce the penalization of the conditional jump.

I've the idea of combining GS_ASSIGN... and GS_COND... to give these
following 6 new GIMPLE instructions:

GS_ASSIGN_COND
GS_ASSIGN_COND_EQ
GS_ASSIGN_COND_NE

GS_ASSIGN_COPY_COND
GS_ASSIGN_COPY_COND_EQ
GS_ASSIGN_COPY_COND_NE

There are many good "cmovXX" from Intel i686.

Please, search the word "cmov" in this url

http://webster.cs.ucr.edu/AoA/Windows/PDFs/AppendixD.pdf

and you will discover new optimized ideas. You will need a good simplifying.

Good bye.

J.C. Pizarro :)


Re: RFC: GIMPLE tuples. Design and implementation proposal

2007-04-10 Thread J.C. Pizarro

Is a need to build several tables in HTML of the codes (with subcodes).
Each table has an explanation. It's like a roadmap.


Re: RFC: GIMPLE tuples. Design and implementation proposal

2007-04-10 Thread J.C. Pizarro

2007/4/10, Diego Novillo <[EMAIL PROTECTED]> wrote:


More debug information?  What debug information are you looking for?



By example, worth weigths, use's frecuencies, statistical data, ... of GIMPLE.
To debug the GIMPLE too.

How you debug the failed GIMPLE?

J.C. Pizarro


Re: RFC: GIMPLE tuples. Design and implementation proposal

2007-04-10 Thread J.C. Pizarro

2007/4/10, Diego Novillo <[EMAIL PROTECTED]>:

J.C. Pizarro wrote on 04/10/07 08:17:

> Is a need to build several tables in HTML of the codes (with subcodes).
> Each table has an explanation. It's like a roadmap.

Hmm, what?


Forget it, it's not so important.


Re: RFC: GIMPLE tuples. Design and implementation proposal

2007-04-10 Thread J.C. Pizarro

2007/4/10, Diego Novillo <[EMAIL PROTECTED]> wrote:

J.C. Pizarro wrote on 04/10/07 10:24:

> By example, worth weigths, use's frecuencies, statistical data, ... of GIMPLE.
> To debug the GIMPLE too.

That's kept separately.  Pointer maps, hash tables...

> How you debug the failed GIMPLE?

Lots of debug_*() functions available.  You also use -fdump-tree-... a
lot.  In the future, I would like us to be able to inject GIMPLE
directly at any point in the pipeline to give us the illusion of unit
testing.


Of course.

J.C. Pizarro :)


A microoptimization of isnegative or greaterthan2millions.

2007-04-10 Thread J.C. Pizarro

/* Given X an unsigned of 32 bits, and Y a bool. Try to translate optimizing
*
* Y = X >  2147483647;   to   Y = ((signed)X) < 0;
* Y = X >= 2147483648;   to   Y = ((signed)X) < 0;
*
* [ Another optimization is to Y = (X >> 31) ]
*
* The opposite (ELSE):
*
* Y = X <= 2147483647;   to   Y = ((signed)X) >= 0;
* Y = X <  2147483648;   to   Y = ((signed)X) >= 0;
*
* [ Another optimization is to Y = ((~X) >> 31) ]
*
* 2147483647=0x7FFF   2147483648=0x8000
*
* The unsigned comparison is become to signed comparison.
*
* by J.C. Pizarro */

#include 

/* isnegative means greaterthan2millions */

int isnegative_1(unsigned int X) {
  int result; // Y is the conditional expression of if-else.
  if (X > 2147483647) result = 1;
  elseresult = 0;
  return result;
}

int isnegative_2(unsigned int X) {
  int result; // Y is the conditional expression of if-else.
  if (X >= 2147483648U) result = 1;
  else  result = 0;
  return result;
}

int isnegative_3(unsigned int X) {
  int result; // Y is the conditional expression of if-else.
  if (X <= 2147483647) result = 0;
  else result = 1;
  return result;
}

int isnegative_4(unsigned int X) {
  int result; // Y is the conditional expression of if-else.
  if (X < 2147483648U) result = 0;
  else result = 1;
  return result;
}

int isnegative_optimized_1(unsigned int X) {
  int result; // Y is the conditional expression of if-else.
  if (((signed)X) < 0) result = 1;
  else result = 0;
  return result;
}

int isnegative_optimized_2(unsigned int X) {
  int result; // Y is the conditional expression of if-else.
  if (((signed)X) >= 0) result = 0;
  else  result = 1;
  return result;
}

int isnegative_optimized_3(unsigned int X) {
  int result; // Y is the conditional expression of if-else.
  if (X >> 31) result = 1;
  else result = 0;
  return result;
}

int isnegative_optimized_4(unsigned int X) {
  int result; // Y is the conditional expression of if-else.
  if ((~X) >> 31) result = 0;
  elseresult = 1;
  return result;
}

int are_equivalent_isnegative(unsigned int X) {
  int equiv=1,isneg;
  isneg = isnegative_1(X);
  equiv = equiv && (isnegative_2(X) == isneg);
  equiv = equiv && (isnegative_3(X) == isneg);
  equiv = equiv && (isnegative_4(X) == isneg);
  equiv = equiv && (isnegative_optimized_1(X) == isneg);
  equiv = equiv && (isnegative_optimized_2(X) == isneg);
  equiv = equiv && (isnegative_optimized_3(X) == isneg);
  equiv = equiv && (isnegative_optimized_4(X) == isneg);
  return equiv;
}

int main(int argc,char *argv[]) {
  long long X;
  int testOK=1;
  for (X=0LL;(X<=0x0LL)&&testOK;X++) {
 testOK = are_equivalent_isnegative((unsigned int)(X&0x));
  }
  if (testOK) printf("Full test of isnegative is PASSED.\n");
  elseprintf("Full test of isnegative is FAILED.\n");
  return 0;
}

--
# gcc version 4.1.3 20070326 (prerelease)
Full test of isnegative is PASSED.

 notl%eax
 shrl$31, %eax
 xorl$1, %eax

 IS WORSE THAN

 shrl$31, %eax

-

 xorl%eax, %eax
 cmpl$0, 4(%esp)
 sets%al

 IS WORSE THAN

             movl4(%esp), %eax
 shrl$31, %eax

--

J.C. Pizarro


isnegative_20070410-1.tar.gz
Description: GNU Zip compressed data


Re: A microoptimization of isnegative or greaterthan2millions.

2007-04-10 Thread J.C. Pizarro

I'm sorry, i did want to say 2 billions, not 2 millions.

J.C. Pizarro


Re: RFC: GIMPLE tuples. Design and implementation proposal

2007-04-10 Thread J.C. Pizarro

Hi Diego Novillo

Your "Tuple representation" of the GIMPLE instructions was:
HEADER:
 code16 bits
 subcode 16bits
 nextword
 prevword
 bb  word
 locus   word
 block   word
BODY:
 OP0 word
 ..
 OPN word

I've a little idea,

Can i remove the word "prev"?
Thanks to "bb", i can traverse the short list of
the small basic block getted from its hashtable.

If i do it then it's one word less :)

Same with the word "block" from "bb".

Two word less :)

My proposed initial "Tuple representation" of the GIMPLE instructions will be:
HEADER:
 code16 bits
 subcode 8bits
 markbits 8 bits
 nextword# removed prev word
 bb  word
 locus   word# removed block word
BODY:
 OP0 word
 ..
 OPN word

Thanks friendly

J.C. Pizarro


Re: A microoptimization of isnegative or greaterthan2millions.

2007-04-10 Thread J.C. Pizarro

10 Apr 2007 10:53:08 -0700, Ian Lance Taylor <[EMAIL PROTECTED]> wrote:

As far as I can tell, you are recommending that gcc generate a
different code sequence than it currently does.  The most helpful
approach you can use for such a suggestion is to open a bug report
marked as an enhancement.  See http://gcc.gnu.org/bugs.html.

Postings to gcc@gcc.gnu.org are not wrong, but they will almost
certainly get lost.  An entry in the bug database will not get lost.

Thanks.

Ian


Thanks, bug reported as enhancement in
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31531


Re: RFC: GIMPLE tuples. Design and implementation proposal

2007-04-10 Thread J.C. Pizarro

2007/4/10, Andrew MacLeod <[EMAIL PROTECTED]> wrote:

On Tue, 2007-04-10 at 19:54 +0200, J.C. Pizarro wrote:
> Can i remove the word "prev"?
> Thanks to "bb", i can traverse the short list of
> the small basic block getted from its hashtable.

Do you mean implement this as a single linked list and then to find the
previous instruction, start at the beginning of the block and traverse
forward? Back in the early days of tree-ssa we did such a thing with the
tree iterators. It was too slow. there are many LARGE basic blocks which
make the computation too expensive for any pass that goes backwards
through a block.

Andrew



Frequently, how big are the basic blocks in the 90% of the non-multimedia
code when using general purpose instructions with many jumps/calls/ifs?

I think that 10% of the multimedia code (hard to find) is little
penalization of few minutes. So, it's not problematic, i suppose.

J.C. Pizarro


Re: RFC: GIMPLE tuples. Design and implementation proposal

2007-04-10 Thread J.C. Pizarro

2007/4/10, Dave Korn <[EMAIL PROTECTED]> wrote:

On 10 April 2007 20:02, Diego Novillo wrote:

>> The obvious way to make the proposed tuples position independent would
>> be to use array offsets rather than pointers.  This has the obvious
>> disadvantage that every access through a pointer requires an
>> additional memory reference.  On the other hand, it has some other
>> advantages: it may no longer be necessary to keep a previous pointer
>
> I doubt this.  We had started with single-linked chains but reverse
> traversals do occur, and they were very painful and slow.

  Reverse-traversing an array really isn't all that painful or slow!

>> in each tuple; we can delete tuples by marking them with a deleted
>> code, and we can periodically garbage collect deleted tuples and fix
>> up the next pointers.  On a 64-bit system, we do not need to burn 64
>> bits for each pointer; 32 bits will be sufficient for an array offset.
>>
>> I would like us to seriously think about this approach.  Most of the
>> details would be hidden by accessor macros when it comes to actual
>> coding.  The question is whether we can tolerate some slow down for
>> normal processing in return for a benefit to LTO.
>>
>> If anybody can see how to get the best of both worlds, that would of
>> course be even better.
>
> I've thought about this a little bit and it may not be all that onerous.
>  So, if you take the components of a tuple:
>
>   nextCould be a UID to the next tuple
>   prevLikewise


  How about delta-linked lists?  Makes your iterators bigger, but makes every
single node smaller.

cheers,
  DaveK
--
Can't think of a witty .sigline today




delta-linked lists?
Better one iterator than complicating this with several iterators.

Function prev_of_this_instruction(word) -> word

The singleton (global structure) of prev_of_this_instruction
contains a small cache to accelerate the reverse-traversing.

This cache is like a hashmap of { word this instruction => word prev
instruction }.
If this instruction isn't in the cache then go to the beginning of a BB of this
instruction and start to traverse adding pointers to the cache.
Use LRU/old-age-used/less-frequently-used to replace their entries.

To have many smaller nodes in the heap is better.

J.C. Pizarro :)


Re: RFC: GIMPLE tuples. Design and implementation proposal

2007-04-10 Thread J.C. Pizarro

2007/4/10, Andrew MacLeod <[EMAIL PROTECTED]> wrote:


Personally, just stick with the double linked lists, be it via pointers
or array index.  Any of these other suggestions either complicate the
algorithms or slow down traversal or both to save that word of memory
and slow down the initial implementation.

These details are easily changed after the initial TUPLE implementation
by replacing the meat of the next/prev iterator. There will be lots of
time for everyone to try their favorite double linked list alternative
before we write TUPLES out to a file in some production compiler and
commit ourselves to specific memory footprint.

As long as the entire thing has a clean interface, changing details like
this is trivial.  Whats important is whether the proposal meets what we
expect our future needs to be, such as LTO and such.  Have we missed
anything critical...

Andrew





Of course, i just stick with the double linked lists too.

The reason is to attain minus KLOCs of implementation and
more performance in the accesses because i've not problem
with the 2 GiB of RAM of my old PC.

Sincerely, J.C. Pizarro :)


Re: How to control the offset for stack operation?

2007-04-16 Thread J.C. Pizarro

2007/4/16, Mohamed Shafi <[EMAIL PROTECTED]>:

hello all,

Depending on the machine mode the compiler will generate automatically
the offset required for the stack operation i.e for a machine with
word size is 32, for char type the offset is 1, for int type the
offset is 2 and so on..

Is there a way to control this ? i mean say for long long the offset
is 4 if long long is mapped to TI mode and i want the generate the
offset such that it is 2.

Is there a way to do this in gcc ?

Regards,
Shafi



For a x86 machine, the stack's offset always is multiple of 4 bytes.

long long is NOT 4 bytes, is 8 bytes!

Sincerely J.C. Pizarro :)


Re: GCC 4.2.0 Status Report (2007-04-15)

2007-04-16 Thread J.C. Pizarro

2007/4/16, François-Xavier Coudert <[EMAIL PROTECTED]> wrote:

> You want more bugs fixed, it would seem a better way would be to build
> a better sense of community (Have bugfix-only days, etc) and encourage
> it through good behavior, not through negative reinforcement.

I do agree with that in a general way, but I think there should also
be a real effort done by the various maintainers to make sure people
indeed fix the few PRs they created. Maintainers should be able to
say, "please think of fixing this PR before submitting a patch for
that feature". That doesn't introduce administrative overhead, because
maintainers should keep track of the various PRs and patches of their
area. I think it works already for some areas of the compiler, but
doesn't work fine for the "most common" areas.

A few examples of that (maybe I'm always quoting the same examples,
but those are the ones I know that impact my own work on GCC):
  -- how can bootstrap stay broken (with default configure options) on
i386-linux for 3 weeks?
  -- how could i386-netbsd bootstrap be broken for months (PR30058),
and i386-mingw still be broken after 6 months (PR30589), when the
cause of failure is well known?

These are not rethorical "How", or finger-pointing. I think these are
cases of failure we should analyze to understand what in our
development model allows them to happen.

FX



The "mea culpa" is to permit for long time to modify "configure" instead of
"configure.ac" or "configure.in" that is used by "autoconf" and/or "automake".

Another "mea culpa" is don't update the autoconf/automake versions when
the GCC''s scripts are using very obsolete/deprecated
autoconf/automake versions.

Currently, "autoconf" is less used because of bad practices of GCC.

I propose to have the following:

* several versions of autoconf/automake in /opt that are depended from the
current GCC's scripts. And to set PATH to corresponding /opt/autoXXX/bin:$PATH.

* to do diff bettween configure and the configure generated by autoconf/automake
with configure.ac

* with these diffs, to do modifications to configure.ac

* to repeat it for verifying of the scripts with recent versions of
autoconf/automake

Sincerely J.C. Pizarro


Re: GCC 4.2.0 Status Report (2007-04-15)

2007-04-16 Thread J.C. Pizarro

2007/4/16, François-Xavier Coudert <[EMAIL PROTECTED]> wrote:

> The "mea culpa" is to permit for long time to modify "configure" instead of
> "configure.ac" or "configure.in" that is used by "autoconf" and/or "automake".
>
> [...]

I'm sorry, but I don't understand at all what you propose, what your
proposal is supposed to fix or how that is related to the mail you're
answering to.

FX



Snapshot GCC-4.3 uses Autoconf 2.59 and Automake-1.9.6
but why does it appear "generated by ... aclocal 1.9.5" when it uses 1.9.6?

libdecnumber/aclocal.m4:# generated automatically by aclocal 1.9.5 -*-
Autoconf -*-

I say that the generated scripts must to be updated automatically and
recursively
before than tarballing and distributing it, and the GCC site is doing
the wrong task.

The correct task is:
1) To update the generated configure scripts of the tarball before
than distributing it.
2) Or to remove the non-updated configure scripts.

Sincerely J.C. Pizarro


Re: GCC 4.2.0 Status Report (2007-04-15)

2007-04-16 Thread J.C. Pizarro

2007/4/16, Andrew Pinski <[EMAIL PROTECTED]> wrote:

On 4/16/07, J.C. Pizarro <[EMAIL PROTECTED]> wrote:
> The "mea culpa" is to permit for long time to modify "configure" instead of
> "configure.ac" or "configure.in" that is used by "autoconf" and/or "automake".
>
> Another "mea culpa" is don't update the autoconf/automake versions when
> the GCC''s scripts are using very obsolete/deprecated
> autoconf/automake versions.

What world are you living in?  Do you even look at the source?
Even though http://gcc.gnu.org/install/prerequisites.html has not been
updated, the toplevel actually uses autoconf 2.59 already and has
since 2007-02-09.  And how can you say 2.59 is obsolete when 90-99% of
the distros ship with that version?  Plus automake 1.9.6 is actually
the latest version of 1.9.x automake.


Since 2007-02-09, it's the problem, little time for a drastic modification.
So, this drastic modification could have lost arguments or flags or
modified incorrectly the behaviour between before and after.
Because of this, there is not time for releasing or iceing after of this.


libtool on the other hand is the older version but that is in the
progress of being fixed, don't you read the mailing lists?


> Currently, "autoconf" is less used because of bad practices of GCC.

Huh? What do you mean by that?
I don't know anyone who touches just configure and not use autoconf.
Yes at one point we had an issue with the toplevel needing an old
version of autoconf but that day has past for 2 months now.


By example, http://gcc.gnu.org/ml/gcc/2007-04/msg00525.html


...

-- Pinski



J.C. Pizarro :)


Re: GCC 4.2.0 Status Report (2007-04-15)

2007-04-16 Thread J.C. Pizarro

2007/4/16, François-Xavier Coudert <[EMAIL PROTECTED]> wrote:

> 1) To update the generated configure scripts of the tarball before
> than distributing it.

It could be done, but there's the risk that an automated process like
that might introduce problems. I'd be more in favour of a nightly
tester that check the "Generated by" headers to see if anything has an
unexpected version number.


if [ $? == 0 ]; then
  echo "OK. All configure script is generated."
else
  echo "Remove the old configure scripts XXX to non-updated_XXX"
fi

Is it complicated? I believe that not.


> 2) Or to remove the non-updated configure scripts.

That's a annoyance, because it would require the autotools to build
the GCC source, which is inconvenient.

FX



The GCC scripts use autotools but the site don't use autotools because
it says which is inconvenient. What???

Don't use autotools or do use autotools? yes or no? Or yes-and-no?

J.C. Pizarro


Re: How to control the offset for stack operation?

2007-04-16 Thread J.C. Pizarro

2007/4/16, Mohamed Shafi <[EMAIL PROTECTED]> wrote:

> > Depending on the machine mode the compiler will generate automatically
> > the offset required for the stack operation i.e for a machine with
> > word size is 32, for char type the offset is 1, for int type the
> > offset is 2 and so on..

   I was not talking about the size of long long but the offset i.e
4x32, required for stack operation.
I want gcc to generate the code such that the offset is 2 (64
bytes)and not 4 (128 bytes)



Offset in bytes? Offset in 32-bit words?
Please, define offset? You confuse.

J.C. Pizarro


Re: GCC 4.2.0 Status Report (2007-04-15)

2007-04-16 Thread J.C. Pizarro

2007/4/16, Dave Korn <[EMAIL PROTECTED]> wrote:

On 16 April 2007 10:56, J.C. Pizarro wrote:


> The GCC scripts use autotools but the site don't use autotools because
> it says which is inconvenient. What???

Why don't you ever go and actually *find something out* about what
you're talking about before you spout nonsense all over the list?  This is not
a remedial class for people who can't be bothered to read the docs.

  Yes, gcc uses autoconf.  But the end-users who just want to compile gcc from
a tarball do not have to have autoconf installed, because we supply all the
generated files for them in the tarball.


cheers,
  DaveK
--
Can't think of a witty .sigline today




I follow,

The end-users who just want to compile gcc from a tarball do not
have to have autoconf installed, because we supply all the generated files
for them in the tarball. <- Well,

what is the matter if the generated files aren't updated?
The users will say many times broken situations like bootstrap doesn't
work or else.

J.C. Pizarro


Re: GCC 4.2.0 Status Report (2007-04-15)

2007-04-16 Thread J.C. Pizarro

2007/4/16, Dave Korn <[EMAIL PROTECTED]> wrote:

On 16 April 2007 11:17, J.C. Pizarro wrote:

> I follow,

  No, not very well.

> The end-users who just want to compile gcc from a tarball do not
> have to have autoconf installed, because we supply all the generated files
> for them in the tarball. <- Well,
>
> what is the matter if the generated files aren't updated?

  This has never happened as far as I know.  Can you point to a single release
that was ever sent out with out-of-date generated files?

> The users will say many times broken situations like bootstrap doesn't
> work or else.

  I haven't seen that happening either.  Releases get tested before they are
released.  Major failures get spotted.  Occasionally, there might be a bug
that causes a problem building on one of the less-used (and hence
less-well-tested) platforms, but this is caused by an actual bug in the
configury, and not by the generated files being out of date w.r.t the source
files from which they are generated; regenerating them would only do the same
thing again.

  If you have a counter-example of where this has /actually/ happened, I would
be interested to see it.


cheers,
  DaveK
--
Can't think of a witty .sigline today




$ ./configure 
...

checking for i686-pc-linux-gnu-ld...
/usr/lib/gcc/i486-slackware-linux/3.4.6/../../../../i486-slackware-linux/bin/ld
# <-- i don't like this
...

$ grep "\-ld" configure   appears COMPILER_LD_FOR_TARGET

$ gcc --print-prog-name=ld
/usr/lib/gcc/i486-slackware-linux/3.4.6/../../../../i486-slackware-linux/bin/ld

This absolute path had broke me many times little time ago.

J.C. Pizarro


Recommend lecture about the meaning of PHI function from SSA.

2007-04-17 Thread J.C. Pizarro

For novice people, i recommend to read

http://en.wikipedia.org/wiki/Static_single_assignment_form

You see why the meaning of "y3 <- phi(y1,y2)" :)


HTML of -fdump-tree-XXXX proposal.

2007-04-17 Thread J.C. Pizarro

Hello,

i've an idea to improve the report of -fdump-tree- using the HTML
format for its output.

I recommend XHTML-1.0 (26-Jan-2000) from http://www.w3.org/TR/xhtml1/

Note: HTML-4.01 (24-Dec-1999) from http://www.w3.org/TR/html401/
is very popular but very old and it's not XML-1.0 (16-Aug-2006) format
from http://www.w3.org/TR/xml/ implying that HTML-4.01 is hardful for
some parsers.

The proposal is instead of

gcc $CFLAGS -fdump-tree-ssa file.c
gcc $CFLAGS -fdump-tree-gimple file.c

to use the -html option for -fdump-tree- like so

gcc $CFLAGS -html -fdump-tree-ssa file.c
gcc $CFLAGS -html -fdump-tree-gimple file.c

and they will generate the files

file.c.tXX.ssa.html and file.c.tXX.gimple.html

instead of

file.c.tXX.ssa and file.c.tXX.gimple

Why? There are a good reason for ".ssa" principally.

1. To use "charset=utf-8" or &#number; from HTML for greek symbols.
2. Is better to use subscripted numbers than numbers.
3. There is a greek symbol for PHI-function
   ( e.g. # X_1 = PHI ; )
4. Underlining or middlelining the instructions, operands or labels
   marked like dead by example.
5. Etc.

The visual representation in HTML is more effective for humans than
in text.

Sincerely, J.C. Pizarro :)


Re: HTML of -fdump-tree-XXXX proposal.

2007-04-18 Thread J.C. Pizarro

2007/4/18, J.C. Pizarro <[EMAIL PROTECTED]> wrote:

Hello,

i've an idea to improve the report of -fdump-tree- using the HTML
format for its output.

I recommend XHTML-1.0 (26-Jan-2000) from http://www.w3.org/TR/xhtml1/

Note: HTML-4.01 (24-Dec-1999) from http://www.w3.org/TR/html401/
is very popular but very old and it's not XML-1.0 (16-Aug-2006) format
from http://www.w3.org/TR/xml/ implying that HTML-4.01 is hardful for
some parsers.

The proposal is instead of

gcc $CFLAGS -fdump-tree-ssa file.c
gcc $CFLAGS -fdump-tree-gimple file.c

to use the -html option for -fdump-tree- like so

gcc $CFLAGS -html -fdump-tree-ssa file.c
gcc $CFLAGS -html -fdump-tree-gimple file.c

and they will generate the files

file.c.tXX.ssa.html and file.c.tXX.gimple.html

instead of

file.c.tXX.ssa and file.c.tXX.gimple

Why? There are a good reason for ".ssa" principally.

1. To use "charset=utf-8" or &#number; from HTML for greek symbols.
2. Is better to use subscripted numbers than numbers.
3. There is a greek symbol for PHI-function
( e.g. # X_1 = PHI ; )
4. Underlining or middlelining the instructions, operands or labels
marked like dead by example.
5. Etc.

The visual representation in HTML is more effective for humans than
in text.

Sincerely, J.C. Pizarro :)



I've used
ftp://gcc.gnu.org/pub/gcc/snapshots/4.1-20070416/gcc-core-4.1-20070416.tar.bz2
ftp://gcc.gnu.org/pub/gcc/snapshots/4.1-20070416/gcc-g++-4.1-20070416.tar.bz2

In the attachment there is a quick&dirty alpha patch that i don't known
why the gcc compiler says "gcc: unrecognized option '-html'". ???
I don't known where to modify the gcc code to add an option.

The XHTML format to fputs is a little bad.

There are examples to test too.

The idea is to filter the output stream.

Bye, i'm not an expert, i'm a novice, i've not much time :s i'm busy

J.C. Pizarro


gcc-4.1-20070416_pphtml_alpha.patch
Description: Binary data


factorial.c
Description: Binary data


run.sh
Description: Bourne shell script


factorial.c.t26.ssa
Description: Binary data


Re: HTML of -fdump-tree-XXXX proposal.

2007-04-19 Thread J.C. Pizarro

2007/4/19, Brooks Moses <[EMAIL PROTECTED]> wrote:

I think it makes a lot more sense to implement this as a standalone
filter for the output stream, which takes the files that contain the
current dump-tree output and converts it to HTML.  You don't lose any
functionality by doing that, and there's no compelling reason for adding
the extra complexity to the tree-dumpers themselves if we don't need to.

Certainly it can be a useful idea to have more ways of viewing the dump
files than just reading the plaintext in a text editor, but it seems
more sensible to me to consider the plaintext-to-HTML conversion as an
aspect of a standalone "viewer" system, rather than as an aspect of the
compiler.

- Brooks


I'm agree. A script that does the plaintext-to-HTML conversion is far better
than modifying the complex GCC compiler i'd made fatally it.

:)


Re: HTML of -fdump-tree-XXXX proposal.

2007-04-22 Thread J.C. Pizarro

2007/4/22, Per Bothner <[EMAIL PROTECTED]> wrote:

Without taking a position on the current proposal, using xhtml
for "dump" files has some further advantages:

* Can use hyperlinks, for example from a reference to the declaration,
   or from a variable to its type.
* Can use JavaScript to hide/unhide sections of the dump.
* Can use CSS to switch between display styles.  This can be
   done dynamically with a smidgen of JavaScript.
* Can more easily convert to other formats, or select only desired
   elements, using some suitable XML processor such as xsltproc.
--
--Per Bothner
[EMAIL PROTECTED]   http://per.bothner.com/



Your idea with JavaScript, CSS, XSLT, .. is very good! :)


Re: 2nd quarter of 2007 and no GPL code of Java from Sun.

2007-05-01 Thread J.C. Pizarro

2007/4/3, Fernando Lozano <[EMAIL PROTECTED]> wrote:

J.C. Pizarro escreveu:
> We're in 2nd quarter of 2007 and no release of the complete source
> code under GPL is put to the public!
>
> What does Sun wait to?

JavaOne, for sure. So you'll have the code in May.

Don't forget they already released important parts of the code, so you
expect them to be serious. And don't think getting agreements with
third-parties that provide code used by Sun in its Java implementation
would be easy.


2007/4/2, Andrew Pinski <[EMAIL PROTECTED]> wrote:

On 4/2/07, J.C. Pizarro <[EMAIL PROTECTED]> wrote:

 >  From http://en.wikipedia.org/wiki/Java_(programming_language)

 >  What does Sun wait to?

The 1st quarter for most companies are just starting today or rather soon :).
I am serious, for an example Sony's new fiscal year started yesterday,
April 1st.

So really just wait.


Today is 01 of May, the worker's day.

I've not the code in May, Fernando.

How long have i to wait? Andrew.


From Sun, there are not notice, news, etc about the process of GPLing

the OpenJDK.


Is there summarized table of ABI binary compatibility?

2008-02-05 Thread J.C. Pizarro
Hallo,

Is there summarized table of ABI binary compatibility of following
compiled programs by ...?

1st. C (the core)
2nd. Fortran
3rd. C++
and 4th. GCJ's Java

between the 4.3.0, 4.2.3, 4.1.x, 4.0.x, 3.4.x and 3.3.x versions?

The comments are not for me, they are for everyones who need
maintain the binary compatibility as for programmers,
distributions's creators, installers, etc.

Thanks, J.C.Pizarro


Slow GCC compiler => Very few people recompile lesser latest packages.

2008-02-10 Thread J.C. Pizarro
Hallo,

When the recent GCC compiler is very slow compiling projects or packages then
many people refuse to follow recompiling updated versions of projects,
few people tend to test each time less the updated versions, there are
less beta testers and finally less detection of unknown bugs .

Where GCC are the bottlenecks that distros's builders wanna its?

J.C.Pizarro: "my fear in soft are the crashes, the virus, the
bugs, the slowering situation and the fat things. (note: fat thing =>
slow transferring => more energy for more bits of information"


RE: RFC: GCC 4.4 criteria - add Fortran as primary language?

2008-02-21 Thread J.C. Pizarro
On Wed, 20 Feb 2008, "Weddington, Eric" <[EMAIL PROTECTED]> wrote:
> > Maybe there could be a "semi-primary" or "experimental
> > primary" status;
> > a feature could be treated as primary, but with the understanding that
> > the requirement will be waived if it causes excessive delay.  The
> > "experimental" label could be dropped after a few successful releases.
>
> Well, why not mirror the Primary and Secondary Platform lists on the
> back end, and have Primary Languages and Secondary Languages on the
> front end, with separate criteria for each category? For examples,
> Primary Languages would be C and C++, and put Ada, Java, and Fortran in
> the Secondary Languages group.

1. C   : 1 points: it's the #1, it's the God of the modern languages, it's
a super language, it's very popular, it's very used in embedded systems,
operating systems and libraries.

2. C++ :  8000 points: it's its big brother. It's the language for complex
realtime applications.

3. Fortran: 5000 points: it's the ancient mathematical brother of C. It's very
used in electronic applications related to EDA, Spice, simulators, LAPACK,
3D rendering, etc.

4. Java:1500 points: it's contrarrested because its TM is trademark of
Sun, Sun controls implementations of overall binaries JVM, Java compiler and
standard Java libraries, and because Sun mispecifies its overcomplex language
for Java2, Java5, Java6 and Java7 that are poorly performed. They don't use
templates as C++, instead, they use genericity.

5. ObjC/ObjC++: 300 points: the GUI language from Apple for Apple.

Why don't use a voting strategy to get the final valuation of Primary
 & Secondary languages?


Superfluous testresults in 4.4.0.

2008-02-22 Thread J.C. Pizarro
Hallo,

i'm comparing minor differences between testresults of 4.4/4.3 (20080221 x64)
http://gcc.gnu.org/ml/gcc-testresults/2008-02/msg01486.html
http://gcc.gnu.org/ml/gcc-testresults/2008-02/msg01487.html
and i found superfluous reporting in 4.4.0:

FAIL: foo/bar.mm (test for excess errors)
UNRESOLVED: foo/bar.mm compilation failed to produce executable

Why does it print FAIL and UNRESOLVED in 4.4, and only UNRESOLVED in 4.3?

Don't report many errors for the same failed file!
Do report only one kind of error well studied for each failed file!

   :)


Idea to gain the time of wider testing.

2008-02-22 Thread J.C. Pizarro
Hallo!

I've ideas when there are repetitive processes as e.g. the testing processes.

Q1. Why to lose time testing the same reiterated files that always had worked
for many months or years?
A1. To cache them the worked testsuite's files that had worked for many
months or years and put it to the snapshot (worked = passed, non-failed,
non-warned). It means that statistically 99.9% of non testing the passed
files works "blindly" OK. It can reduce the time of this strategical
testing if you wan't to lose many unnecessary hours.
Too in the oppose and other direction, if -O3 -O2 -Os -O0 failed in the
same failed file then we can cache them and reduce the number of
parameters to only one or two, e.g. "if -Os failed in the known failed file
then we don't need to lose time testing the same for -O3 -O2 -O0 because
we know it had failed for many months!".

Q2. Why don't duplicate the timeout to ancient half hour instead of quarter?
A2. It's good for few CPU-intensive files, overall from long time optimization
that we want to extend it if we think in the criticality of the opt. code.

Q3. Can we extend the number of testing languages to four as C, C++, Fortran
& Ada instead of two as C & C++?
A.3 Yes because we had reduced the time in Q1.


 Make a gift for everyones before than you fight against the death! :)


When the R.I.P. of 4.1.x branch for?

2008-02-25 Thread J.C. Pizarro
The 4.0.x branch was R.I.P.ed.

Commiting 4.1.x, 4.2.x, 4.3.x and 4.4.x means 4 times of efforts than 3 times.
They are very similar in design, they use TreeSSA, autovectoring, etc.

It's recommended to be online 4.2.x, 4.3.x and 4.4.x branches.

I want to see a comparison of performances between 4.1.x, 4.2.x and 4.4.x
to know how they have been evolved.

   ;)


Re: optimizing predictable branches on x86

2008-02-26 Thread J.C. Pizarro
Compiling and executing the code of Nick Piggin at
http://gcc.gnu.org/ml/gcc/2008-02/msg00601.html

in my old Athlon64 Venice 3200+ 2.0 GHz,
3 GiB DDR400, 32-bit kernel, gcc 3.4.6, i got

$ gcc -O3 -falign-functions=64 -falign-loops=64 -falign-jumps=64
-falign-labels=64 -march=i686 foo.c -o foo
$ ./foo
 no deps,   predictable -- Ccode took  10.08ns per iteration
 no deps,   predictable -- cmov code took  11.07ns per iteration
 no deps,   predictable -- jmp  code took  11.25ns per iteration
has deps,   predictable -- Ccode took  26.66ns per iteration
has deps,   predictable -- cmov code took  35.44ns per iteration
has deps,   predictable -- jmp  code took  18.89ns per iteration
 no deps, unpredictable -- Ccode took  10.17ns per iteration
 no deps, unpredictable -- cmov code took  11.07ns per iteration
 no deps, unpredictable -- jmp  code took  22.51ns per iteration
has deps, unpredictable -- Ccode took  104.02ns per iteration
has deps, unpredictable -- cmov code took  107.19ns per iteration
has deps, unpredictable -- jmp  code took  176.18ns per iteration
$

This machine concludes that ( > means slightly better than, >> better )
1. jmp >> C >> cmov when it's predictable and has data dependencies.
2. C > cmov > jmp when it's predictable and has not data dependencies.
3. C > cmov >> jmp when it's unpredictable and has not data dependencies.
4. C > cmov >> jmp when it's unpredictable and has not data dependencies.

* Be careful, jmp is the worst when it's unpredictable
 (with or without data dependencies).
* But conditional jmp is the best when it's
 predictable AND has data dependencies.

   ;)


Re: optimizing predictable branches on x86

2008-02-26 Thread J.C. Pizarro
On 2008/2/26, J.C. Pizarro <[EMAIL PROTECTED]>, i wrote:
>  4. C > cmov >> jmp when it's unpredictable and has not data dependencies.

I'm sorry of my error typo, the correct is (without the "not")
4. C > cmov >> jmp when it's unpredictable and has data dependencies.

and my forgotten 3rd annotation:
* cmov is the worst when it's
predictable AND has data dependencies.


Re: optimizing predictable branches on x86

2008-02-26 Thread J.C. Pizarro
It's a final summary for good performance of the tested machines:

  + unpredictable: * don't use conditional jmp (the worst).
 / * use cmov or C version.
/
\ + no deps: * use cmov or C version.
 \   /
  + predictable: \
  + has deps: * don't use cmov (the worst).
  * use conditional jmp (the best).


Re: optimizing predictable branches on x86

2008-02-26 Thread J.C. Pizarro
On Tuesday 26 February 2008 21:14, Jan Hubicka wrote:
> Only cases we do so quite reliably IMO are:
>   1) loop branches that are not interesting for cmov conversion
>   2) branches leading to noreturn calls, also not interesting
>   3) builtin_expect mentioned.
>   4) when profile feedback is around to some degree (ie we know when the
>   branch is very likely or very unlikely. We don't simulate what
>   hardware will do on it).

Without profiler, we can estimate blindly that the simply loop branches
can be predictable with the assumption of that we know that the loops
(of many iterations) are potential enemies (they consume many cycles)
of the CPU.

For example, for this simple loop without profiler, human prediction is easy:

for (;;) { /* it's predictable not-branch to the end of loop for */
start:
  ... // hundreds of iterations (e.g. 99% branching inside, <1%
branching outside
} /* or predictable branch to the start of loop for depending in code gen. */
end:

But for this complex loop (mutually nested), the predictability without
profiler is very hard!

loop1:
   ...
   if (cond1) then goto loop2 else loop3 endif  // to loop2 or loop3?
prediction is very hard
   ...
loop2:
   ...
   if (cond2) then goto loop1 else loop2 endif  // to loop1 or loop2?
prediction is very hard
   ...
loop3:
   ...
   if (cond3) then goto loop1 else loop3 endif  // to loop1 or loop2?
prediction is very hard
   ...
   goto loop1

There are things that can be human predictable but human unpredictable too!

   Sincerely yours ;)


Benchmarks: 7z, bzip2 & gzip.

2008-02-29 Thread J.C. Pizarro
Here are the results of benchmarks of 3 compressors: 7z, bzip2 and gzip, and
GCCs 3.4.6, 4.1.3-20080225, 4.2.4-20080227, 4.3.0-20080228 & 4.4.0-20080222.

--
# User's time is taken, machine is Ath64 3200+ 2.0 GHz x1, 64+64K L1, 512K L2.
# Every GCC-4 compilers were compiled so
mkdir build ; cd build
../configure --prefix=/opt/gcc? --disable-shared \
--disable-threads --disable-checking --enable-__cxa_atexit \
--enable-languages=c,c++,fortran --enable-bootstrap
time make "CFLAGS=-O3 -fomit-frame-pointer -march=i686 -pipe \
-mno-sse3 -msse2 -msse -mno-mmx -mno-3dnow \
-funroll-loops -finline-functions -fpeel-loops" bootstrap
make install
--
1. gcc-3.4.6 (the default of the distro)
--
2. gcc-4.1.3-20080225 ( built in 40 minutes, 110 MiB /opt/gcc41320080225 )
export PATH=/opt/gcc41320080225/bin:$PATH
export LD_LIBRARY_PATH=/opt/gcc41320080225//lib
--
3. gcc-4.2.4-20080227 ( built in 65 minutes, 150 MiB /opt/gcc42420080227 )
export PATH=/opt/gcc42420080227/bin:$PATH
export LD_LIBRARY_PATH=/opt/gcc42420080227/lib
--
4. gcc-4.3.0-20080228 ( built in 85 minutes, 114 MiB /opt/gcc43020080228 )
export PATH=/opt/gcc43020080228/bin:$PATH
export LD_LIBRARY_PATH=/opt/gcc43020080228/lib
--
5. gcc-4.4.0-20080222 ( built in 79 minutes, 112 MiB /opt/gcc44020080222 )
export PATH=/opt/gcc44020080222/bin:$PATH
export LD_LIBRARY_PATH=/opt/gcc44020080222/lib
--
  p7zip-4.57
  --
1. tar -jxf p7zip_4.57_src_all.tar.bz2
2. Edit CPP/7zip/Bundles/Alone/makefile adding
LOCAL_FLAGS+= below ...
3. time make ; strip --strip-all bin/7za ; ls -l bin/7za ; size bin/7za
4. linux-2.4.36.2.tar.gz ( 38'720'979 bytes, 175'237'120 bytes .tar,
   24'471'728 bytes .tar.7z )
   time gzip -cd ../linux-2.4.36.2.tar.gz | ./bin/7za a -t7z -m0=lzma -mx=9 \
   -mfb=273 -md=48m -ms=on -si trash.7z  ; ls -l *7z
--
LOCAL_FLAGS+=-O3 -fomit-frame-pointer -march=i686 \
 -mno-sse3 -msse2 -msse -mno-mmx -mno-3dnow \
 -funroll-loops -finline-functions -fpeel-loops

1. 1m50s compile, 1630164 file, 1618639 text, 6120 data, 27168 bss, 5m50s run.
2. 1m53s compile, 1665952 file, 1649829 text, 4668 data, 29160 bss, 6m04s run.
3. 2m08s compile, 1629088 file, 1613313 text, 4672 data, 29160 bss, 5m54s run.
4. 2m36s compile, 2063216 file, 2047420 text, 4380 data, 29160 bss, 6m14s run.
5. 2m30s compile, 1976228 file, 1960164 text, 4380 data, 29160 bss, 6m12s run.
--
LOCAL_FLAGS+=-Os -fomit-frame-pointer -march=i686 \
 -mno-sse3 -mno-sse2 -mno-sse -mno-mmx -mno-3dnow

1. 1m21s compile, 1123476 file, 971 text, 6120 data, 27168 bss, 6m11s run.
2. 1m01s compile,  723872 file,  707806 text, 4672 data, 29224 bss, 6m39s run.
3. 1m11s compile,  720264 file,  705082 text, 4220 data, 29004 bss, 6m15s run.
4. 1m01s compile,  721688 file,  706557 text, 3928 data, 28908 bss, 6m12s run.
5. 1m01s compile,  700612 file,  685009 text, 3928 data, 28940 bss, 6m12s run.
--
LOCAL_FLAGS+=-Os -fomit-frame-pointer -march=i686 \
 -mno-sse3 -mno-sse2 -mno-sse -mno-mmx -mno-3dnow \
 -funroll-loops -finline-functions -fpeel-loops

3. 1m20s compile,  908496 file,  893314 text, 4220 data, 29004 bss, 6m38s run.
5. 1m14s compile,  969228 file,  953629 text, 3928 data, 28940 bss, 6m18s run.
--
  bzip2-1.0.4
  ---
1. tar -zxf bzip2-1.0.4.tar.gz
2. time make "CFLAGS=-Wall -Winline -D_FILE_OFFSET_BITS=64 ...below... " \
   ; strip --strip-all bzip2 ; ls -l bzip2 ; size bzip2
3. linux-2.4.36.2.tar.gz ( 38'720'979 bytes, 175'237'120 bytes .tar,
   24'471'728 .tar.7z, 31'095'864 bytes .tar.bz2 )
   time gzip -cd ../linux-2.4.36.2.tar.gz | ./bzip2 -9 > /dev/null
--
-O3 -fomit-frame-pointer -march=i686 \
-mno-sse3 -msse2 -msse -mno-mmx -mno-3dnow \
-funroll-loops -finline-functions -fpeel-loops

1.  7.0s compile,  111660 file,  104530 text, 3644 data,  4400 bss, 1m15s run.
2.  7.3s compile,  107664 file,  100538 text, 3524 data,  4400 bss, 1m1

Re: Could someone please check if FSF received papers for Intel engineers?

2008-03-13 Thread J.C. Pizarro
On Thu, 13 Mar 2008 09:44:29 -0400, David Edelsohn wrote:
>   The engineers currently are not listed in the FSF copyrights
> assignment file.
>
> David

Why they've to be listed in FSF copyrights assignment file?

Intel released original x86 hardware.
AMD released original x86-64 hardware.

Intel cloned AMD's x86-64 hardware calling it x64.
AMD cloned Intel's x86 hardware doing it compatible.

The software on hardware needs the hexadecimal specification
of the hardware for the working of this pair software-hardware.
It's the ASM description of the hardware.
Otherwise, this pair won't work without knowledge of the hardware.

The problem is when it will start that the hardware company want
not to transfer its copyrights of hardware documents to software
organization because the hardware company wants to live of the
businesses of licenses and copyrightes, and of the lawyers
against any software organization who didn't dealed with it.

I don't understand how it's made the U.S. law. I'm paranoid in it.

I did read IBM suitcases in around 198x about the separation of
hardware-software. Wintel cases too.

   J.C.Pizarro


Re: Could someone please check if FSF received papers for Intel engineers?

2008-03-13 Thread J.C. Pizarro
On 2008/3/13, Robert Dewar <[EMAIL PROTECTED]> wrote:
> J.C. Pizarro wrote:
>  > On Thu, 13 Mar 2008 09:44:29 -0400, David Edelsohn wrote:
>  >>  The engineers currently are not listed in the FSF copyrights
>  >> assignment file.
>  >>
>  >> David
>  >
>  > Why they've to be listed in FSF copyrights assignment file?
>  >
>  > Intel released original x86 hardware.
>  > AMD released original x86-64 hardware.
>  >
>  > Intel cloned AMD's x86-64 hardware calling it x64.
>  > AMD cloned Intel's x86 hardware doing it compatible.
>  >
>  > The software on hardware needs the hexadecimal specification
>  > of the hardware for the working of this pair software-hardware.
>  > It's the ASM description of the hardware.
>  > Otherwise, this pair won't work without knowledge of the hardware.
>  >
>  > The problem is when it will start that the hardware company want
>  > not to transfer its copyrights of hardware documents to software
>  > organization because the hardware company wants to live of the
>  > businesses of licenses and copyrightes, and of the lawyers
>  > against any software organization who didn't dealed with it.
>  >
>  > I don't understand how it's made the U.S. law. I'm paranoid in it.
>  >
>  > I did read IBM suitcases in around 198x about the separation of
>  > hardware-software. Wintel cases too.
>  >
>  >J.C.Pizarro
>
>
> This is complete nonsense, I suggest you do a bit
>  of homework before sending messages to this list,
>  which are entirely off topic anyway.

$ grep -iR "intel\.com" . | sed 's/^[^<]*<\([^>]*\)>.*$/\1/g' | sort -u
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
$

Are they listed in FSF copyrights assignment file?

   J.C.Pizarro


Re: Could someone please check if FSF received papers for Intel engineers?

2008-03-13 Thread J.C. Pizarro
On 2008/3/13, Joe Buck <[EMAIL PROTECTED]> wrote:
> This is off-list, because you are wasting the time of the list readership.

No, it's the readership that has to waste its little time if he wants to read
the short lines mails.

>  On Thu, Mar 13, 2008 at 08:14:38PM +0100, J.C. Pizarro wrote:
>  > $ grep -iR "intel\.com" . | sed 's/^[^<]*<\([^>]*\)>.*$/\1/g' | sort -u
>  > [EMAIL PROTECTED]
>  > [EMAIL PROTECTED]
>  > [EMAIL PROTECTED]
>  > [EMAIL PROTECTED]
>  > [EMAIL PROTECTED]
>  > [EMAIL PROTECTED]
>  > [EMAIL PROTECTED]
>  > [EMAIL PROTECTED]
>  > [EMAIL PROTECTED]
>  > $
>  >
>  > Are they listed in FSF copyrights assignment file?
>
>
> Yes.  Of course.  Each individual needs to file paperwork.  Without
>  paperwork, contributions aren't accepted.

Ohh, contributions aren't accepted because they had not assigned
the copyrights to FSF.

Then, are we not doing it due to "GPL license" instead of
"GPL licence + own FSF's policy"?

Well, if they want not to accept those contributions of another authors
then the another authors can fork their works for them self.

Here there is more from intel.com excluding the search in GCC's ADA:
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]

   J.C.Pizarro


Re: Could someone please check if FSF received papers for Intel engineers?

2008-03-13 Thread J.C. Pizarro
On 2008/3/13, Andrew Pinski <[EMAIL PROTECTED]> wrote:
> On Thu, Mar 13, 2008 at 2:38 PM, J.C. Pizarro <[EMAIL PROTECTED]> wrote:
>  >  Ohh, contributions aren't accepted because they had not assigned
>  >  the copyrights to FSF.
>  >
>  >  Then, are we not doing it due to "GPL license" instead of
>  >  "GPL licence + own FSF's policy"?
>
>
> No, please go and understand why the FSF likes to have the copyright
>  of the file.  It is easier for them to defend where the sources came
>  from when it comes to patents and copyright disputes.
>
>
>  -- Pinski

Patents? => It says in the seventh term of GPL.
Copyrightes => From authors's contributions, why FSF only?


Re: larger default page sizes...

2008-03-25 Thread J.C. Pizarro
On 2008/3/26, J.C. Pizarro <[EMAIL PROTECTED]> i wrote:
 > On Tue, 25 Mar 2008 16:22:44 -0700 (PDT), David Miller wrote:
 >  > > On Mon, 24 Mar 2008, David Miller wrote:
 >  > >
 >  > > > There are ways to get large pages into the process address space for
 >  > > > compute bound tasks, without suffering the well known negative side
 >  > > > effects of using larger pages for everything.
 >  > >
 >  > > These hacks have limitations. F.e. they do not deal with I/O and
 >  > > require application changes.
 >  >
 >  > Transparent automatic hugepages are definitely doable, I don't know
 >  > why you think this requires application changes.
 >  >
 >  > People want these larger pages for HPC apps.
 >
 >  But there is a general problem of larger pages in systems that
 >  don't support them natively (in hardware) depending in how it's
 >  implemented the memory manager in the kernel:
 >
 >"Doubling the soft page size implies
 >   halfing the TLB soft-entries in the old hardware".
 >
 >"x4 soft page size=> 1/4 TLB soft-entries, ... and so on."
 >
 >  Assuming one soft double-sized page represents 2 real-sized pages,
 >  one replacing of one soft double-sized page implies replacing
 >  2 TLB's entries containing the 2 real-sized pages.
 >
 >  The TLB is very small, its entries are around 24 entries aprox. in
 >  some processors!.
 >
 >  Assuming soft 64 KiB page using real 4 KiB pages => 1/16 TLB soft-entries.
 >  If the TLB has 24 entries then calculating 24/16=1.5 soft-entries,
 >the TLB will have only 1 soft-entry for soft 64 KiB pages!!! Weird!!!
 >
 >  The normal soft sizes are 8 KiB or 16 KiB for non-native
processors, not more.
 >   So, the TLB of 24 entries of real 4 KiB will have 12 or 6
 >  soft-entries respect.


The problem is that x86 and x64 is a "crap" when we want larger page sizes
 (as 8, 16, 32 or 64 KiB) for HPC but not unusual excesive huge pages
 (2, 4, 1024 MiB).

 Stop the buying of the current PCs or Laptops, and wait until the well made
 PCs or Laptops "Intel Code Quad 3" or "AMD Athlon AM3" with the following
 cleaner features (lesser gates and more liable circuitry):
 1. Instructions x86-64-II only to address usual cheaper 4 or 8 GiB of DDR RAM,
but changed the hierarchy of bytecodes of x86-64 for better performance.
 2. Removed old 16-bit 8086 and old 32-bit 80386 instructions.
 3. Removed unusual BCD instructions too.
 4. Removed PAE and Hugepages of old 32-bit 80386 instructions.
 5. Configurable TLB to only 8, 16, 32 and 64 KiB pages.
 6. Configurable 3-level or 4-level MMU depending in the cfg'ble page sizes.
 7. Removed 32-bit and 80-bit float points.
 8. Added 64-bit as float point and 128-bit as double point, IEEE754.
 9. Removed MMX/3DNow+ instructions.
 10. Integrated SSEx instructions (auto saved/restrored to/from task's stack).
 11. Improved PIC (Position Independent Code) for shared libraries.
 12. Improved hardware x86-64 virtualization.
 13. Improved hardware x86-64 debuggability.
 14. Improved CPU counters and timestampers in variabled-frequencies.
 15. Realtime UTC (leap ctrl,DST,tz) clockers in ns for CPU-profilings.
 16. Improved flushers (caches, TLB, MMU, ...).
 17. Improved hardware futexes, semaphores, signals, .. for multicores.
 18. MOESI protocol for multicores in SMPs & clusters.
 19. Better integrated DMAs in processors.
 20. Faster buses not always in fixed frequencies as said in the specs.
 21. Reliable exceptions's handlings in out-of-order termination architectures.
 22. More capabilities for soft multi-threading of the O.S. in SMPs & clusters.
 23. More capabilities for processes's migrating of the O.S. in SMPs & clusters.
 24. More capabilities for JIT/AOT emulators of different hw as Qemu, Java, ..
 25. And more acknowledge of connected devices as ACPI, e820, ...

 I don't see reasons of why Intel/AMD follow to release new x86-compatible
 processors when they follow still being a crap in the reasonable practiques.

I can't be sorry of the error in Mail Delivery Subsystem of LKML when i was
terminating soon of my comments.


Re: larger default page sizes...

2008-03-27 Thread J.C. Pizarro
On 2008/3/26, J.C. Pizarro <[EMAIL PROTECTED]> i wrote:
> On 2008/3/26, J.C. Pizarro <[EMAIL PROTECTED]> i wrote:
>   > On Tue, 25 Mar 2008 16:22:44 -0700 (PDT), David Miller wrote:
>   >  > > On Mon, 24 Mar 2008, David Miller wrote:
>   >  > >
>   >  > > > There are ways to get large pages into the process address space 
> for
>   >  > > > compute bound tasks, without suffering the well known negative side
>   >  > > > effects of using larger pages for everything.
>   >  > >
>   >  > > These hacks have limitations. F.e. they do not deal with I/O and
>   >  > > require application changes.
>   >  >
>   >  > Transparent automatic hugepages are definitely doable, I don't know
>   >  > why you think this requires application changes.
>   >  >
>   >  > People want these larger pages for HPC apps.
>   >
>   >  But there is a general problem of larger pages in systems that
>   >  don't support them natively (in hardware) depending in how it's
>   >  implemented the memory manager in the kernel:
>   >
>   >"Doubling the soft page size implies
>   >   halfing the TLB soft-entries in the old hardware".
>   >
>   >"x4 soft page size=> 1/4 TLB soft-entries, ... and so on."
>   >
>   >  Assuming one soft double-sized page represents 2 real-sized pages,
>   >  one replacing of one soft double-sized page implies replacing
>   >  2 TLB's entries containing the 2 real-sized pages.
>   >
>   >  The TLB is very small, its entries are around 24 entries aprox. in
>   >  some processors!.
>   >
>   >  Assuming soft 64 KiB page using real 4 KiB pages => 1/16 TLB 
> soft-entries.
>   >  If the TLB has 24 entries then calculating 24/16=1.5 soft-entries,
>   >the TLB will have only 1 soft-entry for soft 64 KiB pages!!! Weird!!!
>   >
>   >  The normal soft sizes are 8 KiB or 16 KiB for non-native
>  processors, not more.
>   >   So, the TLB of 24 entries of real 4 KiB will have 12 or 6
>   >  soft-entries respect.
>
>
>  The problem is that x86 and x64 is a "crap" when we want larger page sizes
>   (as 8, 16, 32 or 64 KiB) for HPC but not unusual excesive huge pages
>   (2, 4, 1024 MiB).
>
>   Stop the buying of the current PCs or Laptops, and wait until the well made
>   PCs or Laptops "Intel Code Quad 3" or "AMD Athlon AM3" with the following
>   cleaner features (lesser gates and more liable circuitry):
>   1. Instructions x86-64-II only to address usual cheaper 4 or 8 GiB of DDR 
> RAM,
> but changed the hierarchy of bytecodes of x86-64 for better performance.
>   2. Removed old 16-bit 8086 and old 32-bit 80386 instructions.
>   3. Removed unusual BCD instructions too.
>   4. Removed PAE and Hugepages of old 32-bit 80386 instructions.
>   5. Configurable TLB to only 8, 16, 32 and 64 KiB pages.
>   6. Configurable 3-level or 4-level MMU depending in the cfg'ble page sizes.
>   7. Removed 32-bit and 80-bit float points.
>   8. Added 64-bit as float point and 128-bit as double point, IEEE754.
>   9. Removed MMX/3DNow+ instructions.
>   10. Integrated SSEx instructions (auto saved/restrored to/from task's 
> stack).
>   11. Improved PIC (Position Independent Code) for shared libraries.
>   12. Improved hardware x86-64 virtualization.
>   13. Improved hardware x86-64 debuggability.
>   14. Improved CPU counters and timestampers in variabled-frequencies.
>   15. Realtime UTC (leap ctrl,DST,tz) clockers in ns for CPU-profilings.
>   16. Improved flushers (caches, TLB, MMU, ...).
>   17. Improved hardware futexes, semaphores, signals, .. for multicores.
>   18. MOESI protocol for multicores in SMPs & clusters.
>   19. Better integrated DMAs in processors.
>   20. Faster buses not always in fixed frequencies as said in the specs.
>   21. Reliable exceptions's handlings in out-of-order termination 
> architectures.
>   22. More capabilities for soft multi-threading of the O.S. in SMPs & 
> clusters.
>   23. More capabilities for processes's migrating of the O.S. in SMPs & 
> clusters.
>   24. More capabilities for JIT/AOT emulators of different hw as Qemu, Java, 
> ..
>   25. And more acknowledge of connected devices as ACPI, e820, ...
>
>   I don't see reasons of why Intel/AMD follow to release new x86-compatible
>   processors when they follow still being a crap in the reasonable practiques.
>
>
> I can't be sorry of the error in Mail Delivery Subsystem of LKML when i was
>  terminating soon of my comments.

Why are there in almost microprocessors only one smallest TLB instead
of hierarchied TLB L1, TLB L2, TLB L3 as equal as Cache L1, Cache L2, Cache L3?

That fool are the hw engineers!!!

It's a good idea to have a quick flusher of combined TLB-Cache L1-L2-L3 for
OSes of style unix, linux, *bsd, ...

I think that many hw engineers and non-engineers will need dozens of
gigant FPGAs (hundred millions of cells and macrocells) to simulate
the new and modern unreleased microprocessors.

   J.C.Pizarro.


Re: GSOC Student application

2008-03-30 Thread J.C. Pizarro
On Sat, 29 Mar 2008 12:39:13 +0600, "Alexey Salmin"
<[EMAIL PROTECTED]> wrote:
> Hello, here's my application. Please, leave your comments as I still
> have two days to fix it if something is wrong :)
>
> Project
> I want to make some improvements in the Lexer/cpplib area:
> 1) Change the way of file handling
>   -- Mmap file into memory if possible instead of allocating a buffer
> (if no character conversation is needed)
>   -- Find the boundaries of line which for conversation is needed
> instead of converting the whole buffer.
> 2) Replace all malloc/free functions with XNEW/XDELETE (XNEWVEC,
> XDELETEVEC) macro.
> 3) Some small miscellaneous changes
>   -- Improve the developer's documentation and comments
>   -- Add a ru.po file for the libcpp
>
> Why is it useful for GCC? (corresponding to the project items)
> 1) The compile time and, probably, memory usage will be reduced.
> 2) Hard to say anything here. I have no idea why malloc/free functions
> are still used in the code since XNEW/XDELETE are supposed to be
> there. (Or may be I'm wrong? I've asked here once about this and it
> seems that I'm right.)
> 3) A good documentation is important for understanding the source
> code. The long sequence of mails in this list called "How to
> understand the gcc source code" is demonstrative.

There are issues of Garbage Collection from libgcc or Boehms's GC
that you possibly can't use another allocators that these defaults,
unless you have control of the manager of the whole memory,
and it's too complex due to the gigant size of the project.

> Why should I do this?
> 1. My knowledge in C programming language is very good.
> 2. I have some expirience in tokenization.
> 3. I want to join the GCC development process independently of GSOC, I
> will continue my work and supply my code after the end of the summer
> of code.
> 4. I have some expirience writing with the C++ language. May be it's
> not enough to develop big projects with it but it's definite enough to
> lex it :)
> 5. Finally I have a "Compilers: Principles, Techniques, and Tools"
> book written by Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. I'm
> joking of course :)

You must know that before optimizing anything, you must profile the
whole code (-pg, gprof, ...) and study the beautiful formula of
"Amdahl's Law" for sequential machines in some books.

Studied this law, you can optimize better than your previous knowledge.

> Biography
> I'm 18 years old student learning in Novosibirsk State University,
> Russia. I've been working with linux for 4 years, I enjoy writing C
> code and I always wanted to join some really great project like GCC.
>
> PS Where am I supposed to send this mail? I've seen no special address
> for GSOC applications so I sent it here. But I've seen no other
> applications in this list so I'm confused :)

Luck U.S.S.R. boy ;)


Re: GSOC Student application

2008-03-31 Thread J.C. Pizarro
On 2008/3/30, Alexey Salmin <[EMAIL PROTECTED]> wrote:
> > There are issues of Garbage Collection from libgcc or Boehms's GC
>  >  that you possibly can't use another allocators that these defaults,
>  >  unless you have control of the manager of the whole memory,
>  >  and it's too complex due to the gigant size of the project.
>
>
> [EMAIL PROTECTED]:~/gcc/src/include$ grep XNEW libiberty.h
>  #define XNEW(T) ((T *) xmalloc (sizeof (T)))
>  #define XNEWVEC(T, N)   ((T *) xmalloc (sizeof (T) * (N)))
>  #define XNEWVAR(T, S)   ((T *) xmalloc ((S)))
>
>  [EMAIL PROTECTED]:~/gcc/src/libiberty$ grep -A 11 '^xmalloc ('  xmalloc.c
>  xmalloc (size_t size)
>  {
>   PTR newmem;
>
>   if (size == 0)
> size = 1;
>   newmem = malloc (size);
>   if (!newmem)
> xmalloc_failed (size);
>
>   return (newmem);
>  }
>
>  So, you can see that XNEW* macro are now exactly the same as just
>  malloc function and they were added only for possible future change of
>  the memory allocator.
>  Any malloc function should be repalced with this macro AFAIK.
>  And the worst thing I can see in the code is freeing the memory
>  allocated with XNEW macro. It works fine now but it's wrong as I
>  understand.
>
>  [EMAIL PROTECTED]:~/gcc/src/libcpp$ grep XNEW * | wc -l
>  66
>  [EMAIL PROTECTED]:~/gcc/src/libcpp$ grep XDELETE * | wc -l
>  6
>  [EMAIL PROTECTED]:~/gcc/src/libcpp$ grep free * | wc -l
>  153
>  [EMAIL PROTECTED]:~/gcc/src/libcpp$ grep malloc * | wc -l
>  13
>
>
>
>  > You must know that before optimizing anything, you must profile the
>  >  whole code (-pg, gprof, ...) and study the beautiful formula of
>  >  "Amdahl's Law" for sequential machines in some books.
>  >
>  >  Studied this law, you can optimize better than your previous knowledge.
>
>
> I know what profiling is. And I know how parallel programs work,
>  thanks. I'm just talknig here about distinct improvements I can do,
>  not about some abstract optimizing.
>
>
>  >
>  > Luck U.S.S.R. boy ;)
>  >
>
> Yes, I've been living in USSR for 2 first years of my life :)
>

$ objdump -t libexec/gcc/i686-pc-linux-gnu/4.4.0/cc1 | cut -c48- | \
   grep -i alloc | sort -u

There are aprox. 95 symbols related to *alloc* as by example
malloc, calloc, ealloc, ecalloc, etc.

 It's good idea to deforest some different symbols from this
forest of *alloc* to use common symbols.


Re: gcc compiler for pdp10

2008-04-19 Thread J.C. Pizarro
On Fri, 18 Apr 2008 21:56:38 -0400, Alan Lehotsky <[EMAIL PROTECTED]> wrote:
>Martin,
>
>I did a port of GCC to the Analog Devices SHARC chip. I ended up
supporting 3 >kinds of pointers for this chip (two for address
>spaces and one for byte pointers - the chip itself is only word
addressable >(although words can be from 16 to 48 bits in size
>depending on what memory is being accessed.)
>
>I also worked on the Bliss-36 compiler at DEC, so I'm well acquainted
with the >PDP10 architecture.
>
>I don't have access to any 10/20 HW, but I'd be happy to act as a
>reviewer/advisor to your changes.
>
>Al Lehotsky
>
>On Apr 18, 2008, at 20:21, Martin Chaney wrote:
>>
>>Hi,
>>
>>I'm am the proprietor of a gcc compiler for the PDP10 architecture.
>>
>>(This is a compiler previously worked on by Lars Brinkhoff who
left XKL some >>while before I joined XKL. It's possible some of you
may have been familiar with >>him or the compiler from that time.)
>>
>>The compiler is currently in a state where it is synched with
the both the 4.3 >>and 4.4 branches, and it passes the testsuite tests
(with the exception of some >>I've flagged as expected failures for
the pdp10).
>>
>>My employer is happy to release my work on the gcc compiler back
to the >>gcc community and I've sent in a request for the necessary
forms.
>>
>>The PDP10 architecture is unusual in various ways that
distinguish it from >>the mainstream architectures supported by the
gcc compiler and this has >>made the development of this compiler a
significant task. Undoubtedly I've >>made customizations in
inappropriate ways. I'm seeking contacts with people >>who might be
able to advise me on how to cleanup my implementation to >>reduce the
amount of #ifdef __PDP10_H__ I've sprinkled liberally throughout the
>>source. Also, if its possible to get simple changes made to prevent
breaking >>my PDP10 version and that are otherwise innocuous that
would be wonderful. >>For example, the PDP10 word size is 36 bits;
Fairly recently people have taken >>to writing code that assumes word
size is a power of 2 even when it's >>straightforward to write in a
manner that doesn't make that assumption.
>>
>>Considering the large number of files customized to get the
PDP10 compiler >>working, I'm not sure whether it's possible to get it
to build directly from the gcc >>trunk, but it would be nice to work
toward that goal.
>>
>>Some other things which distinguish the PDP10 architecture from
>>assumptions in the gcc code base include: its variety of formats of
pointers >>only one of which can be viewed as an integer and that one
is capable of >>referencing only word aligned data, a functional
difference between signed and >>unsigned integers, and peculiarities
to the use of PDP10 byte arrays which are >>very difficult to
describe.
>>
>>Any help or advise would be appreciated.
>>
>>Martin Chaney
>>XKL, LLC

In http://pdp-10.trailing-edge.com/  there are ~6.5GB
about PDP10 files where inside ~1.5GB are tapes in tap.bz2 format.

It's a lot of obsolete software of more >20 years ago,
it's only for hobbies because you won't find archaic 36 bit machines,
and all the current modern machines are 32 and 64 bit.


Re: IRA for GCC 4.4

2008-04-27 Thread J.C. Pizarro
On Fri 25 Apr 2008 22:22:55 -0500, Peter Bergner <[EMAIL PROTECTED]> wrote:
> On Thu, 2008-04-24 at 20:23 -0400, Vladimir Makarov wrote:
> > Hi, Peter.  The last time I looked at the conflict builder
> > (ra-conflict.c), I did not see the compressed matrix.  Is it in the
> > trunk?  What should I look at?
>
> Yes, the compressed bit matrix was committed as revision 129037 on
> October 5th, so it's been there a while.  Note that the old square
> bit matrix was used not only for testing for conflicts, but also for
> visiting an allocno's neighbors.  The new code (and all compilers I've
> worked on/with), use a {,compressed} upper triangular bit matrix for
> testing for conflicts and an adjacency list for visiting neighbors.
>
> The code that allocates and initializes the compressed bit matrix is in
> global.c.  If you remember how a upper triangular bit matrix works, it's
> just one big bit vector, where the bit number that represents the conflict
> between allocnos LOW and HIGH is given by either of these two functions:
>
>   1) bitnum = f(HIGH) + LOW
>   2) bitnum = f(LOW) + HIGH
>
> where:
>
>   1) f(HIGH) = (HIGH * (HIGH - 1)) / 2
>   2) f(LOW) = LOW * (max_allocno - LOW) + (LOW * (LOW - 1)) / 2 - LOW - 1
>
> As mentioned in some of the conflict graph bit matrix literature (actually,
> they only mention expression #1 above), the expensive functions f(HIGH) and
> f(LOW) can be precomputed and stored in an array, so to access the conflict
> graph bits only takes a load and an addition.  Below is an example bit matrix
> with initialized array:
>
> 012 3456789   10   11
> ---
> | -1 |  0 ||  0 |  1 |  2 |  3 |  4 |  5 |  6 |  7 |  8 |  9 | 10 |
> ---
> |  9 |  1 ||| 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 |
> ---
> | 18 |  2 |||| 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 |
> ---
> | 26 |  3 ||||| 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 |
> ---
> | 33 |  4 |||||| 38 | 39 | 40 | 41 | 42 | 43 | 44 |
> ---
> | 39 |  5 ||||||| 45 | 46 | 47 | 48 | 49 | 50 |
> ---
> | 44 |  6 |||||||| 51 | 52 | 53 | 54 | 55 |
> ---
> | 48 |  7 ||||||||| 56 | 57 | 58 | 59 |
> ---
> | 51 |  8 |||||||||| 60 | 61 | 62 |
> ---
> | 53 |  9 ||||||||||| 63 | 64 |
> ---
> | 54 | 10 |||||||||||| 65 |
> ---
> | NA | 11 |||||||||||||
> ---
>
> As an example, if we look at the interference between allocnos 8 and 10, we
> compute "array[8] + 10" = "51 + 10" = "61", which if you look above, you will
> see is the correct bit number for that interference bit.
>
> The difference between a compressed upper triangular bit matrix from a 
> standard
> upper triangular bit matrix like the one above, is we eliminate space from the
> bit matrix for conflicts we _know_ can never exist.  The easiest case to 
> catch,
> and the only one we catch at the moment, is that allocnos that are "local" to 
> a
> basic block B1 cannot conflict with allocnos that are local to basic block B2,
> where B1 != B2.  To take advantage of this fact, I updated the code in 
> global.c
> to sort the allocnos such that all "global" allocnos (allocnos that are live 
> in
> more than one basic block) are given smaller allocno numbers than the "local"
> allocnos, and all allocnos for a given basic block are grouped together in a
> contiguous range to allocno numbers.  The sorting is accomplished by:
>
>   /* ...so we can sort them in the order we want them to receive
>  their allocnos.  */
>   qsort (reg_allocno, max_allocno, sizeof (int), regno_compare);
>
> Once we have them sorted, our conceptual view of the compressed bit matrix
> will now look like:
>
>   GGGB0   B0   B0   B1   B1   B2   B2   B2   B2
>
>

Re: IRA for GCC 4.4

2008-04-28 Thread J.C. Pizarro
On 2008/4/28 Ben Elliston <[EMAIL PROTECTED]> wrote:
> On Sun, 2008-04-27 at 21:45 +0200, J.C. Pizarro wrote:
>
>  > Don't be stupid!
>
>  Could you be a bit more civil, please?  It's fairly unusual for people
>  on this list to talk to each other in this way.
>
>  Thanks,
>  Ben

Excuse me, i'm not the unique and first person that says you stupid,
GCC did it too.
The "stupid" word can be a help, not only an offense.

gcc/cp/decl.c: and in case doing stupid register allocation.
gcc/c-aux-info.c: user may do something really stupid, like creating a brand new
gcc/reload.c: we are called from global_alloc but false when stupid register
gcc/dwarf2out.c: Now onto stupid register sets in non contiguous locations.
gcc/protoize.c: A table of conversions that may need to be made for
some (stupid) older
/gcc/protoize.c: Don't act too stupid here.  Somebody may try to
convert an entire system
gcc/protoize.c: case it would be stupid for us to fail to realize that
this one definition
gcc/protoize.c: this is stupid practice and should be discouraged.
gcc/tree-ssa-phiopt.c: anything stupid here.
gcc/c-common.c: Warn for unlikely, improbable, or stupid DECL declarations
gcc/function.c: ??? This should no longer be necessary since stupid is
no longer with
gcc/gimple-low.c: don't know.  This is used only to avoid stupidly
generating extra code.
gcc/genrecog.c: ??? is_unconditional is a stupid name for a tri-state function.
gcc/global.c: and it is run even when stupid register allocation is in use.
gcc/config/arm/arm.c: Prevent the user from choosing an obviously
stupid PIC register.
gcc/config/ia64/ia64-modes.def: so that flow doesn't do something stupid.
gcc/config/ia64/ia64.c: stop the paradoxical subreg stupidity in the
*_operand functions
gcc/config/ia64/predicates.md: Deny the stupid user trick of
addressing outside the object.
gcc/config/mmix/predicates.md: FIXME: This may be a stupid trick.
What happens when GCC wants to
gcc/config/v850/v850.c: Function prototypes for stupid compilers
gcc/config/sparc/sparc.c: avoid emitting truly stupid code.
gcc/config/rs6000/darwin-fpsave.asm: be a stupid thing to do, anyway
gcc/Makefile.in: Really, really stupid make features, such as SUN's
KEEP_STATE, may force
gcc/alias.c: but stupid user tricks can produce them, so don't die.
gcc/c-decl.c: and in case doing stupid register allocation.
gcc/c-decl.c: Warn for unlikely, improbable, or stupid declarations of `main'.
gcc/optabs.c: but it's nice to not be stupid about initial code gen either.
gcc/regrename.c: of a call insn, which is stupid, since these are certainly
configure: This seems to be due to autoconf 2.5x stupidity.
libstdc++-v3/doc/xml/manual/using.xml: However, you still need to not
do stupid things like calling
libstdc++-v3/scripts/run_doxygen: work around a stupid doxygen bug
libstdc++-v3/scripts/run_doxygen: here's the other end of the "stupid
doxygen bug" mentioned above
libstdc++-v3/testsuite/data/thirty_years_among_the_dead_preproc.txt: stupidity
configure.ac: This seems to be due to autoconf 2.5x stupidity.
ChangeLog: Makefile.tpl: Fix stupid pasto.
ChangeLog: configure: Fix stupid bug where RANLIB was mistakenly included.
ChangeLog: configure.in: Fix deeply stupid bug.

   J.C.Pizarro


Re: IRA for GCC 4.4

2008-04-28 Thread J.C. Pizarro
On 2008/4/28 Dave Korn <[EMAIL PROTECTED]> wrote:
> J.C. Pizarro wrote on :
>
>
>  > On 2008/4/28 Ben Elliston wrote:
>  >> On Sun, 2008-04-27 at 21:45 +0200, J.C. Pizarro wrote:
>  >>
>  >>  > Don't be stupid!
>  >>
>  >>  Could you be a bit more civil, please?  It's fairly unusual for people
>  >>  on this list to talk to each other in this way.
>  >>
>  >>  Thanks,
>  >>  Ben
>  >
>  > Excuse me, i'm not the unique and first person that says you stupid, GCC
>  > did it too.
>
>   Even if that were so, two wrongs do not make a right.

It's your personal comment.
For me, they do not make a right when they are 7 wrongs.

>
>  >  The "stupid" word can be a help, not only an offense.
>  >
>  > gcc/cp/decl.c: and in case doing stupid register allocation.
>  > gcc/c-aux-info.c: user may do something really stupid, like
>  > creating a brand new
>
>   The crucial difference you've overlooked is that all these comments are
>  describing some /thing/ as stupid, not some /person/.  When you want to offer
>  what you hope will be /constructive/ criticism, try to de-personalise the
>  issues; it makes for more productive social interactions.

What about the stupid user in
   gcc/alias.c: but stupid user tricks can produce them, so don't die  ?

But the stupid things are made by humans, never by things.

You can't de-personalise the stupid things made by humans,
 so it's better to say them stupid to persons who did stupid things
 better than to unfear things.

>
> cheers,
>   DaveK
>  --
>  Can't think of a witty .sigline today


Re: IRA for GCC 4.4

2008-04-28 Thread J.C. Pizarro
On 2008/4/27 J.C. Pizarro <[EMAIL PROTECTED]> wrote:
> On Fri 25 Apr 2008 22:22:55 -0500, Peter Bergner <[EMAIL PROTECTED]> wrote:
>  > The difference between a compressed upper triangular bit matrix from a 
> standard
>  > upper triangular bit matrix like the one above, is we eliminate space from 
> the
>  > bit matrix for conflicts we _know_ can never exist.  The easiest case to 
> catch,
>  > and the only one we catch at the moment, is that allocnos that are "local" 
> to a
>  > basic block B1 cannot conflict with allocnos that are local to basic block 
> B2,
>  > where B1 != B2.  To take advantage of this fact, I updated the code in 
> global.c
>  > to sort the allocnos such that all "global" allocnos (allocnos that are 
> live in
>  > more than one basic block) are given smaller allocno numbers than the 
> "local"
>  > allocnos, and all allocnos for a given basic block are grouped together in 
> a
>  > contiguous range to allocno numbers.  The sorting is accomplished by:
>  >
>  >   /* ...so we can sort them in the order we want them to receive
>  >  their allocnos.  */
>  >   qsort (reg_allocno, max_allocno, sizeof (int), regno_compare);
>  >
>
>   ...
>   It's useful when there are increases or decreases in the number of BBs.

Topicing off about the recent stupidity's discussion, the chained
upper triangulars
 and rectangulars are more flexible than a bigger compressed upper triangular
 because
   how expensive is modifying the compressed upper triangular when
  1) add o remove basic blocks?
  2) add o remove allocnos?

In the chained case, you can call to subroutine to make it consistent after of
 adding or removing basic blocks or allocnos, it's traversing the chains and
 remallocing the many local memoryspaces of BBs.

In the compressed case, you have to realize complex and expensive routines
 for remallocing the big compressed upper triangular.

   J.C.Pizarro


Re: Database for GCC

2008-04-29 Thread J.C. Pizarro
On Tue, 29 Apr 2008 08:16:14 -0500, "Tom Browder" <[EMAIL PROTECTED]> wrote:
> A naive thought, perhaps:
>
> Would there be any advantage to using a key-value embedded database
> program for the voluminous maps needed for gcc optimization, etc.?
>
> If so, consider .
>
> I have used its predecessor, qdbm, for years and it was very fast.  TC
> is faster still.
>
> I notice many threads here about memory requirements and speed
> reductions--perhaps there is some way TC could help with those
> problems.
>
> -Tom

It's interesting but this database manager TC as the SQLite are
 non-concurrent or non-parallel libraries versus their perspective
 another libraries DB4 and MySQL that can be concurrent or
 parallel.

But it's not a general problem. You can use them using filelocks
 but the access to this database manager is non-concurrent.

The applicability of TC for GCC can be enormeus:
* for storing compiled objects with many attributes of information.
* for storing profiles of the runs.
* for storing logs of the compilations of experimental GCC.
* for browsing this locked database for track the internal details of GCC.
* etc.

   J.C.Pizarro


Re: IRA for GCC 4.4

2008-04-29 Thread J.C. Pizarro
On Tue, 29 Apr 2008 20:25:56 +0200, "Steven Bosscher"
<[EMAIL PROTECTED]> wrote:
> On Tue, Apr 29, 2008 at 6:22 PM, Mark Mitchell <[EMAIL PROTECTED]> wrote:
> >  Vladimir, if you feel that Peter's code cannot be used directly in IRA,
> > would you please explain why not?
>
> I think he already has explained, see
> http://gcc.gnu.org/ml/gcc/2008-04/msg00730.html
>
> Having looked at IRA a bit, I think I have to agree with Vlad that
> Peter's code is not easily adapted to IRA.  Peter's code works for a
> single, immutable conflict graph for the whole function.  IRA works
> with inter-region and intra-region conflicts (as far as I understand,
> documentation in ira-conflict.c would be welcome), so the sorting
> trick that Peter uses, doesn't translate one-to-one to Vlad's
> allocator.
>
> Having said that, I think the "square" approach with
> mirror_conflicts() that IRA currently has, is a big step backward from
> what we have now.  IRA should at least have a representation for
> conflicts that does not duplicate information unnecessary.  The bits
> that seem to be bad in this respect are build_conflict_bit_table() and
> mirror_conflicts().  It's not clear to me how these are used, but it
> looks like you can end up building a square conflict graph for the
> whole function, like GCC did before Peter's work.  This could be a
> huge memory and speed regression if it isn't addressed.
>
> Another note: IRA uses VARRAYs, and I was under the impression we are
> trying to move away from those.  Vlad, please use VECs instead.
>
> Gr.
> Steven

Use my idea of flexible chained upper triangulars & rectangulars as
 indicated briefly in

http://gcc.gnu.org/ml/gcc/2008-04/msg00707.html
http://gcc.gnu.org/ml/gcc/2008-04/msg00681.html

You can update easily these structures through subroutines that
 traverse the chains and update their chained local structures when
 is needed, and is similar to the Observer pattern (when the subjects
 are modified then it update the views of the observers too).

   J.C.Pizarro


Re: 2nd quarter of 2007 and no GPL code of Java from Sun.

2007-05-08 Thread J.C. Pizarro

2007/5/2, Casey Marshall <[EMAIL PROTECTED]> wrote:

> From Sun, there are not notice, news, etc about the process of GPLing
> the OpenJDK.

JavaOne begins May 8th.

Cheers.



Today,
there are any news from JavaOne?


Re: GIMPLE temporary variables

2007-05-08 Thread J.C. Pizarro

Andrea Callia D'Iddio <[EMAIL PROTECTED]> wrote:

Hi all,

I'm writing a new compilation pass in gcc, and I'm working on GIMPLE
code. When gcc produce GIMPLE code, it creates new temporary
variables, in order to simplify expressions and statements. For
example, if source C file contains
a=a+b+19;
then GIMPLE code is
D.1295 = a + b;
a = D.1295 + 19;
how can I recognize temporary variables, such as D.1295?
Thanks for support! bye!!


Andrea Callia D'Iddio


This GIMPLE code is bad.

It's better without a new temporary variable D.:
a = a + b;
a = a + 19;


Re: GCC 4.2.0 Status Report (2007-05-11)

2007-05-11 Thread J.C. Pizarro

On 5/12/07, Mark Mitchell <[EMAIL PROTECTED]> wrote:

PR 31797: An infinite loop in the compiler while building RTEMS.


1. Can you localize its last output that stops in its internal infinite loop?
2. Or, is there an infinite outputting in the console?


GCC's trunk, it is necessary to improve the timings from gprof/gcc -pg.

2007-05-15 Thread J.C. Pizarro

Hi developers,

For the current trunk of GCC, thinking about
the related thing of gprof and option -pg of GCC,

it's important to output correctly the data with non-fatal accuracy,
preferably 4 digits decimal instead of 2, e.g 0. ms instead of 0.00 s.

It's important so that the Amdahl's Law can be applicable,
mainly to improve the gain of sequential programs.
It does not have anything to do with the software's parallelization.

http://en.wikipedia.org/wiki/Amdahl's_law

To understand, to see the A & B figure and the speedup 1 / (F + (1 - F)/N).

Seeing the graph, what happens if B1 is 0.00, B2 is 0.00, .. Bn is 0.00
because of bad accuraccy of 2 digits decimal and in seconds instead of ms?
(being B=B1+B2+..+Bn).

http://gcc.gnu.org/ml/gcc/2007-04/msg00175.html
http://gcc.gnu.org/ml/gcc/2007-04/msg00176.html
http://gcc.gnu.org/ml/gcc/2007-04/msg00177.html


Re: GCC's trunk, it is necessary to improve the timings from gprof/gcc -pg.

2007-05-15 Thread J.C. Pizarro

2007/5/15, J.C. Pizarro <[EMAIL PROTECTED]> wrote:

http://en.wikipedia.org/wiki/Amdahl's_law


It's a wrong link, the next is the correct

http://en.wikipedia.org/wiki/Amdahl%27s_law


Re: GCC 4.2.0 Status Report (2007-05-11)

2007-05-15 Thread J.C. Pizarro

2007/5/12, Mike Stump <[EMAIL PROTECTED]> wrote:

On May 11, 2007, at 3:36 PM, J.C. Pizarro wrote:
> On 5/12/07, Mark Mitchell <[EMAIL PROTECTED]> wrote:
>> PR 31797: An infinite loop in the compiler while building RTEMS.
>
> 1. Can you localize its last output that stops in its internal
> infinite loop?
> 2. Or, is there an infinite outputting in the console?

Did you read the referenced bug report?  I suspect not.  If not,
please consider that you consume $100 for each email you post here
and ask yourself, have I donated $100 worth of code to the project
recently to pay for the cost of the email.  If not, consider not
posting the email or donating some more code to pay for it.



Are you a president?

You are mistaken, i have my freedom of expression. I am free like any person.

I'm not a spammer.

With the current tools (e.g. bugzilla, mail searcher, ..), if you can not
localize the problem's origin then  the problem will persist in the future.

Is this due to bad A.I.? Poor agent of semantic network?


Re: GCC's trunk, it is necessary to improve the timings from gprof/gcc -pg.

2007-05-15 Thread J.C. Pizarro

2007/5/15, Joe Buck <[EMAIL PROTECTED]> wrote:

On Tue, May 15, 2007 at 10:32:09PM +0200, J.C. Pizarro wrote:
> For the current trunk of GCC, thinking about
> the related thing of gprof and option -pg of GCC,
>
> it's important to output correctly the data with non-fatal accuracy,
> preferably 4 digits decimal instead of 2, e.g 0. ms instead of 0.00 s.

On many platforms, there is only a timer tick 60 times per second.
Reporting the result to four places will still give a zero if no
timer tick occurred during the execution of a given function.
The output is printing to the approximate precision of the data,
so printing two places is the correct thing to do.


It's not well reasoned.
Is this the reason to use only 2 digits decimal because a timer only exists?



For more accuracy, you need to run the program for a longer time.



It's false, it's like a false positive.
There are cases in that stretching the run time does not improve accuracy
of the program.
For a better accuracy, it's slowering the CPU clock to stretch the accuracy,

This is obtained of several ways:
* to use an old PC of few MHz.
* to lower the frequency of clock of a modern PC.
* or to use a slow emulator, but in the reality, its real timings are
unpredictable
due to the difficulty to complete all its necessary logic like predicting the
timings of the scheduling of the instructions and of the memories like
the caches
and the DRAMs.


Most of your queries would be better suited to gcc-help than to gcc,
because you don't understand the compiler well enough to make
useful contributions to its development.


Are you saying me that I don't understand compilers? and if I am clever?.


Re: GCC's trunk, it is necessary to improve the timings from gprof/gcc -pg.

2007-05-15 Thread J.C. Pizarro

2007/5/16, Joe Buck <[EMAIL PROTECTED]> wrote:

On Wed, May 16, 2007 at 12:02:57AM +0200, J.C. Pizarro wrote:
> 2007/5/15, Joe Buck <[EMAIL PROTECTED]> wrote:
> [ explanation of why gprof is as it is ]
> It's not well reasoned.

If you don't like my explanation, feel free to rewrite the software;
it is free software after all.  This list usually responds even to
good ideas with "patches welcome", meaning that it is up to the
proposer to do the work.

I'm not going to engage you in further discussion.



With my comments, i can help them to attempt to improve the things.

Before beginning, this will be an affair between 3 entities:
  gprof, gcc and linux kernel.

The tricks are to use instructions of timing of CPU like RDTSC in addition to
timers clocks (ticks: 1 Hz or 50, 60, 100, 120, 250, 1000 Hz) and
kernel patches.

The kernel will have to inform (if it was requested by the user process through
a system call) to the user process the corresponding needed information being
written it to a memory's cell previously requested.
Any type of flaw does not have to happen.

This information must contain minimumly the number (long64) of the
subcontext sequence uniquely identified as a result of the context's switches,
system calls, interruptions, signals, etc.

The kernel or a daemon must give a detailed list of the timings of
subcontexts that
were available for only the requested process of user (user time of program's
running only) without counting the lost times by I/O, interruptions,
signals, etc.
The list of the lost times can be requested too.

With this numerous information, the profiler can collect this data and
calculate the
timings shown as presentation in a horizontal bar (or vertical bar) to
be applicable
the Amhdal's Law.

For instance, this tuple stored in the memory's cell of the user
process gives an idea

* number of quantum's subcontext: long64
* moment of time that the subcontext started: long64 (CPU cycles, from RDTSC)
* (optional) moment  subcontext started: uint32 (Unix's seconds from 1970)
* etc.

(a context is formed by several subcontexts)

Initially, it's known that the duration of one ideal RDTSC's cycle is calculated
(double64) or that how many cycles of RDTSC are one second (long64).

Then, profiler's functions can use this tuple to compare if the differences
between RDTSC's values are inside of the time quantum or not.

There are many other alternatives as don't store it in the memory's cell of the
user process. It's like to store the tuples in the list of the kernel, of the
daemon or of another place.

This is only the end point of an iceberg.

Sincerely yours, J.C.


Re: GCC's trunk, it is necessary to improve the timings from gprof/gcc -pg.

2007-05-15 Thread J.C. Pizarro

2007/5/16, J.C. Pizarro <[EMAIL PROTECTED]> wrote:

For instance, this tuple stored in the memory's cell of the user
process gives an idea

* number of quantum's subcontext: long64
* moment of time that the subcontext started: long64 (CPU cycles, from RDTSC)
* (optional) moment  subcontext started: uint32 (Unix's seconds from 1970)
* etc.


I'm sorry, i forgot to put the fields of the finished moments of the previous
subcontext (or last subcontext).

* number of quantum's subcontext: long64
* moment of time that the subcontext started: long64 (CPU cycles, from RDTSC)
* moment of time that the subcontext finished: long64 (CPU cycles, from RDTSC)
* (optional) moment  subcontext started: uint32 (Unix's seconds from 1970)
* (optional) moment  subcontext finished: uint32 (Unix's seconds from 1970)
* etc.

Note: to request the seconds from 1970 of timer clock can be slow (or not) when
the things are measured in nanoseconds, so that by a design decision, it is
possible not to recommend the use of this slow request.

Sincerely yours, J.C.


I don't understand some of gcc-4.1-20070514

2007-05-17 Thread J.C. Pizarro

I suppose that there are some bugs in the snapshot gcc-4.1-20070514.

gcc/rtl.h
-

/* Register Transfer Language EXPRESSIONS CODE CLASSES */

enum rtx_class  {
 /* We check bit 0-1 of some rtx class codes in the predicates below.  */

 /* Bit 0 = comparison if 0, arithmetic is 1 #<-wrong!   v- bit 0
Bit 1 = 1 if commutative.  */#<-wrong! v- bit 1
 RTX_COMPARE,   /* 0 */   # 0 = 0 0 0 0
 RTX_COMM_COMPARE,   # 1 = 0 0 0 1
 RTX_BIN_ARITH,  # 2 = 0 0 1 0
 RTX_COMM_ARITH, # 3 = 0 0 1 1

 /* Must follow the four preceding values.  */
 RTX_UNARY, /* 4 */   # 4 = 0 1 0 0

 RTX_EXTRA,  # 5 = 0 1 0 1
 RTX_MATCH,  # 6 = 0 1 1 0
 RTX_INSN,   # 7 = 0 1 1 1

 /* Bit 0 = 1 if constant.  */
 RTX_OBJ,   /* 8 */   # 8 = 1 0 0 0
 RTX_CONST_OBJ,  # 9 = 1 0 0 1

 RTX_TERNARY,# 10= 1 0 1 0
 RTX_BITFIELD_OPS,   # 11= 1 0 1 1
 RTX_AUTOINC # 12= 1 1 0 0
};

#
# BEGIN CORRECTING below will be correct!
#
 /* We check bit 0-1 of 4 first rtx class codes in the predicates below.
Bit 0 = 1 if commutative.
Bit 1 = 0 if comparison, 1 if arithmetic.  */
#
# END CORRECTING
#



gcc/loop.c
--
#
# To see who uses loop_invariant_p that doesn't return a boolean,
#returns an int with values 0, 1 and 2.
#
# Be careful with it when you will really assume inside of the action's body
# of IF because it's 1 or 2 (or it's !0), and when wont really because it is
# 0 (or it's !1 and !2).
# It has to distinguish easyly if
# IT REALLY IS :
#   =0 : "NOT INVARIANT" OF THE LOOP !!!
#   =1 : "INVARIANT INCONDITIONAL" OF THE LOOP !!!
#   =2 : "INVARIANT CONDITIONAL" OF THE LOOP !!!

# Some lines put condition ([XXX =] loop_invariant_p) and another lines
# put the equivalent condition (([XXX =] loop_invariant_p) != 0), and
# i don't understand the why of this difference?

# Actually, in the src code, it's confuse.
# a) tem = loop_invariant_p (...)
#  What meaning is it when you think in the algorithm?
#  * Does it mean that it can be "INVARIANT INCOND." or "INVARIANT COND."?
#  * Or, does it mean that it's "INVARIANT INCOND." but not "INVARIANT COND."?
# b) (... && loop_invariant_p (...))
#  Does you think that it will be true because (... && true')?
#  * Does it mean that it will be true' because it was
#  "INVARIANT INCOND."(=1) or "INVARIANT COND."(=2)?
#  * Or only "INVARIANT INCOND."(=1) because true' is 1 and
#  "INVARIANT COND."(=2) doesn't count?
# c) if (!loop_invariant_p (loop, iv->add_val))
#  "! NOT INVARIANT" is true "INVARIANT" because not(not(x))=x, but
#  "INVARIANT COND." only or "INVARIANT INCOND." only or either?
# d) (... || ! loop_invariant_p (...) || loop_invariant_p (...))
#  I see it a little dangerous. || is applied to bools, but they are ints.
#  Is it "|| beetween bools casted from ints" or "| beetween ints"?

#
# BEGIN CORRECTING PROPOSAL (it will reduce the confusion)
#
To define the predicates

IS_NOT_INVARIANT_OF_LOOP_P (...)
IS_INVARIANT_INCONDITIONAL_OF_LOOP_P (...)
IS_INVARIANT_CONDITIONAL_OF_LOOP_P (...)

IS_NOT_INVARIANT__OR_INVARIANT_INCONDITIONAL_OF_LOOP_P (...)
IS_NOT_INVARIANT__OR_INVARIANT_CONDITIONAL_OF_LOOP_P (...)
IS_INVARIANT_INCONDITIONAL_OR_CONDITIONAL_OF_LOOP_P (...)

note: i don't recommend to use ! with them (except some very rare situations).
(it's complete, good completeness, understandable like natural language)
#
# END CORRECTING PROPOSAL
#

/* Like rtx_equal_p, but attempts to swap commutative operands.  This is
  important to get some addresses combined.  Later more sophisticated
  transformations can be added when necessary.

  ??? Same trick with swapping operand is done at several other places.
  It can be nice to develop some common way to handle this.  */

static int
rtx_equal_for_prefetch_p (rtx x, rtx y)
{
 ...
 if (COMMUTATIVE_ARITH_P (x))
   {
 return ((rtx_equal_for_prefetch_p (XEXP (x, 0), XEXP (y, 0))
   && rtx_equal_for_prefetch_p (XEXP (x, 1), XEXP (y, 1)))
  || (rtx_equal_for_prefetch_p (XEXP (x, 0), XEXP (y, 1))
  && rtx_equal_for_prefetch_p (XEXP (x, 1), XEXP (y, 0;
   }
 ...
}

 Why not i can add

 if (COMMUTATIVE_ARITH_P (x) || COMMUTATIVE_COMPARE_P (x) || ...)  ?

 Or will it be a problem in the order of the execution like
the aliasing can affect them,..?

  Remember, the binary operators
'+', '*', '==', '!=', '&', '|', '^', ('&&' and '||' are exception) ...
are mathematically commutative.  ('-', '/', '%' are not)

a + b   <=>   b + a  a & b   <=>   b & a
a * b   <=>   b * a  a | b   <=>   b | a
a == b  <=>  b == a  a ^ b   <=>   b ^ a

Re: I don't understand some of gcc-4.1-20070514

2007-05-17 Thread J.C. Pizarro

2007/5/18, Eric Botcazou <[EMAIL PROTECTED]> wrote:

> I suppose that there are some bugs in the snapshot gcc-4.1-20070514.

Dozens, literally, just browse the bug database.  If you want to help, pick
one of them and try to fix it.

--
Eric Botcazou


How?
I can not browse http://gcc.gnu.org/bugzilla/ because has not the
'browse' button.


Re: I don't understand some of gcc-4.1-20070514, a patch here.

2007-05-19 Thread J.C. Pizarro

Hi developers,

for this http://gcc.gnu.org/ml/gcc/2007-05/msg00451.html

you have this nice cleanup's patch of gcc/loop.c that transliterates the logic
 of the uses of the loop_invariant_p (..) and consec_sets_invariant_p (..)
 functions.

I've patched it, builded and executed, and again again with this patched gcc.
It's OK.

It will be more readable and comprehensible. I did patch the comprehension!

To the expert person only, please, read now the loop algorithms that
uses *_OF_LOOP_P if there is any failed logic or not.

Sincerely yours, J.C.


gcc-4.1-20070514_renaming_logic_of_loop_invariant_p.patch
Description: Binary data


Re: I don't understand some of gcc-4.1-20070514, a patch here.

2007-05-19 Thread J.C. Pizarro

2007/5/19, Eric Botcazou <[EMAIL PROTECTED]> wrote:

[Please do not cross post between lists and do not send useless attachments.]

> I've patched it, builded and executed, and again again with this patched
> gcc. It's OK.

You apparently didn't read my previous message carefully.  The patch is
rejected because of the reason previous stated.

--
Eric Botcazou



Eric, what "reason previous stated"?

Have you seem my attachment gcc_build.sh that i did follow the steps of
contribute.html?


  1   2   >