Finding canonical names of systems

2006-11-27 Thread Ulf Magnusson

How are you supposed to find the canonical name of a system (of known
type) in CPU-Vendor-OS form in the general case? If you have access to
a system of that particular type, you can run config.guess to find
out, but you might not have, and that approach won't work for many
systems anyway. The canonical name needs to be known e.g. when
cross-compiling and building cross-compilers.

The only way I could find to get a list of canonical CPU, Vendor and
OS strings was to dig through /usr/share/gnuconfig/config.sub on my
GNU/Linux systems, which needless to say is about as bad as it gets
from a documentation perspective. Is there any other way to get a list
mapping CPU's, Vendors and OS's to their canonical strings? If there
isn't, I think it's making things much more complicated than they
should be.

/Ulf Magnusson


Re: Finding canonical names of systems

2006-11-28 Thread Ulf Magnusson

On 11/28/06, Mike Stump <[EMAIL PROTECTED]> wrote:

[ first, this is the wrong list to ask such question, gcc-help is the
right one ]

On Nov 27, 2006, at 7:25 PM, Ulf Magnusson wrote:
> How are you supposed to find the canonical name of a system (of
> known type) in CPU-Vendor-OS form in the general case?

In the general case, you ask someone that has such a machine to run
config.guess, or failing that, you ask someone, or failing that, you
just invent the obvious string and use it.


That still feels like a very roundabout way to do it, and it won't work for
systems that don't have Unices for them (like some small embedded systems).
Having to ask someone just to  find this information also feels a bit silly,
when it is at crucial as it is e.g. in building cross-compilers. The correct
strings to use aren't always immediately obvious either, if you feel like
guessing. It think the lack of documentation might scare people away.



Most portable software doesn't much care just what configuration you
specify, some very non-portable software will fail to function unless
you provide exactly the right string.  gcc is of the later type, if
you're interested in building it.



Yes, the reason I ended up here is that I wanted to build GCC as a
cross-compiler, and was disappointed that no one had even made a
comprehensible list of how to represent different CPU's, vendors and
OS's. I see this as a bug in the documentation.


> If you have access to a system of that particular type, you can run
> config.guess to find out, but you might not have, and that approach
> won't work for many systems anyway.

That approach always works on all host systems.  :-)  If it didn't,
that'd be a bug and someone would fix it.



As previously mentioned, that won't work on systems that can't even run sh, like
tiny embedded devices, and it's still a somewhat roundabout and silly method.


> The canonical name needs to be known e.g. when cross-compiling and
> building cross-compilers.

Ah, for crosses, you have to know what you want, and what you want is
what you specify.  If your question is, what do you want, well, you
want what you want.  Either, it works, or, you've not ported the
compiler.

For example, you can configure --target=arm, if you want an arm, or --
target=m68k if you want an m68k, or sparc, if you want sparc, or ppc
if you want ppc, or powerpc if you want powerpc, or x86_64, if you
want x86_64, or arm-elf, if you want arm-elf, or sparc-aout if you
want that.  The list _is_ endless.  If you interested in a specific
target, tell us which one and we'll answer the question specifically.



Yes, and  if you want Foo.
The problem is knowing what the string representation is. What I'd like to
see is a list like the following.

..
Materola 2560xx series -> ma256k
Pute1000 64 bit -> p1k64
..
Macrosoft Inc. -> mcinc
Amtel -> amt
..
FroBar OS Version X.Y -> froos-x.y
Lunix x.y -> lunix-x.y
..


If you want pre-formed ideas for targets that might be useful, you
can check out:

   http://gcc.gnu.org/install/specific.html
   http://gcc.gnu.org/buildstat.html
   http://gcc.gnu.org/ml/gcc-testresults/

I was thinking there was one other that tried to be exhaustive, but
maybe we removed the complete list years ago.

Aside from that, yes, reading though the config type files is yet
another way.



Those are helpful links, but I still think there should be an easy-to-find
comprehensive list documenting the strings to use in the canonical triplet
for particular CPU's, vendors and OS's. "How do I represent system foo?"
seems like a reasonable question to ask, and "dig through a shell script"
is not very satisfactory. If that is the only way to find a truly comprehensive
list, it should at least be mentioned in the documentation.

/Ulf Magnusson


Re: Finding canonical names of systems

2006-11-29 Thread Ulf Magnusson

On 11/29/06, Michael Eager <[EMAIL PROTECTED]> wrote:

Ulf Magnusson wrote:
> How are you supposed to find the canonical name of a system (of known
> type) in CPU-Vendor-OS form in the general case? If you have access to
> a system of that particular type, you can run config.guess to find
> out, but you might not have, and that approach won't work for many
> systems anyway. The canonical name needs to be known e.g. when
> cross-compiling and building cross-compilers.
>
> The only way I could find to get a list of canonical CPU, Vendor and
> OS strings was to dig through /usr/share/gnuconfig/config.sub on my
> GNU/Linux systems, which needless to say is about as bad as it gets
> from a documentation perspective. Is there any other way to get a list
> mapping CPU's, Vendors and OS's to their canonical strings? If there
> isn't, I think it's making things much more complicated than they
> should be.

Strictly speaking, there isn't anything that is a canonical name
for a particular configuration, if you mean a single correct name.
GCC uses the vendor name to simplify figuring out the desired target
architecture.

For most cross-compilations, the vendor name is ignored.  There may
be a multitude of vendors, for example.  (How many MIPS vendors can
you name?)



It would be helpful if the documentation said this. I believe I'm not the only
one who immediately went hunting in the docs for some kind of list with
different CPU's, vendors and OS's, or at least some guide on how to find
a string for your configuration.


Take a look at configure and config.gcc.   Find the architecture you
are interested in.  Look at the names defined for the architecture and
pick the best one.

For example, in configure you will find
   powerpc-*-eabi)
 noconfigdirs="$noconfigdirs ${libgcj}"
 ;;

This says that powerpc--eabi is a valid configuration.

This is further refined in config.gcc, where you will find
that powerpc-*-eabi is a bit different from powerpc-*-eabisim.



This should also be in the doc, if it is the way you have to do it.
Right now, I believe it pretty much glosses over the issue of how
to find a suitable string for the configuration you want to
represent.


If you are looking for a comprehensive list of all possible
configurations, rather than just trying to find the correct
one for your particular application, you will find that there
are an infinite number of configurations.



I understand this now, but the docs could be more helpful
in explaining how to find the string to use for your configuration.

While searching for an answer, I noticed that lots of people seem
to have had problems with cross-compilation that to me look more
like problems in the documentation, which I find a bit sad.

/Ulf Magnusson


Re: Finding canonical names of systems

2006-11-29 Thread Ulf Magnusson

On 11/29/06, Michael Eager <[EMAIL PROTECTED]> wrote:

Ulf Magnusson wrote:

> While searching for an answer, I noticed that lots of people seem
> to have had problems with cross-compilation that to me look more
> like problems in the documentation, which I find a bit sad.

Rather than repeatedly complain, the most constructive
contribution would be to contribute to the project.

You can feel sad all you want, but being patronizing is
not going to get much sympathy.



I'm sorry if I came off as patronizing, it's not the way it was meant
to sound. It's just that I've seen a lot of open source software that
has this problem, and I don't like it because I think it hinders the
spread of open source software.

I'd be happy to contribute some documentation on this. I just hope I
have a firm enough grip on the issue. Where should I send drafts for
review? Is there some other resource I should be aware of besides
http://gcc.gnu.org/contribute.html?

/Ulf Magnusson


Suboptimal __restrict optimization?

2011-10-01 Thread Ulf Magnusson
Hi,

Given the code

class C { void f(int *p); int q; };

void C::f(int * __restrict p) __restrict {
q += 10;
*p = 7;
q += 10;
}

g++ 4.5.2 with -O3 generates the following for C::f() (prologue and
epilogue omitted):

mov0x8(%ebp),%eax // eax = this (= &q)
mov0xc(%ebp),%ecx // ecx = p
mov(%eax),%edx// edx = q
movl   $0x7,(%ecx)// *p = 7
add$0x14,%edx // q += 20
mov%edx,(%eax)// save q

If C::f() is rearranged as

void C::f(int * __restrict p) __restrict {
*p = 7;
q += 10;
q += 10;
}

the following is generated instead:

mov0x8(%ebp),%eax // eax = this (= &q)
mov0xc(%ebp),%edx // edx = p
movl   $0x7,(%edx)// *p = 7
addl   $0x14,(%eax)   // q += 20

Is there some reason why GCC couldn't generate this code for the first
version of C::f()? Is this a failure of optimization, or am I missing
something in how __restricted works?

/Ulf


Re: Suboptimal __restrict optimization?

2011-10-04 Thread Ulf Magnusson
On Mon, Oct 3, 2011 at 10:22 PM, Ian Lance Taylor  wrote:
> Ulf Magnusson  writes:
>
>> Is there some reason why GCC couldn't generate this code for the first
>> version of C::f()? Is this a failure of optimization, or am I missing
>> something in how __restricted works?
>
> It's a failure of optimization.
>
> Ian
>

Is this something that has been improved in 4.6.x? (Sorry for the
initial non-reply-all.)


Option to make unsigned->signed conversion always well-defined?

2011-10-05 Thread Ulf Magnusson
Hi,

I've been experimenting with different methods for emulating the
signed overflow of an 8-bit CPU. The method I've found that seems to
generate the most efficient code on both ARM and x86 is

bool overflow(unsigned int a, unsigned int b) {
const unsigned int sum = (int8_t)a + (int8_t)b;
return (int8_t)sum != sum;
}

(The real function would probably be 'inline', of course. Regs are
stored in overlong variables, hence 'unsigned int'.)

Looking at the spec, it unfortunately seems the behavior of this
function is undefined, as it relies on signed int addition wrapping,
and that (int8_t)sum truncates bits. Is there some way to make this
guaranteed safe with GCC without resorting to inline asm? Locally
enabling -fwrap takes care of the addition, but that still leaves the
conversion.

/Ulf


Re: Option to make unsigned->signed conversion always well-defined?

2011-10-05 Thread Ulf Magnusson
On Wed, Oct 5, 2011 at 10:11 PM, Ulf Magnusson  wrote:
> Hi,
>
> I've been experimenting with different methods for emulating the
> signed overflow of an 8-bit CPU. The method I've found that seems to
> generate the most efficient code on both ARM and x86 is
>
> bool overflow(unsigned int a, unsigned int b) {
>    const unsigned int sum = (int8_t)a + (int8_t)b;
>    return (int8_t)sum != sum;
> }
>
> (The real function would probably be 'inline', of course. Regs are
> stored in overlong variables, hence 'unsigned int'.)
>
> Looking at the spec, it unfortunately seems the behavior of this
> function is undefined, as it relies on signed int addition wrapping,
> and that (int8_t)sum truncates bits. Is there some way to make this
> guaranteed safe with GCC without resorting to inline asm? Locally
> enabling -fwrap takes care of the addition, but that still leaves the
> conversion.
>
> /Ulf
>

Is *((int8_t*)&sum) safe (assuming little endian)? Unfortunately that
seems to generate worse code. On X86 it generates the following (GCC
4.5.2):

0050 <_Z9overflow4jj>:
  50:   83 ec 10sub$0x10,%esp
  53:   0f be 54 24 18  movsbl 0x18(%esp),%edx
  58:   0f be 44 24 14  movsbl 0x14(%esp),%eax
  5d:   8d 04 02lea(%edx,%eax,1),%eax
  60:   0f be d0movsbl %al,%edx
  63:   39 d0   cmp%edx,%eax
  65:   0f 95 c0setne  %al
  68:   83 c4 10add$0x10,%esp
  6b:   c3  ret

With the straight (int8_t) cast you get

  50:   0f be 54 24 08  movsbl 0x8(%esp),%edx
  55:   0f be 44 24 04  movsbl 0x4(%esp),%eax
  5a:   8d 04 02lea(%edx,%eax,1),%eax
  5d:   0f be d0movsbl %al,%edx
  60:   39 c2   cmp%eax,%edx
  62:   0f 95 c0setne  %al
  65:   c3  ret

What's with the extra add/sub of ESP?

/Ulf


Re: Option to make unsigned->signed conversion always well-defined?

2011-10-06 Thread Ulf Magnusson
On Thu, Oct 6, 2011 at 12:55 AM, Pedro Pedruzzi
 wrote:
> Em 05-10-2011 17:11, Ulf Magnusson escreveu:
>> Hi,
>>
>> I've been experimenting with different methods for emulating the
>> signed overflow of an 8-bit CPU.
>
> You would like to check whether a 8-bit signed addition will overflow or
> not, given the two operands. Is that correct?
>
> As you used the word `emulating', I am assuming that your function will
> not run by the mentioned CPU.
>

No, it'll most likely only run on systems with a wider bitness.

> Does this 8-bit CPU use two's complement representation?

Yes, and the criterion for signed overflow is "both numbers have the
same sign, but the sign of the sum is different". Should have made
that more clear.

>
>> The method I've found that seems to
>> generate the most efficient code on both ARM and x86 is
>>
>> bool overflow(unsigned int a, unsigned int b) {
>> const unsigned int sum = (int8_t)a + (int8_t)b;
>> return (int8_t)sum != sum;
>> }
>>
>> (The real function would probably be 'inline', of course. Regs are
>> stored in overlong variables, hence 'unsigned int'.)
>>
>> Looking at the spec, it unfortunately seems the behavior of this
>> function is undefined, as it relies on signed int addition wrapping,
>> and that (int8_t)sum truncates bits. Is there some way to make this
>> guaranteed safe with GCC without resorting to inline asm? Locally
>> enabling -fwrap takes care of the addition, but that still leaves the
>> conversion.
>
> I believe the cast from unsigned int to int8_t is implementation-defined
> for values that can't be represented in int8_t (e.g. 0xff). A kind of
> `undefined behavior' as well.
>
> I tried:
>
> bool overflow(unsigned int a, unsigned int b) {
>const unsigned int sum = a + b;
>return ((a & 0x80) == (b & 0x80)) && ((a & 0x80) != (sum & 0x80));
> }
>
> But it is not as efficient as yours.
>
> --
> Pedro Pedruzzi
>

Yeah, I tried similar bit-trickery along the lines of

bool overflow(unsigned int a, unsigned int b) {
const uint8_t ab = (uint8_t)a;
const uint8_t bb = (uint8_t)b;
const uint8_t sum = ab + bb;
return (ab ^ bb) & ~(ab ^ sum) & 0x80;
}

, but it doesn't seem to generate very efficient code.

/Ulf


Re: Option to make unsigned->signed conversion always well-defined?

2011-10-06 Thread Ulf Magnusson
On Thu, Oct 6, 2011 at 10:25 AM, Ulf Magnusson  wrote:
> On Thu, Oct 6, 2011 at 12:55 AM, Pedro Pedruzzi
>  wrote:
>> Em 05-10-2011 17:11, Ulf Magnusson escreveu:
>>> Hi,
>>>
>>> I've been experimenting with different methods for emulating the
>>> signed overflow of an 8-bit CPU.
>>
>> You would like to check whether a 8-bit signed addition will overflow or
>> not, given the two operands. Is that correct?
>>
>> As you used the word `emulating', I am assuming that your function will
>> not run by the mentioned CPU.
>>
>
> No, it'll most likely only run on systems with a wider bitness.
>
>> Does this 8-bit CPU use two's complement representation?
>
> Yes, and the criterion for signed overflow is "both numbers have the
> same sign, but the sign of the sum is different". Should have made
> that more clear.
>
>>
>>> The method I've found that seems to
>>> generate the most efficient code on both ARM and x86 is
>>>
>>> bool overflow(unsigned int a, unsigned int b) {
>>>     const unsigned int sum = (int8_t)a + (int8_t)b;
>>>     return (int8_t)sum != sum;
>>> }
>>>
>>> (The real function would probably be 'inline', of course. Regs are
>>> stored in overlong variables, hence 'unsigned int'.)
>>>
>>> Looking at the spec, it unfortunately seems the behavior of this
>>> function is undefined, as it relies on signed int addition wrapping,
>>> and that (int8_t)sum truncates bits. Is there some way to make this
>>> guaranteed safe with GCC without resorting to inline asm? Locally
>>> enabling -fwrap takes care of the addition, but that still leaves the
>>> conversion.
>>
>> I believe the cast from unsigned int to int8_t is implementation-defined
>> for values that can't be represented in int8_t (e.g. 0xff). A kind of
>> `undefined behavior' as well.
>>
>> I tried:
>>
>> bool overflow(unsigned int a, unsigned int b) {
>>    const unsigned int sum = a + b;
>>    return ((a & 0x80) == (b & 0x80)) && ((a & 0x80) != (sum & 0x80));
>> }
>>
>> But it is not as efficient as yours.
>>
>> --
>> Pedro Pedruzzi
>>
>
> Yeah, I tried similar bit-trickery along the lines of
>
> bool overflow(unsigned int a, unsigned int b) {
>    const uint8_t ab = (uint8_t)a;
>    const uint8_t bb = (uint8_t)b;
>    const uint8_t sum = ab + bb;
>    return (ab ^ bb) & ~(ab ^ sum) & 0x80;
> }
>
> , but it doesn't seem to generate very efficient code.
>
> /Ulf
>

Might as well do

bool overflowbit(unsigned int a, unsigned int b) {
const unsigned int sum = a + b;
return (a ^ b) & ~(a ^ sum) & 0x80;
}

But still not very good output compared to other approaches as expected.

/Ulf


Re: Option to make unsigned->signed conversion always well-defined?

2011-10-06 Thread Ulf Magnusson
On Thu, Oct 6, 2011 at 11:04 AM, Miles Bader  wrote:
> Ulf Magnusson  writes:
>> Might as well do
>>
>> bool overflowbit(unsigned int a, unsigned int b) {
>>     const unsigned int sum = a + b;
>>     return (a ^ b) & ~(a ^ sum) & 0x80;
>> }
>>
>> But still not very good output compared to other approaches as expected.
>
> How about:
>
>   bool overflowbit2(unsigned int a, unsigned int b)
>   {
>       const unsigned int sum = a + b;
>       return ~(a ^ b) & sum & 0x80;
>   }
>
> ?
>
> I thik it has the same results as your function...
> [I just made a table of all 8 possibilities, and checked!]
>
> -miles
>
> --
> Circus, n. A place where horses, ponies and elephants are permitted to see
> men, women and children acting the fool.
>

Ops, should have been

return ~(a ^ b) & (a ^ sum) & 0x80

~(a ^ b) gives 1 in the sign bit position if the signs are the same,
and (a ^ sum) gives 1 if it's different in the sum.

A clearer way of writing it (that also generates suboptimal code) is

bool overflow(unsigned int a, unsigned int b) {
const unsigned asign   = a   & 0x80;
const unsigned bsign   = b   & 0x80;
const unsigned sumsign = (a + b) & 0x80;
return (asign == bsign) && (asign != sumsign);
}

Seems bit-fiddling isn't the way to go.

Maybe I should take this to gnu-help as it isn't really development-related.

/Ulf


Re: Option to make unsigned->signed conversion always well-defined?

2011-10-06 Thread Ulf Magnusson
(I'll cross-post this to gcc and keep it on gcc-help after that.)

On Thu, Oct 6, 2011 at 4:46 PM, Andrew Haley  wrote:
>
> inline int8_t as_signed_8 (unsigned int a) {
>  a &= 0xff;
>  return a & 0x80 ? (int)a - 0x100 : a;
> }
>
> int overflow(unsigned int a, unsigned int b) {
>  int sum = as_signed_8(a) + as_signed_8(b);
>  return as_signed_8(sum) != sum;
> }
>
> Andrew.
>

That's a really neat trick, and seems to generate identical code. Thanks!

I'd be interesting to know if this version produces equally efficient
code with MSVC.

To summarize what we have so far, here's four different methods along
with the code generated for X86 and ARM (GCC 4.5.2):

#include 

inline int8_t as_signed_8(unsigned int a) {
a &= 0xff;
return a & 0x80 ? (int)a - 0x100 : a;
}

bool overflow_range(unsigned int a, unsigned int b) {
const int sum = as_signed_8(a) + as_signed_8(b);
return sum < -128 || sum > 127;
}

bool overflow_bit(unsigned int a, unsigned int b) {
const unsigned int sum = a + b;
return ~(a ^ b) & (a ^ sum) & 0x80;
}

bool overflow_unsafe(unsigned int a, unsigned int b) {
const unsigned int sum = (int8_t)a + (int8_t)b;
return (int8_t)sum != sum;
}

bool overflow_safe(unsigned int a, unsigned int b) {
const int sum = as_signed_8(a) + as_signed_8(b);
return as_signed_8(sum) != sum;
}



Output for X86 with -O3 -fomit-frame-pointer:

 <_Z14overflow_rangejj>:
   0:   0f be 54 24 04  movsbl 0x4(%esp),%edx
   5:   0f be 44 24 08  movsbl 0x8(%esp),%eax
   a:   8d 84 02 80 00 00 00lea0x80(%edx,%eax,1),%eax
  11:   3d ff 00 00 00  cmp$0xff,%eax
  16:   0f 97 c0seta   %al
  19:   c3  ret
  1a:   8d b6 00 00 00 00   lea0x0(%esi),%esi

0020 <_Z12overflow_bitjj>:
  20:   8b 54 24 08 mov0x8(%esp),%edx
  24:   8b 4c 24 04 mov0x4(%esp),%ecx
  28:   89 d0   mov%edx,%eax
  2a:   31 c8   xor%ecx,%eax
  2c:   01 ca   add%ecx,%edx
  2e:   31 ca   xor%ecx,%edx
  30:   f7 d0   not%eax
  32:   21 d0   and%edx,%eax
  34:   a8 80   test   $0x80,%al
  36:   0f 95 c0setne  %al
  39:   c3  ret
  3a:   8d b6 00 00 00 00   lea0x0(%esi),%esi

0040 <_Z15overflow_unsafejj>:
  40:   0f be 54 24 08  movsbl 0x8(%esp),%edx
  45:   0f be 44 24 04  movsbl 0x4(%esp),%eax
  4a:   8d 04 02lea(%edx,%eax,1),%eax
  4d:   0f be d0movsbl %al,%edx
  50:   39 c2   cmp%eax,%edx
  52:   0f 95 c0setne  %al
  55:   c3  ret
  56:   8d 76 00lea0x0(%esi),%esi
  59:   8d bc 27 00 00 00 00lea0x0(%edi,%eiz,1),%edi

0060 <_Z13overflow_safejj>:
  60:   0f be 54 24 08  movsbl 0x8(%esp),%edx
  65:   0f be 44 24 04  movsbl 0x4(%esp),%eax
  6a:   8d 04 02lea(%edx,%eax,1),%eax
  6d:   0f be d0movsbl %al,%edx
  70:   39 c2   cmp%eax,%edx
  72:   0f 95 c0setne  %al
  75:   c3  ret



Output for ARM with -O3 -fomit-frame-pointer -mthumb -march=armv7:

 <_Z14overflow_rangejj>:
   0:   b249sxtbr1, r1
   2:   b240sxtbr0, r0
   4:   1808addsr0, r1, r0
   6:   3080addsr0, #128; 0x80
   8:   28ffcmp r0, #255; 0xff
   a:   bf94ite ls
   c:   2000movls   r0, #0
   e:   2001movhi   r0, #1
  10:   4770bx  lr
  12:   bf00nop
  14:   f3af 8000   nop.w
  18:   f3af 8000   nop.w
  1c:   f3af 8000   nop.w

0020 <_Z12overflow_bitjj>:
  20:   180baddsr3, r1, r0
  22:   4041eorsr1, r0
  24:   ea83 0200   eor.w   r2, r3, r0
  28:   ea22 0001   bic.w   r0, r2, r1
  2c:   f3c0 10c0   ubfxr0, r0, #7, #1
  30:   4770bx  lr
  32:   bf00nop
  34:   f3af 8000   nop.w
  38:   f3af 8000   nop.w
  3c:   f3af 8000   nop.w

0040 <_Z15overflow_unsafejj>:
  40:   b242sxtbr2, r0
  42:   b249sxtbr1, r1
  44:   1888addsr0, r1, r2
  46:   b243sxtbr3, r0
  48:   1a18subsr0, r3, r0
  4a:   bf18it  ne
  4c:   2001movne   r0, #1
  4e:   4770bx  lr

0050 <_Z13overflow_safejj>:
  50:   b242sxtbr2, r0
  52:   b249sxtbr1, r1
  54:   1888addsr0, r1, r2
  56:   b243sxtbr3, r0
  58:   1a18subsr0, r3, r0
  5a:   bf18it  ne
  5c:   2001movne   r0, #1
  5e:   4770bx  lr


Not sure which version would be fastest on ARM (

Re: Option to make unsigned->signed conversion always well-defined?

2011-10-07 Thread Ulf Magnusson
On Thu, Oct 6, 2011 at 11:31 PM, Florian Weimer  wrote:
> * Ulf Magnusson:
>
>> I've been experimenting with different methods for emulating the
>> signed overflow of an 8-bit CPU. The method I've found that seems to
>> generate the most efficient code on both ARM and x86 is
>>
>> bool overflow(unsigned int a, unsigned int b) {
>>     const unsigned int sum = (int8_t)a + (int8_t)b;
>>     return (int8_t)sum != sum;
>> }
>
> There's a GCC extension which is relevant here:
>
> | For conversion to a type of width N, the value is reduced modulo 2^N
> | to be within range of the type; no signal is raised.
>
> <http://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html#Integers-implementation>
>
> Using that, you can replace the final "& 0x80" with a signed
> comparison to zero, which should be give you the best possible code
> (for the generic RISC).  You only need to hunt down a copy of Hacker's
> Delight or find the right bit twiddling by other means. 8-)
>

Are you thinking of something like this?

bool overflow_bit2(unsigned int a, unsigned int b) {
const unsigned int ashift = a << 24;
const unsigned int bshift = b << 24;
const unsigned int sum = a + b;
return (int)(~(a ^ b) & (a ^ sum)) < 0;
}

That version generates

  80:   180baddsr3, r1, r0
  82:   4041eorsr1, r0
  84:   ea83 0200   eor.w   r2, r3, r0
  88:   ea22 0001   bic.w   r0, r2, r1
  8c:   0fc0lsrsr0, r0, #31
  8e:   4770bx  lr

Whereas the unshifted version generates

  40:   180baddsr3, r1, r0
  42:   4041eorsr1, r0
  44:   ea83 0200   eor.w   r2, r3, r0
  48:   ea22 0001   bic.w   r0, r2, r1
  4c:   f3c0 10c0   ubfxr0, r0, #7, #1
  50:   4770bx  lr

So maybe a bit better. (I'm no ARM pro, but the compiler does seem to
take advantage of the fact that it's testing the real sign bit at
least.)

Btw, & 0x8000 generates the same code.

/Ulf


Re: Option to make unsigned->signed conversion always well-defined?

2011-10-07 Thread Ulf Magnusson
On Fri, Oct 7, 2011 at 7:35 PM, Florian Weimer  wrote:
> * Ulf Magnusson:
>
>> Are you thinking of something like this?
>>
>> bool overflow_bit2(unsigned int a, unsigned int b) {
>>     const unsigned int ashift = a << 24;
>>     const unsigned int bshift = b << 24;
>>     const unsigned int sum = a + b;
>>     return (int)(~(a ^ b) & (a ^ sum)) < 0;
>> }
>
> Yes, but rather like :
>
>  bool overflow_bit2(unsigned char a, unsigned char b) {
>    const unsigned char sum = a + b;
>    return ((signed char)(~(a ^ b) & (a ^ sum))) < 0;
>  }
>
> It still results in abysmal code, given that this should result in two
> or three instructions on most architectures.
>
> Are machine code insertions an option?
>

Tried that version, but it seems to generate worse (or bigger anyway -
haven't benchmarked it) code:

  90:   eb01 0c00   add.w   ip, r1, r0
  94:   b2c2uxtbr2, r0
  96:   ea82 030c   eor.w   r3, r2, ip
  9a:   ea82 0101   eor.w   r1, r2, r1
  9e:   ea23 0001   bic.w   r0, r3, r1
  a2:   f3c0 10c0   ubfxr0, r0, #7, #1
  a6:   4770bx  lr
  a8:   f3af 8000   nop.w
  ac:   f3af 8000   nop.w

Good machine code would be fun to see, though I might need to brush up
on my ARM.

/Ulf


Re: [C++] Possible GCC bug

2012-11-14 Thread Ulf Magnusson
On Wed, Nov 14, 2012 at 6:10 PM, Piotr Wyderski
 wrote:
> The following snippet:
>
> class A {};
> class B : public A {
>
>typedef A super;
>
> public:
>
>class X {};
> };
>
>
> class C : public B {
>
>typedef B super;
>
>class X : public super::X {
>
>   typedef super::X super;
>};
> };
>
> compiles without a warning on Comeau and MSVC, but GCC (4.6.1 and
> 4.7.1) failes with the following message:
>
> $ gcc -c bug.cpp
> bug.cpp:18:24: error: declaration of ‘typedef class B::X C::X::super’
> [-fpermissive]
> bug.cpp:14:14: error: changes meaning of ‘super’ from ‘typedef class B
> C::super’ [-fpermissive]
>
> Should I file a report?
>
> Best regards, Piotr

Here's a two-line TC:

typedef struct { typedef int type; } s1;
struct S2 { s1::type s1; };

Fails with GCC 4.6.3; succeeds with clang 3.0. Looks like a bug to me.

/Ulf