How to link a static lib when build a shared lib ?

2009-08-08 Thread Andy
Hi all,
I got that, glibc can support SHA in crypt lib since v2.7.

There is a requirement in my application to use SHA, but update the
whole glibc is too risky. So I want to build a specific crypt lib for
the module using crypt function in my application.

Now the calling graph seems like this, from a simple view.

crypt_user.c  >  call crypt() function in libcrypt.so

myutil.c  -> call functions in crypt_user.c, and it will be build
to a shared lib, libmyutil.so

Executable program : server -> need to use libmyutil.so to work.

Now I can ONLY use the latest static lib  libcrypt.a from the new
glibc. How should I use it ? Link it when build libmyutil.so or build
server ?

I tried to update my Makefile to link the libcrypt.a, but seems that
the called crypt function is not as I expected, it's still from the old glibc.

Could anybody give me a help with how to use the gcc option and write
a working Makefile ?

Any reply will be appreciated!

Thanks,


Re: Cross Compiler Unix - Windows

2005-08-29 Thread Andy Smith
I have used MinGW on Linux to compile Windows
executables. I don't see why it could not be compiled
on other Unix variants. Try:

http://www.libsdl.org/extras/win32/cross/README.txt

and

http://www.mingw.org

Regards,
Andy

--- Ivan Novick <[EMAIL PROTECTED]> wrote:

> Can you recommend a solution for compiling Windows
> DLLs on any  
> variation of UNIX?
> 
> We currently do this with Cygwin/Windows, but would
> like to go one  
> step further and do the builds on a UNIX machine
> that produces  
> Windows DLLs.
> 
> Thanks for any advice,
> 
> Ivan
> 
> 


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


Re: Broken check rejecting -fcf-protection and -mindirect-branch=thunk-extern

2020-04-28 Thread Andy Lutomirski




> On Apr 28, 2020, at 9:14 AM, Peter Zijlstra  wrote:
> 
> On Tue, Apr 28, 2020 at 02:41:33PM +0100, Andrew Cooper wrote:
>> Its fine to focus on userspace first, but the kernel is far more simple.
>> 
>> Looking at that presentation, the only thing missing for kernel is the
>> notrack thunks, in the unlikely case that such code would be tolerated
>> (Frankly, I don't expect Xen or Linux to run with notrack enabled, as
>> there is no legacy code to be concerned with).
> 
> Uhhh.. ftrace and kretprobes play dodgy games with the
> return stack, doesn't that make the CET thing slightly more interesting?

It’s definitely interesting. But there isn’t legacy code involved — we can 
recompile and fix the world :)

Re: Broken check rejecting -fcf-protection and -mindirect-branch=thunk-extern

2020-04-28 Thread Andy Lutomirski



> On Apr 28, 2020, at 10:44 AM, H.J. Lu  wrote:
> 
> On Tue, Apr 28, 2020 at 10:24 AM David Woodhouse  wrote:
>> 
>> 
>> 
>>> On 28 April 2020 17:14:49 BST, Peter Zijlstra  wrote:
>>> On Tue, Apr 28, 2020 at 02:41:33PM +0100, Andrew Cooper wrote:
 Its fine to focus on userspace first, but the kernel is far more
>>> simple.
 
 Looking at that presentation, the only thing missing for kernel is
>>> the
 notrack thunks, in the unlikely case that such code would be
>>> tolerated
 (Frankly, I don't expect Xen or Linux to run with notrack enabled, as
 there is no legacy code to be concerned with).
>>> 
>>> Uhhh.. ftrace and kretprobes play dodgy games with the
>>> return stack, doesn't that make the CET thing slightly more
>>> interesting?
>> 
>> Sure, there is work to do to enable CET. But Andy's point is that we 
>> deliberately fixed up retpoline to be register-based *specifically* for the 
>> purpose of being CET-compatible, so it's somewhat daft for GCC to be 
>> claiming they are incompatible.
>> 
> 
> GCC needs to be told that external thunk is CET compatible.

If I write:

void foo(void);

...

foo();

And I compile this with CET enabled, GCC is perfectly willing to assume that 
foo is CET-compatible.  If I compile with stack alignment set unusually high, 
GCC is fine with assuming that foo will preserve the high alignment. If I 
compile with unusually low alignment, GCC is fine with assuming that foo will 
not crash as a result.  If I use -mregparm, gcc will happily use it.

So why is GCC unwilling to trust that, if I explicitly ask it to call an asm 
helper that I supply, that I supplied a valid helper?

What’s special about CRT? Do we need -fi-know-what-im-doing?  Do you have any 
actual reason to believe that there is even a single user of thunk-extent that 
might mess up?

> 
> -- 
> H.J.


gcc feature request / RFC: extra clobbered regs

2015-06-30 Thread Andy Lutomirski
Hi all-

I'm working on a massive set of cleanups to Linux's syscall handling.
We currently have a nasty optimization in which we don't save rbx,
rbp, r12, r13, r14, and r15 on x86_64 before calling C functions.
This works, but it makes the code a huge mess.  I'd rather save all
regs in asm and then call C code.

Unfortunately, this will add five cycles (on SNB) to one of the
hottest paths in the kernel.  To counteract it, I have a gcc feature
request that might not be all that crazy.  When writing C functions
intended to be called from asm, what if we could do:

__attribute__((extra_clobber("rbx", "rbp", "r12", "r13", "r14",
"r15"))) void func(void);

This will save enough pushes and pops that it could easily give us our
five cycles back and then some.  It's also easy to be compatible with
old GCC versions -- we could just omit the attribute, since preserving
a register is always safe.

Thoughts?  Is this totally crazy?  Is it easy to implement?

(I'm not necessarily suggesting that we do this for the syscall bodies
themselves.  I want to do it for the entry and exit helpers, so we'd
still lose the five cycles in the full fast-path case, but we'd do
better in the slower paths, and the slower paths are becoming
increasingly important in real workloads.)

Thanks,
Andy


Re: gcc feature request / RFC: extra clobbered regs

2015-06-30 Thread Andy Lutomirski
On Tue, Jun 30, 2015 at 2:41 PM, H. Peter Anvin  wrote:
> On 06/30/2015 02:37 PM, Jakub Jelinek wrote:
>> I'd say the most natural API for this would be to allow
>> f{fixed,call-{used,saved}}-REG in target attribute.
>
> Either that or
>
> __attribute__((fixed(rbp,rcx),used(rax,rbx),saved(r11)))
>
> ... just to be shorter.  Either way, I would consider this to be
> desirable -- I have myself used this to good effect in a past life
> (*cough* Transmeta *cough*) -- but not a high priority feature.

I think I mean the per-function equivalent of -fcall-used-reg, so
hpa's "used" suggestion would do the trick.

I guess that clobbering the frame pointer is a non-starter, but five
out of six isn't so bad.  It would be nice to error out instead of
producing "disastrous results", though, if another bad reg is chosen.
(Presumably the PIC register on PIC builds would be an example of
that.)

--Andy


Re: gcc feature request / RFC: extra clobbered regs

2015-06-30 Thread Andy Lutomirski
On Tue, Jun 30, 2015 at 2:52 PM, H. Peter Anvin  wrote:
> On 06/30/2015 02:48 PM, Andy Lutomirski wrote:
>> On Tue, Jun 30, 2015 at 2:41 PM, H. Peter Anvin  wrote:
>>> On 06/30/2015 02:37 PM, Jakub Jelinek wrote:
>>>> I'd say the most natural API for this would be to allow
>>>> f{fixed,call-{used,saved}}-REG in target attribute.
>>>
>>> Either that or
>>>
>>> __attribute__((fixed(rbp,rcx),used(rax,rbx),saved(r11)))
>>>
>>> ... just to be shorter.  Either way, I would consider this to be
>>> desirable -- I have myself used this to good effect in a past life
>>> (*cough* Transmeta *cough*) -- but not a high priority feature.
>>
>> I think I mean the per-function equivalent of -fcall-used-reg, so
>> hpa's "used" suggestion would do the trick.
>>
>> I guess that clobbering the frame pointer is a non-starter, but five
>> out of six isn't so bad.  It would be nice to error out instead of
>> producing "disastrous results", though, if another bad reg is chosen.
>> (Presumably the PIC register on PIC builds would be an example of
>> that.)
>>
>
> Clobbering the frame pointer is perfectly fine, as is the PIC register.
>  However, gcc might need to handle them as "fixed" rather than "clobbered".

Hmm.  True, I guess, although I wouldn't necessarily expect gcc to be
able to generate code to call a function like that.

--Andy


Re: gcc feature request / RFC: extra clobbered regs

2015-07-01 Thread Andy Lutomirski
On Wed, Jul 1, 2015 at 8:23 AM, Vladimir Makarov  wrote:
>
>
> On 06/30/2015 05:37 PM, Jakub Jelinek wrote:
>>
>> On Tue, Jun 30, 2015 at 02:22:33PM -0700, Andy Lutomirski wrote:
>>>
>>> I'm working on a massive set of cleanups to Linux's syscall handling.
>>> We currently have a nasty optimization in which we don't save rbx,
>>> rbp, r12, r13, r14, and r15 on x86_64 before calling C functions.
>>> This works, but it makes the code a huge mess.  I'd rather save all
>>> regs in asm and then call C code.
>>>
>>> Unfortunately, this will add five cycles (on SNB) to one of the
>>> hottest paths in the kernel.  To counteract it, I have a gcc feature
>>> request that might not be all that crazy.  When writing C functions
>>> intended to be called from asm, what if we could do:
>>>
>>> __attribute__((extra_clobber("rbx", "rbp", "r12", "r13", "r14",
>>> "r15"))) void func(void);
>>>
>>> This will save enough pushes and pops that it could easily give us our
>>> five cycles back and then some.  It's also easy to be compatible with
>>> old GCC versions -- we could just omit the attribute, since preserving
>>> a register is always safe.
>>>
>>> Thoughts?  Is this totally crazy?  Is it easy to implement?
>>>
>>> (I'm not necessarily suggesting that we do this for the syscall bodies
>>> themselves.  I want to do it for the entry and exit helpers, so we'd
>>> still lose the five cycles in the full fast-path case, but we'd do
>>> better in the slower paths, and the slower paths are becoming
>>> increasingly important in real workloads.)
>>
>> GCC already supports -ffixed-REG, -fcall-used-REG and -fcall-saved-REG
>> options, which allow to tweak the calling conventions; but it is per
>> translation unit right now.  It isn't clear which of these options
>> you mean with the extra_clobber.
>> I assume you are looking for a possibility to change this to be
>> per-function, with caller with a different calling convention having to
>> adjust for different ABI callee.  To some extent, recent GCC versions
>> do that automatically with -fipa-ra already - if some call used registers
>> are not clobbered by some call and the caller can analyze that callee,
>> it can stick values in such registers across the call.
>> I'd say the most natural API for this would be to allow
>> f{fixed,call-{used,saved}}-REG in target attribute.
>>
>>
> One consequence of frequent changing calling convention per function or
> register usage could be GCC slowdown.  RA calculates too many data and it
> requires a lot of time to recalculate them after something in the register
> usage convention is changed.

Do you mean that RA precalculates things based on the calling
convention and saves it across functions?  Hmm.  I don't think this
would be a big problem in my intended use case -- there would only be
a handful of functions using this extension, and they'd have very few
non-asm callers.

>
> Another consequence would be that RA fails generate the code in some cases
> and even worse the failure might depend on version of GCC (I already saw PRs
> where RA worked for an asm in one GCC version because a pseudo was changed
> by equivalent constant and failed in another GCC version where it did not
> happen).
>

Would this be a problem generating code for a function with extra
"used" regs or just a problem generating code to call such a function.
I imagine that, in the former case, RA's job would be easier, not
harder, since there would be more registers to work with.  In
practice, though, I think it would just end up changing the prologue
and epilogue.

--Andy


Re: gcc feature request / RFC: extra clobbered regs

2015-07-01 Thread Andy Lutomirski
On Wed, Jul 1, 2015 at 10:35 AM, Vladimir Makarov  wrote:
> Actually it raise a question for me.  If we describe that a function
> clobbers more than calling convention and then use it as a value (assigning
> a variable or passing as an argument) and loosing a track of it and than
> call it.  How can RA know what the call clobbers actually.  So for the
> function with the attributes we should prohibit use it as a value or make
> the attributes as a part of the function type, or at least say it is unsafe.

I think it should be part of the type.  This shouldn't compile:

void func(void) __attribute__((used_reg("r12")));
void (*x)(void);
x = func;

--Andy


Re: gcc feature request / RFC: extra clobbered regs

2015-07-01 Thread Andy Lutomirski
On Wed, Jul 1, 2015 at 10:43 AM, Jakub Jelinek  wrote:
> On Wed, Jul 01, 2015 at 01:35:16PM -0400, Vladimir Makarov wrote:
>> Actually it raise a question for me.  If we describe that a function
>> clobbers more than calling convention and then use it as a value (assigning
>> a variable or passing as an argument) and loosing a track of it and than
>> call it.  How can RA know what the call clobbers actually.  So for the
>> function with the attributes we should prohibit use it as a value or make
>> the attributes as a part of the function type, or at least say it is unsafe.
>> So now I see this as a *bigger problem* with this extension.  Although I
>> guess it already exists as we have description of different ABI as an
>> extension.
>
> Unfortunately target attribute is function decl attribute rather than
> function type.  And having more attributes affect switchable targets will be
> non-fun.

Just to make sure we're on the same page here, if I write:

extern void normal_func(void);

void weird_func(void) __attribute__((used_regs("r12")))
{
  // do something
  normal_func();
  // do something
}

I'd want the code that calls normal_func() to be understand that
normal_func() *will* preserve r12 despite the fact that weird_func is
allowed to clobber r12.  I think this means that the attribute would
have to be an attribute of a function, not of the RA while compiling
the function.

--Andy


RFC: adding Linux vsyscall-disable and similar backwards-incompatibility flags to ELF headers?

2015-09-01 Thread Andy Lutomirski
Hi all-

Linux has a handful of weird features that are only supported for
backwards compatibility.  The big one is the x86_64 vsyscall page, but
uselib probably belongs on the list, too, and we might end up with
more at some point.

I'd like to add a way that new programs can turn these features off.
In particular, I want the vsyscall page to be completely gone from the
perspective of any new enough program.  This is straightforward if we
add a system call to ask for the vsyscall page to be disabled, but I'm
wondering if we can come up with a non-syscall way to do it.

I think that the ideal behavior would be that anything linked against
a sufficiently new libc would be detected, but I don't see a good way
to do that using existing toolchain features.

Ideas?  We could add a new phdr for this, but then we'd need to play
linker script games, and I'm not sure that could be done in a clean,
extensible way.

--Andy


Re: RFC: adding Linux vsyscall-disable and similar backwards-incompatibility flags to ELF headers?

2015-09-01 Thread Andy Lutomirski
On Sep 1, 2015 6:53 PM, "Brian Gerst"  wrote:
>
> On Tue, Sep 1, 2015 at 8:51 PM, Andy Lutomirski  wrote:
> > Hi all-
> >
> > Linux has a handful of weird features that are only supported for
> > backwards compatibility.  The big one is the x86_64 vsyscall page, but
> > uselib probably belongs on the list, too, and we might end up with
> > more at some point.
> >
> > I'd like to add a way that new programs can turn these features off.
> > In particular, I want the vsyscall page to be completely gone from the
> > perspective of any new enough program.  This is straightforward if we
> > add a system call to ask for the vsyscall page to be disabled, but I'm
> > wondering if we can come up with a non-syscall way to do it.
> >
> > I think that the ideal behavior would be that anything linked against
> > a sufficiently new libc would be detected, but I don't see a good way
> > to do that using existing toolchain features.
> >
> > Ideas?  We could add a new phdr for this, but then we'd need to play
> > linker script games, and I'm not sure that could be done in a clean,
> > extensible way.
>
>
> The vsyscall page is mapped in the fixmap region, which is shared
> between all processes.  You can't turn it off for an individual
> process.

Why not?

We already emulate all attempts to execute it, and that's trivial to
turn of per process.  Project Zero pointed out that read access is a
problem, too, but we can flip the U/S bit in the pgd once we evict
pvclock from the fixmap.

And we definitely need to evict pvclock from the fixmap regardless.

--Andy


Re: RFC: adding Linux vsyscall-disable and similar backwards-incompatibility flags to ELF headers?

2015-09-01 Thread Andy Lutomirski
On Sep 1, 2015 6:12 PM, "Ian Lance Taylor"  wrote:
>
> On Tue, Sep 1, 2015 at 5:51 PM, Andy Lutomirski  wrote:
> >
> > Linux has a handful of weird features that are only supported for
> > backwards compatibility.  The big one is the x86_64 vsyscall page, but
> > uselib probably belongs on the list, too, and we might end up with
> > more at some point.
> >
> > I'd like to add a way that new programs can turn these features off.
> > In particular, I want the vsyscall page to be completely gone from the
> > perspective of any new enough program.  This is straightforward if we
> > add a system call to ask for the vsyscall page to be disabled, but I'm
> > wondering if we can come up with a non-syscall way to do it.
> >
> > I think that the ideal behavior would be that anything linked against
> > a sufficiently new libc would be detected, but I don't see a good way
> > to do that using existing toolchain features.
> >
> > Ideas?  We could add a new phdr for this, but then we'd need to play
> > linker script games, and I'm not sure that could be done in a clean,
> > extensible way.
>
> What sets up the vsyscall page, and what information does it have
> before doing so?
>
> I'm guessing it's the kernel that sets it up, and that all it can see
> at that point is the program headers.

Currently it's global and nothing thinks about it per-process at all.
The kernel can do whatever it likes going forward, subject to
backwards compatibility.  Doing something at ELF load time is probably
the right approach.

>
> We could pass information using an appropriate note section.  My
> recollection is that the linkers will turn an SHF_ALLOC note section
> into a PT_NOTE program header.

Oh, interesting.  I'll check that.  Glibc and competitors could add
notes to their statically-linked bits.

The unpleasant case is a new dynamic binary linked against an old
libc, but that might be irrelevant in practice.  After all, I think
that a lot of libc competitors never supported the vsyscall page at
all, and even glibc isn't really backwards compatible that way.

We could also require that both the binary and interpreter have the
note, which would more or less solve the backwards compatibility
issue.

--Andy


Re: [musl] RFC: adding Linux vsyscall-disable and similar backwards-incompatibility flags to ELF headers?

2015-09-01 Thread Andy Lutomirski
On Tue, Sep 1, 2015 at 7:54 PM, Rich Felker  wrote:
> On Tue, Sep 01, 2015 at 05:51:44PM -0700, Andy Lutomirski wrote:
>> Hi all-
>>
>> Linux has a handful of weird features that are only supported for
>> backwards compatibility.  The big one is the x86_64 vsyscall page, but
>> uselib probably belongs on the list, too, and we might end up with
>> more at some point.
>>
>> I'd like to add a way that new programs can turn these features off.
>> In particular, I want the vsyscall page to be completely gone from the
>> perspective of any new enough program.  This is straightforward if we
>> add a system call to ask for the vsyscall page to be disabled, but I'm
>> wondering if we can come up with a non-syscall way to do it.
>>
>> I think that the ideal behavior would be that anything linked against
>> a sufficiently new libc would be detected, but I don't see a good way
>> to do that using existing toolchain features.
>>
>> Ideas?  We could add a new phdr for this, but then we'd need to play
>> linker script games, and I'm not sure that could be done in a clean,
>> extensible way.
>
> Is there a practical problem you're trying to solve? My understanding
> is that the vsyscall nonsense is fully emulated now and that the ways
> it could be used as an attack vector have been mitigated.

They've been mostly mitigated, but not fully.  See:

http://googleprojectzero.blogspot.com/2015/08/three-bypasses-and-fix-for-one-of.html

I'm also waiting for someone to find an exploit that uses one of the
vsyscalls as a ROP gadget.

>
> If this is not the case, I have what sounds like an elegant solution,
> if it works: presumably affected versions of glibc that used this used
> it for all syscalls, so if the process has made any normal syscalls
> before using the vsyscall addresses, you can assume it's a bug/attack
> and and just raise SIGSEGV. If there are corner cases this doesn't
> cover, maybe the approach can still be adapted to work; it's cleaner
> than introducing header cruft, IMO.

Unfortunately, I don't think this will work.  It's never been possible
to use the vsyscalls for anything other than gettimeofday, time, or
getcpu, so I doubt we can detect affected glibc versions that way.

--Andy


Re: [musl] RFC: adding Linux vsyscall-disable and similar backwards-incompatibility flags to ELF headers?

2015-09-01 Thread Andy Lutomirski
On Tue, Sep 1, 2015 at 9:18 PM, Rich Felker  wrote:
> On Tue, Sep 01, 2015 at 08:39:27PM -0700, Andy Lutomirski wrote:
>> On Tue, Sep 1, 2015 at 7:54 PM, Rich Felker  wrote:
>> > On Tue, Sep 01, 2015 at 05:51:44PM -0700, Andy Lutomirski wrote:
>> >> Hi all-
>> >>
>> >> Linux has a handful of weird features that are only supported for
>> >> backwards compatibility.  The big one is the x86_64 vsyscall page, but
>> >> uselib probably belongs on the list, too, and we might end up with
>> >> more at some point.
>> >>
>> >> I'd like to add a way that new programs can turn these features off.
>> >> In particular, I want the vsyscall page to be completely gone from the
>> >> perspective of any new enough program.  This is straightforward if we
>> >> add a system call to ask for the vsyscall page to be disabled, but I'm
>> >> wondering if we can come up with a non-syscall way to do it.
>> >>
>> >> I think that the ideal behavior would be that anything linked against
>> >> a sufficiently new libc would be detected, but I don't see a good way
>> >> to do that using existing toolchain features.
>> >>
>> >> Ideas?  We could add a new phdr for this, but then we'd need to play
>> >> linker script games, and I'm not sure that could be done in a clean,
>> >> extensible way.
>> >
>> > Is there a practical problem you're trying to solve? My understanding
>> > is that the vsyscall nonsense is fully emulated now and that the ways
>> > it could be used as an attack vector have been mitigated.
>>
>> They've been mostly mitigated, but not fully.  See:
>>
>> http://googleprojectzero.blogspot.com/2015/08/three-bypasses-and-fix-for-one-of.html
>
> That looks like it would be mitigated by not having any mapping there
> at all and having the kernel just catch the page fault and emulate
> rather than filling it with trapping opcodes for the kernel to catch.
>

Oddly, that causes a compatibility problem.  There's a program called
pin that does dynamic instrumentation and actually expects to be able
to read the targets of calls.  The way that Linux handles this now is
to put a literal mov $NR, %rax; syscall; ret sequence at the syscall
address but to mark the whole page NX so that any attempt to call it
traps.  The trap gets fixed up if the call looks valid (properly
aligned, etc) and the process gets SIGSEGV if not.

This caught me by surprise when I implemented vsyscall emulation the first time.

>> I'm also waiting for someone to find an exploit that uses one of the
>> vsyscalls as a ROP gadget.
>
> This sounds more plausible. gettimeofday actually writes to memory
> pointed to by its arguments. The others look benign.
>
>> > If this is not the case, I have what sounds like an elegant solution,
>> > if it works: presumably affected versions of glibc that used this used
>> > it for all syscalls, so if the process has made any normal syscalls
>> > before using the vsyscall addresses, you can assume it's a bug/attack
>> > and and just raise SIGSEGV. If there are corner cases this doesn't
>> > cover, maybe the approach can still be adapted to work; it's cleaner
>> > than introducing header cruft, IMO.
>>
>> Unfortunately, I don't think this will work.  It's never been possible
>> to use the vsyscalls for anything other than gettimeofday, time, or
>> getcpu, so I doubt we can detect affected glibc versions that way.
>
> I thought the idea of the old vsyscall was that you always call it
> rather than using a syscall instruction and it decides whether it can
> do it in userspace or needs to make a real syscall. But if it was only
> called from certain places, then yes, I think you're right that my
> approach doesn't work.

No, it's actually just three separate functions, one for each of
gettimeofday, time, and getcpu.

--Andy


Re: [musl] RFC: adding Linux vsyscall-disable and similar backwards-incompatibility flags to ELF headers?

2015-09-01 Thread Andy Lutomirski
On Tue, Sep 1, 2015 at 9:55 PM, Rich Felker  wrote:
> On Tue, Sep 01, 2015 at 09:32:22PM -0700, Andy Lutomirski wrote:
>> On Tue, Sep 1, 2015 at 9:18 PM, Rich Felker  wrote:
>> > On Tue, Sep 01, 2015 at 08:39:27PM -0700, Andy Lutomirski wrote:
>> >> On Tue, Sep 1, 2015 at 7:54 PM, Rich Felker  wrote:
>> >> > On Tue, Sep 01, 2015 at 05:51:44PM -0700, Andy Lutomirski wrote:
>> >> >> Hi all-
>> >> >>
>> >> >> Linux has a handful of weird features that are only supported for
>> >> >> backwards compatibility.  The big one is the x86_64 vsyscall page, but
>> >> >> uselib probably belongs on the list, too, and we might end up with
>> >> >> more at some point.
>> >> >>
>> >> >> I'd like to add a way that new programs can turn these features off.
>> >> >> In particular, I want the vsyscall page to be completely gone from the
>> >> >> perspective of any new enough program.  This is straightforward if we
>> >> >> add a system call to ask for the vsyscall page to be disabled, but I'm
>> >> >> wondering if we can come up with a non-syscall way to do it.
>> >> >>
>> >> >> I think that the ideal behavior would be that anything linked against
>> >> >> a sufficiently new libc would be detected, but I don't see a good way
>> >> >> to do that using existing toolchain features.
>> >> >>
>> >> >> Ideas?  We could add a new phdr for this, but then we'd need to play
>> >> >> linker script games, and I'm not sure that could be done in a clean,
>> >> >> extensible way.
>> >> >
>> >> > Is there a practical problem you're trying to solve? My understanding
>> >> > is that the vsyscall nonsense is fully emulated now and that the ways
>> >> > it could be used as an attack vector have been mitigated.
>> >>
>> >> They've been mostly mitigated, but not fully.  See:
>> >>
>> >> http://googleprojectzero.blogspot.com/2015/08/three-bypasses-and-fix-for-one-of.html
>> >
>> > That looks like it would be mitigated by not having any mapping there
>> > at all and having the kernel just catch the page fault and emulate
>> > rather than filling it with trapping opcodes for the kernel to catch.
>> >
>>
>> Oddly, that causes a compatibility problem.  There's a program called
>> pin that does dynamic instrumentation and actually expects to be able
>> to read the targets of calls.  The way that Linux handles this now is
>
> Um, do people seriously need to do this dynamic instrumentation on
> ancient obsolete binaries? This sounds to me like confused
> requirements.

Unclear.  They certainly did, and I got a bug report, the first time
around.  That was a couple years ago.

I suppose we could have a sysctl that you need to set to enable that
use case.  OTOH, I think that, as long as we have a way to distinguish
new and old binaries, it's not that much harder to twiddle vsyscall
readability per process than it is to twiddle vsyscall executability
per process.

--Andy


Re: RFC: adding Linux vsyscall-disable and similar backwards-incompatibility flags to ELF headers?

2015-09-02 Thread Andy Lutomirski
On Sep 2, 2015 6:57 AM, "Brian Gerst"  wrote:
>
> On Tue, Sep 1, 2015 at 10:21 PM, Andy Lutomirski  wrote:
> > On Sep 1, 2015 6:53 PM, "Brian Gerst"  wrote:
> >>
> >> On Tue, Sep 1, 2015 at 8:51 PM, Andy Lutomirski  
> >> wrote:
> >> > Hi all-
> >> >
> >> > Linux has a handful of weird features that are only supported for
> >> > backwards compatibility.  The big one is the x86_64 vsyscall page, but
> >> > uselib probably belongs on the list, too, and we might end up with
> >> > more at some point.
> >> >
> >> > I'd like to add a way that new programs can turn these features off.
> >> > In particular, I want the vsyscall page to be completely gone from the
> >> > perspective of any new enough program.  This is straightforward if we
> >> > add a system call to ask for the vsyscall page to be disabled, but I'm
> >> > wondering if we can come up with a non-syscall way to do it.
> >> >
> >> > I think that the ideal behavior would be that anything linked against
> >> > a sufficiently new libc would be detected, but I don't see a good way
> >> > to do that using existing toolchain features.
> >> >
> >> > Ideas?  We could add a new phdr for this, but then we'd need to play
> >> > linker script games, and I'm not sure that could be done in a clean,
> >> > extensible way.
> >>
> >>
> >> The vsyscall page is mapped in the fixmap region, which is shared
> >> between all processes.  You can't turn it off for an individual
> >> process.
> >
> > Why not?
> >
> > We already emulate all attempts to execute it, and that's trivial to
> > turn of per process.  Project Zero pointed out that read access is a
> > problem, too, but we can flip the U/S bit in the pgd once we evict
> > pvclock from the fixmap.
> >
> > And we definitely need to evict pvclock from the fixmap regardless.
>
>
> Sure, you can turn off emulation per-process.  But the page mapping
> will be the same for every process because it is in the kernel part of
> the page tables which is shared by all processes.

True, but I don't think that means that the mapping has to be readable
in all processes.  Once it's the only user-readable mapping in the top
512 GB, we can turn off user access to the whole top 512 GB.

The only other user accessible thing in the top 512GB (and the only
other user accessible thing in a kernel address at all) is the KVM
pvclock mapping.  We should turn that off, too, because it's
exploitable in more or less the same way as the vsyscall page.

--Andy


Quadmath

2018-09-27 Thread andy hall
Dear GCC developers,

I would just like to say a massive thanks for the work that led to the 
development of the Quadmath libraries. I have been doing  my own research 
project with Matlab and last Christmas I ran out of numbers. I have developed a 
set of formulae that relate Planck’s constant, the speed of light and the 
fundamental constants of electromagnetism, Z0, epsilon0 and mew0 but I was 
stuck with only 1 significant figures. As a result of installing gcc on my 
Cygwin installation and finding the quadmath libraries I have been able to 
prove agreement of my formulae in terms of reciprocal self-consistency to 93 
significant digits. I was previously stuck with double precision based on 32 
bit assumed architecture that gave me 15 decimal digits. I know have quadmath 
on my 64 bit machine giving me way more than I ever expected and the numbers 
seems to stack up. I hope I will be able to either publish something in the 
future, or make some money such that I can make a donation. In any event it has 
been a lot of fun. Once again, thank you.

Andy

Sent from Mail for Windows 10



Re: should sync builtins be full optimization barriers?

2011-09-12 Thread Andy Lutomirski
On 09/12/2011 05:30 PM, Ken Raeburn wrote:
> On Sep 12, 2011, at 19:19, Andrew MacLeod wrote:
>> lets say the order of the writes turns out to be  2,4...  is it possible for 
>> both writes to be travelling around some bus and have thread 4 actually read 
>> the second one first, followed by the first one?   It would imply a lack of 
>> memory coherency in the system wouldn't it? My simple understanding is that 
>> the hardware gives us this sort of minimum guarantee on all shared memory. 
>> which means we should never see that happen.
> 
> According to section 8.2.3.5 "Intra-Processor Forwarding Is Allowed" of 
> "Intel 64 and IA-32 Architectures Software Developer's Manual" volume 3A, 
> December 2009, a processor can see its own store happening before another's, 
> though the example works on two different memory locations.  If at least one 
> of the threads reading the values was on the same processor as one of the 
> writing threads, perhaps it could see the locally-issued store first, unless 
> thread-switching is presumed to include a memory fence.  Consistency of order 
> is guaranteed *from the point of view of other processors* (8.2.3.7), which 
> is not necessarily the case here.  A total order across all processors is 
> imposed for locked instructions (8.2.3.8), but I'm not sure whether their use 
> is assumed here.  I'm still reading up on caching protocols, write-back 
> memory, etc.  Still not sure either way whether the original example can 
> work...

Presumably any sensible operating system insert a fence whenever it
switches between threads to prevent exactly this issue.  Otherwise it
could be nearly impossible to write correct code.

(TBH, it was never entirely clear to me that mfence is guaranteed to
flush the store buffer and force everything to be re-read from the
coherency domain, but if that's not true then it's pretty much
impossible to get this right.)

--Andy


How to handle address which contain SUBREG

2009-11-25 Thread Andy H

I'm am fixing some reload bugs for AVR.

In a couple of situations an address is formed which included  a SUBREG 
expression.
I am not sure how I should be handling these. Initial attempts produce 
sub-optimal code - and/or reload failures, so thought is a good idea to 
get some advise!


Either

(Subreg (Rx,0))

or

PLUS ((Subreg (Rx,0)) 5)

If I reject subreg expression, the code produced is sub-optimal.(since 
address is then calculated into register)


For the non-strict case I believe the address should be accepted as 
legitimate - on the basis that a SUBREG of psuedo  is just as valid as REG.


For the strict case (hard register) - I could either accept if the 
simplified form is valid - or reject and then make Legitimize simplify 
the SUBREG expression. (I guess I might also have to handle it in 
legitimize_reload_address.)


What is the right way?







Cygwin support

2008-11-14 Thread Andy Scott
Hi All

Looking over the bugzilla data base and archives of this (and other)
lists I was wondering about the level of support there is for GCC on
Cygwin. (I realise that it is weird half-way house to many people and
so does get a fair amount of "abuse" from both the Windoze &
Linux/Un*x purist camps but I like it :-) )

Reason, I ask is I would like to start to contribute and for me Cygwin
is the easiest target for me. But looking over things I'm not sure it
would be the best place to start to help things out.

Andy
-- 
Brain upgrade required: a working hypothalamus


Re: Cygwin support

2008-11-14 Thread Andy Scott
On 14/11/2008, Brian Dessent <[EMAIL PROTECTED]> wrote:
> Andy Scott wrote:
>
>  > Looking over the bugzilla data base and archives of this (and other)
>  > lists I was wondering about the level of support there is for GCC on
>  > Cygwin. (I realise that it is weird half-way house to many people and
>  > so does get a fair amount of "abuse" from both the Windoze &
>  > Linux/Un*x purist camps but I like it :-) )
>
>
> Cygwin has been a secondary target for a number of years.  MinGW has
>  been a secondary target since 4.3.  This generally means that they
>  should be in fairly good shape, more or less.  To quote the docs:
>
>  > Our release criteria for the secondary platforms is:
>  >
>  > * The compiler bootstraps successfully, and the C++ runtime library 
> builds.
>  > * The DejaGNU testsuite has been run, and a substantial majority of 
> the tests pass.
>

> Well, you can certainly use Cygwin as a base for contributing, however,
>  unless you are doing target-specific work[1] it doesn't make a lot of
>  sense to do so.  Running the dejagnu testsuite on Cygwin is
>  excruciatingly slow due to the penalty incurred from emulating fork.
>  Even with the overhead of vmware/colinux/virtualbox you're probably
>  looking at a reduction from 20-30 hours down to several hours for a full
>  testsuite run on an virtualized linux image compared to a native run
>  (depending on which languages are enabled.)
>
>  Brian
>
>  [1] And of course, don't get me wrong, that would be fantastic, as these
>  targets need all the TLC they can get.
>

Thanks for the information - and the heads up on the testsuite running times :-)

I tend to use weird and whacky versions of GCC for my work on embedded
devices so helping maintain it for another semi-weird platform will
stand me in good stead :-D

Andy
-- 
Brain upgrade required: a working hypothalamus


This is a Cygwin failure yeah?

2009-01-07 Thread Andy Scott
Got to building the latest stuff on Cygwin - I modiifed the autoconfig
script to get around some issues relating to 'ln -s' - and I then
started the build.

Got some errors, one I think is a Cygwin issue (but wanting that final
1% assurance) other I am pretty sure is a build/setup issue:

source:
 latest as of yesterday morning 10:00 GMT

config command:
   $ ../gcc/configure --enable-languages=c,c++ --enable-nls
--enable-threads=posix

build command:
   $ make >&../build_log.txt

Errors:

Cygwin one:

When it gets to stage 3 (after many hours) I get the following printed
out to the console (not redirected) -

217 [unknown (0x1B0)] conftest 3408 _cygtls::handle_exceptions: Error
while dumping state (probably corrupted stack)

By the looks of this I wold say that some part of the Cygwin runtime
has failed. I've not seen this one in Cygwin at any other time than
building GCC which leads me to assume (which is dangerous I realise)
that there is an issue with my version and how GCC builds. Placing the
"blame" on the Cygwin runtime.

Is this a correct assumption can anyone tell me? [obviously if it is a
Cygwin issue then I'll track it down a bit more before posting on
their forums]

GCC Build One:

Again stage3 part of build, and this is what actually stops the build
the above issue doesn't seem to (I think it happens in stage 2), I get
the following:



  /home/andy/live-gcc/my_gcc/./gcc/xgcc
-B/home/andy/live-gcc/my_gcc/./gcc/ -B/usr/local/i686-pc-cygwin/bin/
-B/usr/local/i686-pc-cygwin/lib/ -isystem
/usr/local/i686-pc-cygwin/include -isystem
/usr/local/i686-pc-cygwin/sys-include -c -DHAVE_CONFIG_H -g -O2-I.
-I../../../gcc/libiberty/../include  -W -Wall -Wwrite-strings
-Wc++-compat -Wstrict-prototypes -pedantic
../../../gcc/libiberty/strsignal.c -o pic/strsignal.o; \
else true; fi
/home/andy/live-gcc/my_gcc/./gcc/xgcc
-B/home/andy/live-gcc/my_gcc/./gcc/ -B/usr/local/i686-pc-cygwin/bin/
-B/usr/local/i686-pc-cygwin/lib/ -isystem
/usr/local/i686-pc-cygwin/include -isystem
/usr/local/i686-pc-cygwin/sys-include -c -DHAVE_CONFIG_H -g -O2-I.
-I../../../gcc/libiberty/../include  -W -Wall -Wwrite-strings
-Wc++-compat -Wstrict-prototypes -pedantic
../../../gcc/libiberty/strsignal.c -o strsignal.o
../../../gcc/libiberty/strsignal.c:408: error: conflicting types for 'strsignal'
/usr/include/string.h:78: note: previous declaration of 'strsignal' was here
make[2]: *** [strsignal.o] Error 1
make[2]: Leaving directory `/home/andy/live-gcc/my_gcc/i686-pc-cygwin/libiberty'
make[1]: *** [all-target-libiberty] Error 2
make[1]: Leaving directory `/home/andy/live-gcc/my_gcc'
make: *** [all] Error 2

Which seems like a possible setup/build issue. If this is so anyone
seen it before and any helpful hints on how to get rid of it?

Thanks.

Andy
-- 
Brain upgrade required: a working hypothalamus


Re: This is a Cygwin failure yeah?

2009-01-08 Thread Andy Scott
On 07/01/2009, Dave Korn  wrote:
> Andy Scott wrote:
>  > GCC Build One:
>  >
>  > Again stage3 part of build, and this is what actually stops the build
>  > the above issue doesn't seem to (I think it happens in stage 2),
>
>
>   That sentence contradicts itself and what you said earlier, doesn't it?
>
If it does it wasn't supposed to :-) As far as I can tell the stack
overflow happens in stage2 - the actual error that stops the build is
in stage 3. Now it is possible that the stack overflow issue causes a
subtle error in code generation that i snot picked up until stage 3
and the failure I cite later on.

re:strsignal - ah right I will take a look and see at that all then.

Bernd>thanks for the information regarding stack issues on Cygwin.
Will look at them also.



Andy
-- 
Brain upgrade required: a working hypothalamus


Re: This is a Cygwin failure yeah?

2009-01-14 Thread Andy Scott
On 10/01/2009, Bernd Roesch  wrote:
> Hello Dave

>  >> Unix commad for stack increase(forget the name)
>  >
>  >  'ulimit'
>
>
> ah yes i see, I update from time and time and now its more.my bash show this
>  now.Maybe Andy can do this test what his bash show.
>
>  $ ulimit -a
>  core file size  (blocks, -c) unlimited
>  data seg size   (kbytes, -d) unlimited
>  file size   (blocks, -f) unlimited
>  open files  (-n) 256
>  pipe size(512 bytes, -p) 8
>  stack size  (kbytes, -s) 2033
>  cpu time   (seconds, -t) unlimited
>  max user processes  (-u) 63
>  virtual memory  (kbytes, -v) 2097152
>
>

For me I get:

$ ulimit -a
core file size  (blocks, -c) unlimited
data seg size   (kbytes, -d) unlimited
file size   (blocks, -f) unlimited
open files  (-n) 256
pipe size(512 bytes, -p) 8
stack size  (kbytes, -s) 2033
cpu time   (seconds, -t) unlimited
max user processes  (-u) 63
virtual memory  (kbytes, -v) 2097152



Thanks for the replies and sorry for my late answer - away from
keyboard  :-) I'm picking this back up now though.

Andy
-- 
Brain upgrade required: a working hypothalamus


Re: This is a Cygwin failure yeah?

2009-01-21 Thread Andy Scott
2009/1/18 Dave Korn :
> Andy Scott wrote:
>
>> Again stage3 part of build, and this is what actually stops the build
>> the above issue doesn't seem to (I think it happens in stage 2), I get
>> the following:
>>
>> 
>
>  < a few more lines of log deleted :) >
>
>> ../../../gcc/libiberty/strsignal.c -o strsignal.o
>> ../../../gcc/libiberty/strsignal.c:408: error: conflicting types for 
>> 'strsignal'
>> /usr/include/string.h:78: note: previous declaration of 'strsignal' was here
>> make[2]: *** [strsignal.o] Error 1
>> make[2]: Leaving directory 
>> `/home/andy/live-gcc/my_gcc/i686-pc-cygwin/libiberty'
>> make[1]: *** [all-target-libiberty] Error 2
>> make[1]: Leaving directory `/home/andy/live-gcc/my_gcc'
>> make: *** [all] Error 2
>
>Hi Andy,
>
>  I created a bugzilla entry for the failure:
>
>  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38903
>
>  I've applied a patch to GCC SVN HEAD to fix the strsignal bug (r.143487),
> and would appreciate if you could verify that it solves the build
> failure for you.
>
>thanks,
>  DaveK
>

Dave

Cheers for that. I will do when I get back to my machine tomorrow;
been laid up since Sunday with the flu so only just seen this, so
apologies for the late reply.

Andy
-- 
Brain upgrade required: a working hypothalamus


Re: This is a Cygwin failure yeah?

2009-01-26 Thread Andy Scott
On 18/01/2009, Dave Korn  wrote:
> Andy Scott wrote:
>
>  > Again stage3 part of build, and this is what actually stops the build
>  > the above issue doesn't seem to (I think it happens in stage 2), I get
>  > the following:
>  >
>  > 
>
>
>  < a few more lines of log deleted :) >
>
>
>  > ../../../gcc/libiberty/strsignal.c -o strsignal.o
>  > ../../../gcc/libiberty/strsignal.c:408: error: conflicting types for 
> 'strsignal'
>  > /usr/include/string.h:78: note: previous declaration of 'strsignal' was 
> here
>  > make[2]: *** [strsignal.o] Error 1
>  > make[2]: Leaving directory 
> `/home/andy/live-gcc/my_gcc/i686-pc-cygwin/libiberty'
>  > make[1]: *** [all-target-libiberty] Error 2
>  > make[1]: Leaving directory `/home/andy/live-gcc/my_gcc'
>  > make: *** [all] Error 2
>
>
> Hi Andy,
>
>   I created a bugzilla entry for the failure:
>
>   http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38903
>
>   I've applied a patch to GCC SVN HEAD to fix the strsignal bug (r.143487),
>  and would appreciate if you could verify that it solves the build
>  failure for you.
>
> thanks,
>   DaveK
>

The two test machines I set going with this both managed succesful
builds. (Even if it did take them 5hrs+ to do it :s )

Thanks again for your effort on this.

Now I'll try to get the full test suite running for them now.

Andy
-- 
Brain upgrade required: a working hypothalamus


Re: Serious code generation/optimisation bug (I think)

2009-01-30 Thread Andy Armstrong

On 30 Jan 2009, at 05:11, Ross Smith wrote:

Zoltán Kócsi wrote:

On Thu, 29 Jan 2009 08:53:10 +
Andrew Haley  wrote:

We're talking about gcc on ARM.  gcc on ARM uses 0 for the null
pointer constant, therefore a linker cannot place an object at
address zero. All the rest is irrelevant.

Um, the linker *must* place the vector table at address zero, because
the ARM, at least the ARM7TDMI fetches all exception vectors from
there. Dictated by the HW, not the compiler.


This sounds like a genuine bug in gcc, then. As far as I can see,  
Andrew is right -- if the ARM hardware requires a legitimate object  
to be placed at address zero, then a standard C compiler has to use  
some other value for the null pointer.



The ARM exception table looks like this:

0x Reset
0x0004 Undefined instruction
0x0008 Software interrupt
0x000C Prefetch Abort
0x0010 Data Abort
0x0014 Reserved
0x0018 IRQ
0x001C FIQ

so only the reset vector is at 0.

--
Andy Armstrong, Hexten





Does backend need to worry about overlap?

2005-02-21 Thread Andy Hutchinson
If I have RTL pattern such as:
(SET (MEM...) (MEM...))
(define_insn in backend target.md)
do I need to guard against the possibility that the two operands 
overlap? Or does the front/middle end take care of any C/C++ language 
specific needs here? (perhaps by using a register as an intermediate)

Thank you



[RFC / musing] Scoped exception handling in Linux userspace?

2013-07-18 Thread Andy Lutomirski
Windows has a feature that I've wanted on Linux forever: stack-based
(i.e. scoped) exception handling.  The upshot is that you can do,
roughly, this (pseudocode):

int callback(...)
{
  /* Called if code_that_may_fault faults.  May return "unwind to
landing pad", "propagate the fault", or "fixup and retry" */
}

void my_function()
{
  __hideous_try_thing(callback) {
code_that_may_fault();
  } blahblahblah {
landing_pad_code();
  }
}

Windows calls it SEH (structured exception handling), and the
implementation on 32-bit Windows is rather gnarly.  I don't really
know how it works on 64-bit windows, but I think it's saner.

This has two really nice properties:

1. It works in libraries!

2. It's localized.  So you can mmap something, read from it *and
handle SIGBUS*, and unmap.

Could Linux support such a thing?  Here's a sketch of a way:

 - The kernel would need to have a fairly well-defined concept of
synchronous faults that can be handled with this mechanism.  Calls to
force_sig_info are probably the right thing to hook in to.

 - The userspace runtime optionally registers (via a new syscall or
prctl, say) a handler for synchronous faults.

 - When a synchronous fault happens, if the process (struct
sighand_struct) has a synchronous fault handler registered, the signal
is delivered to that handler, on the thread that faulted, instead of
via the normal signal handling mechanism.

 - The userspace runtime walks the chain of personality handlers and
gives them a chance to respond.

 - If no handler claims the fault, then the user code somehow* causes
ordinary signal delivery to happen.

* This may need kernel help, too -- if the process is going to die, it
should die for the right reason, so perhaps there should be a syscall
to redeliver the signal.  If the runtime wants to be fancy and a
signal handler is installed, then there could be a fast path.  Maybe
if we got really fancy, it could live in the vdso.

Now everyone wins!  After someone writes the libgcc support for this
(ugh!), then you can write CFI-based exception handlers in assembly!
Presumably you could write them in C++, too, if you don't care about
restarting, like this:

try {
   code_that_may_fault();
} catch (cxxabi::synchronous_kernel_fault &) {
   amazingly_dont_crash();
}

Is this worth persuing?  I'm not touching the gcc part with a ten-foot
pole, but I could probably do some of the kernel work.  I'm a bit
scared of libgcc, too.

It's worth noting that SIGBUS isn't the only interesting signal here.
SIGFPE could work, too.  I'm not sure whether SIGPIPE would make
sense.  SIGSEGV would clearly work, but anyone using this mechanism
for SIGSEGV is probably asking for trouble.


--Andy

P.S.  Just because you can probably get away with throwing a C++
exception from a signal handler right now does not mean it's a good
idea.  Especially in a library.


Re: [RFC / musing] Scoped exception handling in Linux userspace?

2013-07-18 Thread Andy Lutomirski
On Thu, Jul 18, 2013 at 5:40 PM, David Daney  wrote:
> On 07/18/2013 05:26 PM, Andy Lutomirski wrote:
>>
>> Windows has a feature that I've wanted on Linux forever: stack-based
>> (i.e. scoped) exception handling.  The upshot is that you can do,
>> roughly, this (pseudocode):
>>
>> int callback(...)
>> {
>>/* Called if code_that_may_fault faults.  May return "unwind to
>> landing pad", "propagate the fault", or "fixup and retry" */
>> }
>>
>> void my_function()
>> {
>>__hideous_try_thing(callback) {
>>  code_that_may_fault();
>>} blahblahblah {
>>  landing_pad_code();
>>}
>> }
>
>
> How is this different than throwing exceptions from a signal handler?

Two ways.  First, exceptions thrown from a signal handler can't be
retries.  Second, and more importantly, installing a signal handler in
a library is a terrible idea.

--Andy


Re: [RFC / musing] Scoped exception handling in Linux userspace?

2013-07-18 Thread Andy Lutomirski
On Thu, Jul 18, 2013 at 6:17 PM, David Daney  wrote:
> On 07/18/2013 05:50 PM, Andy Lutomirski wrote:
>>
>> On Thu, Jul 18, 2013 at 5:40 PM, David Daney 
>> wrote:
>>>
>>> On 07/18/2013 05:26 PM, Andy Lutomirski wrote:
>>>
>>>
>>> How is this different than throwing exceptions from a signal handler?
>>
>>
>> Two ways.  First, exceptions thrown from a signal handler can't be
>> retries.
>
>
> ??

s/retries/retried, by which I mean that you can't do things like
implementing virtual memory in userspace by catching SIGSEGV, calling
mmap, and resuming.

>
>
>> Second, and more importantly, installing a signal handler in
>> a library is a terrible idea.
>
>
> The signal handler would be installed by main() before calling into the
> library.  You have to have a small amount of boiler plate code to set it up,
> but the libraries wouldn't have to be modified if they were already
> exception safe.
>
> FWIW the libgcj java runtime environment uses this strategy for handling
> NullPointerExceptions and DivideByZeroError(sp?).  Since all that code for
> the most part follows the standard C++ ABIs, it is an example of this
> technique that has been deployed in many environments.

Other way around: a *library* that wants to use exception handling
can't do so safely without the cooperation, or at least understanding,
of the main program and every other library that wants to do something
similar.  Suppose my library installs a SIGFPE handler and throws
my_sigfpe_exception and your library installs a SIGFPE handler and
throws your_sigfpe_exception.  The result: one wins and the other
crashes due to an unhandled exception.

In my particular usecase, I have code (known to the main program) that
catches all kinds of fatal signals to log nice error messages before
dying.  That means that I can't use a library that handles signals for
any other purpose.  Right now I want to have a small snippet of code
handle SIGBUS, but now I need to coordinate it with everything else.

If this stuff were unified, then everything would just work.

--Andy


Re: [RFC / musing] Scoped exception handling in Linux userspace?

2013-07-19 Thread Andy Lutomirski
On Fri, Jul 19, 2013 at 9:22 AM, David Daney  wrote:
> On 07/18/2013 08:29 PM, Andy Lutomirski wrote:
>>
>> Other way around: a *library* that wants to use exception handling
>> can't do so safely without the cooperation, or at least understanding,
>> of the main program and every other library that wants to do something
>> similar.  Suppose my library installs a SIGFPE handler and throws
>> my_sigfpe_exception and your library installs a SIGFPE handler and
>> throws your_sigfpe_exception.  The result: one wins and the other
>> crashes due to an unhandled exception.
>>
>> In my particular usecase, I have code (known to the main program) that
>> catches all kinds of fatal signals to log nice error messages before
>> dying.  That means that I can't use a library that handles signals for
>> any other purpose.  Right now I want to have a small snippet of code
>> handle SIGBUS, but now I need to coordinate it with everything else.
>>
>> If this stuff were unified, then everything would just work.
>
>
> That's right.  But I think the Linux kernel already supplies all the needed
> functionality to do this.  It is really a matter of choosing a userspace
> implementation and standardizing your entire system around it.  In the realm
> of GNU/GLibc/Linux, it is really more of social/political exercise rather
> than a technical problem.
>

The social problem could be solved by glibc (or maybe ld.so)
installing the relevant handlers automatically and taking advantage of
its sigaction wrapper to keep everything working.  But this has
technical problems:

1. Semantic changes: things like kill(pid, SIGSEGV) will no longer
result in a fatal signal, which would be a regression (albeit probably
harmless).  The results from /proc/pid/status might look a bit odd.
Separating out signals resulting from faulting instructions (vs other
causes) might be tricky.  I'm also not sure whether the ignored states
of SIGSEGV and SIGFPE are preserved across exec, but, if they are,
glibc will have trouble emulating this.

2. Unhandled signals: if SIGSEGV is handled (by, say, glibc) but there
is no exception handler that claims the signal, then there's currently
no way to tell the kernel to do everything it normally does on an
unhandled fatal signal (e.g. logging, dumping core correctly,
notifying ptracers, sending the right failure code to waitid).

--Andy


Automatic dependency file generation bug/question

2014-01-15 Thread ANDY KENNEDY
Reading <http://gcc.gnu.org/news/dependencies.html>, I find that
dependency files should be created along the lines of

%.o:  %.c ...

In gcc version 4.4.4 (Slackware64 Linux version 13.0), I execute the
following commands:

touch a.c
gcc -c -MMD -MP -MF"a.d" -MT"a.d" -o "a.o" "a.c"
cat a.c

and get the following output:

a.d a.o: a.c

which is precisely what I want.  However, I have a cross compiler using
gcc version 4.7.3 which produces the following output using the same
commands (obviously, with gcc replace with /bin/gcc:

a.d: a.c

I have been looking for the reason for the change, but am unable to find
the rational.

Please advise whether this is a bug, or if this is meant to be the way
gcc will work for all future releases.  As I see it, this complicates my
Makefile(s) as I have two gcc version that behave differently.  This
implies that my Makefile(s) will now require a section specifically
dedicated to the .d file generation.  Whereas I remember that this used
to be the way I had to construct the dependency list, this was
cumbersome and the way e.g. 4.4.4 supported automatic dependency
generation is preferable to me.

Thank you for your time,
Andy


RE: Automatic dependency file generation bug/question

2014-01-21 Thread ANDY KENNEDY
Ping (with one correction).

> -Original Message-
> From: ANDY KENNEDY
> Sent: Wednesday, January 15, 2014 3:16 PM
> To: 'gcc@gcc.gnu.org'
> Subject: Automatic dependency file generation bug/question
> 
> Reading <http://gcc.gnu.org/news/dependencies.html>, I find that
> dependency files should be created along the lines of
> 
> %.o:  %.c ...
> 
> In gcc version 4.4.4 (Slackware64 Linux version 13.0), I execute the
> following commands:
> 
> touch a.c
> gcc -c -MMD -MP -MF"a.d" -MT"a.d" -o "a.o" "a.c"
> cat a.d
> 
> and get the following output:
> 
> a.d a.o: a.c
> 
> which is precisely what I want.  However, I have a cross compiler using
> gcc version 4.7.3 which produces the following output using the same
> commands (obviously, with gcc replace with /bin/gcc:
> 
> a.d: a.c
> 
> I have been looking for the reason for the change, but am unable to find
> the rational.
> 
> Please advise whether this is a bug, or if this is meant to be the way
> gcc will work for all future releases.  As I see it, this complicates my
> Makefile(s) as I have two gcc version that behave differently.  This
> implies that my Makefile(s) will now require a section specifically
> dedicated to the .d file generation.  Whereas I remember that this used
> to be the way I had to construct the dependency list, this was
> cumbersome and the way e.g. 4.4.4 supported automatic dependency
> generation is preferable to me.
> 
> Thank you for your time,
> Andy


How do I disable warnings across gcc versions?

2012-05-14 Thread Andy Lutomirski
This code warns (incorrectly, but that's a whole separate issue):

double foo(double a, double b)
{
  bool option1_ok, option2_ok;
  double option1, option2;
  if (a == 0) {
option1_ok = false;
  } else {
option1 = b;
option1_ok = true;
  }
  if (a == 1) {
option2_ok = false;
  } else {
option2 = b;
option2_ok = true;
  }
  if (option1_ok) return option1;
  if (option2_ok) return option2;
  return 7;
}

Unfortunately, the bogus warning is -Wuninitialized in gcc 4.6 and
-Wmaybe-uninitialized in gcc 4.7.  The obvious way to silence the
warning is to wrap it in:

#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wuninitialized"
#pragma GCC diagnostic ignored "-Wmaybe-uninitialized"
...
#pragma GCC diagnostic pop

It silences the original warning, but now gcc 4.6 says:
warning: unknown option after ‘#pragma GCC diagnostic’ kind [-Wpragmas]

This seems to defeat the purpose, and adding
#pragma GCC diagnostic ignored "-Wpragmas"
is a little gross.  How am I supposed to do this?

Thanks,
Andy


Re: C++11: new builtin to allow constexpr to be applied to performance-critical functions

2012-10-19 Thread Andy Gibbs

On Saturday, October 20, 2012 7:50 AM, Chandler Carruth wrote:

[...snip...] Let me hypothesize a different interface:

This stays the same...
constexpr int constexpr_strncmp(const char *p, const char *q, size_t n) {
 return !n ? 0 : *p != *q ? *p - *q : !*p ? 0 : constexpr_strncmp(p+1, 
q+1, n-1);

}


But here we do something different on the actual declaration:
[[constexpr_alias(constexpr_strncmp)]]
int strncmp(const char *p, const char *q, size_t n);

When parsing the *declaration* of this function, we lookup the function
name passed to constexpr_alias. We must find a constexpr function with an
identical signature. Then, at function invocation substitution of strncmp,
we instead substitute the body of constexpr_strncmp.

This seems more direct (no redirection in the code), and it also provides
a specific advantage of allowing this to be easily added to an existing
declaration in a declaration-only header file without impacting or
changing the name of the runtime executed body or definition.


I'd be very happy with this solution.  I come across precisely the problem
raised by Richard on a very regular basis and have different workarounds
for both gcc and clang.  I'd love to see something "standard" emerging!

For my side, I'd still like some way of declaring a function to be used
only in a constexpr environment, meaning that the compiler gives an error
up front when a function is then used in a non-constexpr environment.  The
above proposal will provide a link-time error if the non-constexpr function
is not defined, which is half-way there.  Perhaps using the "unavailable"
attribute in conjunction with "constexpr_alias" would be the compile-time
solution...

Cheers

Andy



Re: Redundant logical operations left after early splitting

2008-02-19 Thread Andy H
After some digging, I can confirm local-alloc.c is creating OR Rx,0 
instructions but not simplifying them
local-alloc.c is not the problem - but right now  it the only help I'm 
getting for post split optimization.


This occurs when source registers are replaced with equivalent constant 
using validate_replace_rtx() (which has very minimal simplifications)


I added validate_simplify_rtx() after the normal 
update_equiv_regs/validate_replace_rtx()  and the OR Rx,0 got removed.


I also found that the limited propagation of constants is also due to 
limitations of local-alloc.c. In particular two restrictions:


1) Constants are not propagated into  operands that are both input and 
output. For example:

Ra = 0
Ra=Ra | Rb

Not sure why - maybe just deemed too difficult.

2) The method used only replaces the first use in a daisy chain of 
moves. So if we have


Ra = 0
Rb = Ra
Rc = Rc | Rb

it will only reduce to:

Rb = 0
Rc = Rc | Rb

rather than

Rc = Rc | 0

and ideally

*NOTHING*

Propagating  REG_EQUIV notes across register-register moves would seem 
to a obviously simple way to fix this. Thoughts?
I am not sure local-alloc is the best place to address the overall 
problem, I doubt it is intended to provide such optimizations.

An additional cse pass after split would seem a better way perhaps?

Andy






Re: Redundant logical operations left after early splitting

2008-02-19 Thread Andy H

Dave and Jeff,

Here are more details and I have include testcase, splitter patterns and 
RTL dump to show problem in more detail.


The testcase is:

unsigned long f (unsigned char  *P)
{
 unsigned long C;
 C  = ((unsigned long)P[1] << 24)
| ((unsigned long)P[2] << 16)
| ((unsigned long)P[3] <<  8)
| ((unsigned long)P[4] <<  0);
 return C;
}

which normally produce horrible code with no significant impact of 
optimisation.


To solve this, back end patterns for zero_extend and the lshift by 
multiples of 8 were split into QImode moves.
The hope was that gcc would then collapse the QImode expression list 
such as  0|0|0|x or 0|0|y|0 into simple moves.

Here are splitter patterns (followed by more stuff):

;; xx<---x xx<---x xx<---x xx<---x xx<---x xx<---x xx<---x xx<---x xx<---x
;; zero extend
(define_insn_and_split "zero_extendqi2"
 [(set (match_operand:HIDI 0 "nonimmediate_operand" "")
   (zero_extend:HIDI (match_operand:QI 1 "nonimmediate_operand" "")))]
 ""
 "#"
 ""
 [(const_int 0)]
 "
 int i;
   enum machine_mode dmode = GET_MODE (operands[0]);
   int dsize = GET_MODE_SIZE (dmode);
   enum machine_mode smode = GET_MODE (operands[1]);
   int ssize = GET_MODE_SIZE (smode);
   rtx dword =  simplify_gen_subreg (smode, operands[0], dmode, 0);
   emit_move_insn (dword, operands[1]);
   for (i = ssize; i < dsize; i++)
   {
   rtx dbyte =  simplify_gen_subreg (QImode, operands[0], dmode, i);
   emit_move_insn (dbyte, const0_rtx);
   }
   DONE;
 ")

;byte shift is just a series of moves
;check src OR dest is in register, so the move will be ok

(define_insn_and_split "ashl3_const2p"
 [(set (match_operand:HIDI 0 "nonimmediate_operand""")

   (ashift:HIDI (match_operand:HIDI 1 "register_operand"  "")
  (match_operand:HIDI 2 "const_int_operand" "i"))
)
   ]
 "((INTVAL (operands[2]) % 8) == 0)
  "
 "#"
 ""
 [(const_int 0)]
 {
   int i;
   enum machine_mode mode;
   mode = GET_MODE(operands[0]);
   int size = GET_MODE_SIZE(mode);
   HOST_WIDE_INT x = INTVAL (operands[2]);   
   rtx dbytes[8], sbytes[8];

   for (i = 0; i < size; i++)
   {
   dbytes[i] =  simplify_gen_subreg (QImode, operands[0], mode, i);
   sbytes[i] =  simplify_gen_subreg (QImode, operands[1], mode, i);
   }
   int shift = x / 8;
   if (shift > size) shift = size;
   for (i = shift; i < size; i++)
   {
   emit_move_insn (dbytes[i], sbytes[i - shift]);   
   }

   for (i = 0; i < shift; i++)
   {
   emit_move_insn (dbytes[i], const0_rtx);   
   }

 }
)

;

The result kinda works but we are left with OR x,0 (and some missed 
opportunities to propagate zero constant forward into OR)


The spliiters are matched up initially (zero_extend) or at combine - 
just as expected.


All the subregs appear as expected in split1. Naturally this produces a 
bunch of QI subregs many of which  contain zero. No real change happens 
in RTL  until local register allocation (lreg dump file). There are no 
redundant  IOR Rm,0 in dump files before lreg pass. The only note  is a 
reg dead on the pointer argument when it gets moved to a pointer 
register. (no reg equals or other dead notes until lreg pass)


In the lreg dump file  I can see  the propagation of  many (but not all) 
constant 0 forward into the IOR instructions (eg Rn = 0, Rm= Rm | Rn  
=>  Rm = Rm|0).  These remains in RTL and are output into final code.  
Loads  of zero into registers which end up being unused are removed in 
latter passes.


I can remove IOR Rm,0 with a targetted splitter to create a NOP - which 
is my last resort.


So here is lreg dump extract:


;; Function f (f)

starting the processing of deferred insns
ending the processing of deferred insns
df_analyze called
df_worklist_dataflow_overeager:n_basic_blocks 3 n_edges 2 count 3 (1)
df_worklist_dataflow_overeager:n_basic_blocks 3 n_edges 2 count 3 (1)


Pass 0

 Register 42 costs: POINTER_X_REGS:0 POINTER_Y_REGS:0 POINTER_Z_REGS:0 
BASE_POINTER_REGS:0 POINTER_REGS:0 ADDW_REGS:8000 SIMPLE_LD_REGS:8000 
LD_REGS:8000 NO_LD_REGS:8000 GENERAL_REGS:8000 ALL_REGS:1 MEM:2
 Register 58 costs: POINTER_X_REGS:0 POINTER_Z_REGS:0 
BASE_POINTER_REGS:0 POINTER_REGS:0 ADDW_REGS:0 SIMPLE_LD_REGS:0 
LD_REGS:0 NO_LD_REGS:2000 GENERAL_REGS:2000 ALL_REGS:16000 MEM:16000
 Register 59 costs: POINTER_X_REGS:0 POINTER_Z_REGS:0 
BASE_POINTER_REGS:0 POINTER_REGS:0 ADDW_REGS:0 SIMPLE_LD_REGS:0 
LD_REGS:0 NO_LD_REGS:2000 GENERAL_REGS:2000 ALL_REGS:16000 MEM:16000
 Register 60 costs: POINTER_X_REGS:0 POINTER_Z_REGS:0 
BASE_POINTER_REGS:0 POINTER_REGS:0 ADDW_REGS:0 SIMPLE_LD_REGS:0 
LD_REGS:0 NO_LD_REGS:2000 GENERAL_REGS:2000 ALL_REGS:16000 MEM:16000
 Register 61 costs: POINTER_X_REGS:0 POINTER_Z_REGS:0 
BASE_POINTER_REGS:0 POINTER_REGS:0 ADDW_REGS:0 SIMPLE_LD_REGS:0 
LD_REGS:0 NO_LD_REGS:0 GENERAL_REGS:0 ALL_REGS:16000 MEM:16000
 Register 62 costs: POINTER_X_REGS:0 POINTER_Z_REGS:0 
BASE_POINTER_REGS:0 POINTER_RE

Re: Redundant logical operations left after early splitting

2008-02-20 Thread Andy H

I tried extra fwprop pass and got some very interesting results!

First "caveat" I just cut/pasted extra pass into list - nor worrying 
about detail.


 NEXT_PASS (pass_rtl_fwprop);
 NEXT_PASS (pass_local_alloc);

To show effects here is assembler code dump (which is easier to read 
than RTL)


(1)Just splitters - normal passes for O3 - no attempt to remove Or rx,0
(without splitters its almost 2x  bigger)

 23   /* prologue: function */
 24   /* frame size = 0 */
 25  FC01  movw r30,r24
 26   .LM2:
 27 0002 9181  ldd r25,Z+1
 28 0004 80E0  ldi r24,lo8(0)
 29   .LVL1:
 30 0006 60E0  ldi r22,lo8(0)
 31 0008 2281  ldd r18,Z+2
 32 000a 30E0  ldi r19,lo8(0)
 33 000c 6060  ori r22,lo8(0)
 34 000e 762F  mov r23,r22
 35 0010 822B  or r24,r18
 36 0012 932B  or r25,r19
 37 0014 2481  ldd r18,Z+4
 38 0016 622B  or r22,r18
 39 0018 7060  ori r23,lo8(0)
 40 001a 8060  ori r24,lo8(0)
 41 001c 9060  ori r25,lo8(0)
 42 001e 2381  ldd r18,Z+3
 43 0020 40E0  ldi r20,lo8(0)
 44 0022 6060  ori r22,lo8(0)
 45 0024 722B  or r23,r18
 46 0026 832B  or r24,r19
 47 0028 942B  or r25,r20
 48   /* epilogue start */
 49   .LM3:
 50 002a 0895  ret

(2)Same code but now with fwprop:

 23   /* prologue: function */
 24   /* frame size = 0 */
 25  FC01  movw r30,r24
 26   .LM2:
 27 0002 4181  ldd r20,Z+1
 28 0004 942F  mov r25,r20
 29 0006 70E0  ldi r23,lo8(0)
 30 0008 3281  ldd r19,Z+2
 31 000a 832F  mov r24,r19
 32   .LVL1:
 33 000c 9060  ori r25,lo8(0)
 34 000e 2481  ldd r18,Z+4
 35 0010 622F  mov r22,r18
 36 0012 8060  ori r24,lo8(0)
 37 0014 2381  ldd r18,Z+3
 38 0016 722B  or r23,r18
 39   /* epilogue start */
 40   .LM3:
 41 0018 0895  ret


 Much better! But note we still have OR rx,0 created. (There were none 
before

 fwprop pass.) As there are still obvious propagation oppertunities I
 suspect that these are being added by local-alloc  propagation after 
imperfect  fwprop.
 


 (4)Now with fwprop and NOP splitter for OR rx,0

  23   /* prologue: function */
 24   /* frame size = 0 */
 25  FC01  movw r30,r24
 26   .LM2:
 27 0002 4181  ldd r20,Z+1
 28 0004 942F  mov r25,r20
 29 0006 70E0  ldi r23,lo8(0)
 30 0008 3281  ldd r19,Z+2
 31 000a 832F  mov r24,r19
 32   .LVL1:
 33 000c 2481  ldd r18,Z+4
 34 000e 622F  mov r22,r18
 35 0010 2381  ldd r18,Z+3
 36 0012 722B  or r23,r18
 37   /* epilogue start */
 38   .LM3:
 39 0014 0895  ret

No diference apart from OR Rx,0 removal. (I expected that)


(5) And just for the hell of it 2 passes of fwprop before local-alloc.
No NOP splitter.

 NEXT_PASS (pass_rtl_fwprop);
 NEXT_PASS (pass_rtl_fwprop);
 NEXT_PASS (pass_local_alloc);


 23   /* prologue: function */
 24   /* frame size = 0 */
 25  FC01  movw r30,r24
 26   .LM2:
 27 0002 9181  ldd r25,Z+1
 28 0004 8281  ldd r24,Z+2
 29   .LVL1:
 30 0006 6481  ldd r22,Z+4
 31 0008 7381  ldd r23,Z+3
 32   /* epilogue start */
 33   .LM3:
 34 000a 0895  ret

Which is optimal. TADA!

This  would indicate that simplify-rtx inside fwprop is removing OR Rx,0
but not picking up the the additionally revealed forward propagation 
oppertunities

This would seem to be an avoidable limitation.

Andy




Re: Redundant logical operations left after early splitting

2008-02-21 Thread Andy H

I very grateful for your help and wisdom

Testcase and MD Patch attached


unsigned long f (unsigned char  *P)
{
 unsigned long C;
 C  = ((unsigned long)P[1] << 24)
| ((unsigned long)P[2] << 16)
| ((unsigned long)P[3] <<  8)
| ((unsigned long)P[4] <<  0);
 return C;
}




Index: avr.md
===
--- avr.md  (revision 132380)
+++ avr.md  (working copy)
@@ -251,8 +251,8 @@
(set_attr "cc" "none")])
 
 (define_insn "*movhi"
-  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r,m,d,*r,q,r")
-(match_operand:HI 1 "general_operand"   "r,m,rL,i,i,r,q"))]
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r,Qm,d,*r,q,r")
+(match_operand:HI 1 "general_operand"   "r,Qm,rL,i,i,r,q"))]
   "(register_operand (operands[0],HImode)
 || register_operand (operands[1],HImode) || const0_rtx == operands[1])"
   "* return output_movhi (insn, operands, NULL);"
@@ -1126,73 +1126,310 @@
   [(set_attr "length" "1,1")
(set_attr "cc" "set_zn,set_zn")])
 
-(define_insn "andhi3"
-  [(set (match_operand:HI 0 "register_operand" "=r,d,r")
- (and:HI (match_operand:HI 1 "register_operand" "%0,0,0")
- (match_operand:HI 2 "nonmemory_operand" "r,i,M")))
-   (clobber (match_scratch:QI 3 "=X,X,&d"))]
+(define_mode_iterator HIDI [(HI "") (SI "") (DI "")])
+(define_mode_iterator SIDI [(SI "") (DI "")])
+(define_mode_iterator DIDI [(DI "")])
+
+(define_insn_and_split "and3"
+[(set (match_operand:HIDI 0 "register_operand" "=r,d")
+   (and:HIDI (match_operand:HIDI 1 "register_operand" "%0,0")
+   (match_operand:HIDI 2 "nonmemory_operand" "r,i")))
+]
   ""
-  "*{
-  if (which_alternative==0)
-return (AS2 (and,%A0,%A2) CR_TAB
-   AS2 (and,%B0,%B2));
-  else if (which_alternative==1)
+  "#"
+  ""
+  [(const_int 0)]
 {
+   int i;
+   enum machine_mode mode;
+   mode = GET_MODE(operands[0]);
+   int size = GET_MODE_SIZE(mode);
+   
   if (GET_CODE (operands[2]) == CONST_INT)
 {
- int mask = INTVAL (operands[2]);
- if ((mask & 0xff) != 0xff)
-   output_asm_insn (AS2 (andi,%A0,lo8(%2)), operands);
- if ((mask & 0xff00) != 0xff00)
-   output_asm_insn (AS2 (andi,%B0,hi8(%2)), operands);
- return \"\";
+   HOST_WIDE_INT x = INTVAL (operands[2]);
+  
+   for (i = 0; i < size; i++)
+   {
+   rtx dest = simplify_gen_subreg (QImode, operands[0], mode, i);
+   rtx src1 = simplify_gen_subreg (QImode, operands[1], mode, i);
+   int byte = (x & 0xff);
+   rtx src2 = gen_int_mode (byte, QImode);
+   if (byte == 0x00)
+   {
+   emit_move_insn (dest, const0_rtx);
 }
-return (AS2 (andi,%A0,lo8(%2)) CR_TAB
-   AS2 (andi,%B0,hi8(%2)));
+   else if (byte == 0xff)
+   {
+   emit_move_insn (dest, src1);
  }
-  return (AS2 (ldi,%3,lo8(%2)) CR_TAB
-  AS2 (and,%A0,%3) CR_TAB
-  AS1 (clr,%B0));
-}"
-  [(set_attr "length" "2,2,3")
-   (set_attr "cc" "set_n,clobber,set_n")])
+   else
+   {
+   emit_move_insn (dest, gen_rtx_AND (QImode, src1, src2));
+   }
+   x= x >> 8;
+   }
+   }
+   else
+   {
+   for (i = 0; i < size; i++)
+   {
+   rtx dest = simplify_gen_subreg (QImode, operands[0], mode, i);
+   rtx src1 = simplify_gen_subreg (QImode, operands[1], mode, i);
+   rtx src2 = simplify_gen_subreg (QImode, operands[2], mode, i);
+   emit_move_insn (dest, gen_rtx_AND (QImode, src1, src2));
+   }
+   }
+   DONE;
+  }
+  )
+  
+;(define_insn_and_split "ziorQI3"
+;[(set (match_operand:QI 0 "nonimmediate_operand" "=rm")
+;  (ior:QI (match_operand:QI 1 "general_operand" "0")
+;  (const_int 0)))
+;]
+;  "0"
+;  "#"
+;  ""
+;  [(const_int 0)]
+;  ""
+;  )
 
-(define_insn "andsi3"
-  [(set (match_operand:SI 0 "register_operand" "=r,d")
-   (and:SI (match_operand:SI 1 "register_operand" "%0,0")
-   (match_operand:SI 2 "nonmemory_operand" "r,i")))]
+  
+(define_insn_and_split "ior3"
+[(set (match_operand:HIDI 0 "register_operand" "=r,d")
+   (ior:HIDI (match_operand:HIDI 1 "register_operand" "%0,0")
+   (match_operand:HIDI 2 "nonmemory_operand" "r,i")))
+]
   ""
-  "*{
-  if (which_alternative==0)
-return (AS2 (and, %0,%2)   CR_TAB
-   AS2 (and, %B0,%B2) CR_TAB
-   AS2 (and, %C0,%C2) CR_TAB
-   AS2 (and, %D0,%D2));
-  else if (which_alternative==1)
+  "#"
+  ""
+  [(const_int 0)]
 {
+   int i;
+   enum machine_mode mode;
+   mode = GET_MODE(operands[0]);
+   int size = GET_MODE_SIZE(mode);
+   
   if (GET_CODE (operands[2]) == CONST_INT)
 {
- HOST_WIDE_INT mask = INTVAL (operands[2]);
- if ((mask & 0xff) != 0xff)
-   output_asm_insn (AS2 (andi,%A0,lo8(%2)), operands);
- if ((mask & 0xff00) != 0xff00)
-   output_asm_insn (AS2 (andi,%B0,hi8(%2)), operands);
- if ((mask &

Re: Redundant logical operations left after early splitting

2008-02-21 Thread Andy H

Paolo,

As you suggested, I moved the extra fwprop nearer combine, just after 
split - but it failed to propagate anything.


The reason is that immediately post split the data flow is reflecting 
cross dependencies between Word and subreg U/D.  So the USE of  just 1 
QImode subreg of SImode register islinked to  4 QImode DEFs - and fwprop 
gives up.


Putting fwprop after subreg pass removes this problem  - as the subreg 
have then been converted to QImode psuedo regs and we get single DEF.


Andy


Paolo Bonzini wrote:



This  would indicate that simplify-rtx inside fwprop is removing OR Rx,0
but not picking up the the additionally revealed forward propagation 
oppertunities

This would seem to be an avoidable limitation.


Yes, can you send me your MD patch and a simple testcase?  fwprop is 
supposed to be "cascading", and some bugs in cascading were already 
revealed by the AVR port.


It might be even more worthwhile to try *moving* fwprop2 after 
combine, then.


Paolo


Combine repeats matching on insn pairs and will ICE on 3.

2008-03-08 Thread Andy H

Hi,

I have problem with data flow and combine that is causing ICE with 
experimental build. Despite all efforts to blame my own target changes,
I have reached the conclusion that this is a gcc COMBINE bug, but seek 
your advice before filing a bug report.


The problem seems to be that the LOG_LINKS that combine creates and uses 
can include multiple references between  instruction pairs.


The information is derived from DF. That will produce multiple 
references to the same instructions if  the register in question is a 
hard register that decomposes into several smaller registers.


The RTL that triggered problem is:

(insn 45 42 46 4 920625-1.c:55 (set (reg:SI 22 r22 [ temp.24 ])
   (mem:SI (reg/v/f:HI 71 [ alpha ]) [2 S4 A8])) 19 {*movsi} (nil))

(insn 46 45 47 4 920625-1.c:55 (set (reg:SI 18 r18)
   (mem:SI (plus:HI (reg:HI 68 [ ivtmp.18 ])
   (const_int 4 [0x4])) [2 S4 A8])) 19 {*movsi} (nil))

(insn 47 46 48 4 920625-1.c:55 (parallel [
   (set (reg:SI 22 r22)
   (mult:SI (reg:SI 22 r22)
   (reg:SI 18 r18)))
   (clobber (reg:HI 26 r26))
   (clobber (reg:HI 30 r30))
   ]) 43 {*mulsi3_call} (expr_list:REG_DEAD (reg:SI 18 r18)
   (expr_list:REG_UNUSED (reg:HI 30 r30)
   (expr_list:REG_UNUSED (reg:HI 26 r26)
   (nil)


This is call to library function, and the parameter for instruction 47 
are hard registers for example SI:R22 - which is physically actually 
R22,23,24 and 25.

DF marks all 4 in def/use chains (which seems entirely correct)

When DF information is transferred into LOG_LINKS we still have 4 
references back to the definition in instructions 45 and 47. From gdb 
this was:


(gdb) print uid_log_links[47]
$8 = (rtx) 0x7ff140d0
(gdb) pr
(insn_list:REG_DEP_TRUE 45 (insn_list:REG_DEP_TRUE 45 
(insn_list:REG_DEP_TRUE 45
(insn_list:REG_DEP_TRUE 45 (insn_list:REG_DEP_TRUE 46 
(insn_list:REG_DEP_TRUE 4

6 (insn_list:REG_DEP_TRUE 46 (insn_list:REG_DEP_TRUE 46 (nil)

These multiple references causes COMBINE to try the same combinations 
multiple times (it thinks they are different instructions). Apart from 
burning CPU time, this appears to have no obvious problem for 
instruction pairs (i.e. 2 only)


However, when 3 are combined, we end up trying to combine i3=47 with 
instruction i2=46 and instruction i1=46 (thats right two copies of 46). 
Mostly this is ok - except when we get a new pattern for i2, and then 
delete  i1 - and not realizing that i2 is also deleted.


This ICE occured when it tried to copy the  REG_DEAD notes back to the 
source of R22 - instruction 46 - which, of course was no longer there!


I'm thinking that create_log_links, needs to distill the links down to 
avoid duplicates, but I'm really not sure what to blame.


best regards

Andy








Forward propagation before register allocation

2008-03-16 Thread Andy H
I have been working on AVR port and have come across many instances 
where poor code is produced due to the absence of effective forward 
propagation of operands before register allocation.


The AVR target in particular benefits from register lowering pass as 
many physical registers and instructions are only 8 bits. However, most  
of the opportunities this creates for register and instruction 
elimination  are not realized as the only following  pass that can help 
is register allocation which has minimal propagation capabilities - and 
no instruction simplifications.


Similarly, instruction splitting  created at the combine stage and 
before reload do not reliase the expected benefits from spliiting into 8 
bit operations. In fact if any new pseudo are created by a split this 
often turn into new hard register.


Targets other than the AVR which have small sized registers (i86 and 
68HC11 for example) are likely to have the same issues.


There have been suggestions in tha pass to split all instructions by RTL 
expanders. Indeed I and others have explored this. However, this has 
proved fruitless as this also requires a change to  a non-CC0 target. At 
this time this cannot be realized  due to problems reloading addresses 
and reloading arithmetic carry operations.


Hybrid approaches that split some instructions using  RTL expanders have 
also been tried. But the resultant complexity  (some split -some are 
not) then defeats early RTL optimizations -so we loose  more than we gain.


With  an additional forward propagation pass prior to register 
allocation the full benefits of  Subreg lowering and splitting are obtained.


I realise than nobody is keen on new passes. Yet I can see no other way 
in which this problem can be addressed. 

From a target viewpoint, moving fwprop2 latter would be fine. Adding a 
target dependent pass (before reload) would be also be fine. Indeed any 
way we could get better forward propagation/simplification  after 
combine/lowering and before register allocation would be great.


Can I humbly ask that the maintainers of gcc seriously consider this 
request and provide some means by which to solve the issue.


regards

Andy











Re: How to avoid stack calling for trapoline code?

2008-04-04 Thread Andy H

no_trampolines Dejagnu switch will omit many but not all trampoline dependent 
tests.

Nested function are ok - but anything that takes an address of a nested function will use trampoline. 
They can be hard to find as testcases are devilish at hiding that part!



For example: 


gcc.c-torture/compile/nested-1.c

will fail.


There are quite a few like this. I am hoping to get round to providing patches to correct these test cases. You will find a few more listed 
in this valiant attempt of 2005:


http://gcc.gnu.org/ml/gcc-patches/2005-05/msg01919.html



Andy






RFC Test suite fix testing of no_trampolines

2008-04-05 Thread Andy H
There are several test in testsuite that use trampolines that are still 
run with dejagnu switch set to  no_trampolines.


Its on my TODO list for AVR target but a recent email reminded me that 
it  affects testing of other targets than can't or won't support 
trampolines.


Theres an  old patch by Björn Haase that was approved but not committed 
in 2005 that addressed many of these


http://gcc.gnu.org/ml/gcc-patches/2005-05/msg01919.html

Essentially excluding them with  dg-require-effective-target

I am going to create an updated version of this. I realize that some of 
the target-supports changes are no longer needed.


Any comments?




Re: RFC Test suite fix testing of no_trampolines

2008-04-07 Thread Andy H

Thank  you so much.

I can test it easily and will let you know of any divergence from 
original other than the those you mention.


Andy


Janis Johnson wrote:

On Sat, 2008-04-05 at 06:57 -0400, Andy H wrote:
  
There are several test in testsuite that use trampolines that are still 
run with dejagnu switch set to  no_trampolines.


Its on my TODO list for AVR target but a recent email reminded me that 
it  affects testing of other targets than can't or won't support 
trampolines.


Theres an  old patch by Björn Haase that was approved but not committed 
in 2005 that addressed many of these


http://gcc.gnu.org/ml/gcc-patches/2005-05/msg01919.html

Essentially excluding them with  dg-require-effective-target

I am going to create an updated version of this. I realize that some of 
the target-supports changes are no longer needed.


Any comments?



Except for check_effective_target_int_larger_than_16_bits, that
patch still looks OK to me.  In the meantime there are new
effective-target checks for int16 and int32plus that could
be used instead.

I wouldn't want to check in a three-year-old patch without
testing it, so if you can update it and test on avr, that would
be great.  If it works, it's preapproved.  Post it to gcc-patches
and let me know if you need me to check it in.

Janis
  


Re: Problem with reloading in a new backend...

2008-04-08 Thread Andy H

Take at look at AVR target which is very similar.

Here onluy "d" constrained register accept constants (they are call used 
registers too)


The AVR move pattern (*reloadinqi) has multiple constrain options "d" 
against "i" being relevant. So check you have all the combinations 
defined. You omitted "i" constraint but I dont know if that is relevant.


When "costing " is done, it walks thru constraints. You don't seem to 
have many constrain combinations (since costs are lower than I typically 
see for AVR).  It will score on both EVEN and EIGHT and likely pick EVEN 
because its bigger class. So I think that is perhaps reason. I think 
order of classes maybe need to be changed or something else to prevent 
problem with overlap (LOWER_EVEN? UPPER_EVEN)


But I could be completely wrong!

Andy






How do I add target specific tests?

2008-05-10 Thread Andy H

I want to add target specific tests for AVR.

These would be testcases for PR that fail  related to AVR back end 
problems - rather than testcases for generic PR.


Do I just add them to directory testsuite/gcc.target/avr? Or are there 
some other configuration steps needed?


Andy



Whats going on with the conversion warning?

2008-05-19 Thread Andy H


I came across this odd issue with testsuite test Wconversion-5.c and AVR 
target.
I should get warning going from a unsigned value that is wider than 
signed result.


As I am not skilled in the art of the all conversions rules. I would 
appreciate some guidance before I report this as bug.


FYI AVR has 16 bit int, 16 bit short  int and 32 bit long.

I extracted the problematic line with a few variants and compiled -O0 
and -Wconversion,


void foo(void)
{
  signed char sc;
  signed char xi;

 xi =  (int) (unsigned short int) sc;/* testcase NO WARNING - think 
this is bug*/

 xi =  (unsigned short int) sc;   /* NO WARNING - think this is bug*/
 xi =  (long) (unsigned short int) sc;/* warning: conversion to 
'signed char' from 'short unsigned int' may alter its value - correct*/

 xi =  (long) ( short int) sc;/* NO WARNING - correct */
 }

It would seem Wconversion:want to see 32bit result before it gives warning.
That can't be right - can it?

best regards





Re: Whats going on with the conversion warning?

2008-05-20 Thread Andy H

Thanks for explanation and help

But this leave me with the conclusion that one of the following must be 
wrong:


signed char xi;

xi =  (int) (unsigned short int) sc;/* testcase NO WARNING - think 
this is bug*/
xi =  (long) (unsigned short int) sc;/* warning: conversion to 
'signed char' from 'short unsigned int' may alter its value - correct*/


Following your logic, (long) appears to be wrong. Yet for i686 the 
first case  (int). generates PASS with expected warning and that 
would seem similar.


So  I'm still completely stuck knowing how I can patch testcase 
correctly for AVR or  post a bug.


best regards
Andy


Manuel López-Ibáñez wrote:

2008/5/20 Andy H <[EMAIL PROTECTED]>:
  

I came across this odd issue with testsuite test Wconversion-5.c and AVR
target.
I should get warning going from a unsigned value that is wider than signed
result.




Yes. You should also get a warning from a unsigned value converted to
a same-width signed type.

  

void foo(void)
{
 signed char sc;
 signed char xi;

 xi =  (int) (unsigned short int) sc;/* testcase NO WARNING - think this
is bug*/



I may be wrong but I think (unsigned short int) sc is zero-extended,
then (int)(unsigned short int) sc is again zero extended. This means
that the complete conversion results in an integer value that when
converted to signed char gives back the original signed char. So the
assignment is actually equivalent to xi = sc. Ergo, no conversion
warning.

  

 xi =  (unsigned short int) sc;   /* NO WARNING - think this is bug*/



The same applies here. Zero-extending to a wider type and then
conversion to the original type does not change the value. So now
warning. (That is, Wconversion can see whether the casts actually
affect the result or not.)

So I think this is not a bug. There are bugs in Wconversion, nonetheless.

http://gcc.gnu.org/PR35635
http://gcc.gnu.org/PR35701
http://gcc.gnu.org/PR34389
http://gcc.gnu.org/PR35852

Cheers,

Manuel.
  


Re: Where is setup for "goto" in nested function created?

2008-05-22 Thread Andy H

Thanks Ian!

I found it in function.c (expand_function_start)

 emit_move_insn (r_save, virtual_stack_vars_rtx);

Whereas it should be

 emit_move_insn (r_save, targetm.builtin_setjmp_frame_value ());

to match same construction used for setjmp.

thanks for help!



Ian Lance Taylor wrote:

[EMAIL PROTECTED] writes:

  

expand_builtin_nonlocal_goto is fine. This perform stack restore,
extracts frame pointer value and does jump.

reciever is fine - this jump destination does restore of frame pointer.

The problem I have is with frame pointer value that is saved in by
"setup" prior to all this

For goto is does not use expand_builtin_setjmp_setup - (pathetically)
I can't find what it is using.



I'm not really sure just what you are after (and I'm not sure that I
would know the answer if I did).  Most of the relevant code should be
somewhere in tree-nested.c.  Also look at expand_function_start, and
in general any use of cfun->nonlocal_goto_save_area.

Ian
  


Re: Help with reload and naked constant sum causing ICE

2008-05-27 Thread Andy H

Thank you very much for reply. reload is such a lonely place!

TBH, it sounds like the opposite: LEGITIMIZE_RELOAD_ADDRESS should
not be handling this address at all. 
Yes but reload will not do anything before call to L_R_A. So in practice 
that would mean L_R_A has
to check reg_equiv_constant[regno] to see if register were a constant 
and do nothing if it is.



 If L_R_A does nothing with it,
the normal reload handling will first try:

  (const:HI (plus:HI (symbol_ref:HI ("chk_fail_buf") (const_int 2
  


Are you sure? I think it will try

(plus:HI (symbol_ref:HI ("chk_fail_buf") (const_int 2)))

I could be wrong so I will rerun to check this. If const is missing it 
will  be  rejected as valid address.

Though I note parts of reload do not impose this check.


If that's legitimate, that's the address you'll get.  If it isn't,
the normal reload handling will reload the symbol_ref into a
base register of class:

  MODE_CODE_BASE_REG_CLASS (mem_mode, PLUS, CONST_INT)

And it sounds from your message like that's exactly what you want.
  

Not really. - though that might be what happens if cons wrapper is missing.
We can take any relocatable expressions. So first choice is a good one.


Richard
  


Re: Help with reload and naked constant sum causing ICE

2008-05-27 Thread Andy H



If L_R_A does nothing with it,
the normal reload handling will first try:

  (const:HI (plus:HI (symbol_ref:HI ("chk_fail_buf") (const_int 2
  


This worked just as your described after I added test of 
reg_equiv_constant[] inside L_R_A .


So I guess that looks like  the fix for bug I posted.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34641

To summarize

LEGITIMIZE_RELOAD_ADDRESS should now always check reg_equiv_constant
before it trying to do any push_reload of register.

Thanks for help!







Bugs in dg/struct profilr test - but what was intended?

2008-06-10 Thread Andy H

Hi,

In the process of fixing tests for AVR and other small targets I have 
come across issues with profile
tests in gcc/dg/struct that affect all targets and I would like reviewed 
so I can raise patches to rectify them correctly.


Both involve random creating structures then setting and checking value.

The first is a BUG in wo_prof_malloc_size_var.c

A random number (<32768) of structures are "mallocated", but then code 
always sets and checks 1000 of them.

Which will be undefined (aka crash)  if the random amount is less than 1000.
The obvious fix is to create random number  <1000 and check only those 
created. (but see next test first

before saying ok)


#include 
typedef struct
{
 int a;
 float b;
}str_t;

#define N 1000

int
main ()
{
 int i, num;

 num = rand();
 str_t * p = malloc (num * sizeof (str_t));

 if (p == 0)
   return 0;

 for (i = 0; i < N; i++)
   p[i].b = i;

 for (i = 0; i < N; i++)
   p[i].a = p[i].b + 1;

 for (i = 0; i < N; i++)
   if (p[i].a != p[i].b + 1)
 abort ();

 return 0;
}


The other test that has  similar behavior is w_prof_two_strs.c

This does check  the random amount before malloc but oddly defaults to 
zero if the number exceeds 1000.  
This will work, but it seem likely the intent was to limit the random 
amount to 1000. Otherwise

32 times out of 33 it will not create structures.


#define N 1000

str_t1 *p1;
str_t2 *p2;
int num;

void
foo (void)
{
 int i;

 for (i=0; i < num; i++)
   p2[i].c = 2;
}

int
main ()
{
 int i, r;

 r = rand ();
 num = r > N ? N : num;
 p1 = malloc (num * sizeof (str_t1));
 p2 = malloc (num * sizeof (str_t2));

 if (p1 == NULL || p2 == NULL)
   return 0;

 for (i = 0; i < num; i++)
   p1[i].a = 1;

 foo ();

 for (i = 0; i < num; i++)
   if (p1[i].a != 1 || p2[i].c != 2)
 abort ();

 return 0;
}



So what should these two test do, with out messing up the test purposes?
I also note, some of the struct tests don't check the returned pointer 
from malloc - but some do.





Why does loop-35.c store motion testcase fail for AVR?

2008-06-16 Thread Andy H


Help !

gcc.dg/tree-ssa/loop-35.c is a test that looks for "Executing store motion" in 
dump-tree-lim-details

As the load and store of memory location should be pulled out of loop.


This works for 3 out of 4 tescases. But on AVR target test3() will fail.

The only difference between this and test1() is that index is unsigned long 
rather than int.
Index of char or int work fine.

Why

Even considering pointers and int are only HImode, I cant see why a long 
(SImode) should affect this.



void test3(unsigned long b)
{
unsigned i;

/* And here.  */
for (i = 0; i < 100; i++)
  {
arr[b+8].X += i;
arr[b+9].X += i;
  }
}







How should I disable vectorization test for AVR?

2008-06-22 Thread Andy H

For AVR I get failures of:

FAIL: gcc.dg/tree-ssa/gen-vect-11.c scan-tree-dump-times vect 
"vectorized 1 loops" 1
FAIL: gcc.dg/tree-ssa/gen-vect-11a.c scan-tree-dump-times vect 
"vectorized 1 loops" 1
FAIL: gcc.dg/tree-ssa/gen-vect-2.c scan-tree-dump-times vect "vectorized 
1 loops" 1
FAIL: gcc.dg/tree-ssa/gen-vect-25.c scan-tree-dump-times vect 
"vectorized 2 loops" 1
FAIL: gcc.dg/tree-ssa/gen-vect-26.c scan-tree-dump-times vect 
"vectorized 1 loops" 1
FAIL: gcc.dg/tree-ssa/gen-vect-26.c scan-tree-dump-times vect "Alignment 
of access forced using peeling" 1
FAIL: gcc.dg/tree-ssa/gen-vect-28.c scan-tree-dump-times vect 
"vectorized 1 loops" 1
FAIL: gcc.dg/tree-ssa/gen-vect-28.c scan-tree-dump-times vect "Alignment 
of access forced using peeling" 1
FAIL: gcc.dg/tree-ssa/gen-vect-32.c scan-tree-dump-times vect 
"vectorized 1 loops" 1




For AVR there are no vectypes so it fails message test.

Q. Should I skip this test with target selector avr-*-* - OR use the 
effective keyword vect_cmdline_needed


Either will work, the latter is easy and more likely to work with future 
additions but as AVR does not have any vectors enabled by command line

or otherwise, the semantics are misleading.


Andy




Re: a small C (naive) program faster with clang than with gcc

2023-04-25 Thread Andy via Gcc
I see it in godbolt
GCC compiles to:
movsx eax, BYTE PTR [rdi+2]
cmp al, 9
ja .L42
Clang:
movzx edx, byte ptr [rdi + 2]
cmp edx, 9
ja .LBB0_40


GCC extend with sign, Clang with zero.
cmp with 32 bit register is apparently faster than 8bit

pon., 24 kwi 2023 o 17:34 Basile Starynkevitch
 napisał(a):
>
> Hello all,
>
>
> Consider the naive program (GPLv3+) to solve the cryptaddition
>
> `NEUF` + `DEUX` = `ONZE`
>
> onhttps://github.com/bstarynk/misc-basile/blob/master/CryptArithm/neuf%2Bdeux%3Donze/naive0.c
>   (commit0d1bd0e
> )
>
>
> On Linux/x86-64 that source code compiled with gcc-12 -O3 is twice as
> slower as with clang -O3
>
> (Debian/Sid or Ubuntu/22/10)
>
> Feel free to add it to some testsuite!
>
>
> Thanks
>
>
> --
> Basile Starynkevitch
> (only mine opinions / les opinions sont miennes uniquement)
> 92340 Bourg-la-Reine, France
> web page: starynkevitch.net/Basile/ & refpersys.org


Re: typeof and operands in named address spaces

2020-11-05 Thread Andy Lutomirski via Gcc
> On Nov 5, 2020, at 4:26 AM, Uros Bizjak  wrote:
>
> On Thu, Nov 5, 2020 at 1:14 PM Alexander Monakov  wrote:
>
>>> I was also thinking of introducing of operand modifier, but Richi
>>> advises the following:
>>>
>>> --cut here--
>>> typedef __UINTPTR_TYPE__ uintptr_t;
>>>
>>> __seg_fs int x;
>>>
>>> uintptr_t test (void)
>>> {
>>> uintptr_t *p = (uintptr_t *)(uintptr_t) &x;
>>> uintptr_t addr;
>>>
>>> asm volatile ("lea %1, %0" : "=r"(addr) : "m"(*p));
>>>
>>> return addr;
>>> }
>>
>> This is even worse undefined behavior compared to my solution above:
>> this code references memory in uintptr_t type, while mine preserves the
>> original type via __typeof. So this can visibly break with TBAA (though
>> the kernel uses -fno-strict-aliasing, so this particular concern wouldn't
>> apply there).
>
> Agreed, but I was trying to solve this lone use case in the kernel. It
> fits this particular usage, so I found a bit of overkill to implement
> the otherwise useless operand modifier in gcc. As discussed
> previously, these hacks are needed exclusively in asm templates, they
> are not needed in "normal" C code.
>>
>> If you don't care about preserving sizeof and type you can use a cast to 
>> char:
>>
>> #define strip_as(mem) (*(char *)(intptr_t)&(mem))
>
> I hope that a developer from kernel can chime in and express their
> opinion on the proposed approaches.
>

I haven’t looked all that closely at precisely what the kernel needs,
but I’ve had bad experiences with passing imprecise things into asm
“m” and “=m” operands. GCC seems to assume, quite reasonably, that if
I pass a value via “m” or “=m”, then I read or write *that value*.
So, if we use type hackery to produce an lvalue or rvalue that has the
address space stripped, then I would imagine I get UB — GCC will try
to understand what value I’m reading or writing, and this will only
match what I’m actually doing by luck.

It’s kind of like doing this (sorry for whitespace damage):

int read_int(int *ptr)
{
int ret; uintptr_t tmp;
asm (
"lea %[val], %[tmp]\n\t"
"mov 4(%[tmp]), %[ret]"
: [ret] "=r" (ret), [tmp] "+r" (tmp)
: [val] "m" (*(ptr - 1)));
return ret;
}

That code is obviously rather contrived, but I think it's
fundamentally the same type of hack as all these typeofs.  I haven't
tested precisely what GCC does, but I suspect we have:

int foo;
read_int(&foo);  // UB

int foo[2];
read_int(foo[1]);  // Maybe UB, but maybe non-UB that returns garbage

So I think a better constraint type would be an improvement.  Or maybe
a more general "pointer" constraint could be invented for this and
other use cases:

[name] "p" (ptr)

With this constraint, ptr must be uintptr_t or intptr_t.  %[name]
refers to ptr, formatted as a dereference operation.  So the generated
asm is identical to [name] "m" (*(char *)ptr), but the semantics are
different.  The problem is that I don't know how to specify the
semantics, but at least the instant UB of building and dereferencing a
garbage pointer would be avoided.

--Andy


Bugs in GCC 14.2 …/fixed-include/ files

2024-08-31 Thread Andy Miller via Gcc
Hello! 

After apparently easily installing GCC 14.2 into macOS 12.7.4, 
directly from https://gcc.gnu.org/gcc-14/ 
I find that it is unusable, because… 

The files in gcc14/…/14.2.0/include-fixed/ , which have been auto-edited by 
script fixincludes, contain the undefined (& deprecated 
<https://www.gnu.org/software/libtool/manual/html_node/C-header-files.html>) 
symbols: 
__BEGIN_DECLS and __END_DECLS !
Also also stdio.h fails to include the missing  _stdio.h which fails to include 
the missing  secure/_common.h  ! 

Shocking !!  ;-)

Thanks

Andy Miller





Re: Bugs in GCC 14.2 …/fixed-include/ files

2024-09-01 Thread Andy Miller via Gcc
Thanks for your prompt reply (on a weekend!), Ian.  
But I think we’re missing the point (see red text)… 

> On Aug 31, 2024, at 11:01 PM, Iain Sandoe  wrote:
> 
> Hello Andy,
> 
>> On 31 Aug 2024, at 23:14, Andy Miller via Gcc  wrote:
> 
>> After apparently easily installing GCC 14.2 into macOS 12.7.4, 
>> directly from https://gcc.gnu.org/gcc-14/ 
> 
> Please identify:
> 
> 0. Which macOS architecture you are installing on (Arm64 is not supported by 
> upstream yet, but there is a development branch)

MacBook Pro (Retina, 15-inch, Mid 2015)
2.8 GHz Quad-Core Intel Core i7


> 1.  the version of XCode command line tools (or xcode app) you are using to 
> build this.

Xcode-14.0.1

> 2.  the configuration line you used.  

Well, you’re not likely to find that helpful, as such “lines” are generated 
by a script or process in a very complex installation process for ATLAS 
<https://sourceforge.net/projects/math-atlas/>. 

Here’s a snippet of one of hundreds of such errors (after 3 small successes)… 

gcc -I/tmp/Atlas/src/..//CONFIG/include  -g -w -o xisgcc 
/tmp/Atlas/src/..//CONFIG/src/IsGcc.c atlconf_misc.o 
gcc -I/tmp/Atlas/src/..//CONFIG/include  -g -w -c 
/tmp/Atlas/src/..//CONFIG/src/probe_comp.c
gcc -I/tmp/Atlas/src/..//CONFIG/include  -g -w -o xprobe_comp probe_comp.o 
atlconf_misc.o 
rm -f config1.out
/Applications/Xcode.app/Contents/Developer/usr/bin/make atlas_run 
atldir=/tmp/Atlas/build exe=xprobe_comp redir=config1.out \
args="-v 0 -o atlconf.txt -O 12 -A 43 -Si nof77 0 -V 976   -b 
64 -d b /tmp/Atlas/build"
cd /tmp/Atlas/build ; ./xprobe_comp -v 0 -o atlconf.txt -O 12 -A 43 -Si nof77 0 
-V 976   -b 64 -d b /tmp/Atlas/build > config1.out
In file included from /tmp/Atlas/src/..//CONFIG/src/backend/comptestC.c:1:
/opt/local/lib/gcc14/gcc/x86_64-apple-darwin21/14.2.0/include-fixed/stdio.h:80:14:
 error: expected ';' before 'extern'
   80 | __BEGIN_DECLS
  |  ^
  |  ;
   81 | extern FILE *__stdinp;


> 
>> I find that it is unusable, because… 
>> 
>> The files in gcc14/…/14.2.0/include-fixed/ , which have been auto-edited by 
>> script fixincludes, contain the undefined (& deprecated 
>> <https://www.gnu.org/software/libtool/manual/html_node/C-header-files.html>) 
>> symbols: 
>> __BEGIN_DECLS and __END_DECLS !
> 
> We do not have any control over what the OS headers use - you can file a 
> Feedback with Apple to make queries / observations to them 
> (https://feedbackassistant.apple.com).

But you do have such control over the (very few) headers in  
gcc14/…/14.2.0/include-fixed/ ! 
(That was the whole point of my message — see above.) 
One such Gnu-modified <https://www.gnu.org/software/autogen/fixinc.html> header 
begins thusly… 

/*  DO NOT EDIT THIS FILE.

It has been auto-edited by fixincludes from:


"/Library/Developer/CommandLineTools/SDKs/MacOSX12.sdk/usr/include/stdio.h"

This had to be done to correct non-standard usages in the
original, manufacturer supplied header file.  */

#ifndef FIXINC_WRAP_STDIO_H_STDIO_STDARG_H
#define FIXINC_WRAP_STDIO_H_STDIO_STDARG_H 1
.
.

So I guess someone may have fed unfortunate choices into fixincludes.  
———

I installed gnu compilers instead of using clang because some 
ATLAS documentation warned that clang was very inefficient 
for this very compute-intensive system.  (Their installer even does 
detailed cpu timing tests to determine how to configure their codes 
to optimize performance!)  

But now I see a possibly more recent comment that modern clang 
is not so bad.  So how can I prevent any install process from finding 
and using gnu compilers (without deleting them from /opt/local)?  

I ask also because I have another different problem porting pgplot 
<https://sites.astro.caltech.edu/~tjp/pgplot/>.  
Is MacPorts mixing in clang stuff while using your gfortran? … 

:debug:build build phase started at Sat Aug 31 20:30:32 PDT 2024
:notice:build --->  Building pgplot
:debug:build Executing org.macports.build (pgplot)
|
|
:info:build make: Entering directory 
`/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_graphics_pgplot/pgplot/work/build'
:info:build /opt/local/bin/gfortran-mp-14 -c -Os -fno-backslash 
-fallow-argument-mismatch 
/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_graphics_pgplot/pgplot/work/pgplot/src/pgarro.f
:info:build clang (LLVM option parsing): Unknown command line argument 
'-x86-pad-for-align=false'.  Try: 'clang (LLVM option parsing) --help'
:info:build clang (LLVM option parsing): Did you mean '--x86-slh-loads=false'?
:info:build make: *** [pgarro.o] Error 1

Thanks!

Andy


&g

Revolutionize Your Software Sales with EngageBay 💻

2024-09-10 Thread Andy Roberts via Gcc
Is growth possible for a dollar a 
day?..
 Hi there! I'm Andy Roberts, and I've been closely analyzing the dynamic needs 
of the software industry. That's how I discovered your company, and I couldn't 
wait to reach out. EngageBay has emerged as a RevOps champ for software 
companies seeking a budget-friendly yet potent solution. It's a holistic CRM 
and marketing automation platform, crafted with the intricacies of your 
industry in mind. Why EngageBay? Simplify client management and streamline your 
software sales and marketing processes. Automate your outreach, ensuring your 
software solutions gain the visibility they deserve. Exceptional value starting 
at just 14.99 a month. I'm eager to hear your thoughts or even a simple 
indication of your interest level. Schedule an e-meet with me, or just sign up 
to play around with the tool! Regards, Andy Roberts EngageBay Inc Schedule a 
Call Don't want to get emails like this? Unsubscribe from our emails