Re: [x86-64 psABI]: Extend x86-64 psABI to support AVX-512

2013-07-25 Thread Ondřej Bílka
On Thu, Jul 25, 2013 at 05:06:55AM +0200, Jakub Jelinek wrote:
> On Wed, Jul 24, 2013 at 07:36:31PM +0200, Richard Biener wrote:
> > >Make them callee saved means we need to change ld.so to
> > >preserve them and we need to change unwind library to
> > >support them.  It is certainly doable.
> > 
> > IMHO it was a mistake to not have any callee saved xmm register in the
> > original abi - we should fix this at this opportunity.  Loops with
> > function calls are not that uncommon.
> 
> I've raised that earlier already.  One issue with that beyond having to
> teach unwinders about this (dynamic linker if you mean only for the lazy PLT
> resolving is only a matter of whether the dynamic linker itself has been
> built with a compiler that would clobber those registers anywhere) is that
> as history shows, the vector registers keep growing over time.
> So if we reserve now either 8 or all 16 zmm16 to zmm31 registers as call
> saved, do we save them as 512 bit registers, or say 1024 bit already?

We shouldn't save them all as we would often need to unnecessarily save
register in leaf function. I am fine with 8. In practice 4 should be
enough for most use cases. 

> If just 512 bit, then when next time the vector registers grow in size (will
> they?), would we have just low parts of the 1024 bits registers call saved
> and upper half call clobbered (I guess that is the case for M$Win 64-bit ABI
> now, just with 128 bit vs. more).
>
I do not think that 1024 bit registers will come in next ten years.
If they came tohn call clobbered is better. Full 1024 bits would be used
rarely; given that in most cases we will use them just to store 64bit
for doubles.
 
> But yeah, it would be nice to have some call saved ones.
> 
>   Jakub



Re: fatal error: gnu/stubs-32.h: No such file

2013-07-25 Thread Andrew Haley
On 07/24/2013 11:51 PM, David Starner wrote:
> On Wed, Jul 24, 2013 at 8:50 AM, Andrew Haley  wrote:
>> Not at all: we're just disagreeing about what a real system with
>> a real workload looks like.
> 
> No, we aren't. We're disagreeing about whether it's acceptable to
> enable a feature by default that breaks the compiler build half way
> through with an obscure error message.

No we aren't.  I want that error message fixed too.  A configure-
time warning would be good.

> Real systems need features that aren't enabled by default sometimes.

I *totally* agree.

>> It's a stupid thing to say anyway, because who is to say their
>> system is more real than mine or yours?
> 
> By that logic, you've already said that any system needing GNAT is
> less real then others, because it's not enabled by default.

Absolutely not: you're the one making claims about "real systems and
real workloads".  I made no such claims.

Andrew.


Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX

2013-07-25 Thread Ilya Enkovich
2013/7/25 Ian Lance Taylor :
> On Wed, Jul 24, 2013 at 4:36 PM, Roland McGrath  wrote:
>>
>> Will an MPX-using binary require an MPX-supporting dynamic linker to run
>> correctly?
>>
>> * An old dynamic linker won't clobber %bndN directly, so that's not a
>>   problem.
>
> These are my answers and likely incorrect.

Hi,

I want add some comments to your answers.

>
> It will clobber the registers indirectly, though, as soon as it
> executes a branching instruction.  The effect will be that calls from
> bnd-checked code to bnd-checked code through the dynamic linker will
> not succeed.

I would not say that call will fail. Some bound info will just be
lost. MPX binaries should still work correctly with old dynamic
linker. The problem here is that when you decrease level of MPX
support (use legacy dynamic linker, and legacy libraries) you decrease
a quality of bound violation detection. BTW if new PLT section is used
then table fixup after the first call will lead to correct bounds
transfer in subsequent calls.

>
> I have not yet seen the changes this will require to the ABI, but I'm
> making the natural assumptions: the first four pointer arguments to a
> function will be associated with a pair of bound registers, and
> similarly for a returned pointer.  I don't know what the proposal is
> for struct parameters and return values.

The general idea is to use bound registers for pointers passed in
registers. It does not matter if this pointer is a part of the
structure. BND0 is used to return bounds for returned pointer.

Of course, there are some more details (e.g. when more than 4 pointers
are passed in registers or when vararg call is made).

>
>
>> * Does having the bounds registers set have any effect on regular/legacy
>>   code, or only when bndc[lun] instructions are used?
>
> As far as I can tell, only when the bndXX instructions are used,
> though I'd be happy to hear otherwise.

As usually new registers affect context save/restore instructions.

>
>
>>   If it doesn't affect normal instructions, then I don't entirely
>>   understand why it would matter to clear %bnd* when entering or leaving
>>   legacy code.  Is it solely for the case of legacy code returning a
>>   pointer value, so that the new code would expect the new ABI wherein
>>   %bnd0 has been set to correspond to the pointer returned in %rax?
>
> There is no problem with clearing the bnd registers when calling in or
> out of legacy code.  The issue is avoiding clearing the pointers when
> calling from bnd-enabled code to bnd-enabled code.

When legacy code returns a pointer we need to clear at least BND0 to
avoid wrong bounds for returned pointer.
We also may have a calls sequence mpx code -> legacy code -> mpx code.
In such case we have to clear all bound register before calling mpx
code from legacy code. Otherwise nested mpx code gets wrong bounds.

Thanks,
Ilya

>
>
>> * What's the effect of entering the dynamic linker via "bnd jmp"
>>   (i.e. new MPX-using binary with new PLT, old dynamic linker)?  The old
>>   dynamic linker will leave %bndN et al exactly as they are, until its
>>   first unadorned branching instruction implicitly clears them.  So the
>>   only problem would be if the work _dl_runtime_{resolve,profile} does
>>   before its first branch/call were affected by the %bndN state.
>
> "It's not a problem."
>
>> In a related vein, what's the effect of entering some legacy code via
>> "bnd jmp" (i.e. new binary using PLT call into legacy DSO)?
>>
>> * If the state of %bndN et al does not affect legacy code directly, then
>>   it's not a problem.  The legacy code will eventually use an unadorned
>>   branch instruction, and that will implicitly clear %bnd*.  (Even if
>>   it's a leaf function that's entirely branch-free, its return will
>>   count as such an unadorned branch instruction.)
>
> Yes.
>
>> * If that's not the case, 
>
> It is the case.
>
>> I can't tell if you are proposing that a single object might contain
>> both 16-byte and 32-byte PLT slots next to each other in the same .plt
>> section.  That seems like a bad idea.  I can think of two things off
>> hand that expect PLT entries to be of uniform size, and there may well
>> be more.
>>
>> * The foo@plt pseudo-symbols that e.g. objdump will display are based on
>>   the BFD backend knowing the size of PLT entries.  Arguably this ought
>>   to look at sh_entsize of .plt instead of using baked-in knowledge, but
>>   it doesn't.
>
> This seems fixable.  Of course, we could also keep the PLT the same
> length by changing it.  The current PLT entries are
>
> jmpq *GOT(sym)
> pushq offset
> jmpq plt0
>
> The linker or dynamic linker initializes *GOT(sym) to point to the
> second instruction in this sequence.  So we can keep the PLT at 16
> bytes by simply changing it to jump somewhere else.
>
> bnd jmpq *GOT(sym)
> .skip 9
>
> We have the linker or dynamic linker fill in *GOT(sym) to point to the
> second PLT table.  When the dynamic linker i

Re: Intel® Memory Protection Extensions support in the GCC

2013-07-25 Thread Florian Weimer

On 07/24/2013 05:58 PM, Zamyatin, Igor wrote:

Hi All!

This is to let you know that enabling of Intel® MPX technology (see details in 
http://download-software.intel.com/sites/default/files/319433-015.pdf) in GCC 
has been started. (Corresponding changes in binutils are here - 
http://sourceware.org/ml/binutils/2013-07/msg00233.html)


Thanks, this is interesting.

Can userspace update the translation tables for bounds?  Are the bounds 
stored in Bound Table Entries relative to the starting linear address of 
pointer (LAp) or absolute?  The former would allow sharing bound table 
pages for different pages having memory objects of the same size (which 
happens with some malloc implementations).



--
Florian Weimer / Red Hat Product Security Team


Re: [x86-64 psABI]: Extend x86-64 psABI to support AVX-512

2013-07-25 Thread Janne Blomqvist
On Wed, Jul 24, 2013 at 9:52 PM, Ondřej Bílka  wrote:
> On Wed, Jul 24, 2013 at 08:25:14AM -1000, Richard Henderson wrote:
>> On 07/24/2013 05:23 AM, Richard Biener wrote:
>> > "H.J. Lu"  wrote:
>> >
>> >> Hi,
>> >>
>> >> Here is a patch to extend x86-64 psABI to support AVX-512:
>> >
>> > Afaik avx 512 doubles the amount of xmm registers. Can we get them callee 
>> > saved please?
>>
>> Having them callee saved pre-supposes that one knows the width of the 
>> register.
>>
>> There's room in the instruction set for avx1024.  Does anyone believe that is
>> not going to appear in the next few years?
>>
> It would be mistake for intel to focus on avx1024. You hit diminishing
> returns and only few workloads would utilize loading 128 bytes at once.
> Problem with vectorization is that it becomes memory bound so you will
> not got much because performance is dominated by cache throughput.
>
> You would get bigger speedup from more effective pipelining, more
> fusion...

ISTR that one of the main reason "long" vector ISA's did so well on
some workloads was not that the vector length was big, per se, but
rather that the scatter/gather instructions these ISA's typically have
allowed them to extract much more parallelism from the memory
subsystem. The typical example being sparse matrix style problems, but
I suppose other types of problems with indirect accesses could benefit
as well. Deeper OoO buffers would in principle allow the same memory
level parallelism extraction, but those apparently have quite steep
power and silicon area cost scaling (O(n**2) or maybe even O(n**3)),
making really deep buffers impractical.

And, IIRC scatter/gather instructions are featured as of some
recent-ish AVX-something version. That being said, maybe current
cache-based memory subsystems are different enough from the vector
supercomputers of yore that the above doesn't hold to the same extent
anymore..


--
Janne Blomqvist


Re: [x86-64 psABI]: Extend x86-64 psABI to support AVX-512

2013-07-25 Thread Ondřej Bílka
On Thu, Jul 25, 2013 at 03:17:43PM +0300, Janne Blomqvist wrote:
> On Wed, Jul 24, 2013 at 9:52 PM, Ondřej Bílka  wrote:
> > On Wed, Jul 24, 2013 at 08:25:14AM -1000, Richard Henderson wrote:
> >> On 07/24/2013 05:23 AM, Richard Biener wrote:
> >> > "H.J. Lu"  wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> Here is a patch to extend x86-64 psABI to support AVX-512:
> >> >
> >> > Afaik avx 512 doubles the amount of xmm registers. Can we get them 
> >> > callee saved please?
> >>
> >> Having them callee saved pre-supposes that one knows the width of the 
> >> register.
> >>
> >> There's room in the instruction set for avx1024.  Does anyone believe that 
> >> is
> >> not going to appear in the next few years?
> >>
> > It would be mistake for intel to focus on avx1024. You hit diminishing
> > returns and only few workloads would utilize loading 128 bytes at once.
> > Problem with vectorization is that it becomes memory bound so you will
> > not got much because performance is dominated by cache throughput.
> >
> > You would get bigger speedup from more effective pipelining, more
> > fusion...
> 
> ISTR that one of the main reason "long" vector ISA's did so well on
> some workloads was not that the vector length was big, per se, but
> rather that the scatter/gather instructions these ISA's typically have
> allowed them to extract much more parallelism from the memory
> subsystem. The typical example being sparse matrix style problems, but
> I suppose other types of problems with indirect accesses could benefit
> as well. Deeper OoO buffers would in principle allow the same memory
> level parallelism extraction, but those apparently have quite steep
> power and silicon area cost scaling (O(n**2) or maybe even O(n**3)),
> making really deep buffers impractical.
> 
> And, IIRC scatter/gather instructions are featured as of some
> recent-ish AVX-something version. That being said, maybe current
> cache-based memory subsystems are different enough from the vector
> supercomputers of yore that the above doesn't hold to the same extent
> anymore..
>
Also this depends how many details intel got right. One example is
pmovmsk instruction. It is trivial to implement in silicon and gives
advantage over other architectures.

When a problem is 'find elements in array that satisfy some expression'
then without pmovmsk or equivalent finding what changed is relatively expensive.

One problem is that depending on profile you may spend majority of time
for small sizes. So you need to have effective branches for these sizes
(gcc does not handle that well yet). Then you get problem that it
increases icache pressure.

Then another problem is that you often could benefit from vector
instructions if you could read/write more memory. Reading can be done
inexpensively by checking if it crosses page, writing data is problem
and so we do a suboptimal path just to write only data that changed.

This could also be solved technologically if a masked move instruction 
could encode only to memory accesses that changed and thus avoid
possible race conditions in unchanged parts.
> 
> --
> Janne Blomqvist



Re: Intel(R) Memory Protection Extensions support in the GCC

2013-07-25 Thread Ilya Enkovich
2013/7/25 Florian Weimer :
> On 07/24/2013 05:58 PM, Zamyatin, Igor wrote:
>>
>> Hi All!
>>
>> This is to let you know that enabling of IntelŽ MPX technology (see
>> details in
>> http://download-software.intel.com/sites/default/files/319433-015.pdf) in
>> GCC has been started. (Corresponding changes in binutils are here -
>> http://sourceware.org/ml/binutils/2013-07/msg00233.html)
>
>
> Thanks, this is interesting.
>
> Can userspace update the translation tables for bounds?  Are the bounds
> stored in Bound Table Entries relative to the starting linear address of
> pointer (LAp) or absolute?  The former would allow sharing bound table pages
> for different pages having memory objects of the same size (which happens
> with some malloc implementations).

Hi Florian,

Do you mean 'Bounds Directory' when say 'translation tables'? If yes,
then you should be able to access it by getting its address from
BNDCFGU register.
It is not clear how Bound Tables may be shared. Bound Tables are used
to hold bounds for pointers stored in memory, not for objects
allocated in memory.

Thanks,
Ilya

>
>
> --
> Florian Weimer / Red Hat Product Security Team


RE: [x86-64 psABI]: Extend x86-64 psABI to support AVX-512

2013-07-25 Thread Gopalasubramanian, Ganesh
Hi,

This got lost in our site-consolidation efforts.
We are working to make it active again.
Will update the community soon.

Regards
Ganesh

From: Joseph Myers [jos...@codesourcery.com]
Sent: Tuesday, July 23, 2013 2:57 PM
To: H.J. Lu
Cc: GNU C Library; GCC Development; Binutils; Girkar, Milind; Kreitzer, David 
L; Gopalasubramanian, Ganesh
Subject: Re: [x86-64 psABI]: Extend x86-64 psABI to support AVX-512

On Tue, 23 Jul 2013, H.J. Lu wrote:

> Here is a patch to extend x86-64 psABI to support AVX-512:

I have no comments on this patch for now - but where is the version
control repository we should use for the ABI source code, since x86-64.org
has been down for some time?

(I've also CC:ed the last person from AMD to post to gcc-patches, in the
hope that they have the right contacts to get x86-64.org - website,
mailing lists, version control - brought back up again.)

--
Joseph S. Myers
jos...@codesourcery.com




Re: Intel(R) Memory Protection Extensions support in the GCC

2013-07-25 Thread Florian Weimer

On 07/25/2013 03:50 PM, Ilya Enkovich wrote:


Do you mean 'Bounds Directory' when say 'translation tables'? If yes,
then you should be able to access it by getting its address from
BNDCFGU register.


Good to know.


It is not clear how Bound Tables may be shared. Bound Tables are used
to hold bounds for pointers stored in memory, not for objects
allocated in memory.


Oh.  I think I misread the specification then.  Obviously, this supports 
more precise checking, covering pointer provenience and intra-object 
overflow checks.  I'm worried that this adds quite a bit of memory 
overhead, but I guess I'll have to wait and see.


--
Florian Weimer / Red Hat Product Security Team


Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX

2013-07-25 Thread H.J. Lu
On Thu, Jul 25, 2013 at 4:08 AM, Ilya Enkovich  wrote:
> 2013/7/25 Ian Lance Taylor :
>> On Wed, Jul 24, 2013 at 4:36 PM, Roland McGrath  wrote:
>>>
>>> Will an MPX-using binary require an MPX-supporting dynamic linker to run
>>> correctly?
>>>
>>> * An old dynamic linker won't clobber %bndN directly, so that's not a
>>>   problem.
>>
>> These are my answers and likely incorrect.
>
> Hi,
>
> I want add some comments to your answers.
>
>>
>> It will clobber the registers indirectly, though, as soon as it
>> executes a branching instruction.  The effect will be that calls from
>> bnd-checked code to bnd-checked code through the dynamic linker will
>> not succeed.
>
> I would not say that call will fail. Some bound info will just be
> lost. MPX binaries should still work correctly with old dynamic
> linker. The problem here is that when you decrease level of MPX
> support (use legacy dynamic linker, and legacy libraries) you decrease
> a quality of bound violation detection. BTW if new PLT section is used
> then table fixup after the first call will lead to correct bounds
> transfer in subsequent calls.

To make it clear, the sequence is

MPX code -> PLT -> ld.so -> PLT -> MPX library

If ld.so doesn't preserve bound registers, bound registers
will be cleared, which means the lower bound is 0 and
upper bound is -1 (MAX), when MPX library is reached.
The MPX library will work correctly, but without MPX
protections on pointers passed in registers.


--
H.J.


Re: [x86-64 psABI]: Extend x86-64 psABI to support AVX-512

2013-07-25 Thread Rich Felker
On Thu, Jul 25, 2013 at 08:55:38AM +0200, Ondřej Bílka wrote:
> On Thu, Jul 25, 2013 at 05:06:55AM +0200, Jakub Jelinek wrote:
> > On Wed, Jul 24, 2013 at 07:36:31PM +0200, Richard Biener wrote:
> > > >Make them callee saved means we need to change ld.so to
> > > >preserve them and we need to change unwind library to
> > > >support them.  It is certainly doable.
> > > 
> > > IMHO it was a mistake to not have any callee saved xmm register in the
> > > original abi - we should fix this at this opportunity.  Loops with
> > > function calls are not that uncommon.
> > 
> > I've raised that earlier already.  One issue with that beyond having to
> > teach unwinders about this (dynamic linker if you mean only for the lazy PLT
> > resolving is only a matter of whether the dynamic linker itself has been
> > built with a compiler that would clobber those registers anywhere) is that
> > as history shows, the vector registers keep growing over time.
> > So if we reserve now either 8 or all 16 zmm16 to zmm31 registers as call
> > saved, do we save them as 512 bit registers, or say 1024 bit already?
> 
> We shouldn't save them all as we would often need to unnecessarily save
> register in leaf function. I am fine with 8. In practice 4 should be
> enough for most use cases. 

You can't add call-saved registers without breaking the ABI, because
they need to be saved in the jmp_buf, which does not have space for
them.

Also, unless you add them at the same time the registers are added to
the machine (so there's no existing code using those registers),
you'll have ABI problems like this: function using the new call-saved
registers calls qsort, which calls application code, which assumes the
registers are call-clobbered and clobbers them; after qsort returns,
the original caller's state is gone.

Adding call-saved registers to an existing psABI is just fundamentally
misguided.

Rich


Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX

2013-07-25 Thread H.J. Lu
On Wed, Jul 24, 2013 at 4:36 PM, Roland McGrath  wrote:
> I've read through the MPX spec once, but most of it is still not very
> clear to me.  So please correct any misconceptions.  (HJ, if you answer
> any or all of these questions in your usual style with just, "It's not a
> problem," I will find you and I will kill you.  Explain!)
>
> Will an MPX-using binary require an MPX-supporting dynamic linker to run
> correctly?

Yes.  But you may lose MPX protection in MPX library since bound registers
are cleared in the first call with lazy bounding:

MPX code -> PLT -> ld.so -> PLT -> MPX library

>
> Those are the background questions to help me understand better.
> Now, to your specific questions.
>
> Now, assuming we are talking about a uniform PLT in each object, there
> is the question of whether to use a new PLT layout everywhere, or only
> when linking an object with some input files that use MPX.

I am proposing the uniform PLT in each object.  That was my first
question.

> * My initial reaction was to say that we should just change it
>   unconditionally to keep things simple: use new linker, get new format,
>   end of story.  Simplicity is good.

This is my thinking also.

> * But, doubling the size of PLT entries means more i-cache pressure.  If
>   cache lines are 64 bytes, then today you fit four entries into a cache
>   line.  Assuming PLT entries are more used than unused, this is a good
>   thing.  Reducing that to two entries per cache line means twice as
>   many i-cache misses if you hit a given PLT frequently (with even
>   distribution of which entries you actually use--at any rate, it's
>   "more" even if it's not "twice as many").  Perhaps this is enough cost
>   in real-world situations to be worried about.  I really don't know.
>
> * As I mentioned before, there are things floating around that think
>   they know the size of PLT entries.  Realistically, there will be
>   plenty of people using new tools to build binaries but not using MPX
>   at all, and these people will give those binaries to people who have
>   old tools.  In the case of someone running an old objdump on a new
>   binary, they would see bogus foo@plt pseudo-symbols and be misled and
>   confused.  Not to mention the unknown unknowns, i.e. other things that
>   "know" the size of PLT entries that we don't know about or haven't
>   thought of here.  It's just basic conservatism not to perturb things
>   for these people who don't care about or need anything related to MPX
>   at all.

We can investigate if the old objdump can deal with PLT entry size
change.

> How a relocatable object is marked so that the linker knows whether its
> code is MPX-compatible at link time and how a DSO/executable is marked
> so that the dynamic linker knows at runtime are two separate subjects.
>
> For relocatable objects, I don't think there is really any precedent for
> using ELF notes to tell the linker things.  It seems much nicer if the

We have been using .note.GNU-stack section at link-time for a long time.

> linker continues to treat notes completely normally, i.e. appending
> input files' same-named note sections together like with any other named
> section rather than magically recognizing and swallowing certain notes.
> OTOH, the SHT_GNU_ATTRIBUTES mechanism exists for exactly this sort of
> purpose and is used on other machines for very similar sorts of issues.
> There is both precedent and existing code in binutils to have the linker
> merge attribute sections from many input files together in a fashion
> aware of the semantics of those sections, and to have those attributes
> affect the linker's behavior in machine-specific ways.  I think you have
> to make a very strong case to use anything other than SHT_GNU_ATTRIBUTES
> for this sort of purpose in relocatable objects.
>
> For linked objects, there a couple of obvious choices.  They all require
> that the linker have special knowledge to create the markings.  One
> option is a note.  We use .note.ABI-tag for a similar purpose in libc,
> but I don't know of any precedent for the linker synthesizing notes.
> The most obvious choice is e_flags bits.  That's what other machines use
> to mark ABI variants.  There are no bits assigned for x86 yet.  There
> are obvious limitations to using e_flags, in that it's part of the
> universal ELF psABI rather than something with vendor extensibility
> built in like notes have, and in that there are only 32 bits available
> to assign rather than being a wholly open-ended format like notes.  But
> using e_flags is certainly simpler to synthesize in the linker and
> simpler to recognize in the dynamic linker than a note format.  I think
> you have to make at least a reasonable (objective) case to use a note
> rather than e_flags, though I'm certainly not firmly against a note.

My main concerns are e_flags isn't very extensible and
the old tools may not be able to handle it properly.  A note
section is backward compatible. Given that MP

Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX

2013-07-25 Thread H.J. Lu
On Wed, Jul 24, 2013 at 5:23 PM, Ian Lance Taylor  wrote:
>> * The foo@plt pseudo-symbols that e.g. objdump will display are based on
>>   the BFD backend knowing the size of PLT entries.  Arguably this ought
>>   to look at sh_entsize of .plt instead of using baked-in knowledge, but
>>   it doesn't.
>
> This seems fixable.  Of course, we could also keep the PLT the same
> length by changing it.  The current PLT entries are
>
> jmpq *GOT(sym)
> pushq offset
> jmpq plt0
>
> The linker or dynamic linker initializes *GOT(sym) to point to the
> second instruction in this sequence.  So we can keep the PLT at 16
> bytes by simply changing it to jump somewhere else.
>
> bnd jmpq *GOT(sym)
> .skip 9
>
> We have the linker or dynamic linker fill in *GOT(sym) to point to the
> second PLT table.  When the dynamic linker is involved, we use another
> DT tag to point to the second PLT.  The offsets are consistent: there
> is one entry in each PLT table, so the dynamic linker can compute the
> right value.  Then in the second PLT we have the sequence
>
> pushq offset
> bnd jmpq plt0
>
> That gives the dynamic linker the offset that it needs to update
> *GOT(sym) to point to the runtime symbol value.  So we get slightly
> worse instruction cache handling the first time a function is called,
> but after that we are the same as before.  And PLT entries are the
> same size as always so everything is simpler.
>
> The special DT tag will tell the dynamic linker to apply the special
> processing.  No attribute is needed to change behaviour.  The issue
> then is: a program linked in this way will not work with an old
> dynamic linker, because the old dynamic linker will not initialize
> GOT(sym) to the right value.  That is a problem for any scheme, so I
> think that is OK.  But if that is a concern, we could actually handle
> by generating two PLTs.  One conventional PLT, and another as I just
> outlined.  The linker branches to the new PLT, and initializes
> GOT(sym) to point to the old PLT.  The dynamic linker spots this
> because it recognizes the new DT tags, and cunningly rewrites the GOT
> to point to the new PLT.  Cost is an extra jump the first time a
> function is called when using the old dynamic linker.
>

I don't like the complexity.  I believe extending PLT entry to
32 byte works with the old ld.so.  If we are willing to have
mixed PLT entry, we merge 2 16-byte PLT entries into one
super 32-byte PLT entry so that we can have

jmpq   *name@GOTPCREL(%rip)
pushq  $index
jmpq   PLT0
bnd jmpq   *name@GOTPCREL(%rip)
pushq  $index
bnd jmpq   PLT0
nop paddings
jmpq   *name@GOTPCREL(%rip)
pushq  $index
jmpq   PLT0

We can also have new link-time relocations for branches with BND
prefix and only create the super PLT entries when needed.  Of course,.
unwind info may be incorrect for both approach if we don't find a way
to fix it.

--
H.J.


gcc-4.8-20130725 is now available

2013-07-25 Thread gccadmin
Snapshot gcc-4.8-20130725 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.8-20130725/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.8 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_8-branch 
revision 201255

You'll find:

 gcc-4.8-20130725.tar.bz2 Complete GCC

  MD5=e21f259bc4c44e61e19a780ad5badfeb
  SHA1=d6f611012ae432b0a7c4c1ab6472d854ed2ba5cc

Diffs from 4.8-20130718 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.8
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


INCOMING_RETURN_ADDR_RTX

2013-07-25 Thread Hendrik Greving
I am getting a crash with my backend when running arbitrary code with
-g. Apparently this is because the compiler aborts at dwarf2cfi.c:2852
(GCC 4.8.1-release, because

initial_return_save (INCOMING_RETURN_ADDR_RTX);

INCOMING_RETURN_ADDR_RTX is undefined.

The documentation states "You only need to define this macro if you
want to support call frame debugging information like that provided by
DWARF 2.".

We can't support frame debugging right now (at least I think we
can't), I need to investigate that. In any case the documentation
sounds more like that you don't need to define this macro for your
target. In order to disable this feature, do I also need to disable
some frame unwind info macros?

Thanks,
Regards,
Hendrik Greving


Re: INCOMING_RETURN_ADDR_RTX

2013-07-25 Thread Hendrik Greving
I am reaching this code like this:

(gdb) p targetm.debug_unwind_info ()
$1 = UI_DWARF2
(gdb) p targetm_common.except_unwind_info (&global_options)
$2 = UI_SJLJ


On Thu, Jul 25, 2013 at 3:57 PM, Hendrik Greving
 wrote:
> I am getting a crash with my backend when running arbitrary code with
> -g. Apparently this is because the compiler aborts at dwarf2cfi.c:2852
> (GCC 4.8.1-release, because
>
> initial_return_save (INCOMING_RETURN_ADDR_RTX);
>
> INCOMING_RETURN_ADDR_RTX is undefined.
>
> The documentation states "You only need to define this macro if you
> want to support call frame debugging information like that provided by
> DWARF 2.".
>
> We can't support frame debugging right now (at least I think we
> can't), I need to investigate that. In any case the documentation
> sounds more like that you don't need to define this macro for your
> target. In order to disable this feature, do I also need to disable
> some frame unwind info macros?
>
> Thanks,
> Regards,
> Hendrik Greving


Re: INCOMING_RETURN_ADDR_RTX

2013-07-25 Thread Hendrik Greving
I found this email thread

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48459

It sounds like I should define TARGET_DEBUG_UNWIND_INFO and return
UI_NONE for now?

On Thu, Jul 25, 2013 at 3:57 PM, Hendrik Greving
 wrote:
> I am getting a crash with my backend when running arbitrary code with
> -g. Apparently this is because the compiler aborts at dwarf2cfi.c:2852
> (GCC 4.8.1-release, because
>
> initial_return_save (INCOMING_RETURN_ADDR_RTX);
>
> INCOMING_RETURN_ADDR_RTX is undefined.
>
> The documentation states "You only need to define this macro if you
> want to support call frame debugging information like that provided by
> DWARF 2.".
>
> We can't support frame debugging right now (at least I think we
> can't), I need to investigate that. In any case the documentation
> sounds more like that you don't need to define this macro for your
> target. In order to disable this feature, do I also need to disable
> some frame unwind info macros?
>
> Thanks,
> Regards,
> Hendrik Greving


Re: fatal error: gnu/stubs-32.h: No such file

2013-07-25 Thread David Starner
On Thu, Jul 25, 2013 at 1:17 AM, Andrew Haley  wrote:
> On 07/24/2013 11:51 PM, David Starner wrote:
>> On Wed, Jul 24, 2013 at 8:50 AM, Andrew Haley  wrote:
>>> Not at all: we're just disagreeing about what a real system with
>>> a real workload looks like.
>>
>> No, we aren't. We're disagreeing about whether it's acceptable to
>> enable a feature by default that breaks the compiler build half way
>> through with an obscure error message.
>
> No we aren't.  I want that error message fixed too.  A configure-
> time warning would be good.

The obscurity of the error message is only part of the problem; the
fact that it errors out halfway through a multi-hour build is also an
issue. The question is if it can't detect a compile time that this
will fail, should GCC disable multilibs?

-- 
Kie ekzistas vivo, ekzistas espero.


DejaGnu and toolchain testing

2013-07-25 Thread Joseph S. Myers
I was interested to watch the video of the DejaGnu BOF at the Cauldron.  A 
few issues with DejaGnu for toolchain testing that I've noted but I don't 
think were covered there include:

* DejaGnu has a lot of hardcoded logic to try to find various files in a 
toolchain build directory.  A lot of it is actually for very old toolchain 
versions (using GCC version 2 or older, for example).  The first issue 
with this is that it doesn't belong in DejaGnu: the toolchain should be 
free to rearrange its build directories without needing changes to DejaGnu 
itself (which in practice means there's lots of such logic in the 
toolchain's own testsuites *as well*, duplicating the DejaGnu code to a 
greater or lesser extent).  The second issue is that "make install" 
already knows where to find files in the build directory, and it would be 
better to move towards build-tree testing installing the toolchain in a 
staging directory and running tools from there, rather than needing any 
logic in the testsuites at all to enable bits of uninstalled tools to find 
other bits of uninstalled tools.  (There might still be a few bits like 
setting LD_LIBRARY_PATH required.  But the compiler command lines would be 
much simpler and much closer to how users actually use the compiler in 
practice.)

* Similarly, DejaGnu has hardcoded prune_warnings - and again GCC adds 
lots of its own prunes; it's not clear hardcoding this in DejaGnu is a 
particularly good idea either.

* Another piece of unfortunate hardcoding in DejaGnu is how remote-host 
testing uses "-o a.out" when running tools on the remote host - such a 
difference from how they are run on a local host results in lots of issue 
where a tool cares about the output file name in some way (e.g. to 
generate other output files).

* A key feature of QMTest that I like but I don't think got mentioned is 
that you can *statically enumerate the set of tests* without running them.  
That is, a testsuite has a well-defined set of tests, and that set does 
not depend on what the results of the tests are - whereas it's very easy 
and common for a DejaGnu test to have test names (the text after PASS: or 
FAIL: ) depending on whether the test passed or failed, or how the test 
passed or failed (no doubt the testsuite authors had reasons for doing 
this, but it conflicts with any automatic comparison of results).  The 
QMTest model isn't wonderfully well-matched to toolchain testing - in 
toolchain testing, you can typically do a single indivisible test 
execution (e.g. compiling a file), which produces results for a large 
number of test assertions (tests for warnings on particular lines of that 
file), and QMTest expects one indivisible test execution to produce one 
result.  But a model where a test can contain multiple assertions, and 
both tests and their assertions can be statically enumerated independent 
of their result, and where the results can be annotated by the testsuite 
(to deal with the purposes for which testsuites stick extra text on the 
PASS/FAIL line) certainly seems better than one that makes it likely the 
set of test assertions will vary in unpredictable ways.

* People in the BOF seemed happy with expect.  I think expect has caused 
quite a few problems for toolchain testing.  In particular, there are or 
have been too many places where expect likes to throw away input whose 
size exceeds some arbitrary limit and you need to hack around those by 
increasing the limits in some way.  GCC tests can generate and test for 
very large numbers of diagnostics from a single test, and some binutils 
tests can generate megabytes of output from a tool (that are then matched 
against regular expressions etc.).

-- 
Joseph S. Myers
jos...@codesourcery.com