Re: Could we start accepting rich-text postings on the gcc lists?

2012-11-26 Thread Martin Jambor
Hi,

On Fri, Nov 23, 2012 at 12:12:17PM -0800, Andrew Pinski wrote:
> On Fri, Nov 23, 2012 at 11:53 AM, Diego Novillo  wrote:
> > In this day and age of rich-text capable mailers, restricting postings
> > to be text-only seems quaint and antiquated.  Are there any hard
> > requirements that force us to only accept plain text messages?
> 
> I think it is a bad idea to accept non plain text messages (except for
> attachments).
> 

Just for the record, I'm also against rich text reaching gcc mailing
lists.  I use mutt for work email and consider such messages
irritating in one or (usually) more ways whenever I get them.

Martin


The Linux binutils 2.23.51.0.6 is released

2012-11-26 Thread H.J. Lu
This is the beta release of binutils 2.23.51.0.6 for Linux, which is
based on binutils 2012 1123 in CVS on sourceware.org plus various
changes. It is purely for Linux.

All relevant patches in patches have been applied to the source tree.
You can take a look at patches/README to see what have been applied and
in what order they have been applied.

Starting from the 2.21.51.0.3 release, you must remove .ctors/.dtors
section sentinels when building glibc or other C run-time libraries.
Otherwise, you will run into:

http://sourceware.org/bugzilla/show_bug.cgi?id=12343

Starting from the 2.21.51.0.2 release, BFD linker has the working LTO
plugin support. It can be used with GCC 4.5 and above. For GCC 4.5, you
need to configure GCC with --enable-gold to enable LTO plugin support.

Starting from the 2.21.51.0.2 release, binutils fully supports compressed
debug sections.  However, compressed debug section isn't turned on by
default in assembler. I am planning to turn it on for x86 assembler in
the future release, which may lead to the Linux kernel bug messages like

WARNING: lib/ts_kmp.o (.zdebug_aranges): unexpected non-allocatable section.

But the resulting kernel works fine.

Starting from the 2.20.51.0.4 release, no diffs against the previous
release will be provided.

You can enable both gold and bfd ld with --enable-gold=both.  Gold will
be installed as ld.gold and bfd ld will be installed as ld.bfd.  By
default, ld.bfd will be installed as ld.  You can use the configure
option, --enable-gold=both/gold to choose gold as the default linker,
ld.  IA-32 binary and X64_64 binary tar balls are configured with
--enable-gold=both/ld --enable-plugins --enable-threads.

Starting from the 2.18.50.0.4 release, the x86 assembler no longer
accepts

fnstsw %eax

fnstsw stores 16bit into %ax and the upper 16bit of %eax is unchanged.
Please use

fnstsw %ax

Starting from the 2.17.50.0.4 release, the default output section LMA
(load memory address) has changed for allocatable sections from being
equal to VMA (virtual memory address), to keeping the difference between
LMA and VMA the same as the previous output section in the same region.

For

.data.init_task : { *(.data.init_task) }

LMA of .data.init_task section is equal to its VMA with the old linker.
With the new linker, it depends on the previous output section. You
can use

.data.init_task : AT (ADDR(.data.init_task)) { *(.data.init_task) }

to ensure that LMA of .data.init_task section is always equal to its
VMA. The linker script in the older 2.6 x86-64 kernel depends on the
old behavior.  You can add AT (ADDR(section)) to force LMA of
.data.init_task section equal to its VMA. It will work with both old
and new linkers. The x86-64 kernel linker script in kernel 2.6.13 and
above is OK.

The new x86_64 assembler no longer accepts

monitor %eax,%ecx,%edx

You should use

monitor %rax,%ecx,%edx

or
monitor

which works with both old and new x86_64 assemblers. They should
generate the same opcode.

The new i386/x86_64 assemblers no longer accept instructions for moving
between a segment register and a 32bit memory location, i.e.,

movl (%eax),%ds
movl %ds,(%eax)

To generate instructions for moving between a segment register and a
16bit memory location without the 16bit operand size prefix, 0x66,

mov (%eax),%ds
mov %ds,(%eax)

should be used. It will work with both new and old assemblers. The
assembler starting from 2.16.90.0.1 will also support

movw (%eax),%ds
movw %ds,(%eax)

without the 0x66 prefix. Patches for 2.4 and 2.6 Linux kernels are
available at

http://www.kernel.org/pub/linux/devel/binutils/linux-2.4-seg-4.patch
http://www.kernel.org/pub/linux/devel/binutils/linux-2.6-seg-5.patch

The ia64 assembler is now defaulted to tune for Itanium 2 processors.
To build a kernel for Itanium 1 processors, you will need to add

ifeq ($(CONFIG_ITANIUM),y)
CFLAGS += -Wa,-mtune=itanium1
AFLAGS += -Wa,-mtune=itanium1
endif

to arch/ia64/Makefile in your kernel source tree.

Please report any bugs related to binutils 2.23.51.0.6 to
hjl.to...@gmail.com

and

http://www.sourceware.org/bugzilla/

Changes from binutils 2.23.51.0.5:

1. Update from binutils 2012 1123.
2. Fix 64-bit jecxz encoding regression in x86 assembler.  PR 14859.
3. Revert an accidental linker change.  PR 14862.
4. Fix x32 TLS LD to LE optimization in gold.  PR 14858.
5. Add "-z global" option to set DF_1_GLOBAL to ld.
6. Improve ld plugin error handling.
7. Port ld lib32 arrangement from Debian.
8. Properly set the output maxpagesize when rewriting program header.
PR 14493.
9. Add additional DF_1_XXX support to readelf.
10. Improve nacl support with separate code segments.
11. Improve macos support.
12. Improve arm support.
13. Improve microblaze support.
14. Improve mips support.
15. Improve ppc support.
16. Improve sparc support.

Changes from binutils 2.23.51.0.4:

1. Update from binutils 2012 1110.

Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

2012-11-26 Thread Richard Biener
On Mon, Nov 5, 2012 at 2:59 PM, Kenneth Zadeck  wrote:
>
> On 11/04/2012 11:54 AM, Richard Biener wrote:
>>
>> On Thu, Nov 1, 2012 at 2:10 PM, Richard Sandiford
>>  wrote:
>>>
>>> Kenneth Zadeck  writes:

 I would like you to respond to at least point 1 of this email.   In it
 there is code from the rtl level that was written twice, once for the
 case when the size of the mode is less than the size of a HWI and once
 for the case where the size of the mode is less that 2 HWIs.

 my patch changes this to one instance of the code that works no matter
 how large the data passed to it is.

 you have made a specific requirement for wide int to be a template that
 can be instantiated in several sizes, one for 1 HWI, one for 2 HWI.   I
 would like to know how this particular fragment is to be rewritten in
 this model?   It seems that I would have to retain the structure where
 there is one version of the code for each size that the template is
 instantiated.
>>>
>>> I think richi's argument was that wide_int should be split into two.
>>> There should be a "bare-metal" class that just has a length and HWIs,
>>> and the main wide_int class should be an extension on top of that
>>> that does things to a bit precision instead.  Presumably with some
>>> template magic so that the length (number of HWIs) is a constant for:
>>>
>>>typedef foo<2> double_int;
>>>
>>> and a variable for wide_int (because in wide_int the length would be
>>> the number of significant HWIs rather than the size of the underlying
>>> array).  wide_int would also record the precision and apply it after
>>> the full HWI operation.
>>>
>>> So the wide_int class would still provide "as wide as we need"
>>> arithmetic,
>>> as in your rtl patch.  I don't think he was objecting to that.
>>
>> That summarizes one part of my complaints / suggestions correctly.  In
>> other
>> mails I suggested to not make it a template but a constant over object
>> lifetime
>> 'bitsize' (or maxlen) field.  Both suggestions likely require more thought
>> than
>> I put into them.  The main reason is that with C++ you can abstract from
>> where
>> wide-int information pieces are stored and thus use the arithmetic /
>> operation
>> workers without copying the (source) "wide-int" objects.  Thus you should
>> be able to write adaptors for double-int storage, tree or RTX storage.
>
> We had considered something along these lines and rejected it.   I am not
> really opposed to doing something like this, but it is not an obvious
> winning idea and is likely not to be a good idea.   Here was our thought
> process:
>
> if you abstract away the storage inside a wide int, then you should be able
> to copy a pointer to the block of data from either the rtl level integer
> constant or the tree level one into the wide int.   It is certainly true
> that making a wide_int from one of these is an extremely common operation
> and doing this would avoid those copies.
>
> However, this causes two problems:
> 1)  Mike's first cut at the CONST_WIDE_INT did two ggc allocations to make
> the object.   it created the base object and then it allocated the array.
> Richard S noticed that we could just allocate one CONST_WIDE_INT that had
> the array in it.   Doing it this way saves one ggc allocation and one
> indirection when accessing the data within the CONST_WIDE_INT.   Our plan is
> to use the same trick at the tree level.   So to avoid the copying, you seem
> to have to have a more expensive rep for CONST_WIDE_INT and INT_CST.

I did not propose having a pointer to the data in the RTX or tree int.  Just
the short-lived wide-ints (which are on the stack) would have a pointer to
the data - which can then obviously point into the RTX and tree data.

> 2) You are now stuck either ggcing the storage inside a wide_int when they
> are created as part of an expression or you have to play some game to
> represent the two different storage plans inside of wide_int.

Hm?  wide-ints are short-lived and thus never live across a garbage collection
point.  We create non-GCed objects pointing to GCed objects all the time
and everywhere this way.

>   Clearly this
> is where you think that we should be going by suggesting that we abstract
> away the internal storage.   However, this comes at a price:   what is
> currently an array access in my patches would (i believe) become a function
> call.

No, the workers (that perform the array accesses) will simply get
a pointer to the first data element.  Then whether it's embedded or
external is of no interest to them.

>  From a performance point of view, i believe that this is a non
> starter. If you can figure out how to design this so that it is not a
> function call, i would consider this a viable option.
>
> On the other side of this you are clearly correct that we are copying the
> data when we are making wide ints from INT_CSTs or CONST_WIDE_INTs.But
> this is why we represent data inside of

Re: Unused field in graphds.h struct graph?

2012-11-26 Thread Richard Biener
On Tue, Nov 20, 2012 at 11:29 PM, Lawrence Crowl  wrote:
> In graphds.h, struct graph has a field "htab_t indices".
> As near as I can tell, it is completely unused.  It builds
> and tests fine with the field #if'd out.
>
> Shall I remove the field?

Sure.  Please make sure to have graphite enabled when building, it is one of
the graphds users.

Thanks,
Richard.

> --
> Lawrence Crowl


Re: Dependences for call-preserved regs on exposed pipeline target?

2012-11-26 Thread Greg McGary
On 11/25/12 23:33, Maxim Kuvyrkov wrote:
> You essentially need a fix-up pass just before the end of compilation 
> (machine-dependent reorg, if memory serves me right) to space instructions 
> consuming values from CPRs from the CALL_INSNS that set those CPRs.  I.e., 
> for the 99% of compilation you don't care about this restriction, it's only 
> the very last VLIW bundling and delay slot passes that need to know about it.
>
> You, probably, want to make the 2nd scheduler pass run as machine-dependent 
> reorg (as ia64 does) and enable an additional constraint (through scheduling 
> bypass) for the scheduler DFA to space CALL_INSNs from their consumers for at 
> least for 2 cycles.  One challenge here is that scheduler operates on basic 
> blocks, and it is difficult to track dependencies across basic block 
> boundaries.  To workaround basic-block scope of the scheduler you could emit 
> dummy instructions at the beginning of basic blocks that have predecessors 
> that end with CALL_INSNs.  These dummy instructions would set the appropriate 
> registers (probably just assign the register to itself), and you will have a 
> bypass (see define_bypass) between these dummy instructions and consumers to 
> guarantee the 2-cycle delay.

Thanks for the advice.  We're already on the same page--I have most of what you
recommend: I only schedule once from machine_dependent_reorg, after splitting
loads/stores, calls/branches into "init" and "fini" phases bound at fixed clock
offsets by record_delay_slot_pair().  I already have a fixup pass to handle
inter-EBB hazards.  (The selective scheduler would handle interblock
automatically, but I had trouble with it initially with split load/stores.  I 
want
to revisit that.)  Regarding CPRs, I strongly desire to avoid kludgy fixups for
schedules created with an incomplete dependence graph when the generic scheduler
can do the job perfectly with a complete dependence graph.

G


Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

2012-11-26 Thread Kenneth Zadeck

On 11/26/2012 10:03 AM, Richard Biener wrote:

On Mon, Nov 5, 2012 at 2:59 PM, Kenneth Zadeck  wrote:

On 11/04/2012 11:54 AM, Richard Biener wrote:

On Thu, Nov 1, 2012 at 2:10 PM, Richard Sandiford
 wrote:

Kenneth Zadeck  writes:

I would like you to respond to at least point 1 of this email.   In it
there is code from the rtl level that was written twice, once for the
case when the size of the mode is less than the size of a HWI and once
for the case where the size of the mode is less that 2 HWIs.

my patch changes this to one instance of the code that works no matter
how large the data passed to it is.

you have made a specific requirement for wide int to be a template that
can be instantiated in several sizes, one for 1 HWI, one for 2 HWI.   I
would like to know how this particular fragment is to be rewritten in
this model?   It seems that I would have to retain the structure where
there is one version of the code for each size that the template is
instantiated.

I think richi's argument was that wide_int should be split into two.
There should be a "bare-metal" class that just has a length and HWIs,
and the main wide_int class should be an extension on top of that
that does things to a bit precision instead.  Presumably with some
template magic so that the length (number of HWIs) is a constant for:

typedef foo<2> double_int;

and a variable for wide_int (because in wide_int the length would be
the number of significant HWIs rather than the size of the underlying
array).  wide_int would also record the precision and apply it after
the full HWI operation.

So the wide_int class would still provide "as wide as we need"
arithmetic,
as in your rtl patch.  I don't think he was objecting to that.

That summarizes one part of my complaints / suggestions correctly.  In
other
mails I suggested to not make it a template but a constant over object
lifetime
'bitsize' (or maxlen) field.  Both suggestions likely require more thought
than
I put into them.  The main reason is that with C++ you can abstract from
where
wide-int information pieces are stored and thus use the arithmetic /
operation
workers without copying the (source) "wide-int" objects.  Thus you should
be able to write adaptors for double-int storage, tree or RTX storage.

We had considered something along these lines and rejected it.   I am not
really opposed to doing something like this, but it is not an obvious
winning idea and is likely not to be a good idea.   Here was our thought
process:

if you abstract away the storage inside a wide int, then you should be able
to copy a pointer to the block of data from either the rtl level integer
constant or the tree level one into the wide int.   It is certainly true
that making a wide_int from one of these is an extremely common operation
and doing this would avoid those copies.

However, this causes two problems:
1)  Mike's first cut at the CONST_WIDE_INT did two ggc allocations to make
the object.   it created the base object and then it allocated the array.
Richard S noticed that we could just allocate one CONST_WIDE_INT that had
the array in it.   Doing it this way saves one ggc allocation and one
indirection when accessing the data within the CONST_WIDE_INT.   Our plan is
to use the same trick at the tree level.   So to avoid the copying, you seem
to have to have a more expensive rep for CONST_WIDE_INT and INT_CST.

I did not propose having a pointer to the data in the RTX or tree int.  Just
the short-lived wide-ints (which are on the stack) would have a pointer to
the data - which can then obviously point into the RTX and tree data.
There is the issue then what if some wide-ints are not short lived. It 
makes me nervous to create internal pointers to gc ed memory.

2) You are now stuck either ggcing the storage inside a wide_int when they
are created as part of an expression or you have to play some game to
represent the two different storage plans inside of wide_int.

Hm?  wide-ints are short-lived and thus never live across a garbage collection
point.  We create non-GCed objects pointing to GCed objects all the time
and everywhere this way.
Again, this makes me nervous but it could be done.  However, it does 
mean that now the wide ints that are not created from rtxes or trees 
will be more expensive because they are not going to get their storage 
"for free", they are going to alloca it.


however, it still is not clear, given that 99% of the wide ints are 
going to fit in a single hwi, that this would be a noticeable win.



   Clearly this
is where you think that we should be going by suggesting that we abstract
away the internal storage.   However, this comes at a price:   what is
currently an array access in my patches would (i believe) become a function
call.

No, the workers (that perform the array accesses) will simply get
a pointer to the first data element.  Then whether it's embedded or
external is of no interest to them.
so is your plan that the wide int constructors from rtx or 

Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

2012-11-26 Thread Richard Biener
On Mon, Nov 26, 2012 at 5:03 PM, Kenneth Zadeck
 wrote:
> On 11/26/2012 10:03 AM, Richard Biener wrote:
>>
>> On Mon, Nov 5, 2012 at 2:59 PM, Kenneth Zadeck 
>> wrote:
>>>
>>> On 11/04/2012 11:54 AM, Richard Biener wrote:

 On Thu, Nov 1, 2012 at 2:10 PM, Richard Sandiford
  wrote:
>
> Kenneth Zadeck  writes:
>>
>> I would like you to respond to at least point 1 of this email.   In it
>> there is code from the rtl level that was written twice, once for the
>> case when the size of the mode is less than the size of a HWI and once
>> for the case where the size of the mode is less that 2 HWIs.
>>
>> my patch changes this to one instance of the code that works no matter
>> how large the data passed to it is.
>>
>> you have made a specific requirement for wide int to be a template
>> that
>> can be instantiated in several sizes, one for 1 HWI, one for 2 HWI.
>> I
>> would like to know how this particular fragment is to be rewritten in
>> this model?   It seems that I would have to retain the structure where
>> there is one version of the code for each size that the template is
>> instantiated.
>
> I think richi's argument was that wide_int should be split into two.
> There should be a "bare-metal" class that just has a length and HWIs,
> and the main wide_int class should be an extension on top of that
> that does things to a bit precision instead.  Presumably with some
> template magic so that the length (number of HWIs) is a constant for:
>
> typedef foo<2> double_int;
>
> and a variable for wide_int (because in wide_int the length would be
> the number of significant HWIs rather than the size of the underlying
> array).  wide_int would also record the precision and apply it after
> the full HWI operation.
>
> So the wide_int class would still provide "as wide as we need"
> arithmetic,
> as in your rtl patch.  I don't think he was objecting to that.

 That summarizes one part of my complaints / suggestions correctly.  In
 other
 mails I suggested to not make it a template but a constant over object
 lifetime
 'bitsize' (or maxlen) field.  Both suggestions likely require more
 thought
 than
 I put into them.  The main reason is that with C++ you can abstract from
 where
 wide-int information pieces are stored and thus use the arithmetic /
 operation
 workers without copying the (source) "wide-int" objects.  Thus you
 should
 be able to write adaptors for double-int storage, tree or RTX storage.
>>>
>>> We had considered something along these lines and rejected it.   I am not
>>> really opposed to doing something like this, but it is not an obvious
>>> winning idea and is likely not to be a good idea.   Here was our thought
>>> process:
>>>
>>> if you abstract away the storage inside a wide int, then you should be
>>> able
>>> to copy a pointer to the block of data from either the rtl level integer
>>> constant or the tree level one into the wide int.   It is certainly true
>>> that making a wide_int from one of these is an extremely common operation
>>> and doing this would avoid those copies.
>>>
>>> However, this causes two problems:
>>> 1)  Mike's first cut at the CONST_WIDE_INT did two ggc allocations to
>>> make
>>> the object.   it created the base object and then it allocated the array.
>>> Richard S noticed that we could just allocate one CONST_WIDE_INT that had
>>> the array in it.   Doing it this way saves one ggc allocation and one
>>> indirection when accessing the data within the CONST_WIDE_INT.   Our plan
>>> is
>>> to use the same trick at the tree level.   So to avoid the copying, you
>>> seem
>>> to have to have a more expensive rep for CONST_WIDE_INT and INT_CST.
>>
>> I did not propose having a pointer to the data in the RTX or tree int.
>> Just
>> the short-lived wide-ints (which are on the stack) would have a pointer to
>> the data - which can then obviously point into the RTX and tree data.
>
> There is the issue then what if some wide-ints are not short lived. It makes
> me nervous to create internal pointers to gc ed memory.

I thought they were all short-lived.

>>> 2) You are now stuck either ggcing the storage inside a wide_int when
>>> they
>>> are created as part of an expression or you have to play some game to
>>> represent the two different storage plans inside of wide_int.
>>
>> Hm?  wide-ints are short-lived and thus never live across a garbage
>> collection
>> point.  We create non-GCed objects pointing to GCed objects all the time
>> and everywhere this way.
>
> Again, this makes me nervous but it could be done.  However, it does mean
> that now the wide ints that are not created from rtxes or trees will be more
> expensive because they are not going to get their storage "for free", they
> are going to alloca it.

No, those would simply use the embedded stor

Re: RFC - Alternatives to gengtype

2012-11-26 Thread Diego Novillo
On Sun, Nov 25, 2012 at 10:45 AM, Richard Biener
 wrote:
> On Sun, Nov 25, 2012 at 4:21 PM, Diego Novillo  wrote:
>> On Sun, Nov 25, 2012 at 10:09 AM, Richard Biener
>>  wrote:
>>
>>> I'd say the most pragmatic solution is to stick with gengtype but
>>> make it more dependent on annotations (thus, explicit).  That is,
>>
>> Yes.  That is the direction in which I've been leaning towards.  My
>> preference is to transitionally move to manual markers
>> (http://gcc.gnu.org/wiki/cxx-conversion/gc-alternatives#Do_GC_marking_manually)
>> and over time transition to memory pool management.
>
> Note that the most GCed thing is a 'tree' and the solution is not
> to move trees to memory pools but to use less trees in the first place!

True, but you are describing an orthogonal problem.  There are other
data structures in GC.  Long term, I would like to move all of them
out of GC.

> Improving things wrt tree usage also means to isolate C/C++ frontend
> IL from the middle-end.  I once proposed to cp tree.[ch] and at gimplification
> time re-allocate and copy from the frontend tree "kind" to the gimple
> tree "kind".
> Of course our FE / middle-end separation enemy (debug info) makes this not
> so viable at the moment.

Right.

>
>>> I suppose I agree that garbage collection is not technically
>>> required for writing a compiler, but getting rid of GC in GCC
>>> entirely will be a hard and error-prone task (even if you
>>> factor out PCH which is an entirely different mess).
>>
>> Agreed.  As far as PCH is concerned, my preferred long term approach
>> is to move to streamable types.  We have an almost working
>> implementation in the PPH branch and we already have a streaming
>> framework in LTO.
>
> Of course that's not all we preserve in PCH ... (look for "interesting" global
> data marked as GC root just for the sake of PCH).

That's fine.  We can stream that data as well.  Identifying all that
is also helpful to realize just how much loose global state we have.
Coalescing that global state would be a good cleanup too.


Diego.


Re: Time for GCC 5.0? (TIC)

2012-11-26 Thread DJ Delorie

> Marketing loves high numbers after all!

If you truly think this way, we're going to have to revoke your hacker's 
license ;-)


Re: Dependences for call-preserved regs on exposed pipeline target?

2012-11-26 Thread Maxim Kuvyrkov
On 27/11/2012, at 4:34 AM, Greg McGary wrote:

> On 11/25/12 23:33, Maxim Kuvyrkov wrote:
>> You essentially need a fix-up pass just before the end of compilation 
>> (machine-dependent reorg, if memory serves me right) to space instructions 
>> consuming values from CPRs from the CALL_INSNS that set those CPRs.  I.e., 
>> for the 99% of compilation you don't care about this restriction, it's only 
>> the very last VLIW bundling and delay slot passes that need to know about it.
>> 
>> You, probably, want to make the 2nd scheduler pass run as machine-dependent 
>> reorg (as ia64 does) and enable an additional constraint (through scheduling 
>> bypass) for the scheduler DFA to space CALL_INSNs from their consumers for 
>> at least for 2 cycles.  One challenge here is that scheduler operates on 
>> basic blocks, and it is difficult to track dependencies across basic block 
>> boundaries.  To workaround basic-block scope of the scheduler you could emit 
>> dummy instructions at the beginning of basic blocks that have predecessors 
>> that end with CALL_INSNs.  These dummy instructions would set the 
>> appropriate registers (probably just assign the register to itself), and you 
>> will have a bypass (see define_bypass) between these dummy instructions and 
>> consumers to guarantee the 2-cycle delay.
> 
> Thanks for the advice.  We're already on the same page--I have most of what 
> you
> recommend: I only schedule once from machine_dependent_reorg, after splitting
> loads/stores, calls/branches into "init" and "fini" phases bound at fixed 
> clock
> offsets by record_delay_slot_pair().  I already have a fixup pass to handle
> inter-EBB hazards.  (The selective scheduler would handle interblock
> automatically, but I had trouble with it initially with split load/stores.  I 
> want
> to revisit that.)  Regarding CPRs, I strongly desire to avoid kludgy fixups 
> for
> schedules created with an incomplete dependence graph when the generic 
> scheduler
> can do the job perfectly with a complete dependence graph.


I wonder if "kludgy fixups" refers to the dummy-instruction solution I 
mentioned above.  The complete dependence graph is a myth.  You cannot have a 
complete dependence graph for a function -- scheduler works on DAG regions (and 
I doubt it will ever support anything more complex), so you would have to do 
something to account for inter-region dependencies anyway.

It is simpler to have a unified solution that would handle both inter- and 
intra-region dependencies, rather than implementing two different approaches.

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics




Re: Hash table iterators.

2012-11-26 Thread Lawrence Crowl
On 11/23/12, Andrew MacLeod  wrote:
> On 11/22/2012 01:18 PM, Lawrence Crowl wrote:
> > I have found that tree-flow.h implements iteration over htab_t,
> > while there is no current facility to do that with hash_table.
> > Unfortunately, the specific form does not match the standard C++
> > approach to iterators.  We have several choices.
> >
> > (1) Ignore the problem and leave all such tables as htab_t.
> >
> > (2) Write new hash_table iteration functions to match the form of
> > the existing GCC macro/function approach.
> >
> > (3) Write new hash_table iteration functions to match the form used
> > by the C++ standard.  This approach would entail modifying the loops.
> >
> > Diego and I have a preference for (3).  What do you prefer?
>
> I don't like (1) for sure.
>
> Before deciding a preference between (2) and (3), what are the
> actual differences?  ie, is (2) doing something practical that
> (3) has to bend over for, or is (3)'s format better but wasn't
> practical before?  is (2) otherwise useful going forward?

For iterating over a hash table containing elements of type T,

(2) The for statement is parameterized by an iterator variable and a
variable of type T.  The loop copies the element into the T variable,
and that variable is used in the body.

(3) The for statement is parameterized only by an iterator variable.
The loop uses "*iterator_variable" to obtain a reference to the
element.

With (3), we have well-established practice for writing generic
algorithms.  With (2), we seem to have just for loops.

-- 
Lawrence Crowl


GCC 4.7.2 error handling type short

2012-11-26 Thread Bill Beech (NJ7P)
I have run into a problem with both 4.6.1 and 4.7.2 of the gcc compiler 
handling type short.  Sizeof(unsigned short) returns a length of 2 as 
expected, but when I use a union of a character buffer and some fields 
including a unsigned short the value returned is 2 bytes but the buffer 
pointer is moved 4 bytes.


Here is the code for the union of the fs structure with the buffer (the 
super block structure and union are at the bottom of the listing):


/* taken from filsys.h for Intel Xenix */
#defineu_shortunsigned short
#definedaddr_tunsigned int
#defineino_tunsigned short
#definetime_tunsigned int

#define FS_CLEAN106
#defineBMAPSIZE994/* Max size of CG bit map */
/* Equals BSIZE-sizeof(struct cylinder)*/

#define MAXCGS80/* Max CG's per filsys */
#defineMAXEXTSIZE32/* Max extent size */
#define FNEWCG64/* When a file grows beyond FNEWCG KB,
   allocate blocks from a new
   cylinder group */
#define SNEWCG512/* Move to a new cylinder group after
   every subsequent SNEWCG KB */

/*
 * Cylinder group header
 */
struct cylinder {
daddr_tcg_doffset;/* offset to first data block 
from start of filsys */
daddr_tcg_ioffset;/* offset to first inode 
block from start of filsys */

u_shortcg_dblocks;/* number of data blocks in cg */
ino_tcg_ifirst;/* next free inode in linked list */
charcg_number;/* cg sequence number in filsys */
charcg_currextent;/* current extent size */
u_shortcg_lowat;/* if free blocks drop below 
cg_lowat, recompute cg_currextent */
u_shortcg_hiwat;/* if free blocks increase 
beyond cg_hiwat, recompute cg_currextent */
u_shortcg_erotor;/* position of next candidate 
block for allocation */

charcg_ilock;/* inode manipulation lock */
charcg_reserved[9];/* reserved field. (9 to align 
on word boundary) */
charcg_bits[BMAPSIZE];/* bit map. 0 = allocated. 1 = 
free */

};

/*
 * Contains global policy information.
 * Stored in the superblock.
 */
structcginfo {
u_shortfs_cgincore;/* points to buf structure 
containing cg header. Null if not in core */

daddr_tfs_cgblk;/* disk address of cg header */
u_shortfs_cgffree;/* number of free data blocks 
in cg */

ino_tfs_cgifree;/* number of free inodes in cg */
ino_tfs_cgdirs;/* number of directories in cg */
};

/*
 * Super block
 */
struct filsys {
charfs_fname[6];/* file system name */
charfs_fpack[6];/* pack name */
daddr_tfs_fsize;/* number of data blocks in fs */
u_shortfs_cgblocks;/* number of blocks per cg */
daddr_tfs_maxblock;/* max disk block in fs */
ino_tfs_cginodes;/* number of inodes per cg */
ino_tfs_maxino;/* max inumber in fs */
time_tfs_time;/* time last modified */
charfs_fmod;/* modified flag */
charfs_ronly;/* read-only fs */
charfs_clean;/* fs was cleanly unmounted */
charfs_type;/* fs type and version */
u_shortfs_fnewcg;/* contains FNEWCG */
u_shortfs_snewcg;/* contains SNEWCG */
daddr_tfs_ffree;/* number of free data blocks 
in fs */

ino_tfs_ifree;/* number of free inodes in fs */
ino_tfs_dirs;/* number of directories in fs */
charfs_extentsize;/* native extent size */
charfs_cgnum;/* number of cg's in fs */
charfs_cgrotor;/* next cg to be searched */
charfs_reserved[15];/* reserved. (15 to align on word 
boundary) */
structcginfo fs_cylinder[MAXCGS];/* contains global policy 
information

per cylinder group */
};

I use this routine to to dump the info from the superblock:

void dumpsuper(void)
{
if (*super.fs.fs_fname)
printf("fs_fname = %s\n", super.fs.fs_fname);
if (*super.fs.fs_fname)
printf("fs_fpack = %s\n", super.fs.fs_fpack);
printf("fs_fsize = %d\n", super.fs.fs_fsize);
printf("fs_cgblocks = %d\n", super.fs.fs_cgblocks);
printf("fs_maxblock = %d\n", super.fs.fs_maxblock);
printf("fs_cginodes = %d\n", super.fs.fs_cginodes);
printf("fs_maxino = %d\n", super.fs.fs_maxino);
printf("len = %d\n", sizeof(unsigned short));
dumphex(1024, 256);
}

When run, I get this result:

MAKI

Re: GCC 4.7.2 error handling type short

2012-11-26 Thread Paul_Koning

On Nov 26, 2012, at 3:57 PM, Bill Beech (NJ7P) wrote:

> I have run into a problem with both 4.6.1 and 4.7.2 of the gcc compiler 
> handling type short.  Sizeof(unsigned short) returns a length of 2 as 
> expected, but when I use a union of a character buffer and some fields 
> including a unsigned short the value returned is 2 bytes but the buffer 
> pointer is moved 4 bytes.
> ...
> As you can see the value at 0410 in the file, 6601 is returned as 358, which 
> is correct.  The 4-byte
> value following 67 01 00 00 is not returned for the unsigned int but rather 
> 00 00 30 00 is returned next (which equals 3145728 decimal).  While a 
> sizeof(unsigned short) returns 2 bytes, in this case the pointer into the 
> unioned buffer is moved 4 bytes.
> 
> This bug makes it hell to you any of your products to build emulators for the 
> 16-bit processors.
> 
> Is there a definition for a 16-bit quantity that will work in a union?
> 
> Thanks!
> 
> Bill Beech
> NJ7P

You meant struct, right, not union?

Every field has a size as well as an alignment.  The starting address of each 
field is forced to be a multiple of its alignment.  In many cases, for 
primitive data types (like the various size integers) the alignment equals the 
size; for example, a 4-byte int has alignment 4.

So if you have a struct of short then int, the compiler has to insert 2 bytes 
of padding before the int to obey the alignment.

In some cases, there are types that don't have alignment == sizeof, for example 
long long int on Intel is size 8 but (by default) alignment 4.

Since you mentioned 16-bit processors -- are you talking about a port for a 
16-bit processor, where you want int (size 4) to be aligned 2?  (For example, 
that would be sensible on a PDP-11.)  If so, you'd want to tell the compiler 
how to do that; I'm not sure of the details, presumably they are in the GCC 
Internals manual.

Or are you talking about an existing port which has defined the alignment of 
int to be 4?  If so, that might be because unaligned accesses would cause 
exceptions.  Or it may just be a convention.  In either case, you can use the 
"packed" attribute to override the normal alignment of fields.  See the GCC 
documentation for details.

paul



Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

2012-11-26 Thread Kenneth Zadeck

Richard,

I spent a good part of the afternoon talking to Mike about this.  He is 
on the c++ standards committee and is a much more seasoned c++ 
programmer than I am.


He convinced me that with a large amount of engineering and c++ 
"foolishness" that it was indeed possible to get your proposal to 
POSSIBLY work as well as what we did.


But now the question is why would any want to do this?

At the very least you are talking about instantiating two instances of 
wide-ints, one for the stack allocated uses and one for the places where 
we just move a pointer from the tree or the rtx. Then you are talking 
about creating connectors so that the stack allocated functions can take 
parameters of pointer version and visa versa.


Then there is the issue that rather than just saying that something is a 
wide int, that the programmer is going to have to track it's origin.   
In particular,  where in the code right now i say.


wide_int foo = wide_int::from_rtx (r1);
wide_int bar = wide_int::from_rtx (r2) + foo;

now i would have to say

wide_int_ptr foo = wide_int_ptr::from_rtx (r1);
wide_int_stack bar = wide_int_ptr::from_rtx (r2) + foo;

then when i want to call some function using a wide_int ref that 
function now must be either overloaded to take both or i have to choose 
one of the two instantiations (presumably based on which is going to be 
more common) and just have the compiler fix up everything (which it is 
likely to do).


And so what is the payoff:
1) No one except the c++ elite is going to understand the code. The rest 
of the community will hate me and curse the ground that i walk on.
2) I will end up with a version of wide-int that can be used as a medium 
life container (where i define medium life as not allowed to survive a 
gc since they will contain pointers into rtxes and trees.)
3) An no clients that actually wanted to do this!!I could use as an 
example one of your favorite passes, tree-vrp.   The current double-int 
could have been a medium lifetime container since it has a smaller 
footprint, but in fact tree-vrp converts those double-ints back into 
trees for medium storage.   Why, because it needs the other fields of a 
tree-cst to store the entire state.  Wide-ints also "suffer" this 
problem.  their only state are the data, and the three length fields.   
They have no type and none of the other tree info so the most obvious 
client for a medium lifetime object is really not going to be a good 
match even if you "solve the storage problem".


The fact is that wide-ints are an excellent short term storage class 
that can be very quickly converted into our two long term storage 
classes.  Your proposal is requires a lot of work, will not be easy to 
use and as far as i can see has no payoff on the horizon.   It could be 
that there could be future clients for a medium lifetime value, but 
asking for this with no clients in hand is really beyond the scope of a 
reasonable review.


I remind you that the purpose of these patches is to solve problems that 
exist in the current compiler that we have papered over for years.   If 
someone needs wide-ints in some way that is not foreseen then they can 
change it.


kenny

On 11/26/2012 11:30 AM, Richard Biener wrote:

On Mon, Nov 26, 2012 at 5:03 PM, Kenneth Zadeck
 wrote:

On 11/26/2012 10:03 AM, Richard Biener wrote:

On Mon, Nov 5, 2012 at 2:59 PM, Kenneth Zadeck 
wrote:

On 11/04/2012 11:54 AM, Richard Biener wrote:

On Thu, Nov 1, 2012 at 2:10 PM, Richard Sandiford
 wrote:

Kenneth Zadeck  writes:

I would like you to respond to at least point 1 of this email.   In it
there is code from the rtl level that was written twice, once for the
case when the size of the mode is less than the size of a HWI and once
for the case where the size of the mode is less that 2 HWIs.

my patch changes this to one instance of the code that works no matter
how large the data passed to it is.

you have made a specific requirement for wide int to be a template
that
can be instantiated in several sizes, one for 1 HWI, one for 2 HWI.
I
would like to know how this particular fragment is to be rewritten in
this model?   It seems that I would have to retain the structure where
there is one version of the code for each size that the template is
instantiated.

I think richi's argument was that wide_int should be split into two.
There should be a "bare-metal" class that just has a length and HWIs,
and the main wide_int class should be an extension on top of that
that does things to a bit precision instead.  Presumably with some
template magic so that the length (number of HWIs) is a constant for:

 typedef foo<2> double_int;

and a variable for wide_int (because in wide_int the length would be
the number of significant HWIs rather than the size of the underlying
array).  wide_int would also record the precision and apply it after
the full HWI operation.

So the wide_int class would still provide "as wide as we need"
arithmetic,
as in your rtl patch.  I don'

Re: embedded Linux: improvement issues

2012-11-26 Thread Maxim Kuvyrkov
On 27/11/2012, at 4:51 PM, ETANI NORIKO wrote:

> Dear Sirs,
> 
> 
> I am researching the status quo of embedded Linux and find out your website 
> of "Embedded Linux Conference 2013". We are looking for the engineer at a 
> distributor side in order to consult our implementation issues and improve 
> embedded Linux for our system. We have developed high-level API for many-core 
> system based on OpenCL sponsored by NEDO in Japan.
> 
> Our development environments are as follows.
> PC: Sony VAIO
> OS: Windows 7 Professional Service Pack 1 
> VM: VMware Player 4.0.3
> HOST: 32-bit Fedora 16
> TARGET: MIPS typed Linux created with GNU Linux GCC and uClibc
> 
> We found out the following 3 vital implementation issues in our development.
> 1. MPFR and GMP should be available for "LD" to link some object files and 
> create a binary file.
> The MPFR library is a C library for multiple-precision floating-point 
> computations with correct rounding. GMP is a free library for arbitrary 
> precision arithmetic, operating on signed integers, rational numbers, and 
> floating point numbers. These libraries are installed into GCC compiler. So, 
> a binary file executed on device core for computing in many-core system 
> cannot use them because it is created with "LD".

It sounds like you want to create a cross-compiler toolchain (binutils, GCC) 
and use it to generate Linux/uClibc rootfs for your MIPS target.  I.e., the 
compiler will run on x86 and generate code MIPS.

Building a cross-toolchain is a difficult task, weeks of work if you don't know 
exactly what you are doing.  Get one of the precompiled packages if you can 
(google "cross toolchain for MIPS").

The MPFR and GMP libraries are used by the compiler, which is an x86 program, 
so can simply install these libraries from your Fedora distribution: "yum 
install gmp-devel mpfr-devel libmpc-devel".  Read 
http://gcc.gnu.org/wiki/InstallingGCC for additional details.  The main point 
is that there are libraries used by the target (e.g., uClibc) and by host 
(e.g., GMP, MPFR, MPC).

> 2. About generation of uClibc, it should be available for a developer to 
> select some functions among Linux standard library and create uClibc.

I don't quite understand what you mean here.

> 3. Please tell us how to create our Linux for C++ because we have no 
> information about it.

For this you want to specify "--enable-languages=c,c++" when configuring the 
compiler.

Thank you,

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics



Re: Dependences for call-preserved regs on exposed pipeline target?

2012-11-26 Thread Greg McGary
On 11/26/12 12:46, Maxim Kuvyrkov wrote:

> I wonder if "kludgy fixups" refers to the dummy-instruction solution I 
> mentioned above.  The complete dependence graph is a myth.  You cannot have a 
> complete dependence graph for a function -- scheduler works on DAG regions 
> (and I doubt it will ever support anything more complex), so you would have 
> to do something to account for inter-region dependencies anyway.
>
> It is simpler to have a unified solution that would handle both inter- and 
> intra-region dependencies, rather than implementing two different approaches.

I retract any implication that your bypass proposal is a kludge.  I found using
bypasses to be very compact and effective.  Thanks for the extra nudge.

G