Re: gimple type system

2008-07-19 Thread Richard Guenther
On Sat, Jul 19, 2008 at 12:45 AM, Kenneth Zadeck
<[EMAIL PROTECTED]> wrote:
> Richard Guenther wrote:
>>
>> On Fri, Jul 18, 2008 at 11:25 PM, Kenneth Zadeck
>> <[EMAIL PROTECTED]> wrote:
>>
>>>
>>> Diego has asked me to look into what would be needed in a gimple type
>>> system.   This is an issue that has been brought to a head because now it
>>> is
>>> time to merge types for lto.
>>>
>>> There are a lot of questions that need to be answered before designing
>>> such
>>> a system and i would like to handle them one by one, rather than deal
>>> with a
>>> thousand threads that go off in a lot of directions.  So for now, I would
>>> like to limit the discussion to a single question:   "what do we want to
>>> do
>>> in the middle end of a compiler with a middle end type system?"
>>>
>>> I have a couple of positive answers and one negative answer.  The point
>>> of
>>> this mail is to get a more refined list.  The two positive answers are:
>>>
>>> 1) Type narrowing.   In an object oriented system, it is generally a big
>>> win
>>> to be able to narrow a type as much as possible.   This can be used to
>>> then
>>> be able to inline method calls, as well as remove runtime casts and type
>>> checks (this is useless for c).
>>> 2) Inter file type checking.  While this is not an optimization, there
>>> are
>>> reasons why it would be useful to discover types that are mismatched
>>> across
>>> compilation units.
>>>
>>> The thing that MAY not be useful anymore is the use of a type system of
>>> alias analysis.   I would have hoped that danny and richi and all of the
>>> other people hacking on the alias analysis would have subsumed anything
>>> that
>>> one could have gathered from a type based alias analysis.  If I am wrong,
>>> please correct me.
>>>
>>
>> Hah.  You are definitely wrong.
>>
>>
>
> I stand corrected.  I could hope.

Low hanging improvements are still possible.  That said, I'm still fighting out
if we can have some precision at all in the virtual FUD chains.

>>> Anyway, there must be other uses of types in either the existing middle
>>> end
>>> or that people have dreams of adding to the middle end in the future.
>>> Now
>>> is the time to raise your hand before the design has been started.
>>>
>>
>> We already have a middle-end type-system.  It is unfortunately mostly
>> implicitly defined by the assumptions we make and the information we
>> extract from the types.
>>
>> I don't think you can just go and define a type-system - after all, what
>> would
>> be the result of such a definition?
>>
>>
>
> Here, I think that you are wrong.  You can first start with a series of
> requirements and only keep the information that is necessary to satisfy the
> requirements.We currently have a type system that is defined by: here is
> what the c front end used to provide for use, and then we grafted on what
> the front ends would provide for use and now we have a mess.
>
> It is my hope to define what what we really need and then to define an api
> to get that information and then engineer the underlying information to make
> this all work and I would like to know why we want to merge and what it
> means to merge (except for the obvious space reason) before i go proposing
> some algorithms and datastructures to support this.
>>
>> Instead there are some goals we want to reach:
>> 1) reduce the amount of data related to types (and declarations)
>> during optimization
>> 2) canonicalize "types" if their difference does not matter for the
>> current and further
>> stages of compilation (like we do now say all integral types with the
>> same precision
>> and signedness are "the same")
>>
>> For both of the above a prerequesite to actually really "unify" types
>> (not treating
>> them the same, but actually getting rid of either in favor of another)
>> is to handle
>> debug information properly.  The plan is to emit debug information for the
>> source-level state of types early and refer to that from declarations via
>> a
>> unique identifier.
>>
>> One question would be if we want to gradually lower types (for example
>> flatten
>> structures, unify bit-precision integer types to modes, etc.) or if we
>> just want to
>> do one step after/during gimplification (we have a second step at RTL
>> expansion
>> of course).
>>
>>
>
> I would like to settle this lowering issue after i get an inventory of the
> uses we want to make of the type information.

Ok, I see two^Wthree classes of uses:

 1) debug information - this eventually needs frontend specific details not
yet available in the current middle-end "interface"

 2) type-based alias disambiguation - likewise this either has frontend-specific
parts or touches the type-merging problem

 3) information necessary for optimization purposes.  Trivial stuff like
signedness, precision and kind, but also possibly value-range information.
For aggregates I don't see that we need more than a flat representation
containing offset, size pairs.
It get's interesting if we w

What are the functions that i can use?

2008-07-19 Thread Mohamed Shafi
Hello all,

I am involved in porting gcc 4.1.2.
For some processing i need to know whether a register is being defined
and used in a particular instruction. Till now i have been using
'refers_to_regno_p()' to know whether a register is being used in a
instruction and 'modified_in_p()' to know whether a register is being
defined in the instruction. But 'refers_to_regno_p()' also looks into
expr_list and/or notes in an instruction. So sometimes
refers_to_regno_p() returns 1 when the register is referred in the
expr_list in the instruction even though its use list of the
instruction.

Could any one tell me the functions that i can use to find out whether
an register is being used and/or defined in a particular instruction?

Regards,
Shafi


compressed pointer type implementation in gcc

2008-07-19 Thread Yair Lifshitz
Hi,

I hope I'm not flooding with this topic. I've did some research and
couldn't find anything relevant on this topic.
My team is developing a large scale CAD application that has a large
memory footprint, requiring strong machines to run.

The application uses pointers massively.
During one of our optimization cycles I noticed that since most
objects are aligned on 8-byte boundaries, it's possible to drop the
lower 3 bits of the address and reconstruct the full address later.

Basically, as long as the application is in the 32G range (2^32*2^3),
it's possible to represent aligned pointers using an unsigned int - 4
bytes.
Seeing as obtaining the address from the compressed representation
only costs a left shift (which is very cheap), this trick will have an
insignificant impact in highly polymorphic code (and perhaps even in
general - I am not sure how well the compiler will optimize multiple
calls to the same address using this mechanism).

After giving this some thought, I realized 2 points:

1. The virtual function table pointer cannot be compressed without
handling the compiler.
2. Spreading this (if the idea actually makes sense ;)) will also be
easier through the compiler.
3. 32G of addressable range may not start at 0 and may be dispersed
(although in reality sbrk usually starts a bit above 0 and grows
continuously), so this may require some base address shifting/loader
changes (??)

Would be nice to get your opinions on it.
If it makes any sense and you can give me some basic hints, I may be
able to tailor it into the gcc development branch.

In case I was unable to make sense above I'm attaching a simple
implementation I did just to check the concept. :)

Thanks for your time,

Yair

-



#ifndef __ALIGNED_PTR_H__
#define __ALIGNED_PTR_H__

// An memory-efficient pointer implementation.
//
// Generally, pointers enable access at byte-level granularity.
// 64-bit pointers are useful to enable access to 2^64 unique byte addresses,
// which is useful for applications with a large memory foot-print.
//
// When byte-level granularity is not needed (example: some allocators return
// addresses aligned to sizeof(void*)), it is possible to address >4GB using
// a 32-bit value.
//
// The implementation below assumes the allocator's alignment is
2^ALIGNMENT_BITS,
// and thus access to pointers only requires shifting the stored
unsigned integer left,
// which is faster than multiplication which would otherwise be necessary.
//
// For example, if ALIGNMENT_BITS = 3, the actual alignment is 2^3 (8),
// which provides access to addresses up to 2^35 (32GB).
//
// Note that if unaligned addresses, or addresses farther than the
allowed limit,
// are sent to aligned_ptr it will assert, and so while the user will have to
// rerun with full 64-bit pointers, there is no risk of memory corruption.
//
// Yair Lifshitz, June 2008 :)

#include "assertions.h"

extern unsigned long __aligned_ptr_malloc_base__;

template 
class aligned_ptr
{
 public:

  typedef aligned_ptr self_type;

  aligned_ptr(): m_ptr(0) {}

  aligned_ptr(T val)
  {
assert(is_aligned_ptr(val));
m_ptr = remove_base(val) >> ALIGNMENT_BITS;
  }

  T operator-> () {
  return ptr();
  }

  const T operator->() const {
  return ptr();
  }

  T ptr() {
  unsigned long ptr = m_ptr << ALIGNMENT_BITS;
  return reinterpret_cast(ptr);
  }

  T ptr() const {
  unsigned long ptr = m_ptr << ALIGNMENT_BITS;
  return reinterpret_cast(ptr);
  }

  operator T () const {return ptr();}

  self_type& operator= (T val)
  {
  *this = self_type(val);
  return *this;
  }

  static bool is_aligned_ptr(T val)
  {
  unsigned long val_reffed_to_base = remove_base(val);
  unsigned long ALIGNMENT_MASK = (1 << ALIGNMENT_BITS) - 1;
  if ((val_reffed_to_base & ALIGNMENT_MASK) != 0) return false;
  if (val_reffed_to_base >= ((unsigned long)1 << (sizeof(unsigned
int)*8))) return false;
  return true;
  }

private:

static unsigned long remove_base(const T val)
{
return (unsigned long)val - __aligned_ptr_malloc_base__;
}

   unsigned int m_ptr;
};

#endif


Re: compressed pointer type implementation in gcc

2008-07-19 Thread Robert Dewar

Yair Lifshitz wrote:

Hi,

I hope I'm not flooding with this topic. I've did some research and
couldn't find anything relevant on this topic.
My team is developing a large scale CAD application that has a large
memory footprint, requiring strong machines to run.

The application uses pointers massively.
During one of our optimization cycles I noticed that since most
objects are aligned on 8-byte boundaries, it's possible to drop the
lower 3 bits of the address and reconstruct the full address later.

Basically, as long as the application is in the 32G range (2^32*2^3),
it's possible to represent aligned pointers using an unsigned int - 4
bytes.


why not just run in 32 bit mode in this case?
What's the point of using 64-bit addresses if you
don't need them?


Re: compressed pointer type implementation in gcc

2008-07-19 Thread Yair Lifshitz
The application does go well beyond the 4GB range.
By using this implementation I can use 32-bit pointers for most
objects (those that do not require <8byte alignment, i.e. chars, etc.)
while still have a 32GB address range.


On 7/19/08, Robert Dewar <[EMAIL PROTECTED]> wrote:
> Yair Lifshitz wrote:
> > Hi,
> >
> > I hope I'm not flooding with this topic. I've did some research and
> > couldn't find anything relevant on this topic.
> > My team is developing a large scale CAD application that has a large
> > memory footprint, requiring strong machines to run.
> >
> > The application uses pointers massively.
> > During one of our optimization cycles I noticed that since most
> > objects are aligned on 8-byte boundaries, it's possible to drop the
> > lower 3 bits of the address and reconstruct the full address later.
> >
> > Basically, as long as the application is in the 32G range (2^32*2^3),
> > it's possible to represent aligned pointers using an unsigned int - 4
> > bytes.
> >
>
> why not just run in 32 bit mode in this case?
> What's the point of using 64-bit addresses if you
> don't need them?
>


Re: compressed pointer type implementation in gcc

2008-07-19 Thread Andi Kleen
"Yair Lifshitz" <[EMAIL PROTECTED]> writes:
>
> Basically, as long as the application is in the 32G range (2^32*2^3),

Seems like a strange assumption. If the application can use 32GB
what stops it from using 40GB or 64GB? Systems with that much 
memory are readily available these days.

> Would be nice to get your opinions on it.

It's a common optimization in the JVM world where the Just in time
compilers do it based on the configured heap sizes and because
there's no fixed ABI.

But on the C compiler level the problem is that you'll have an
completely own ABI and won't be able to use any standard libraries
unless you recompile them. And no system calls without a translation
layer or some way to tag all system interfaces to the the standard
ABI.  Java avoids that by having clearly defined interfaces to the
outside world, but that's not the case in C.  Or you annotiate all
structures where this translation should happen.

It is essentially the same problem as the common "32bit ABI
in long mode" proposals have.

-Andi


Re: compressed pointer type implementation in gcc

2008-07-19 Thread Robert Dewar

Andi Kleen wrote:

"Yair Lifshitz" <[EMAIL PROTECTED]> writes:

Basically, as long as the application is in the 32G range (2^32*2^3),


Seems like a strange assumption. If the application can use 32GB
what stops it from using 40GB or 64GB? Systems with that much 
memory are readily available these days.


Well the idea would be specifically for applications that can live
in 32 GB. If you compile in this mode, obviously you can't play
this trick, and you have to be sure that malloc cooperates in
limiting memory to 32G. On a system with allocate on use, you
could perhaps allocate 32GB in a chunk, and then use an allocator
within this 32GB which deals with compressed pointers.



Would be nice to get your opinions on it.



But on the C compiler level the problem is that you'll have an
completely own ABI and won't be able to use any standard libraries
unless you recompile them. And no system calls without a translation
layer or some way to tag all system interfaces to the the standard
ABI.  Java avoids that by having clearly defined interfaces to the
outside world, but that's not the case in C.  Or you annotiate all
structures where this translation should happen.


Not trivial, but seems addressable, either by recompiling the library
specially for this mode, or providing an interface layer, or by
providing some kind of mechhanism for indicating that certain
routines are compiled in normal mode.


Re: compressed pointer type implementation in gcc

2008-07-19 Thread Joe Buck
On Sat, Jul 19, 2008 at 06:53:41PM -0400, Robert Dewar wrote:
> Andi Kleen wrote:
> >"Yair Lifshitz" <[EMAIL PROTECTED]> writes:
> >>Basically, as long as the application is in the 32G range (2^32*2^3),
> >
> >Seems like a strange assumption. If the application can use 32GB
> >what stops it from using 40GB or 64GB? Systems with that much 
> >memory are readily available these days.
> 
> Well the idea would be specifically for applications that can live
> in 32 GB. If you compile in this mode, obviously you can't play
> this trick, and you have to be sure that malloc cooperates in
> limiting memory to 32G. On a system with allocate on use, you
> could perhaps allocate 32GB in a chunk, and then use an allocator
> within this 32GB which deals with compressed pointers.

An alternative would be to design the program to use integers in
the role of pointers, and allocate objects of a given type in pools
that are contiguous in virtual address space.  Then instead of a
pointer to class Foo, you have an unsigned 32-bit integer, and the 
true address of the object is Foo_pool_base + offset.  The offset
is automatically scaled by the compiler, multiplied by sizeof(Foo).

The application could then scale beyond 32GB in many cases (especially
if different kinds of objects are in different pools).  The conversion
from integers to pointers for dereferencing can be done in inline
functions.  Real pointers can be used as well where the cost is
small.  The application can then remain portable C or C++, and no
compiler modification is necessary.


working machine

2008-07-19 Thread Jason mclaughlin
> > Anyone familiar with the idea of trying to describe a machine the way
> > it works?
> > like where it's being a machine working the way of having a loop with
> > the problem of being in
> > the middle and then to the outside as how the machine can move? so
> > like if you were to make it a machine
> > that does math the way it works it has machine parts that actually
> > move like the way the calculation is done?
> > so it moves like if this were to try and move as a real machine:
> > for (i = 0; i < 10; i++) {
> >  if (i == 3) next;
> > };
> > so as a real machine though, that works the way that would have to as
> > a machine? a machine that can't be anything really working like gears
> > because of how to be in the middle of the loop is to go outside but
> > sometimes not is a problem the way something has to move.
> > so if that's to look like a real machine, it's not a machine that
> > turns around and around mechanically though, because in the middle is
> > back to the beginning. but sometimes through and back around. But it
> > actually has to move like a real machine though.
> > I think I know a way there is to describe a machine that works this
> > way...
> > say on a checkers board you have checker pieces, and say each checker
> > piece is paired with another.
> > now all checker pieces are pairs.
> > the way said, try to make one piece able to move... but you have to
> > move the other it's a pair with at the same time.
> > the board is full, there's no free spaces to move to.
> > so to make a piece move with it's paired piece, find where it can go
> > where there's another pair that can move, that pair can move where
> > another pair can move, and so on... where the last pair to move goes
> > where the first pair left.
> > each time you move a pair, they are not the same pair anymore once
> > they've moved, each of the pair is now a pair with the piece that
> > left
> > where they went to make a new pair of them. this is key in figuring
> > out the only way it can work so a piece can move at all.
> > so knowing no first move you can make because there isn't any
> > specific
> > move to know, find the pair to be able to move the way where they
> > move
> > to another pair where each of the pair is now another pair with the
> > one they move to, they move to a pair that's together. now first pair
> > to move to another pair, is now not the same pair, but each a pair
> > with each of the other pair.
> > so move and do that, but at the same time when you get a piece of the
> > first pair where it goes and the other pair moved, as a new pair now
> > it can't stay there because it has to move again because of how at
> > the
> > same time something is making the other of the pair move. right?
> > maybe
> > that part is hard to see. It's the only way to figure it can move in
> > any way at all.
> > so it's like the last move has to be known before the first move can
> > be made, because the first move that can be made is where something
> > can move next, but what can move next is what carries on to the last
> > move that can move where the first pair moved from. it's a recursive
> > type of problem to figure out how to move a pair.
> > where a pair can move is where it goes to another pair that at the
> > same time is moving away making an occupancy, but when you get there
> > and you're a new pair with the piece that moves from where you get
> > to,
> > it's not to think staying can work because now something needs the
> > pair you are now to move.
> > it's like so where you can never have it so you stop moving until all
> > the move is complete. but nowhere in the middle, or at first or last,
> > can you know how to make it work to move because the beginning is
> > like
> > knowing the end of how to move.
> > what's interesting though is how pairs that figure themselves to be
> > able to move are not all the pairs but some, but that some other
> > pairs
> > that figure out a way to move are the same pairs as others that
> > figure
> > out another way.
> > and each time a pair makes a move it's all the way to where the place
> > they leave has something come, but that had to be before you can move
> > in the first place in idea of the problem it is because that's where
> > something can move to finally get around to the last move where you
> > leave, but leave where you can because you find what comes where you
> > leave at first. each time a pair is moved depending on how moving
> > pairs are said together is to reorganize how pairs are together, but
> > to keep moving the same pairs is to find the same place they were in
> > to begin with but alot of other ways too depending on which of the
> > pairs together you try to move, if you say the pairs together are the
> > pairs that given one pair are the ones that move at the same time
> > too.
> > isn't it fair to call how they behave any machine?
> > there's a few things you can say like given how they're arranged pick
> > a few pairs for how they