RFC - Introducing different VEC types

Diego Novillo Fri, 24 Aug 2012 15:42:27 -0700

Now that we have a slightly cleaner implementation of vectors,
I've been thinking of ways to fix some of the API warts:


1- Pointers vs references when handling vector elements.

Elements are sometimes passed by value (vectors of pointers) and
others they are passed by reference (vectors of objects).  This
forces us to have functions accepting 'T' and 'T *', which
uglifies the interface when T is a pointer type.

It forces the caller to use a explicit cast when passing a 0
value and we need to provide two overloads for the function.

This is not hard to fix by making every function handle
references.  But it requires changes in the callers.  I will be
fixing this in the next few days.


2- All vectors are now pointers.

This has the advantage that vectors occupy a single word when
they are part of structure fields.  However, it also means that
most accessors need a couple of loads to reference the vector.

Moreover, we use the convention that a NULL vector indicates an
empty vector.  This forces a test in every accessor and causes
problems for member functions (which need to become static
members).


3- The structure uses the trailing array idiom for storage.

This makes it possible to implement embedded vectors, but it also
means that an allocation or reallocation operation need to
potentially reallocate the instance (returning a different VEC
pointer) and they all need to have the allocation strategy passed
as an argument to the call.


Lawrence and I discussed ways of addressing these last 2 issues.
We think we can provide three distinct VEC types to support
different scenarios:

- Embedded vectors of fixed size.

  These vectors are fields of a parent structure and are
  allocated by the parent using the trailing array idiom (e.g.,
  tree_binfo::base_binfos). They are not allowed to grow and
  cannot be allocated (they have to be laid out on top of
  pre-allocated storage).

  We can support this with something very similar to the current
  structure:

  template<typename T>
  struct VEC_embedded_fixed {
    unsigned num_;
    unsigned alloc_;
    T vec_[1];
  };

  This structure will not support a memory allocation strategy.
  It will implement most of the vector API, except functions that
  reallocate memory.

  Accessors to this type are quick.  For instance, v.length() is
  { return v.num_; }


- Embedded vectors of variable size.

  These vectors are fields of a parent structure, but they need
  to grow/shrink and support an allocation strategy.
  Additionally, they should occupy a single word in the structure
  when they are not allocated.

  Also, it would be convenient to make them regular instances
  (instead of pointers), so that accessor functions do not have
  to bother testing the value of the vector itself before
  accessing its fields:

  template<typename T, enum vec_allocation_t A>
  struct GTY(()) VEC_embedded_variable {
    struct VEC_embedded_fixed *vec_;
  };

  Instances of this type only need a word of storage until they
  are actually allocated.  This will cause no growth in the
  structures where we use VECs.

  Accessors to this type are similar to the current
  implementation (slow).  In this case v.length() is
  { return (v->vec_) ? v->vec_.num_ : 0 }


- Standalone vectors.

  These vectors have a larger footprint, but accessors can be
  much quicker since they would not have to jump through many
  hoops:

  template<typename T, enum vec_allocation_t A>
  struct VEC_fast {
    unsigned num_;
    unsigned alloc_;
    T *vec_;
  };

  These are typically global or function-local variables.  Access
  to these vectors is quick: v.length() is { return v.num_; }


All three types will implement the current API.  We will be able
to simplify the user code in the following ways:

- The type of vector need only be specified at the declaration
  site.  In particular, the memory allocation strategy does not
  need to be specified in all the function calls that may
  reallocate the vector (it becomes a property of the type).

- We can restrict the API for embedded vectors at compile time.
  Currently, there are runtime checks.  We can make them compile
  time checks.

- The standalone vectors are going to be quicker to access than
  the current VECs.  This may provide some runtime improvements.

Lawrence, I hope I didn't miss anything.  Please correct me as
you see fit.

If this works for everyone, I will be implementing this in a
sequence of patches following
http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01636.html.  The
names of the VECs will need to change (I don't like them, but I'm
not good at names).


Thanks.  Diego.

RFC - Introducing different VEC types

Reply via email to