[dmlc/tvm] [RFC] Unifying Object Protocol in the Stack (#4116)

Tianqi Chen Sat, 12 Oct 2019 22:25:34 -0700

We start to introduce many useful reference-counted objects in the stack, 
including:


- ```runtime::Object``` used by virtual machine, needed in runtime.
- ```Node```  used to store ASTs.

These objects have quite a lot in common, as a matter of fact, the 
implementation
of Object and Node system is roughly the same as Node.
(or tag if we use the terminology tagged-union objects). Besides these two 
objects ```runtime::NDArray``` is also quite similar, and could potentially be 
unified.

This is an RFC to discuss for us to unify some of these(at least Node and 
Object) implementations into a single one. Doing so would give us a more 
unified runtime system that can help us to clearly define an extensive runtime 
for deep learning models.

It also allows us to have reusable containers such as Array and Map that can 
contain both runtime objects and Node.

## Features and High-level Considerations

Features that benefits all objects

- F1: reference counted object.
- F2: self-managing (have a customized deleter) that can be called upon release.
- F3: being future compatible to new allocation strategy(e.g. allocate object 
from a pool)
- F4: runtime type casting: we want to be able to check the object type during 
runtime and cast them.
```
// Example of runtime type casting.
void Process(Expr ast) {
  if (const AddNode* ptr = ast.as<AddNode>()) {
    // Logic for add
  } else if (const SubNode* ptr = ast.as<SubNode>()) {
    // Logic for sub
  }
}
```

Features that needed by Node(AST), they are implemented by additional
virtual functions in the Node base class.

- F5: Support dynamic type index allocation.
  For AST nodes, we want to add new ones as we go, we support that in Node
  by dynamically allocate the type index during runtime keyed by an unique 
type_key string.
- F6: Support for recursive serialization and front-end access.
  We support serialization of arbitrary AST node via a visitor pattern that 
every node implements.

Features that needed by runtime Object. Note that one key requirement for 
runtime is minimum code size and dependency.

- F7: static type index if possible. For a few core data structures, we want to 
statically
  decide their type_index as constant, to make sure we have the most efficient
  and minimum code.

Finally, we want to make objects close to POD-C types if possible. That will 
open path for more native language integrations in other languages(e.g. rust) 
that goes beyond a simple frontend API based integration. We can also start to 
think about introducing a DSL that allows us to generate the
implementation and code in different languages. Unifying the object protocol 
will open path towards
that goal, but automatic code generation for objects is beyond the scope of 
this RFC.


## Object Protocol Proposal

This section summarizes the base object protocol proposal.
It contains three fields:
- A type_index that can be used for runtime type checking(see more details 
about type checking in next section).
- A 32 bit reference counter.
- A deleter function

```
class Object {
 public:
  /*!
   * \brief Object deleter
   * \param self pointer to the Object.
   */
  typedef void (*FDeleter)(Object* self);
  /*!
   * Check if the object is an instance of TargetType.
   * \tparam TargetType The target type to be checked.
   * \return Whether the target type is true.
   */
  template<typename TargetType>
  inline bool IsInstance() const;

 protected:
  // The fields of the base object cell.
  /*! \brief Type index(tag) that indicate the type of the object. */
  uint32_t type_index_{0};
  /*! \brief The internal reference counter */
  RefCounterType ref_counter_{0};
  /*!
   * \brief deleter of this object to enable customized allocation.
   * If the deleter is nullptr, no deletion will be performed.
   * The creator of the object must always set the deleter field properly.
   */
  FDeleter deleter_ = nullptr;
};

```

The object header is in total 16 bytes. It is an intrusive pointer object with 
an additional deleter and ```type_index```.

The existence of ```deleter_``` gives more flexibility about where the object 
comes from. For example, if the object has a POD-C ABI signature, in theory we 
could allocate it from one DLL(or another language e.g. rust) pass to another 
dll, while still allows us to call the correct deleter.

## Future Compatibility for Special Allocators

Although we do not do it now, we want to think about the future possibility of
introducing customized allocators. The following code gives an example of an 
object pool.

```c++
class ObjectPool {
 public:
   // make an new object.
   ObjectPtr<A> make();
   // release object to the pool.
   void Release(A* obj);
};

void Example(ObjectPool* pool) {
  // allocate from the pool
  ObjectPtr<A> ptr1 = pool->make();

  // Triggers ptr1.reset() when ptr1 goes out of scope.
  // instead of calling delete, we send the object goes back to obj_pool
  // Equivalent to calling: pool->Release(ptr1.get());
}
```

If there is a global singleton reference to the pool, we can simply set the 
deleter_ function to call ```CurrentPool()->Release```. If the object pool is 
created during runtime, and we don't have a global singleton, we need one 
additional field in addition to the object, and make the object pool allocate 
the chunk instead.
```
template<typename A>
class ObjectChunk {
 public:
   A data;
   ObjectPool* pool_;
}

template<typename A>
void Deleter(Object* obj) {
  auto* ptr = static_cast<ObjectChunk<A> >(obj);
  ptr->pool_->Release(ptr);
}
```

Note that we only need this if we want the object to go back to a specific 
pool. That is why the ```pool_``` field is not introduced in the Object 
protocol, but can be added by a subsequent implementation of object pool if 
necessary.

## Type Hierachy and Runtime Type Checking

One of the design choices that can be discussed is how to implement the type 
hierachy and type checking. There are several ways to do so and each of them 
has pros and cons.
Specifically, how can we implement the IsInstance API.

```c++
// Example of usage of IsInstance

class A : public Object {
};
class BaseB : public Object {
};
class C: public BaseB {
};

void Example() {
  // type-erased object
  ObjectPtr<Object> aptr = make_object<A>();
  ObjectPtr<Object> cptr = make_object<C>();
  // easy, check if aptr->type_index_ == A::type_index()
  CHECK(aptr->IsInstance<A>());
  CHECK(cptr->IsInstance<C>());
  // harder cannot simply do equivalence checking.
  CHECK(cptr->IsInstance<BaseB>());
}
```
See the above code example, things are easy if we just care about
whether an object is a terminal(final) type.
We can just check the type index.
It becomes harder if we want to support checking of an intermediate
base class (BaseB) in the above case.

There are three choices:

- A1: Introduce ```_type_child_slots``` attributes for BaseB, make sure all its 
children's type are in a fixed range.
  We can assign ```A::type_index = 1, BaseB::type_index=2, 
BaseB::type_child_slots=1, C::type_index=3```
  Make sure the type indices in ```[2, 4)``` only contain sub-class of BaseB
  - Fast to type check, but has to declare the maximum number of children 
before-hand and pre-allocate the type index slots.
- A2: Create a global type hierachy table, that has information about each 
type's parent type
  - Works for all cases, no need to pre-declare number of child classes.
  - Might need locking the global type table if it can be dynamically updated 
during runtime.
- A3: Create a chain of vtable during type construction.
  - Used by current Node
  - Need to create vtable, or introduce virtual function to the Node.

A1 gives the fatest code, but requires additional information(pre-reserve the 
child slots). Which could be fine for a project with a fixed type hierachy, but 
might be painful to allow new plugins that contains new types. A2 is the most 
flexible one, but could be slow(due to access to global type table and lock). 
A3 needs virtual function and vtable support(means additional fields in the 
Object),
which may or may not be what we want(if we want to move toward pure C ABI).

My current take is to support a mix of A1 and A2: allow each type declare 
```_type_child_slots```, but also allows
additional children that overflows the slots. During type checking, we first 
check if the type_index is
within the range(fast path), if not, we go to the global type table(slower path 
for rare cases).
It would be great to get more thoughts on this matter.

## Unifying NDArray as an Object

Another debatable point is whether we want to unify NDArray as an Object. The 
advantage is clear, we could reuse the containers(Array<NDArray>) and remove 
runtime VM wrapper. This however, will mean we need to change the signature of 
NArray, it also brings an interesting question about how to handle sub-classes 
of NDArray and type checks them. We could defer the decision to a later point.

## Path for Upgrade

- [ ] Introduce the new object protocol, use it for the old runtime::Object.
- [ ] Redirect Node to be sub-class of the new Object, redirect the functions.
- [ ] Finish the tests and remove redundant API if any.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4116

[dmlc/tvm] [RFC] Unifying Object Protocol in the Stack (#4116)

Reply via email to