We start to introduce many useful reference-counted objects in the stack, including:
- ```runtime::Object``` used by virtual machine, needed in runtime. - ```Node``` used to store ASTs. These objects have quite a lot in common, as a matter of fact, the implementation of Object and Node system is roughly the same as Node. (or tag if we use the terminology tagged-union objects). Besides these two objects ```runtime::NDArray``` is also quite similar, and could potentially be unified. This is an RFC to discuss for us to unify some of these(at least Node and Object) implementations into a single one. Doing so would give us a more unified runtime system that can help us to clearly define an extensive runtime for deep learning models. It also allows us to have reusable containers such as Array and Map that can contain both runtime objects and Node. ## Features and High-level Considerations Features that benefits all objects - F1: reference counted object. - F2: self-managing (have a customized deleter) that can be called upon release. - F3: being future compatible to new allocation strategy(e.g. allocate object from a pool) - F4: runtime type casting: we want to be able to check the object type during runtime and cast them. ``` // Example of runtime type casting. void Process(Expr ast) { if (const AddNode* ptr = ast.as<AddNode>()) { // Logic for add } else if (const SubNode* ptr = ast.as<SubNode>()) { // Logic for sub } } ``` Features that needed by Node(AST), they are implemented by additional virtual functions in the Node base class. - F5: Support dynamic type index allocation. For AST nodes, we want to add new ones as we go, we support that in Node by dynamically allocate the type index during runtime keyed by an unique type_key string. - F6: Support for recursive serialization and front-end access. We support serialization of arbitrary AST node via a visitor pattern that every node implements. Features that needed by runtime Object. Note that one key requirement for runtime is minimum code size and dependency. - F7: static type index if possible. For a few core data structures, we want to statically decide their type_index as constant, to make sure we have the most efficient and minimum code. Finally, we want to make objects close to POD-C types if possible. That will open path for more native language integrations in other languages(e.g. rust) that goes beyond a simple frontend API based integration. We can also start to think about introducing a DSL that allows us to generate the implementation and code in different languages. Unifying the object protocol will open path towards that goal, but automatic code generation for objects is beyond the scope of this RFC. ## Object Protocol Proposal This section summarizes the base object protocol proposal. It contains three fields: - A type_index that can be used for runtime type checking(see more details about type checking in next section). - A 32 bit reference counter. - A deleter function ``` class Object { public: /*! * \brief Object deleter * \param self pointer to the Object. */ typedef void (*FDeleter)(Object* self); /*! * Check if the object is an instance of TargetType. * \tparam TargetType The target type to be checked. * \return Whether the target type is true. */ template<typename TargetType> inline bool IsInstance() const; protected: // The fields of the base object cell. /*! \brief Type index(tag) that indicate the type of the object. */ uint32_t type_index_{0}; /*! \brief The internal reference counter */ RefCounterType ref_counter_{0}; /*! * \brief deleter of this object to enable customized allocation. * If the deleter is nullptr, no deletion will be performed. * The creator of the object must always set the deleter field properly. */ FDeleter deleter_ = nullptr; }; ``` The object header is in total 16 bytes. It is an intrusive pointer object with an additional deleter and ```type_index```. The existence of ```deleter_``` gives more flexibility about where the object comes from. For example, if the object has a POD-C ABI signature, in theory we could allocate it from one DLL(or another language e.g. rust) pass to another dll, while still allows us to call the correct deleter. ## Future Compatibility for Special Allocators Although we do not do it now, we want to think about the future possibility of introducing customized allocators. The following code gives an example of an object pool. ```c++ class ObjectPool { public: // make an new object. ObjectPtr<A> make(); // release object to the pool. void Release(A* obj); }; void Example(ObjectPool* pool) { // allocate from the pool ObjectPtr<A> ptr1 = pool->make(); // Triggers ptr1.reset() when ptr1 goes out of scope. // instead of calling delete, we send the object goes back to obj_pool // Equivalent to calling: pool->Release(ptr1.get()); } ``` If there is a global singleton reference to the pool, we can simply set the deleter_ function to call ```CurrentPool()->Release```. If the object pool is created during runtime, and we don't have a global singleton, we need one additional field in addition to the object, and make the object pool allocate the chunk instead. ``` template<typename A> class ObjectChunk { public: A data; ObjectPool* pool_; } template<typename A> void Deleter(Object* obj) { auto* ptr = static_cast<ObjectChunk<A> >(obj); ptr->pool_->Release(ptr); } ``` Note that we only need this if we want the object to go back to a specific pool. That is why the ```pool_``` field is not introduced in the Object protocol, but can be added by a subsequent implementation of object pool if necessary. ## Type Hierachy and Runtime Type Checking One of the design choices that can be discussed is how to implement the type hierachy and type checking. There are several ways to do so and each of them has pros and cons. Specifically, how can we implement the IsInstance API. ```c++ // Example of usage of IsInstance class A : public Object { }; class BaseB : public Object { }; class C: public BaseB { }; void Example() { // type-erased object ObjectPtr<Object> aptr = make_object<A>(); ObjectPtr<Object> cptr = make_object<C>(); // easy, check if aptr->type_index_ == A::type_index() CHECK(aptr->IsInstance<A>()); CHECK(cptr->IsInstance<C>()); // harder cannot simply do equivalence checking. CHECK(cptr->IsInstance<BaseB>()); } ``` See the above code example, things are easy if we just care about whether an object is a terminal(final) type. We can just check the type index. It becomes harder if we want to support checking of an intermediate base class (BaseB) in the above case. There are three choices: - A1: Introduce ```_type_child_slots``` attributes for BaseB, make sure all its children's type are in a fixed range. We can assign ```A::type_index = 1, BaseB::type_index=2, BaseB::type_child_slots=1, C::type_index=3``` Make sure the type indices in ```[2, 4)``` only contain sub-class of BaseB - Fast to type check, but has to declare the maximum number of children before-hand and pre-allocate the type index slots. - A2: Create a global type hierachy table, that has information about each type's parent type - Works for all cases, no need to pre-declare number of child classes. - Might need locking the global type table if it can be dynamically updated during runtime. - A3: Create a chain of vtable during type construction. - Used by current Node - Need to create vtable, or introduce virtual function to the Node. A1 gives the fatest code, but requires additional information(pre-reserve the child slots). Which could be fine for a project with a fixed type hierachy, but might be painful to allow new plugins that contains new types. A2 is the most flexible one, but could be slow(due to access to global type table and lock). A3 needs virtual function and vtable support(means additional fields in the Object), which may or may not be what we want(if we want to move toward pure C ABI). My current take is to support a mix of A1 and A2: allow each type declare ```_type_child_slots```, but also allows additional children that overflows the slots. During type checking, we first check if the type_index is within the range(fast path), if not, we go to the global type table(slower path for rare cases). It would be great to get more thoughts on this matter. ## Unifying NDArray as an Object Another debatable point is whether we want to unify NDArray as an Object. The advantage is clear, we could reuse the containers(Array<NDArray>) and remove runtime VM wrapper. This however, will mean we need to change the signature of NArray, it also brings an interesting question about how to handle sub-classes of NDArray and type checks them. We could defer the decision to a later point. ## Path for Upgrade - [ ] Introduce the new object protocol, use it for the old runtime::Object. - [ ] Redirect Node to be sub-class of the new Object, redirect the functions. - [ ] Finish the tests and remove redundant API if any. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/4116