Hi all,

PoC (rough): 
https://github.com/apache/tvm/compare/main...areusch:move-backend-runtime?expand=1

As we work to merge AOT, define the µTVM firmware-facing API, and merge support 
for existing embedded frameworks such as 
[STM32](https://discuss.tvm.apache.org/t/rfc-standalone-code-generation-and-c-runtime-for-stm32-bare-metal-devices/9562),
 some discrepancies in the organization of TVM's low-level API are becoming 
clear. This RFC addresses those by proposing the following things:
 - creating a new header file `include/tvm/runtime/c_packed_func.h` to document 
the C-facing PackedFunc calling convention.
 - moving those typedefs related to calling C PackedFunc from `c_runtime_api.h` 
into `c_packed_func.h`
 - redefining the split between `c_backend_api` and `c_runtime_api` as: 
`c_backend_api` contains all of the functions and types depended on by 
generated TVM code (but the runtime is allowed to use these too), while 
`c_runtime_api` contains functions and types used only by the TVM runtime.

### Motivation

The code present in a typical TVM model deployment can be logically split into 
pieces as follows:

```
                       graph executor ----- c_runtime_api
                               |                 ^
                  compiled operators [c,so]      |
                                  |              |
                      c_backend_api.[h,c]  <-----+
                              |
                    platform-specific[platform.h, platform-specific 
implementation location]
```

In this split, the TVM codebase directly contributes these pieces:
- graph_executor, responsible for driving model inference end-to-end
- c_runtime_api, contains infrastructure to support graph_executor plus 
user-facing functions
- c_backend_api, contains functions called by the generated operators

We are currently undertaking implementation of two features which, taken 
together, allow users to run model inference with nearly no runtime 
requirements under certain use cases (CPU-only workloads, static models only):
1. an [Ahead-of-Time compilation 
flow](https://discuss.tvm.apache.org/t/implementing-aot-in-tvm/9206) which 
removes the need for an Executor at inference time (or replaces it with a 
generated AOT executor reliant only on c_backend_api)
2. an 
["unpacked"](https://discuss.tvm.apache.org/t/rfc-utvm-aot-optimisations-for-embedded-targets/9849)
 calling convention, which removes type metadata from all model calls where it 
is not needed.

These features are creating a parallel path to model execution:
```
               AOT [c]          graph executor ----- c_runtime_api
                   |                    |                 ^
                  compiled operators [c,so]               |
                                  |                       |
                      c_backend_api.[h,c]  <--------------+
                              |
                    platform-specific[platform.h, platform-specific 
implementation location]
```

Given these new features, it can be confusing for implementers to determine 
which functions from the TVM codebase are required for model inference. In the 
previous world, the requirement of graph_executor alone meant that all of the 
above pieces were required. The introduction of AOT means that users may no 
longer be interested in including the entire c_runtime_api in their deployed 
code. However, 
[attempts](https://github.com/apache/tvm/pull/7742#issuecomment-855391490) to 
get rid of `c_runtime_api.h` have exposed these problems with the internal 
organization:
1.  The calling convention for `TVMBackendPackedCFunc` (typedef describing the 
signature of generated model functions) states:
    ```
    /*!
     * \brief Signature for backend functions exported as DLL.
     *
     * \param args The arguments
     * \param type_codes The type codes of the arguments
     * \param num_args Number of arguments.
     * \param out_ret_value The output value of the the return value.
     * \param out_ret_tcode The output type code of the return value.
     * \param resource_handle Pointer to associated resource.
     *
     * \return 0 if success, -1 if failure happens, set error via 
TVMAPISetLastError.
    ```

    However, `TVMAPISetLastError` resides in `c_runtime_api.h`. In practice, 
this is only used when schedules offload implementation to third-party 
libraries by calling `PackedFunc` at inference time.
2. The docs for the `PackedFunc` calling convention are not very discoverable 
(they're buried in `c_runtime_api` even though used by generated model 
functions; and there are actually two definitions of `PackedFunc` typedefs in 
`c_runtime_api` (see below)), and some interactions between the runtime and 
PackedFunc are not documented at all (e.g. memory management of complex types 
returned from PackedFunc).
3. PackedFunc implementations can be categorized into two distinct usage 
patterns:
     1. generated model functions, which mainly take DLTensorHandle as 
arguments and return nothing
      2. usage in the TVM runtime (e.g. GraphExecutor), which may return 
complex objects which may require that the caller takes ownership of their 
memory management

    To address the challenges of calling PackedFunc in category (2), an 
additional type `TVMPackedCFunc` was defined in `c_runtime_api.h`:

    ```
    /*!
     * \brief C type of packed function.
     *
     * \param args The arguments
     * \param type_codes The type codes of the arguments
     * \param num_args Number of arguments.
     * \param ret The return value handle.
     * \param resource_handle The handle additional resouce handle from 
fron-end.
     * \return 0 if success, -1 if failure happens, set error via 
TVMAPISetLastError.
     * \sa TVMCFuncSetReturn
     */
    typedef int (*TVMPackedCFunc)(TVMValue* args, int* type_codes, int 
num_args, TVMRetValueHandle ret,
                                  void* resource_handle);
    ```

   You'd be forgiven for confusing this with `TVMBackendPackedCFunc`, defined 
in the same file (and pasted above), and which is the actual typedef of the 
PackedFunc generated for model inference. The difference is the 
`TVMRetValueHandle` arg, which allows the runtime to take ownership of returned 
complex types e.g. `string`, `bytes`, and `ObjectHandle`.

### Additional motivation: splitting `src/runtime/crt/common` library

At present, the C runtime places the implementations of both `c_runtime_api` 
and `c_backend_api` into the same logical C library (`.a`). As we move to slim 
down the runtime required for standalone deployment on embedded platforms, it 
makes sense to split the `common` library into two pieces:
1. `c_backend_api` implementations, required at deploy time with AOT
2. `c_runtime_api` implementations, required at deploy time with Graph Executor 
and for host-driven inference

Making the split between these two usages explicit in the header files will 
help this effort.

### Proposals

This RFC proposes to cleanup these discrepancies as follows:

#### Create `include/tvm/runtime/c_packed_func.h`
Create a new header file to document **the** `PackedFunc` used in Model 
Inference. This is the one that people care about anyway; they shouldn't be 
having to tease apart `BackendPackedCFunc` from `PackedCFunc`.

In this file, do the following:
1. Place `TVMBackendPackedCFunc` typedef plus all dependent typedefs (e.g. 
`TVMArgTypeCode`, `TVMByteArray`, `TVMDeviceExtType`, `TVMValue`). Things that 
belong here are anything involved in the type signature or documentation of 
`TVMBackendPackedCFunc`.
2. Rename `TVMBackendPackedCFunc`. `PackedCFunc` merges two names together into 
a confusing amalgamation.
   * R1. `TVMCPackedFunc`  (conflicts with `tvmc` the command-line tool...)
   * R2. `CTVMPackedFunc`
   * R3. `TVMBackendCPackedFunc` (readable but not `Backend`-only)

3. Move `TVMAPISetLastError` to this file and rename to 
`TVMPackedFuncSetLastError`. This function is mentioned in 
`TVMBackendPackedCFunc` `\returns` doc.

#### Rename `TVMPackedCFunc`

This typedef is solely confined to frontend use and exists to help with memory 
management. It effectively wraps `TVMBackendPackedCFunc`. From a frontend 
perspective, it is the `PackedFunc` you'd like users to interact with, but it 
doesn't document the calling convention; so as such, it shouldn't be named as 
though it were the definition of C PackedFunc.

Options:
* F1. `TVMFrontendCPackedFunc` -- to match usage with the frontend only
* F2. `TVMRuntimeCPackedFunc` -- formalizes the notion that the runtime is a 
client of the backend

### Drawbacks of these changes

This change is mostly organizational. The main drawbacks are breakage to 
downstream users due to the renames and changes to the include paths. We will 
mitigate that by publicizing this change plus a migration guide in the forums.

### For discussion

1. Do you support or oppose this change?
2. Which F/B naming option do you prefer?
3. Are there things in particular missing from the PackedFunc docs?

cc @stoa @manupa-arm @giuseros @mousius @tqchen @jroesch @tkonolige @mehrdadh 
@junrushao1994





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/pre-rfc-api-change-formalizing-c-backend-api/10380/1)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/d3eee7f8d03446f585b5704c3e0375265c16c4fc8597ee651e7c318c899edeb9).
  • [Apache TVM Discuss] [Develo... Andrew Reusch via Apache TVM Discuss

Reply via email to