It is worth pointing out that: * Most of the existing tests are CPU-bound, including those uses GPU for execution (end-to-end tests), which also relies heavily on CPU for code generation * All e2e tests can be decoupled as host-side compilation on CPU + execution on device (e.g. GPUs) * Brute force split between fast and slow tests is less efficient because even slow tests are not utilizing most of the GPU resources
Therefore, my proposal is: based on TVM RPC infra, instead of separating fast/slow tests, we should split host-side logic and device execution. Details: * Run all tests on CPU with single or limited number of threads * Provide an API via TVM RPC that allows execution of compiled code on an isolated GPU/Hexagon/ARM instance The advantage of my proposal: * Concurrency: a CPU instance could run multiple CI pipelines in parallel; * Device utilization: the RPC infra makes sure only minimal logic is executed on device. It routes and manages execution efficiently and thus greatly improves device utilization and lowers the cost. --- [Visit Topic](https://discuss.tvm.apache.org/t/modularize-and-modernize-tensorir-tests/15237/2) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/659b3e2267963d4dea8c32a864737b01d7c3f1d02b7dc17076ab5206c3381cca).