It is worth pointing out that:
* Most of the existing tests are CPU-bound, including those uses GPU for 
execution (end-to-end tests), which also relies heavily on CPU for code 
generation
* All e2e tests can be decoupled as host-side compilation on CPU + execution on 
device (e.g. GPUs)
* Brute force split between fast and slow tests is less efficient because even 
slow tests are not utilizing most of the GPU resources

Therefore, my proposal is: based on TVM RPC infra, instead of separating 
fast/slow tests, we should split host-side logic and device execution. Details:
* Run all tests on CPU with single or limited number of threads
* Provide an API via TVM RPC that allows execution of compiled code on an 
isolated GPU/Hexagon/ARM instance

The advantage of my proposal:
* Concurrency: a CPU instance could run multiple CI pipelines in parallel;
* Device utilization: the RPC infra makes sure only minimal logic is executed 
on device. It routes and manages execution efficiently and thus greatly 
improves device utilization and lowers the cost.





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/modularize-and-modernize-tensorir-tests/15237/2)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/659b3e2267963d4dea8c32a864737b01d7c3f1d02b7dc17076ab5206c3381cca).

Reply via email to