It is worth pointing out that:
* Most of the existing tests are CPU-bound, including those uses GPU for
execution (end-to-end tests), which also relies heavily on CPU for code
generation
* All e2e tests can be decoupled as host-side compilation on CPU + execution on
device (e.g. GPUs)
* Brute
As we start to build multiple modules, it is useful to start modularizing the
unit-tests with a goal of reducing some of the actual integration tests.
Previously quite a few tests are written in a way that directly invokes end to
end compilation, we also have tests that are coupled with legacy