Hi hjiang,

Thank you very much for your reply! I will try to clarify the two questions you 
mentioned:

> “any OpenCL-compatible devices” and “vendor-specific optimization” are 
> conflict, could you give more detail about what the plan here to balance this 
> 2 parts and how to reduce related complexity to minus developer efforts?

The point we wish to emphasize here is that the changes we propose will not 
restrict itself to Intel OpenCL platform, and it should be able to support 
other OpenCL enabled (FPGA) devices with minimal code modifications. To achieve 
this, we use standard opencl interfaces and terminologies within our codes.

Vendor-specific optimization refers to the process of translating OpenCL kernel 
codes into HDL. All the FPGA vendors will embed their own in-house optimization 
tactics into their compiler/synthesis tools.

> for “Limitations” , about “all instructions are running sequentially”, this 
> may cause big performance problem because memory hiding by pipe line TLPP.

Yes, there are performance penalties here. The original VTA design take 
advantage of two-port-property of BlockRAMs in FPGA. Thus the load unit could 
occupy one port while the other port will be used by the compute unit. However, 
such arrangement is not possible with OpenCL, as local memories should not be 
shared between kernels.

We have some optimizations/walk-arounds in mind to improve on this issue. We 
plan to explorer those options soon.





---
[Visit 
Topic](https://discuss.tvm.ai/t/rfc-vta-support-for-cloud-devices-opencl-compatible/6676/11)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/30538bd78f67a436ae333ee58a6d3d2952d33d19dd24296610e4b774dddf6472).

Reply via email to