Hi hjiang,
Thank you very much for your reply! I will try to clarify the two questions you mentioned: > “any OpenCL-compatible devices” and “vendor-specific optimization” are > conflict, could you give more detail about what the plan here to balance this > 2 parts and how to reduce related complexity to minus developer efforts? The point we wish to emphasize here is that the changes we propose will not restrict itself to Intel OpenCL platform, and it should be able to support other OpenCL enabled (FPGA) devices with minimal code modifications. To achieve this, we use standard opencl interfaces and terminologies within our codes. Vendor-specific optimization refers to the process of translating OpenCL kernel codes into HDL. All the FPGA vendors will embed their own in-house optimization tactics into their compiler/synthesis tools. > for “Limitations” , about “all instructions are running sequentially”, this > may cause big performance problem because memory hiding by pipe line TLPP. Yes, there are performance penalties here. The original VTA design take advantage of two-port-property of BlockRAMs in FPGA. Thus the load unit could occupy one port while the other port will be used by the compute unit. However, such arrangement is not possible with OpenCL, as local memories should not be shared between kernels. We have some optimizations/walk-arounds in mind to improve on this issue. We plan to explorer those options soon. --- [Visit Topic](https://discuss.tvm.ai/t/rfc-vta-support-for-cloud-devices-opencl-compatible/6676/11) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/30538bd78f67a436ae333ee58a6d3d2952d33d19dd24296610e4b774dddf6472).