Thank you @zhanghaohit, @remotego, @liangfu, @hjiang for the discussion.
This is a great step forward for VTA. Having a story for PCI-E type FPGAs is highly needed and has been a little too overlooked lately, so I appreciate the solid RFC and the hard work. The TVM community looks forward to your PRs! Before addressing the low level engineering details I wanted to take a step back to look at VTA today. Currently VTA is a collection of sources that follow an accelerator design defined by its low-level (microcode) and task-level ISA. As such there is a collection of sources that have been maintained that need to align functionally: * The Xilinx-centric HLS source code and compilation scripts that target Pynq-type SoCs. They rely on the low-level Pynq software drivers that are not completely open source. Therefore this design is difficult to adapt to other vendors (Intel) or other FPGA types (PCI-E cards). This was the first implementation of VTA. * A VTA functional simulator specified in C. This gives us non-cycle accurate, but behaviorally correct simulation of VTA in order to test the whole TVM-VTA stack from the comfort of your laptop/desktop machine. * A more recent Chisel-based VTA implementation that is vendor, or even FPGA-agnostic. This Chisel design has the benefit of being ported to ASICs for instance. Another benefit is that we can achieve cycle accurate simulation with Verilator, and simulate full workloads (e.g. mobilenet) which would give us the ability to not have to maintain separate hardware sources and simulator sources as we do with the HLS design and the functional simulator sources. This ensures we don't have feature drift between simulation and hardware. Finally, we're proposing a 4th design entry method which would leverage OpenCL programming language. In terms of pros, OpenCL is adopted by both Intel and Xilinx as a programming language for its FPGAs (minus several vendor specific pragmas). It can target both PCI-E based and SoC type designs. As a negative, it is difficult to expose virtual threads in the design, so we may lose the benefit of virtual threading in those designs, but it makes the compilation story a little cleaner, easier to maintain. So the high level question on VTA is: given that we're introducing more design entries for VTA, how are we going to make sure that they follow the same spec, and don't bitrot/feature drift over time? And if they don't follow the same spec, how will we handle the diversity of designs, and how will this informs the design and testing of TVM? I see us going two ways: (1) We try to adopt a single design entry language for all variants of VTA, e.g. Chisel. Since it's the most hardware vendor agnostic and is friendly to ASIC development, it's a safe bet moving forward but it means that we'll end up having more complex code to maintain, and not necessarily achieve as high of performance as we might using High-Synthesis design languages designed by the vendors (Intel, Xilinx) that more seamlessly map down to the FPGA hardware. (2) We embrace the diversity of needs from TVM/VTA users and continue to maintain HLS, OpenCL, C, and Chisel sources. To keep this challenge tractable, and make sure that these sources are well tested and don't bitrot, we need to make sure that each can follow a VTA spec via regular CI testing, which can test different variants of VTA (e.g. different sets of ALU instructions being supported, support for virtual threading or not, etc.) I'd be curious to know what all of your thoughts are about (1) or (2), or a possible third option. This is no RFC, or vote, but I'd like to have your thoughts on this matter since it may affect how we prioritize open source work around VTA. --- [Visit Topic](https://discuss.tvm.ai/t/rfc-vta-support-for-cloud-devices-opencl-compatible/6676/13) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/f44cbf3aa9b5d03c2d20c520db0f127838df76a3fa316e812414aa3009938a2e).