Hi all, While experimenting with different TensorRT versions, I noticed that compatibility is tied closely to CUDA releases (e.g., TensorRT 8.x → CUDA 11.x, TensorRT 10.0.1 → CUDA 12.x, TensorRT 10.13 → CUDA 13.x).
I’m looking for feedback on design direction: - Should we maintain separate handlers for different TensorRT versions, or evolve the current handler to target only the latest TensorRT (10.x)? - In the existing code, load_onnx only parses ONNX to an engine but isn’t used downstream. In my prototype, I added _load_onnx_build_engine, which directly builds an engine from ONNX and then runs inference. Should this live in the same handler, or be split into an ONNX-specific handler separate from TensorRT? This is my first open source contribution, so I’d greatly appreciate any guidance on what would make sense long term for Beam. >
