featzhang opened a new pull request, #27385: URL: https://github.com/apache/flink/pull/27385
### What is the purpose of the change This PR introduces a new optional Triton inference module under `flink-models`, enabling Flink to invoke external NVIDIA Triton Inference Server for batch-oriented model inference. The module implements a reusable runtime-level integration based on the existing model provider SPI, allowing users to define Triton-backed models via `CREATE MODEL` and execute inference through `ML_PREDICT` without modifying the Flink planner or SQL execution semantics. --- ### Brief change log - Added a new `flink-model-triton` module under `flink-models` - Implemented a Triton model provider based on the existing model inference framework - Supported asynchronous and batched inference via HTTP/REST API - Added documentation for Triton model usage and configuration - Extended SQL documentation to list Triton as a supported model provider --- ### Verifying this change - Verified module compilation and packaging - Added unit tests for the Triton model provider factory - Manually validated model invocation logic against a local Triton server --- ### Does this pull request potentially affect one of the following parts? - API changes: **No** - Planner changes: **No** - Runtime changes: **No** - SQL semantics changes: **No** --- ### Documentation - Added dedicated documentation under `docs/connectors/models/triton.md` - Updated SQL model inference documentation to include Triton as a supported provider --- ### Related issues - FLINK-38857 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
