[PR] [FLINK-38857][Model] Introduce a Triton inference module under flink-models for batch-oriented AI inference [flink]

via GitHub Sun, 04 Jan 2026 19:02:35 -0800


featzhang opened a new pull request, #27385:
URL: https://github.com/apache/flink/pull/27385


   ### What is the purpose of the change
   
   This PR introduces a new optional Triton inference module under 
`flink-models`, enabling Flink to invoke external NVIDIA Triton Inference 
Server for batch-oriented model inference.
   
   The module implements a reusable runtime-level integration based on the 
existing model provider SPI, allowing users to define Triton-backed models via 
`CREATE MODEL` and execute inference through `ML_PREDICT` without modifying the 
Flink planner or SQL execution semantics.
   
   ---
   
   ### Brief change log
   
   - Added a new `flink-model-triton` module under `flink-models`
   - Implemented a Triton model provider based on the existing model inference 
framework
   - Supported asynchronous and batched inference via HTTP/REST API
   - Added documentation for Triton model usage and configuration
   - Extended SQL documentation to list Triton as a supported model provider
   
   ---
   
   ### Verifying this change
   
   - Verified module compilation and packaging
   - Added unit tests for the Triton model provider factory
   - Manually validated model invocation logic against a local Triton server
   
   ---
   
   ### Does this pull request potentially affect one of the following parts?
   
   - API changes: **No**
   - Planner changes: **No**
   - Runtime changes: **No**
   - SQL semantics changes: **No**
   
   ---
   
   ### Documentation
   
   - Added dedicated documentation under `docs/connectors/models/triton.md`
   - Updated SQL model inference documentation to include Triton as a supported 
provider
   
   ---
   
   ### Related issues
   
   - FLINK-38857


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [FLINK-38857][Model] Introduce a Triton inference module under flink-models for batch-oriented AI inference [flink]

Reply via email to