Re: [PR] [FLINK-38857][Model] Introduce a Triton inference module under flink-models [flink]

via GitHub Sun, 18 Jan 2026 08:19:49 -0800


featzhang commented on code in PR #27385:
URL: https://github.com/apache/flink/pull/27385#discussion_r2702540654



##########
flink-models/flink-model-triton/README.md:
##########
@@ -0,0 +1,174 @@
+# Flink Triton Model Integration
+
+This module provides integration between Apache Flink and NVIDIA Triton 
Inference Server, enabling real-time model inference within Flink streaming 
applications.
+
+## Features
+
+- **REST API Integration**: Communicates with Triton Inference Server via 
HTTP/REST API
+- **Asynchronous Processing**: Non-blocking inference requests for high 
throughput
+- **Flexible Configuration**: Comprehensive configuration options for various 
use cases
+- **Error Handling**: Built-in retry mechanisms and error handling
+- **Resource Management**: Efficient HTTP client pooling and resource 
management
+
+## Configuration Options
+
+### Required Options
+
+| Option | Type | Description |
+|--------|------|-------------|
+| `endpoint` | String | Full URL of the Triton Inference Server endpoint 
(e.g., `http://localhost:8000/v2/models`) |
+| `model-name` | String | Name of the model to invoke on Triton server |
+| `model-version` | String | Version of the model to use (defaults to 
"latest") |
+
+### Optional Options
+
+| Option | Type | Default | Description |
+|--------|------|---------|-------------|
+| `timeout` | Long | 30000 | Request timeout in milliseconds |
+| `max-retries` | Integer | 3 | Maximum number of retries for failed requests |
+| `batch-size` | Integer | 1 | Batch size for inference requests |
+| `priority` | Integer | - | Request priority level (0-255, higher values = 
higher priority) |
+| `sequence-id` | String | - | Sequence ID for stateful models |
+| `sequence-start` | Boolean | false | Whether this is the start of a sequence 
for stateful models |
+| `sequence-end` | Boolean | false | Whether this is the end of a sequence for 
stateful models |
+| `binary-data` | Boolean | false | Whether to use binary data transfer 
(defaults to JSON) |
+| `compression` | String | - | Compression algorithm to use (e.g., 'gzip') |
+| `auth-token` | String | - | Authentication token for secured Triton servers |
+| `custom-headers` | String | - | Custom HTTP headers in JSON format |
+
+## Usage Example
+
+### SQL DDL
+
+```sql
+CREATE MODEL my_triton_model (
+  input STRING,
+  output STRING
+) WITH (
+  'provider' = 'triton',
+  'endpoint' = 'http://localhost:8000/v2/models',
+  'model-name' = 'text-classification',

Review Comment:
   The valid `model-name` values are not defined or constrained by Flink.
   
   They are entirely determined by the models deployed in the target Triton 
Inference Server instance. In other words, any model name that is available 
under the configured Triton model repository (and exposed by the server) can be 
referenced here.
   
   To discover valid model names, users can:
   
   * Inspect the Triton model repository directly, or
   * Query the Triton Server model metadata / repository APIs (e.g. 
`/v2/models`).
   
   I agree that this is worth clarifying in the documentation, and I will 
update it to explicitly state that `model-name` must match a model deployed on 
the Triton server and point users to the appropriate Triton discovery 
mechanisms.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [FLINK-38857][Model] Introduce a Triton inference module under flink-models [flink]

Reply via email to