featzhang commented on code in PR #27385: URL: https://github.com/apache/flink/pull/27385#discussion_r2702542718
########## flink-models/flink-model-triton/README.md: ########## @@ -0,0 +1,174 @@ +# Flink Triton Model Integration + +This module provides integration between Apache Flink and NVIDIA Triton Inference Server, enabling real-time model inference within Flink streaming applications. + +## Features + +- **REST API Integration**: Communicates with Triton Inference Server via HTTP/REST API +- **Asynchronous Processing**: Non-blocking inference requests for high throughput +- **Flexible Configuration**: Comprehensive configuration options for various use cases +- **Error Handling**: Built-in retry mechanisms and error handling +- **Resource Management**: Efficient HTTP client pooling and resource management + +## Configuration Options + +### Required Options + +| Option | Type | Description | +|--------|------|-------------| +| `endpoint` | String | Full URL of the Triton Inference Server endpoint (e.g., `http://localhost:8000/v2/models`) | +| `model-name` | String | Name of the model to invoke on Triton server | +| `model-version` | String | Version of the model to use (defaults to "latest") | + +### Optional Options + +| Option | Type | Default | Description | +|--------|------|---------|-------------| +| `timeout` | Long | 30000 | Request timeout in milliseconds | +| `max-retries` | Integer | 3 | Maximum number of retries for failed requests | +| `batch-size` | Integer | 1 | Batch size for inference requests | +| `priority` | Integer | - | Request priority level (0-255, higher values = higher priority) | +| `sequence-id` | String | - | Sequence ID for stateful models | +| `sequence-start` | Boolean | false | Whether this is the start of a sequence for stateful models | +| `sequence-end` | Boolean | false | Whether this is the end of a sequence for stateful models | +| `binary-data` | Boolean | false | Whether to use binary data transfer (defaults to JSON) | +| `compression` | String | - | Compression algorithm to use (e.g., 'gzip') | +| `auth-token` | String | - | Authentication token for secured Triton servers | +| `custom-headers` | String | - | Custom HTTP headers in JSON format | Review Comment: `security-token` and `auth-token` are mutually exclusive and serve different purposes. * `auth-token` is a convenience option that is mapped to a standard `Authorization` HTTP header (e.g. `Authorization: Bearer <token>`). * `security-token` is treated as a generic security credential and is passed through as-is, typically via custom headers. If both are specified, this is considered a configuration error. Flink does not define a precedence rule between them, as sending multiple authentication tokens in the same request would be ambiguous. I will update the documentation to explicitly state that only one of these options should be configured at a time and clarify their respective usage. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
