nicusX commented on code in PR #1:
URL: 
https://github.com/apache/flink-connector-prometheus/pull/1#discussion_r1755176342


##########
flink-connector-prometheus/README.md:
##########
@@ -0,0 +1,260 @@
+## Flink Prometheus connector (sink)
+
+Implementation of the Prometheus sink connector for DataStream API.
+
+The sink writes to Prometheus using the Remote-Write interface, based
+on [Remote-Write specifications version 
1.0](https://prometheus.io/docs/concepts/remote_write_spec/)
+
+### Guarantees and input restrictions
+
+Due to the strict 
[ordering](https://prometheus.io/docs/concepts/remote_write_spec/#ordering)
+and [format](https://prometheus.io/docs/concepts/remote_write_spec/#labels) 
requirements
+of Prometheus Remote-Write, the sink guarantees that input data are written to 
Prometheus only if input data are in
+order and well-formed.
+
+For efficiency, the connector does not do any validation.
+If input is out of order or malformed, **the write request is rejected by 
Prometheus**.
+If the write request is rejected, depending on the configured [error handling 
behaviour](#error-handling-behavior) for
+"Prometheus non-retriable errors", the sink will either throw an exception 
(`FAIL`, default behavior) or discard the
+offending
+request and continue (`DISCARD_AND_CONTINUE`). See [error handling 
behaviour](#error-handling-behavior), below, for
+further details.
+
+The sink receives as input time-series, each containing one or more samples.
+To optimise the write throughput, input time-series are batched, in the order 
they are received, and written with a
+single write-request.
+
+If a write-request contains any out-of-order or malformed data, **the entire 
request is rejected** and all time series
+are discarded.
+The reason is Remote-Write
+specifications [explicitly forbids 
retrying](https://prometheus.io/docs/concepts/remote_write_spec/#retries-backoff)
 of
+rejected write requests (4xx responses).
+and the Prometheus response does not contain enough information to efficiently 
partially retry the write, discarding the
+offending data.
+
+### Responsibilities of the application
+
+It is responsibility of the application sending the data to the sink in the 
correct order and format.
+
+1. Input time-series must be well-formed, e.g. only valid and non-duplicated 
labels,
+   samples in timestamp order (see [Labels and 
Ordering](https://prometheus.io/docs/concepts/remote_write_spec/#labels)
+   in Prometheus Remote-Write specs).
+2. Input time-series with identical labels are sent to the sink in timestamp 
order.
+3. If sink parallelism > 1 is used, the input stream must be partitioned so 
that all time-series with identical labels
+   go to the same sink subtask. A `KeySelector` is provided to partition input 
correctly (
+   see [Partitioning](#partitioning), below).
+
+#### Sink input objects
+
+To help sending well-formed data to the sink, the connector
+expect 
[`PrometheusTimeSeries`](./src/main/java/org/apache/flink/connector/prometheus/sink/PrometheusTimeSeries.java)
+POJOs as input.
+
+Each `PrometheusTimeSeries` instance maps 1-to-1 to
+a [remote-write 
`TimeSeries`](https://prometheus.io/docs/concepts/remote_write_spec/#protocol). 
Each object contains:
+
+* exactly one `metericName`, mapped to the special  `__name__` label
+* optionally, any number of additional labels { k: String, v:String } - MUST 
BE IN ORDER BY KEY
+* one or more `Samples` { value: double, timestamp: long } - MUST BE IN 
TIMESTAMP ORDER
+
+`PrometheusTimeSeries` provides a builder interface.
+
+```java
+
+// List<Tuple2<Double, Long>> samples = ...
+
+PrometheusTimeSeries.Builder tsBuilder = PrometheusTimeSeries.builder()
+        .withMetricName("CPU") // mapped to  `__name__` label
+        .addLabel("InstanceID", instanceId)
+        .addLabel("AccountID", accountId);
+    
+for(
+Tuple2<Double, Long> sample :samples){
+        tsBuilder.
+
+addSample(sample.f0, sample.f1);
+}
+
+PrometheusTimeSeries ts = tsBuilder.build();
+```
+
+#### Input data validation
+
+Prometheus imposes strict constraints to the content sent to remote-write, 
including label format and ordering, sample
+time ordering ecc.
+
+For efficiency, the sink **does not do any validation or reordering** of the 
input. It's responsibility
+of the application ensuring that input is well-formed.
+
+Any malformed data will be rejected on write to Prometheus. Depending on
+the [error handling behaviours](#error-handling-behavior)
+configured, the sink will throw an exception stopping the job (default), or 
drop the entire write-request, log the fact,
+and continue.
+
+For complete details about these constraints, refer to
+the [remote-write 
specifications](https://prometheus.io/docs/concepts/remote_write_spec/).
+
+### Batching, blocking writes and retry
+
+The sink batches multiple time-series into a single write-request, retaining 
the order..
+
+Batching is based on the number of samples. Each write-request contains up to 
500 samples, with a max buffering time of
+5 seconds
+(both configurable). The number of time-series doesn't matter.
+
+As by [Prometheus Remote-Write 
specifications](https://prometheus.io/docs/concepts/remote_write_spec/#retries-backoff),
+the sink retries 5xx and 429 responses. Retrying is blocking, to retain sample 
ordering, and uses and exponential
+backoff.
+
+The exponential backoff starts with an initial delay (default 30 ms) and 
increases it exponentially up to a max retry
+delay (default 5 sec). It continues retrying until the max number of retries 
is reached (default reties forever).
+
+On non-retriable error response (4xx, except 429, non retryable exceptions), 
or on reaching the retry limit, depending
+on
+the configured [error handling behaviour](#error-handling-behavior) for "Max 
retries exceeded", the sink will either
+throw
+an exception (`FAIL`, default behaviour), or **discard the entire 
write-request**, log a warning and continue. See
+[error handling behaviour](#error-handling-behavior), below, for further 
details.
+
+### Initializing the sink
+
+Example of sink initialisation (for documentation purposes, we are setting all 
parameters to their default values):
+Sink
+
+```java
+PrometheusSink sink = PrometheusSink.builder()
+        .setMaxBatchSizeInSamples(500)              // Batch size 
(write-request size), in samples (default: 500)
+        .setMaxRecordSizeInSamples(500)             // Max sink input record 
size, in samples (default: 500), must be <= maxBatchSizeInSamples

Review Comment:
   Good point.
   I will explain this in the actual docs, in the other PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to