hubcio commented on code in PR #41:
URL: https://github.com/apache/iggy-website/pull/41#discussion_r3180779548


##########
content/docs/connectors/introduction.mdx:
##########
@@ -27,7 +27,7 @@ The connector runtime operates in two directions - **source** 
(ingest) and **sin
 | Type   | Connectors                                                     |
 |--------|----------------------------------------------------------------|
 | Source | PostgreSQL, Elasticsearch, Random                              |
-| Sink   | PostgreSQL, MongoDB, Elasticsearch, Quickwit, Apache Iceberg, 
Stdout |
+| Sink   | PostgreSQL, MongoDB, Elasticsearch, Quickwit, Apache Iceberg, 
Generic HTTP, Stdout |

Review Comment:
   the table cell here says "Generic HTTP" but the page it links to has 
frontmatter title "HTTP Sink" and never uses the word "Generic" anywhere in the 
body. clicking through gives the reader two different names for the same thing. 
either rename the new page's frontmatter to something like "HTTP Sink 
(Generic)" and add a one-liner in the opening paragraph noting it's the 
transport-level connector (vs Elasticsearch/Quickwit which speak HTTP under the 
hood), or drop "Generic" from this row.



##########
content/docs/connectors/sinks/http.mdx:
##########
@@ -0,0 +1,179 @@
+---
+title: HTTP Sink
+---
+
+<Callout type="info">
+  This page is a curated subset of the HTTP Sink documentation. The canonical 
reference — including deployment patterns, performance tuning, and the full set 
of configuration options and limitations — is the upstream 
[`http_sink/README.md`](https://github.com/apache/iggy/tree/master/core/connectors/sinks/http_sink)
 in the `apache/iggy` repository.
+</Callout>
+
+The HTTP Sink connector consumes messages from Iggy streams and delivers them 
to any HTTP endpoint — webhooks, REST APIs, serverless functions, or SaaS 
integrations. It is generic by design: you bring the URL, headers, and batching 
strategy, and the sink handles transport, retries, and metadata wrapping.
+
+## Configuration
+
+```toml
+[[streams]]
+stream = "events"
+topics = ["notifications"]
+schema = "json"
+batch_length = 50
+poll_interval = "100ms"
+consumer_group = "http_sink"
+
+[plugin_config]
+url = "https://api.example.com/ingest";
+batch_mode = "ndjson"
+```
+
+### Configuration Options
+
+| Option | Type | Default | Description |
+| ------ | ---- | ------- | ----------- |
+| `url` | string | **required** | Target URL for HTTP requests |
+| `method` | string | `POST` | HTTP method: `GET`, `HEAD`, `POST`, `PUT`, 
`PATCH`, `DELETE` |
+| `timeout` | string | `30s` | Request timeout (e.g. `10s`, `500ms`) |
+| `max_payload_size_bytes` | u64 | `10485760` | Max body size in bytes (10MB). 
`0` to disable |
+| `batch_mode` | string | `individual` | `individual`, `ndjson`, `json_array`, 
or `raw` |
+| `include_metadata` | bool | `true` | Wrap payload in metadata envelope |
+| `include_checksum` | bool | `false` | Add message checksum to metadata |
+| `include_origin_timestamp` | bool | `false` | Add origin timestamp to 
metadata |
+| `health_check_enabled` | bool | `false` | Send health check request in 
`open()` |
+| `health_check_method` | string | `HEAD` | HTTP method for health check |
+| `max_retries` | u32 | `3` | Retry attempts for transient errors |
+| `retry_delay` | string | `1s` | Base delay between retries |
+| `retry_backoff_multiplier` | u32 | `2` | Exponential backoff multiplier (min 
1) |
+| `max_retry_delay` | string | `30s` | Maximum retry delay cap |
+| `success_status_codes` | [u16] | `[200, 201, 202, 204]` | Status codes 
considered successful |
+| `tls_danger_accept_invalid_certs` | bool | `false` | Skip TLS certificate 
validation |
+| `max_connections` | usize | `10` | Max idle connections per host |
+| `verbose_logging` | bool | `false` | Log request/response details at debug 
level |
+| `headers` | table | `{}` | Custom HTTP headers (e.g. `Authorization`) |
+
+## Batch Modes
+
+The `batch_mode` option controls how messages from one poll cycle are 
delivered to the endpoint.
+
+- **`individual`** (default): one HTTP request per message. Best for webhooks 
and endpoints that accept single events. With `batch_length = 50`, this 
produces 50 sequential round trips per poll cycle.
+- **`ndjson`**: all messages in one request, [newline-delimited 
JSON](https://github.com/ndjson/ndjson-spec). Best for bulk-ingestion 
endpoints. `Content-Type: application/x-ndjson`.
+- **`json_array`**: all messages as a single JSON array. Best for APIs that 
expect array payloads. `Content-Type: application/json`.
+- **`raw`**: raw bytes, one request per message. For non-JSON payloads 
(Protobuf, FlatBuffers, binary). The metadata envelope is not applied. 
`Content-Type: application/octet-stream`.
+
+For production throughput, prefer `ndjson` or `json_array` over `individual` — 
they collapse N round trips per poll cycle into one.
+
+## Metadata Envelope
+
+When `include_metadata = true` (default), the JSON-mode payload is wrapped:
+
+```json
+{
+  "metadata": {
+    "iggy_id": "0123456789abcdef0123456789abcdef",
+    "iggy_offset": 42,
+    "iggy_timestamp": 1710064800000000,
+    "iggy_stream": "my_stream",
+    "iggy_topic": "my_topic",
+    "iggy_partition_id": 0
+  },
+  "payload": { ... }
+}
+```
+
+- `iggy_id` is a 32-character lowercase hex string (no dashes).
+- For non-JSON payloads (`raw`, `flatbuffer`, `proto` schemas), the payload is 
base64-encoded and an `iggy_payload_encoding: "base64"` field is added.
+- Set `include_metadata = false` to send the raw payload without wrapping 
(useful when the downstream service expects bare JSON, e.g. Slack webhooks).
+
+The connector does not require any particular message structure on input. The 
envelope is applied on the way out, not expected on the way in — your producers 
can publish whatever they like.
+
+## Authentication
+
+The HTTP sink supports authentication via custom headers under 
`[plugin_config.headers]`. All headers are sent with every request, including 
health checks.
+
+```toml
+# Bearer token
+[plugin_config.headers]
+Authorization = "Bearer eyJhbGciOiJSUzI1NiIs..."
+
+# API key
+[plugin_config.headers]
+x-api-key = "my-secret-api-key"
+
+# Basic auth (base64 of "username:password")
+[plugin_config.headers]
+Authorization = "Basic dXNlcm5hbWU6cGFzc3dvcmQ="
+```
+
+Multiple headers are supported and combined per request. For secrets, prefer 
environment variable overrides at the process level (see the upstream README) 
to keep tokens out of `config.toml`.
+
+## Retry & Delivery Semantics
+
+Failed requests use `reqwest-middleware` with exponential backoff: 
`retry_delay`, then `retry_delay * retry_backoff_multiplier`, capped at 
`max_retry_delay`.
+
+- **Transient errors** (retried): network errors, HTTP 429, 500, 502, 503, 504.
+- **Non-transient errors** (fail immediately): HTTP 400, 401, 403, 404, 405, 
etc.
+- **HTTP 429 `Retry-After`**: the header is logged but not honored; the 
middleware uses computed backoff.
+- **Partial delivery** (`individual`/`raw` modes): after 3 consecutive HTTP 
failures, the remainder of the batch is aborted to avoid hammering a dead 
endpoint.
+
+The connector runtime commits consumer-group offsets *before* `consume()` is 
called and does not inspect its return value, so the effective delivery 
guarantee is **at-most-once** at the runtime level. The sink's internal retry 
loop provides best-effort delivery within each `consume()` call.
+
+## Example Configurations
+
+### Webhook (Slack)
+
+```toml
+[plugin_config]
+url = "https://hooks.slack.com/services/T00/B00/xxx";
+batch_mode = "individual"
+include_metadata = false   # Slack expects bare JSON payload

Review Comment:
   the comment "Slack expects bare JSON payload" is true but easy to misread. 
flipping `include_metadata = false` doesn't transform arbitrary payloads into 
Slack's `{"text": "..."}` shape - the sink does no payload transformation on 
outbound. add a one-liner: "your producer must publish Slack-compatible JSON; 
the sink does not transform payloads."



##########
content/docs/connectors/sinks/http.mdx:
##########
@@ -0,0 +1,179 @@
+---
+title: HTTP Sink
+---
+
+<Callout type="info">
+  This page is a curated subset of the HTTP Sink documentation. The canonical 
reference — including deployment patterns, performance tuning, and the full set 
of configuration options and limitations — is the upstream 
[`http_sink/README.md`](https://github.com/apache/iggy/tree/master/core/connectors/sinks/http_sink)
 in the `apache/iggy` repository.
+</Callout>
+
+The HTTP Sink connector consumes messages from Iggy streams and delivers them 
to any HTTP endpoint — webhooks, REST APIs, serverless functions, or SaaS 
integrations. It is generic by design: you bring the URL, headers, and batching 
strategy, and the sink handles transport, retries, and metadata wrapping.
+
+## Configuration
+
+```toml
+[[streams]]
+stream = "events"
+topics = ["notifications"]
+schema = "json"
+batch_length = 50
+poll_interval = "100ms"
+consumer_group = "http_sink"
+
+[plugin_config]
+url = "https://api.example.com/ingest";
+batch_mode = "ndjson"
+```
+
+### Configuration Options
+
+| Option | Type | Default | Description |
+| ------ | ---- | ------- | ----------- |
+| `url` | string | **required** | Target URL for HTTP requests |
+| `method` | string | `POST` | HTTP method: `GET`, `HEAD`, `POST`, `PUT`, 
`PATCH`, `DELETE` |

Review Comment:
   the method list includes `GET` and `HEAD` without flagging that those 
methods with non-`individual` batch modes produce a warning at runtime ("may be 
rejected by the server") since GET/HEAD don't conventionally carry bodies. 
either drop `GET`/`HEAD` from this list (rarely useful for a sink) or add a 
one-liner about the batch-mode interaction.



##########
content/docs/connectors/sinks/http.mdx:
##########
@@ -0,0 +1,179 @@
+---
+title: HTTP Sink
+---
+
+<Callout type="info">
+  This page is a curated subset of the HTTP Sink documentation. The canonical 
reference — including deployment patterns, performance tuning, and the full set 
of configuration options and limitations — is the upstream 
[`http_sink/README.md`](https://github.com/apache/iggy/tree/master/core/connectors/sinks/http_sink)
 in the `apache/iggy` repository.
+</Callout>
+
+The HTTP Sink connector consumes messages from Iggy streams and delivers them 
to any HTTP endpoint — webhooks, REST APIs, serverless functions, or SaaS 
integrations. It is generic by design: you bring the URL, headers, and batching 
strategy, and the sink handles transport, retries, and metadata wrapping.
+
+## Configuration
+
+```toml
+[[streams]]
+stream = "events"
+topics = ["notifications"]
+schema = "json"
+batch_length = 50
+poll_interval = "100ms"
+consumer_group = "http_sink"
+
+[plugin_config]
+url = "https://api.example.com/ingest";
+batch_mode = "ndjson"
+```
+
+### Configuration Options
+
+| Option | Type | Default | Description |
+| ------ | ---- | ------- | ----------- |
+| `url` | string | **required** | Target URL for HTTP requests |
+| `method` | string | `POST` | HTTP method: `GET`, `HEAD`, `POST`, `PUT`, 
`PATCH`, `DELETE` |
+| `timeout` | string | `30s` | Request timeout (e.g. `10s`, `500ms`) |
+| `max_payload_size_bytes` | u64 | `10485760` | Max body size in bytes (10MB). 
`0` to disable |
+| `batch_mode` | string | `individual` | `individual`, `ndjson`, `json_array`, 
or `raw` |
+| `include_metadata` | bool | `true` | Wrap payload in metadata envelope |
+| `include_checksum` | bool | `false` | Add message checksum to metadata |
+| `include_origin_timestamp` | bool | `false` | Add origin timestamp to 
metadata |
+| `health_check_enabled` | bool | `false` | Send health check request in 
`open()` |
+| `health_check_method` | string | `HEAD` | HTTP method for health check |
+| `max_retries` | u32 | `3` | Retry attempts for transient errors |
+| `retry_delay` | string | `1s` | Base delay between retries |
+| `retry_backoff_multiplier` | u32 | `2` | Exponential backoff multiplier (min 
1) |
+| `max_retry_delay` | string | `30s` | Maximum retry delay cap |
+| `success_status_codes` | [u16] | `[200, 201, 202, 204]` | Status codes 
considered successful |
+| `tls_danger_accept_invalid_certs` | bool | `false` | Skip TLS certificate 
validation |
+| `max_connections` | usize | `10` | Max idle connections per host |
+| `verbose_logging` | bool | `false` | Log request/response details at debug 
level |
+| `headers` | table | `{}` | Custom HTTP headers (e.g. `Authorization`) |
+
+## Batch Modes
+
+The `batch_mode` option controls how messages from one poll cycle are 
delivered to the endpoint.
+
+- **`individual`** (default): one HTTP request per message. Best for webhooks 
and endpoints that accept single events. With `batch_length = 50`, this 
produces 50 sequential round trips per poll cycle.
+- **`ndjson`**: all messages in one request, [newline-delimited 
JSON](https://github.com/ndjson/ndjson-spec). Best for bulk-ingestion 
endpoints. `Content-Type: application/x-ndjson`.
+- **`json_array`**: all messages as a single JSON array. Best for APIs that 
expect array payloads. `Content-Type: application/json`.
+- **`raw`**: raw bytes, one request per message. For non-JSON payloads 
(Protobuf, FlatBuffers, binary). The metadata envelope is not applied. 
`Content-Type: application/octet-stream`.
+
+For production throughput, prefer `ndjson` or `json_array` over `individual` — 
they collapse N round trips per poll cycle into one.

Review Comment:
   `raw` has the same N-round-trips problem as `individual` (one HTTP request 
per message - confirmed at `send_raw` in `lib.rs`) but isn't called out 
alongside in this throughput note. either include `raw` here or be explicit 
that `raw` shares the per-message cost.



##########
content/docs/connectors/sinks/http.mdx:
##########
@@ -0,0 +1,179 @@
+---
+title: HTTP Sink
+---
+
+<Callout type="info">
+  This page is a curated subset of the HTTP Sink documentation. The canonical 
reference — including deployment patterns, performance tuning, and the full set 
of configuration options and limitations — is the upstream 
[`http_sink/README.md`](https://github.com/apache/iggy/tree/master/core/connectors/sinks/http_sink)
 in the `apache/iggy` repository.
+</Callout>
+
+The HTTP Sink connector consumes messages from Iggy streams and delivers them 
to any HTTP endpoint — webhooks, REST APIs, serverless functions, or SaaS 
integrations. It is generic by design: you bring the URL, headers, and batching 
strategy, and the sink handles transport, retries, and metadata wrapping.
+
+## Configuration
+
+```toml
+[[streams]]
+stream = "events"
+topics = ["notifications"]
+schema = "json"
+batch_length = 50
+poll_interval = "100ms"
+consumer_group = "http_sink"
+
+[plugin_config]
+url = "https://api.example.com/ingest";
+batch_mode = "ndjson"
+```
+
+### Configuration Options
+
+| Option | Type | Default | Description |
+| ------ | ---- | ------- | ----------- |
+| `url` | string | **required** | Target URL for HTTP requests |
+| `method` | string | `POST` | HTTP method: `GET`, `HEAD`, `POST`, `PUT`, 
`PATCH`, `DELETE` |
+| `timeout` | string | `30s` | Request timeout (e.g. `10s`, `500ms`) |
+| `max_payload_size_bytes` | u64 | `10485760` | Max body size in bytes (10MB). 
`0` to disable |
+| `batch_mode` | string | `individual` | `individual`, `ndjson`, `json_array`, 
or `raw` |
+| `include_metadata` | bool | `true` | Wrap payload in metadata envelope |
+| `include_checksum` | bool | `false` | Add message checksum to metadata |
+| `include_origin_timestamp` | bool | `false` | Add origin timestamp to 
metadata |
+| `health_check_enabled` | bool | `false` | Send health check request in 
`open()` |
+| `health_check_method` | string | `HEAD` | HTTP method for health check |
+| `max_retries` | u32 | `3` | Retry attempts for transient errors |
+| `retry_delay` | string | `1s` | Base delay between retries |
+| `retry_backoff_multiplier` | u32 | `2` | Exponential backoff multiplier (min 
1) |
+| `max_retry_delay` | string | `30s` | Maximum retry delay cap |
+| `success_status_codes` | [u16] | `[200, 201, 202, 204]` | Status codes 
considered successful |
+| `tls_danger_accept_invalid_certs` | bool | `false` | Skip TLS certificate 
validation |
+| `max_connections` | usize | `10` | Max idle connections per host |
+| `verbose_logging` | bool | `false` | Log request/response details at debug 
level |
+| `headers` | table | `{}` | Custom HTTP headers (e.g. `Authorization`) |
+
+## Batch Modes
+
+The `batch_mode` option controls how messages from one poll cycle are 
delivered to the endpoint.
+
+- **`individual`** (default): one HTTP request per message. Best for webhooks 
and endpoints that accept single events. With `batch_length = 50`, this 
produces 50 sequential round trips per poll cycle.
+- **`ndjson`**: all messages in one request, [newline-delimited 
JSON](https://github.com/ndjson/ndjson-spec). Best for bulk-ingestion 
endpoints. `Content-Type: application/x-ndjson`.
+- **`json_array`**: all messages as a single JSON array. Best for APIs that 
expect array payloads. `Content-Type: application/json`.
+- **`raw`**: raw bytes, one request per message. For non-JSON payloads 
(Protobuf, FlatBuffers, binary). The metadata envelope is not applied. 
`Content-Type: application/octet-stream`.
+
+For production throughput, prefer `ndjson` or `json_array` over `individual` — 
they collapse N round trips per poll cycle into one.
+
+## Metadata Envelope
+
+When `include_metadata = true` (default), the JSON-mode payload is wrapped:
+
+```json
+{
+  "metadata": {
+    "iggy_id": "0123456789abcdef0123456789abcdef",
+    "iggy_offset": 42,
+    "iggy_timestamp": 1710064800000000,
+    "iggy_stream": "my_stream",
+    "iggy_topic": "my_topic",
+    "iggy_partition_id": 0
+  },
+  "payload": { ... }
+}
+```
+
+- `iggy_id` is a 32-character lowercase hex string (no dashes).
+- For non-JSON payloads (`raw`, `flatbuffer`, `proto` schemas), the payload is 
base64-encoded and an `iggy_payload_encoding: "base64"` field is added.

Review Comment:
   the prose says "an `iggy_payload_encoding: "base64"` field is added" but 
never shows the actual JSON shape. for `raw`/`flatbuffer`/`proto` schemas the 
sink emits the payload as `{"data": "<base64>", "iggy_payload_encoding": 
"base64"}` (see `EncodedPayload` struct). worth a 4-line example block - 
non-obvious that the bytes live in a `data` field.



##########
content/docs/connectors/sinks/http.mdx:
##########
@@ -0,0 +1,179 @@
+---
+title: HTTP Sink
+---
+
+<Callout type="info">
+  This page is a curated subset of the HTTP Sink documentation. The canonical 
reference — including deployment patterns, performance tuning, and the full set 
of configuration options and limitations — is the upstream 
[`http_sink/README.md`](https://github.com/apache/iggy/tree/master/core/connectors/sinks/http_sink)
 in the `apache/iggy` repository.
+</Callout>
+
+The HTTP Sink connector consumes messages from Iggy streams and delivers them 
to any HTTP endpoint — webhooks, REST APIs, serverless functions, or SaaS 
integrations. It is generic by design: you bring the URL, headers, and batching 
strategy, and the sink handles transport, retries, and metadata wrapping.
+
+## Configuration
+
+```toml
+[[streams]]
+stream = "events"
+topics = ["notifications"]
+schema = "json"
+batch_length = 50
+poll_interval = "100ms"
+consumer_group = "http_sink"
+
+[plugin_config]
+url = "https://api.example.com/ingest";
+batch_mode = "ndjson"
+```
+
+### Configuration Options
+
+| Option | Type | Default | Description |

Review Comment:
   this whole config table is a verbatim copy of the upstream 
`core/connectors/sinks/http_sink/README.md`. defaults, types, and the transient 
retry codes (429/500/502/503/504 a few sections down) are all hardcoded 
constants in `lib.rs` and will silently drift the moment that file changes. 
options: trim this to the 5-6 most-used knobs plus a clear pointer to the 
upstream README for the full list, or add a CI step that diffs this table 
against the upstream README on each build. note `MAX_CONSECUTIVE_FAILURES = 3` 
is also a const but only surfaced in prose - if it changes upstream nothing 
flags it here.
   
   personally, i prefer linking upstream README.



##########
content/docs/connectors/sinks/http.mdx:
##########
@@ -0,0 +1,179 @@
+---
+title: HTTP Sink
+---
+
+<Callout type="info">
+  This page is a curated subset of the HTTP Sink documentation. The canonical 
reference — including deployment patterns, performance tuning, and the full set 
of configuration options and limitations — is the upstream 
[`http_sink/README.md`](https://github.com/apache/iggy/tree/master/core/connectors/sinks/http_sink)
 in the `apache/iggy` repository.
+</Callout>
+
+The HTTP Sink connector consumes messages from Iggy streams and delivers them 
to any HTTP endpoint — webhooks, REST APIs, serverless functions, or SaaS 
integrations. It is generic by design: you bring the URL, headers, and batching 
strategy, and the sink handles transport, retries, and metadata wrapping.
+
+## Configuration
+
+```toml
+[[streams]]
+stream = "events"
+topics = ["notifications"]
+schema = "json"
+batch_length = 50
+poll_interval = "100ms"
+consumer_group = "http_sink"
+
+[plugin_config]
+url = "https://api.example.com/ingest";
+batch_mode = "ndjson"
+```
+
+### Configuration Options
+
+| Option | Type | Default | Description |
+| ------ | ---- | ------- | ----------- |
+| `url` | string | **required** | Target URL for HTTP requests |
+| `method` | string | `POST` | HTTP method: `GET`, `HEAD`, `POST`, `PUT`, 
`PATCH`, `DELETE` |
+| `timeout` | string | `30s` | Request timeout (e.g. `10s`, `500ms`) |
+| `max_payload_size_bytes` | u64 | `10485760` | Max body size in bytes (10MB). 
`0` to disable |
+| `batch_mode` | string | `individual` | `individual`, `ndjson`, `json_array`, 
or `raw` |
+| `include_metadata` | bool | `true` | Wrap payload in metadata envelope |
+| `include_checksum` | bool | `false` | Add message checksum to metadata |
+| `include_origin_timestamp` | bool | `false` | Add origin timestamp to 
metadata |
+| `health_check_enabled` | bool | `false` | Send health check request in 
`open()` |
+| `health_check_method` | string | `HEAD` | HTTP method for health check |
+| `max_retries` | u32 | `3` | Retry attempts for transient errors |
+| `retry_delay` | string | `1s` | Base delay between retries |
+| `retry_backoff_multiplier` | u32 | `2` | Exponential backoff multiplier (min 
1) |
+| `max_retry_delay` | string | `30s` | Maximum retry delay cap |
+| `success_status_codes` | [u16] | `[200, 201, 202, 204]` | Status codes 
considered successful |

Review Comment:
   the description is fine but misses a useful behavior: codes in this set are 
also never retried, even normally-transient ones like 429. so users who want to 
treat 429 as "queued/accepted" can put 429 here and it will short-circuit 
retries. worth a sentence on this - it's a non-obvious knob.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to