Christophe,
very good initiative!

I support it
Some comments inline below


Enrico

Il giorno lun 19 set 2022 alle ore 19:10 Christophe Bornet
<bornet.ch...@gmail.com> ha scritto:
>
> Hi all,
>
> I have drafted PIP-208: HTTP Sink
>
> PIP link:
> https://github.com/apache/pulsar/issues/17719
>
> Here's a copy of the contents of the GH issue for your references:
>
> ### Motivation
>
> Currently, when you want to consume from Pulsar topics in applications
> written in languages that don't have a Pulsar driver supported, you need to
> run some type of proxy like the WebSocket Proxy or Pulsar Beam. In
> production this needs additional effort to deploy, scale, load balance,
> monitor, and so on...
> Pulsar IO is a framework that deals with all these operational subjects and
> can be leveraged to provide a way to push messages to external systems
> using HTTP, a protocol supported by every existing language and OS.
>
> ### Goal
>
> This proposal defines an HTTP Sink that sends the messages to a configured
> URL.
> It takes inspiration from [Pulsar Beam](
> https://github.com/kafkaesque-io/pulsar-beam) and the [Confluent HTTP Sink
> connector](
> https://docs.confluent.io/kafka-connectors/http/current/overview.html).
>
>
> ### Implementation
>
> A `pulsar-io-http` module will be added to `pulsar-io`.
> On building the project `pulsar-io-http-{version}.nar` will be built and
> added to the `pulsar-all` distribution.
> The name of the Sink will be `http`.
>
> The HTTP Sink pushes records to any HTTP server with the record value in
> the body of a POST method.
> The body of the HTTP request is the JSON representation of the record value.

What do you mean ?
I think that this should depend on the Schema.

BYTES SCHEMA -> I would push the raw message payload
PRIMITIVE VALUES (long, integer, string) - > I would push the JSON
represantation
JSON SCHEMA ->  push the raw message payload
AVRO -> ?  convert to JSON ?
PROTOBUF -> ? convert to JSON ?
KEY-VALUE ?

Probably we need some flag to define the behaviour for the non trivial cases.


>
> Some headers are added to the HTTP request:
> * `PulsarTopic`: the topic of the record
> * `PulsarKey`: the key of the record
> * `PulsarEventTime`: the event time of the record
> * `PulsarPublishTime`: the publish time of the record
> * `PulsarMessageId`: the ID of the message contained in the record
> * `PulsarProperties-*`: each record property is passed with the property
> name prefixed by `PulsarProperties-`
>

Can we make the "Content-Type" configurable ?
Can we make the HTTP METHOD configurable ?


> ### Alternatives
>
> Creating a separated project for this Sink is rejected since:
> * this Sink is very useful for developers to test the Pulsar IO framework,
> transform functions, and to make demos.
> * the code has a very small footprint with no external dependencies.
> * it should be visible at the same level as other sinks

100% agreed !

>
> I'm looking forward the discussion.
>
> Best regards,
>
> Christophe Bornet

Reply via email to