Hi Christophe, I think this is a very good idea!
I agree with Enrico that the body should depend on the record schema, but it could also be done as a follow-up task. Another thing to think about could be an optional batching mechanism that would take a batch of records and send them as a list of JSON objects in a single HTTP request. Best, Alex On Tue, Sep 20, 2022 at 2:16 PM Enrico Olivelli <eolive...@gmail.com> wrote: > Christophe, > very good initiative! > > I support it > Some comments inline below > > > Enrico > > Il giorno lun 19 set 2022 alle ore 19:10 Christophe Bornet > <bornet.ch...@gmail.com> ha scritto: > > > > Hi all, > > > > I have drafted PIP-208: HTTP Sink > > > > PIP link: > > https://github.com/apache/pulsar/issues/17719 > > > > Here's a copy of the contents of the GH issue for your references: > > > > ### Motivation > > > > Currently, when you want to consume from Pulsar topics in applications > > written in languages that don't have a Pulsar driver supported, you need > to > > run some type of proxy like the WebSocket Proxy or Pulsar Beam. In > > production this needs additional effort to deploy, scale, load balance, > > monitor, and so on... > > Pulsar IO is a framework that deals with all these operational subjects > and > > can be leveraged to provide a way to push messages to external systems > > using HTTP, a protocol supported by every existing language and OS. > > > > ### Goal > > > > This proposal defines an HTTP Sink that sends the messages to a > configured > > URL. > > It takes inspiration from [Pulsar Beam]( > > https://github.com/kafkaesque-io/pulsar-beam) and the [Confluent HTTP > Sink > > connector]( > > https://docs.confluent.io/kafka-connectors/http/current/overview.html). > > > > > > ### Implementation > > > > A `pulsar-io-http` module will be added to `pulsar-io`. > > On building the project `pulsar-io-http-{version}.nar` will be built and > > added to the `pulsar-all` distribution. > > The name of the Sink will be `http`. > > > > The HTTP Sink pushes records to any HTTP server with the record value in > > the body of a POST method. > > The body of the HTTP request is the JSON representation of the record > value. > > What do you mean ? > I think that this should depend on the Schema. > > BYTES SCHEMA -> I would push the raw message payload > PRIMITIVE VALUES (long, integer, string) - > I would push the JSON > represantation > JSON SCHEMA -> push the raw message payload > AVRO -> ? convert to JSON ? > PROTOBUF -> ? convert to JSON ? > KEY-VALUE ? > > Probably we need some flag to define the behaviour for the non trivial > cases. > > > > > > Some headers are added to the HTTP request: > > * `PulsarTopic`: the topic of the record > > * `PulsarKey`: the key of the record > > * `PulsarEventTime`: the event time of the record > > * `PulsarPublishTime`: the publish time of the record > > * `PulsarMessageId`: the ID of the message contained in the record > > * `PulsarProperties-*`: each record property is passed with the property > > name prefixed by `PulsarProperties-` > > > > Can we make the "Content-Type" configurable ? > Can we make the HTTP METHOD configurable ? > > > > ### Alternatives > > > > Creating a separated project for this Sink is rejected since: > > * this Sink is very useful for developers to test the Pulsar IO > framework, > > transform functions, and to make demos. > > * the code has a very small footprint with no external dependencies. > > * it should be visible at the same level as other sinks > > 100% agreed ! > > > > > I'm looking forward the discussion. > > > > Best regards, > > > > Christophe Bornet >