Christophe, very good initiative! I support it Some comments inline below
Enrico Il giorno lun 19 set 2022 alle ore 19:10 Christophe Bornet <bornet.ch...@gmail.com> ha scritto: > > Hi all, > > I have drafted PIP-208: HTTP Sink > > PIP link: > https://github.com/apache/pulsar/issues/17719 > > Here's a copy of the contents of the GH issue for your references: > > ### Motivation > > Currently, when you want to consume from Pulsar topics in applications > written in languages that don't have a Pulsar driver supported, you need to > run some type of proxy like the WebSocket Proxy or Pulsar Beam. In > production this needs additional effort to deploy, scale, load balance, > monitor, and so on... > Pulsar IO is a framework that deals with all these operational subjects and > can be leveraged to provide a way to push messages to external systems > using HTTP, a protocol supported by every existing language and OS. > > ### Goal > > This proposal defines an HTTP Sink that sends the messages to a configured > URL. > It takes inspiration from [Pulsar Beam]( > https://github.com/kafkaesque-io/pulsar-beam) and the [Confluent HTTP Sink > connector]( > https://docs.confluent.io/kafka-connectors/http/current/overview.html). > > > ### Implementation > > A `pulsar-io-http` module will be added to `pulsar-io`. > On building the project `pulsar-io-http-{version}.nar` will be built and > added to the `pulsar-all` distribution. > The name of the Sink will be `http`. > > The HTTP Sink pushes records to any HTTP server with the record value in > the body of a POST method. > The body of the HTTP request is the JSON representation of the record value. What do you mean ? I think that this should depend on the Schema. BYTES SCHEMA -> I would push the raw message payload PRIMITIVE VALUES (long, integer, string) - > I would push the JSON represantation JSON SCHEMA -> push the raw message payload AVRO -> ? convert to JSON ? PROTOBUF -> ? convert to JSON ? KEY-VALUE ? Probably we need some flag to define the behaviour for the non trivial cases. > > Some headers are added to the HTTP request: > * `PulsarTopic`: the topic of the record > * `PulsarKey`: the key of the record > * `PulsarEventTime`: the event time of the record > * `PulsarPublishTime`: the publish time of the record > * `PulsarMessageId`: the ID of the message contained in the record > * `PulsarProperties-*`: each record property is passed with the property > name prefixed by `PulsarProperties-` > Can we make the "Content-Type" configurable ? Can we make the HTTP METHOD configurable ? > ### Alternatives > > Creating a separated project for this Sink is rejected since: > * this Sink is very useful for developers to test the Pulsar IO framework, > transform functions, and to make demos. > * the code has a very small footprint with no external dependencies. > * it should be visible at the same level as other sinks 100% agreed ! > > I'm looking forward the discussion. > > Best regards, > > Christophe Bornet