Thanks for your feedback Enrico. My answers to your comments below BR
Christophe Le mar. 20 sept. 2022 à 14:16, Enrico Olivelli <eolive...@gmail.com> a écrit : > Christophe, > very good initiative! > > I support it > Some comments inline below > > > Enrico > > Il giorno lun 19 set 2022 alle ore 19:10 Christophe Bornet > <bornet.ch...@gmail.com> ha scritto: > > > > Hi all, > > > > I have drafted PIP-208: HTTP Sink > > > > PIP link: > > https://github.com/apache/pulsar/issues/17719 > > > > Here's a copy of the contents of the GH issue for your references: > > > > ### Motivation > > > > Currently, when you want to consume from Pulsar topics in applications > > written in languages that don't have a Pulsar driver supported, you need > to > > run some type of proxy like the WebSocket Proxy or Pulsar Beam. In > > production this needs additional effort to deploy, scale, load balance, > > monitor, and so on... > > Pulsar IO is a framework that deals with all these operational subjects > and > > can be leveraged to provide a way to push messages to external systems > > using HTTP, a protocol supported by every existing language and OS. > > > > ### Goal > > > > This proposal defines an HTTP Sink that sends the messages to a > configured > > URL. > > It takes inspiration from [Pulsar Beam]( > > https://github.com/kafkaesque-io/pulsar-beam) and the [Confluent HTTP > Sink > > connector]( > > https://docs.confluent.io/kafka-connectors/http/current/overview.html). > > > > > > ### Implementation > > > > A `pulsar-io-http` module will be added to `pulsar-io`. > > On building the project `pulsar-io-http-{version}.nar` will be built and > > added to the `pulsar-all` distribution. > > The name of the Sink will be `http`. > > > > The HTTP Sink pushes records to any HTTP server with the record value in > > the body of a POST method. > > The body of the HTTP request is the JSON representation of the record > value. > > What do you mean ? > I think that this should depend on the Schema. > > BYTES SCHEMA -> I would push the raw message payload > PRIMITIVE VALUES (long, integer, string) - > I would push the JSON > represantation > JSON SCHEMA -> push the raw message payload > AVRO -> ? convert to JSON ? > PROTOBUF -> ? convert to JSON ? > KEY-VALUE ? > > Probably we need some flag to define the behaviour for the non trivial > cases. > > The current impl chooses to serialize as JSON because it's a well supported content-type on the server frameworks. It's also to be consistent with existing HTTP Sinks such as Pulsar Bean and Confluent HTTP Sink Connector. The possibility to adapt the content-type to the schema is elegant and will probably result in shorter payloads (but less readable) and I think it could be done as a follow-up option. It has indeed the problem of being difficult to do for KV schema. For the content-type mappings I would do: BYTES SCHEMA -> application/octet-stream (raw bytes) PRIMITIVE VALUES (long, integer, string) - > text/plain JSON -> application/json AVRO -> avro/binary PROTOBUF -> probably application/octet-stream ? KEY-VALUE ? Would also need to indicate the Schema-Type in the HTTP headers. > > > > > Some headers are added to the HTTP request: > > * `PulsarTopic`: the topic of the record > > * `PulsarKey`: the key of the record > > * `PulsarEventTime`: the event time of the record > > * `PulsarPublishTime`: the publish time of the record > > * `PulsarMessageId`: the ID of the message contained in the record > > * `PulsarProperties-*`: each record property is passed with the property > > name prefixed by `PulsarProperties-` > > > > Can we make the "Content-Type" configurable ? > Yes we can. But do we do it for the first iteration ? If we do it, I would have an option to add some fix headers and the user can override the content-type. If we go for a variable content-type depending on the schema, then we could have a map<SchemaType, content-type> > Can we make the HTTP METHOD configurable ? > Yes we can. But do we do it for the first iteration ? > > > ### Alternatives > > > > Creating a separated project for this Sink is rejected since: > > * this Sink is very useful for developers to test the Pulsar IO > framework, > > transform functions, and to make demos. > > * the code has a very small footprint with no external dependencies. > > * it should be visible at the same level as other sinks > > 100% agreed ! > > > > > I'm looking forward the discussion. > > > > Best regards, > > > > Christophe Bornet >