Sure you can test with the Sink of my PR branch. Otherwise I'll do the test after ApacheCon.
Le mar. 27 sept. 2022 à 12:57, tison <wander4...@gmail.com> a écrit : > Yes. It's a potential use case for validating the implementation. If you > don't have time to try it out, I can schedule some time to demo it with a > prototype HTTP sink or after the patch gets merged :) > > Best, > tison. > > > Christophe Bornet <bornet.ch...@gmail.com> 于2022年9月27日周二 18:51写道: > > > Hi Tison, > > > > Very interesting and shows the value of such a HTTP Sink. > > The Pulsar HTTP Sink should work OOTB with ClickHouse. I don't have time > to > > do the test right now, so would someone want to do it ? > > > > Best regards. > > > > Christophe Bornet > > > > Le mar. 27 sept. 2022 à 12:31, tison <wander4...@gmail.com> a écrit : > > > > > Hi Christophe, > > > > > > Thanks for starting this proposal. It looks cool. > > > > > > I'd suggest one real-world integration test you can make use of: > > > https://clickhouse.com/docs/en/integrations/kafka/kafka-connect-http > > > (replace source kafka with pulsar). > > > > > > Best, > > > tison. > > > > > > > > > Enrico Olivelli <eolive...@gmail.com> 于2022年9月27日周二 18:04写道: > > > > > > > Thanks for your answers. > > > > I am fine with the current proposal. > > > > We can enhance it as follow up work > > > > > > > > Enrico > > > > > > > > Il giorno ven 23 set 2022 alle ore 19:20 Christophe Bornet > > > > <bornet.ch...@gmail.com> ha scritto: > > > > > > > > > > Thanks for your feedback Enrico. > > > > > My answers to your comments below > > > > > > > > > > BR > > > > > > > > > > Christophe > > > > > > > > > > Le mar. 20 sept. 2022 à 14:16, Enrico Olivelli < > eolive...@gmail.com> > > a > > > > > écrit : > > > > > > > > > > > Christophe, > > > > > > very good initiative! > > > > > > > > > > > > I support it > > > > > > Some comments inline below > > > > > > > > > > > > > > > > > > Enrico > > > > > > > > > > > > Il giorno lun 19 set 2022 alle ore 19:10 Christophe Bornet > > > > > > <bornet.ch...@gmail.com> ha scritto: > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > I have drafted PIP-208: HTTP Sink > > > > > > > > > > > > > > PIP link: > > > > > > > https://github.com/apache/pulsar/issues/17719 > > > > > > > > > > > > > > Here's a copy of the contents of the GH issue for your > > references: > > > > > > > > > > > > > > ### Motivation > > > > > > > > > > > > > > Currently, when you want to consume from Pulsar topics in > > > > applications > > > > > > > written in languages that don't have a Pulsar driver supported, > > you > > > > need > > > > > > to > > > > > > > run some type of proxy like the WebSocket Proxy or Pulsar Beam. > > In > > > > > > > production this needs additional effort to deploy, scale, load > > > > balance, > > > > > > > monitor, and so on... > > > > > > > Pulsar IO is a framework that deals with all these operational > > > > subjects > > > > > > and > > > > > > > can be leveraged to provide a way to push messages to external > > > > systems > > > > > > > using HTTP, a protocol supported by every existing language and > > OS. > > > > > > > > > > > > > > ### Goal > > > > > > > > > > > > > > This proposal defines an HTTP Sink that sends the messages to a > > > > > > configured > > > > > > > URL. > > > > > > > It takes inspiration from [Pulsar Beam]( > > > > > > > https://github.com/kafkaesque-io/pulsar-beam) and the > [Confluent > > > > HTTP > > > > > > Sink > > > > > > > connector]( > > > > > > > > > > > > https://docs.confluent.io/kafka-connectors/http/current/overview.html > > ). > > > > > > > > > > > > > > > > > > > > > ### Implementation > > > > > > > > > > > > > > A `pulsar-io-http` module will be added to `pulsar-io`. > > > > > > > On building the project `pulsar-io-http-{version}.nar` will be > > > built > > > > and > > > > > > > added to the `pulsar-all` distribution. > > > > > > > The name of the Sink will be `http`. > > > > > > > > > > > > > > The HTTP Sink pushes records to any HTTP server with the record > > > > value in > > > > > > > the body of a POST method. > > > > > > > The body of the HTTP request is the JSON representation of the > > > record > > > > > > value. > > > > > > > > > > > > What do you mean ? > > > > > > I think that this should depend on the Schema. > > > > > > > > > > > > BYTES SCHEMA -> I would push the raw message payload > > > > > > PRIMITIVE VALUES (long, integer, string) - > I would push the > JSON > > > > > > represantation > > > > > > JSON SCHEMA -> push the raw message payload > > > > > > AVRO -> ? convert to JSON ? > > > > > > PROTOBUF -> ? convert to JSON ? > > > > > > KEY-VALUE ? > > > > > > > > > > > > Probably we need some flag to define the behaviour for the non > > > trivial > > > > > > cases. > > > > > > > > > > > > The current impl chooses to serialize as JSON because it's a well > > > > > supported content-type on the server frameworks. > > > > > It's also to be consistent with existing HTTP Sinks such as Pulsar > > Bean > > > > and > > > > > Confluent HTTP Sink Connector. > > > > > The possibility to adapt the content-type to the schema is elegant > > and > > > > will > > > > > probably result in shorter payloads (but less readable) and I think > > it > > > > > could be done as a follow-up option. > > > > > It has indeed the problem of being difficult to do for KV schema. > > > > > For the content-type mappings I would do: > > > > > BYTES SCHEMA -> application/octet-stream (raw bytes) > > > > > PRIMITIVE VALUES (long, integer, string) - > text/plain > > > > > JSON -> application/json > > > > > AVRO -> avro/binary > > > > > PROTOBUF -> probably application/octet-stream ? > > > > > KEY-VALUE ? > > > > > > > > > > Would also need to indicate the Schema-Type in the HTTP headers. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Some headers are added to the HTTP request: > > > > > > > * `PulsarTopic`: the topic of the record > > > > > > > * `PulsarKey`: the key of the record > > > > > > > * `PulsarEventTime`: the event time of the record > > > > > > > * `PulsarPublishTime`: the publish time of the record > > > > > > > * `PulsarMessageId`: the ID of the message contained in the > > record > > > > > > > * `PulsarProperties-*`: each record property is passed with the > > > > property > > > > > > > name prefixed by `PulsarProperties-` > > > > > > > > > > > > > > > > > > > Can we make the "Content-Type" configurable ? > > > > > > > > > > > Yes we can. But do we do it for the first iteration ? > > > > > If we do it, I would have an option to add some fix headers and the > > > user > > > > > can override the content-type. > > > > > If we go for a variable content-type depending on the schema, then > we > > > > could > > > > > have a map<SchemaType, content-type> > > > > > > > > > > > Can we make the HTTP METHOD configurable ? > > > > > > > > > > > Yes we can. But do we do it for the first iteration ? > > > > > > > > > > > > > > > > > > ### Alternatives > > > > > > > > > > > > > > Creating a separated project for this Sink is rejected since: > > > > > > > * this Sink is very useful for developers to test the Pulsar IO > > > > > > framework, > > > > > > > transform functions, and to make demos. > > > > > > > * the code has a very small footprint with no external > > > dependencies. > > > > > > > * it should be visible at the same level as other sinks > > > > > > > > > > > > 100% agreed ! > > > > > > > > > > > > > > > > > > > > I'm looking forward the discussion. > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > > Christophe Bornet > > > > > > > > > > > > > > > >