Re: Updates on PIP-62

2020-11-15 Thread Jia Zhai
+1

Best Regards.


Jia Zhai

Beijing, China

Mobile: +86 15810491983




On Fri, Nov 13, 2020 at 12:32 PM Sijie Guo  wrote:

> I have cloned the changes to pulsar-connectors, pulsar-adapters, and
> pulsar-presto. They can be built independently.
>
> However, there is one challenge that we release the `pulsar-all` image to
> include all connectors, offloaders, and presto. And we also use the image
> for integration tests. It is hard to remove connectors and presto without
> changing too much the release process. Hence after discussing with Penghui,
> I am holding off deleting those modules after releasing 2.7.0 and will
> resume PIP-62 for 2.8.0 release.
>
> Thanks,
> Sijie
>
> On Sat, Nov 7, 2020 at 12:36 AM Jia Zhai  wrote:
>
> > 👍
> >
> > Best Regards.
> >
> >
> > Jia Zhai
> >
> > Beijing, China
> >
> > Mobile: +86 15810491983
> >
> >
> >
> >
> > On Sat, Nov 7, 2020 at 3:40 AM Enrico Olivelli 
> > wrote:
> >
> > > Great !
> > > We will make the repo very slighter !
> > >
> > > Enrico
> > >
> > > Il giorno ven 6 nov 2020 alle ore 18:45 Sijie Guo 
> > ha
> > > scritto:
> > >
> > > > Hi all,
> > > >
> > > > I am starting this thread to provide updates on the progress of
> PIP-62.
> > > >
> > > > Now I have moved all the pulsar-flink / pulsar-spark / pulsar-storm /
> > > > pulsar-kafka-wrapper / pulsar-log4j2-appender to
> > > > https://github.com/apache/pulsar-adapters.
> > > >
> > > > Going to delete those modules from the main repo.
> > > >
> > > > - Sijie
> > > >
> > >
> >
>


Re: Proposal for Consumer Filtering in Pulsar brokers

2020-11-15 Thread Jia Zhai
Hi Andre,
Thanks for this proposal. Besides Sijie's comments, there is also a PIP 70:
https://github.com/apache/pulsar/wiki/PIP-70%3A-Introduce-lightweight-raw-Message-metadata,
do you think it could help on this proposal?  We have discussed this
consumer filter before, and the performance and penalty to the broker is
also a big concern.


Best Regards.


Jia Zhai

Beijing, China

Mobile: +86 15810491983




On Sat, Nov 14, 2020 at 3:08 AM Sijie Guo  wrote:

> Andre,
>
> Is it possible to put it in a Google Doc (or similar collaboration tool)
> that allows other people to make comments? Also, it would be easier for the
> committers to copy the PIP to Pulsar wiki pages.
>
> Thanks,
> Sijie
>
> On Fri, Nov 13, 2020 at 2:44 AM Kramer, Andre  >
> wrote:
>
> > Hi Sijie,
> >
> > I had added a PIP style document to the pull request:
> >
> https://github.com/andrekramer1/pulsar/blob/consumer-filter2-7-0/PIP-XX%20-%20Consumer-filtering.pdf
> > Hopefully that could be used to start the discussion?
> >
> > Regards,
> > Andre
> >
> > -Original Message-
> > From: Sijie Guo 
> > Sent: 12 November 2020 18:32
> > To: Dev 
> > Subject: Re: Proposal for Consumer Filtering in Pulsar brokers
> >
> > Hi Andre,
> >
> > I didn't see the attached writeup. Can you write a PIP for this feature?
> > Given it is a big feature, it would be good to discuss it through a PIP.
> >
> > - Sijie
> >
> > On Thu, Nov 12, 2020 at 6:17 AM Kramer, Andre <
> andre.kra...@softwareag.com
> > >
> > wrote:
> >
> > > Hello everyone,
> > >
> > >
> > >
> > > We at Software AG have prototyped adding filtering on Consumer
> > > subscriptions in the Pulsar broker and are submitting our changes for
> > > consideration under Apache 2.0 license. Please see pull request
> > > [Consumer Filtering #8544 https://github.com/apache/pulsar/pull/8544]
> > > and attached write up. Comments welcome!
> > >
> > >
> > >
> > > Thanks,
> > >
> > > Andre
> > >
> > >
> > >
> > > andre.kra...@softwareag.com
> > > This communication contains information which is confidential and may
> > > also be privileged. It is for the exclusive use of the intended
> > > recipient(s). If you are not the intended recipient(s), please note
> > > that any distribution, copying, or use of this communication or the
> > > information in it, is strictly prohibited. If you have received this
> > > communication in error please notify us by e-mail and then delete the
> > e-mail and any copies of it.
> > > Software AG (UK) Limited Registered in England & Wales 1310740 -
> > > *http://www.softwareag.com/uk
> > > * 
> > >
> > This communication contains information which is confidential and may
> also
> > be privileged. It is for the exclusive use of the intended recipient(s).
> If
> > you are not the intended recipient(s), please note that any distribution,
> > copying, or use of this communication or the information in it, is
> strictly
> > prohibited. If you have received this communication in error please
> notify
> > us by e-mail and then delete the e-mail and any copies of it.
> > Software AG (UK) Limited Registered in England & Wales 1310740 -
> > http://www.softwareag.com/uk
> >
>


[RESULT][VOTE] Apache BookKeeper 4.12.0 release candidate 0

2020-11-15 Thread Jia Zhai
I'm happy to announce that we have unanimously approved this release.

There are 3 approving votes, 3 of which are binding:

* Enrico Olivelli
* Sijie Guo
* Jia Zhai

1 non-binding +1 votes:
* Yong Zhang

There are no disapproving votes.

Thank you for the help!


Re: Updates on PIP-62

2020-11-15 Thread PengHui Li
+1
On Nov 15, 2020, 10:48 PM +0800, Jia Zhai , wrote:
> +1
>
> Best Regards.
>
>
> Jia Zhai
>
> Beijing, China
>
> Mobile: +86 15810491983
>
>
>
>
> On Fri, Nov 13, 2020 at 12:32 PM Sijie Guo  wrote:
>
> > I have cloned the changes to pulsar-connectors, pulsar-adapters, and
> > pulsar-presto. They can be built independently.
> >
> > However, there is one challenge that we release the `pulsar-all` image to
> > include all connectors, offloaders, and presto. And we also use the image
> > for integration tests. It is hard to remove connectors and presto without
> > changing too much the release process. Hence after discussing with Penghui,
> > I am holding off deleting those modules after releasing 2.7.0 and will
> > resume PIP-62 for 2.8.0 release.
> >
> > Thanks,
> > Sijie
> >
> > On Sat, Nov 7, 2020 at 12:36 AM Jia Zhai  wrote:
> >
> > > 👍
> > >
> > > Best Regards.
> > >
> > >
> > > Jia Zhai
> > >
> > > Beijing, China
> > >
> > > Mobile: +86 15810491983
> > >
> > >
> > >
> > >
> > > On Sat, Nov 7, 2020 at 3:40 AM Enrico Olivelli 
> > > wrote:
> > >
> > > > Great !
> > > > We will make the repo very slighter !
> > > >
> > > > Enrico
> > > >
> > > > Il giorno ven 6 nov 2020 alle ore 18:45 Sijie Guo 
> > > ha
> > > > scritto:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I am starting this thread to provide updates on the progress of
> > PIP-62.
> > > > >
> > > > > Now I have moved all the pulsar-flink / pulsar-spark / pulsar-storm /
> > > > > pulsar-kafka-wrapper / pulsar-log4j2-appender to
> > > > > https://github.com/apache/pulsar-adapters.
> > > > >
> > > > > Going to delete those modules from the main repo.
> > > > >
> > > > > - Sijie
> > > > >
> > > >
> > >
> >


Re: Updates on PIP-62

2020-11-15 Thread Jinfeng Huang
+1

Best Regards,
Jennifer


On Mon, Nov 16, 2020 at 8:24 AM PengHui Li  wrote:

> +1
> On Nov 15, 2020, 10:48 PM +0800, Jia Zhai , wrote:
> > +1
> >
> > Best Regards.
> >
> >
> > Jia Zhai
> >
> > Beijing, China
> >
> > Mobile: +86 15810491983
> >
> >
> >
> >
> > On Fri, Nov 13, 2020 at 12:32 PM Sijie Guo  wrote:
> >
> > > I have cloned the changes to pulsar-connectors, pulsar-adapters, and
> > > pulsar-presto. They can be built independently.
> > >
> > > However, there is one challenge that we release the `pulsar-all` image
> to
> > > include all connectors, offloaders, and presto. And we also use the
> image
> > > for integration tests. It is hard to remove connectors and presto
> without
> > > changing too much the release process. Hence after discussing with
> Penghui,
> > > I am holding off deleting those modules after releasing 2.7.0 and will
> > > resume PIP-62 for 2.8.0 release.
> > >
> > > Thanks,
> > > Sijie
> > >
> > > On Sat, Nov 7, 2020 at 12:36 AM Jia Zhai  wrote:
> > >
> > > > 👍
> > > >
> > > > Best Regards.
> > > >
> > > >
> > > > Jia Zhai
> > > >
> > > > Beijing, China
> > > >
> > > > Mobile: +86 15810491983
> > > >
> > > >
> > > >
> > > >
> > > > On Sat, Nov 7, 2020 at 3:40 AM Enrico Olivelli 
> > > > wrote:
> > > >
> > > > > Great !
> > > > > We will make the repo very slighter !
> > > > >
> > > > > Enrico
> > > > >
> > > > > Il giorno ven 6 nov 2020 alle ore 18:45 Sijie Guo <
> guosi...@gmail.com>
> > > > ha
> > > > > scritto:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I am starting this thread to provide updates on the progress of
> > > PIP-62.
> > > > > >
> > > > > > Now I have moved all the pulsar-flink / pulsar-spark /
> pulsar-storm /
> > > > > > pulsar-kafka-wrapper / pulsar-log4j2-appender to
> > > > > > https://github.com/apache/pulsar-adapters.
> > > > > >
> > > > > > Going to delete those modules from the main repo.
> > > > > >
> > > > > > - Sijie
> > > > > >
> > > > >
> > > >
> > >
>


Re: [DISCUSS] PIP-70: Introduce lightweight raw Message metadata

2020-11-15 Thread Yunze Xu
I think protobuf has the ability to check if a field is enabled. i.e. 
RAW_METADATA_MAGIC_NUMBER and RAW_METADATA_SIZE are included in the protobuf-ed 
struct. In Kafka, a magic number represents the version of protocol not if the 
feature is enabled. If we need a *real* magic number, we must make it clear.

On 2020/11/09 06:24:18, Aloys Zhang  wrote: 
> Hi all,> 
> 
> We have drafted a proposal for supporting lightweight raw Message metadata> 
> which can be found at> 
> https://github.com/apache/pulsar/wiki/PIP-70%3A-Introduce-lightweight-raw-Message-metadata>
>  
>  and> 
> https://docs.google.com/document/d/1IgnF9AJzL6JG6G4EL_xcoQxvOpd7bUXcgxFApBiPOFY>
>  
> 
> Also, I copy it to the email thread for easier viewing.> 
> 
> Any suggestions or ideas are welcomed to join the discussion.> 
> 
> 
> 
> ## PIP-70:  Introduce lightweight raw Message metadata> 
> 
> ### 1. Motivation> 
> 
> For messages in Pulsar, If we want to add new property, we always change> 
> the `MessageMetadata` in protocol(PulsarApi.proto), this kind of property> 
> could be understood by both the broker side and client side by> 
> deserializing the `MessageMetadata` . But in some different cases,, the> 
> property needs to be added from the broker side, Or need to be understood> 
> by the broker side in a low cost way. When the broker side gets the message> 
> produced from the client,  we could add the property at a new area, which> 
> does not combine with `MessageMetadata`, and no need deserializing original> 
> `MessageMetadata` when gets it out ; and when the broker sends the message> 
> to client, we could choose to filter out this part of property(or not as> 
> the client needs). We call this kind of property “raw Message metadata”. By> 
> this way, the “raw Message metadata” consumption is independent, and not> 
> related with the original `MessageMetadata`.> 
> 
> The benefit for this kind of “raw Message metadata” is that the broker does> 
> not need to  serialize/deserialize for the protobuf-ed `MessageMetadata`,> 
> this will provide a better performance. And also could provide a lot of> 
> features that are not supported yet.> 
> 
> Here are some of the use cases for raw Message metadata:> 
> 1) Provide ordered messages by time(broker side) sequence to make message> 
> seek by time more accurate.> 
> Currently, each message has a `publish_time`, it uses client side time, but> 
> for different producers in different clients, the time may not align> 
> between clients, and cause the message order and the message time> 
> (`publish_time`) order may be different.  But each topic-partition only has> 
> one owner broker, if we append broker side time in the “raw Message> 
> metadata”, we could make sure the message order is aligned with broker side> 
> time. With this feature, we could handle the message seek by time more> 
> accurately.> 
> 
> 2) Provide continuous message sequence-Id for messages in one> 
> topic-partition.> 
> MessageId is a combination of ledgerId+entryId+batchIndex; for a partition> 
> that contains more than one ledger, the Ids inside is not continuous. By> 
> this solution, we could append a sequence-Id at the end of each Message.> 
> This will make the message sequence management earlier.> 
> 
> In this proposal, we will take count in the first feature “provide ordered> 
> message by time(broker side) sequence” mentioned above, this will be easier> 
> to go through the proposal.> 
> 
> ### 2. Message and “raw Message metadata” structure changes.> 
> 
> As mentioned above, there are 2 main benefits in this proposal:> 
> 
> 1. Most of all the change happened on the Broker side.> 
> 2. Avoid to serialize/deserialize for the protobuf-ed `MessageMetadata`.> 
> 
>  2.1 Raw Message metadata structure in Protobuf> 
> 
> Protobuf used a lot in Pulsar, we could use Protobuf to do the raw Message> 
> metadata serialize/deserialize.> 
> In this example, we will save the broker side timestamp when each message> 
> is sent from the broker to BookKeeper. So the definition is very simple.> 
> 
> ```protobuf> 
> message RawMessageMetadata {> 
> optional uint64 broker_timestamp = 1;> 
>}> 
> ```> 
> 
>  2.2 Message and “raw Message metadata” structure details> 
> 
> Each message is send from producer client to broker in this frame format:> 
> 
> ```> 
> [TOTAL_SIZE] [CMD_SIZE][CMD] [MAGIC_NUMBER] [CHECKSUM] [METADATA_SIZE]> 
> [METADATA] [PAYLOAD]> 
> ```> 
> 
> The first 3 fields “[TOTAL_SIZE] [CMD_SIZE ] [CMD]” will be read in> 
> `LengthFieldBasedFrameDecoder`  and `PulsarDecoder`, and left the rest part> 
> handled in method> 
> `org.apache.pulsar.broker.service.Producer.publishMessage`. The left part> 
> “[MAGIC_NUMBER] [CHECKSUM] [METADATA_SIZE] [METADATA] [PAYLOAD]” is usually> 
> treated as “headersAndPayload” in the code. As described above, we do not> 
> want this part to be changed at all, so we could take this part as a whole> 
> package.> 
> 
> ```> 
> [MAGIC_NUMBER] [CHECKSUM] [METADATA_SIZ

Re: [DISCUSS] PIP-70: Introduce lightweight raw Message metadata

2020-11-15 Thread Aloys Zhang
Yunze,

Thanks for you suggestion.  Protobuf does have the ability to check whether
a field is existed. But if we want to use this ability, we should get a
Protobuf object first.
In this proposal, RAW_METADATA_MAGIC_NUMBER is used to indicate what type
object we can get from the bytebuf of Entry. If RAW_METADATA_MAGIC_NUMBER
exists, the bytebuf head will be parsed as RAW_METADATA first, otherwise,
bytebuf will parsed as origin data format without RAW_METADATA.

Yunze Xu  于2020年11月16日周一 上午10:08写道:

> I think protobuf has the ability to check if a field is enabled. i.e.
> RAW_METADATA_MAGIC_NUMBER and RAW_METADATA_SIZE are included in the
> protobuf-ed struct. In Kafka, a magic number represents the version of
> protocol not if the feature is enabled. If we need a *real* magic number,
> we must make it clear.
>
> On 2020/11/09 06:24:18, Aloys Zhang  wrote:
> > Hi all,>
> >
> > We have drafted a proposal for supporting lightweight raw Message
> metadata>
> > which can be found at>
> >
> https://github.com/apache/pulsar/wiki/PIP-70%3A-Introduce-lightweight-raw-Message-metadata>
>
> >  and>
> >
> https://docs.google.com/document/d/1IgnF9AJzL6JG6G4EL_xcoQxvOpd7bUXcgxFApBiPOFY>
>
> >
> > Also, I copy it to the email thread for easier viewing.>
> >
> > Any suggestions or ideas are welcomed to join the discussion.>
> >
> >
> >
> > ## PIP-70:  Introduce lightweight raw Message metadata>
> >
> > ### 1. Motivation>
> >
> > For messages in Pulsar, If we want to add new property, we always
> change>
> > the `MessageMetadata` in protocol(PulsarApi.proto), this kind of
> property>
> > could be understood by both the broker side and client side by>
> > deserializing the `MessageMetadata` . But in some different cases,, the>
> > property needs to be added from the broker side, Or need to be
> understood>
> > by the broker side in a low cost way. When the broker side gets the
> message>
> > produced from the client,  we could add the property at a new area,
> which>
> > does not combine with `MessageMetadata`, and no need deserializing
> original>
> > `MessageMetadata` when gets it out ; and when the broker sends the
> message>
> > to client, we could choose to filter out this part of property(or not
> as>
> > the client needs). We call this kind of property “raw Message metadata”.
> By>
> > this way, the “raw Message metadata” consumption is independent, and
> not>
> > related with the original `MessageMetadata`.>
> >
> > The benefit for this kind of “raw Message metadata” is that the broker
> does>
> > not need to  serialize/deserialize for the protobuf-ed
> `MessageMetadata`,>
> > this will provide a better performance. And also could provide a lot of>
> > features that are not supported yet.>
> >
> > Here are some of the use cases for raw Message metadata:>
> > 1) Provide ordered messages by time(broker side) sequence to make
> message>
> > seek by time more accurate.>
> > Currently, each message has a `publish_time`, it uses client side time,
> but>
> > for different producers in different clients, the time may not align>
> > between clients, and cause the message order and the message time>
> > (`publish_time`) order may be different.  But each topic-partition only
> has>
> > one owner broker, if we append broker side time in the “raw Message>
> > metadata”, we could make sure the message order is aligned with broker
> side>
> > time. With this feature, we could handle the message seek by time more>
> > accurately.>
> >
> > 2) Provide continuous message sequence-Id for messages in one>
> > topic-partition.>
> > MessageId is a combination of ledgerId+entryId+batchIndex; for a
> partition>
> > that contains more than one ledger, the Ids inside is not continuous.
> By>
> > this solution, we could append a sequence-Id at the end of each
> Message.>
> > This will make the message sequence management earlier.>
> >
> > In this proposal, we will take count in the first feature “provide
> ordered>
> > message by time(broker side) sequence” mentioned above, this will be
> easier>
> > to go through the proposal.>
> >
> > ### 2. Message and “raw Message metadata” structure changes.>
> >
> > As mentioned above, there are 2 main benefits in this proposal:>
> >
> > 1. Most of all the change happened on the Broker side.>
> > 2. Avoid to serialize/deserialize for the protobuf-ed
> `MessageMetadata`.>
> >
> >  2.1 Raw Message metadata structure in Protobuf>
> >
> > Protobuf used a lot in Pulsar, we could use Protobuf to do the raw
> Message>
> > metadata serialize/deserialize.>
> > In this example, we will save the broker side timestamp when each
> message>
> > is sent from the broker to BookKeeper. So the definition is very
> simple.>
> >
> > ```protobuf>
> > message RawMessageMetadata {>
> > optional uint64 broker_timestamp = 1;>
> >}>
> > ```>
> >
> >  2.2 Message and “raw Message metadata” structure details>
> >
> > Each message is send from producer client to broker in this frame
> format:>
> >
> > ```>
> > [TOTAL_S