Re: [DISCUSS][doc][improve] Update Pulsar site landing page content #349

2023-01-15 Thread Asaf Mesika
I'm pasting the context described in the PR
 also here:

Pulsar’s landing page has a key role in getting new people “onboard” Apache
Pulsar - effectively getting people from “Pulsar? Never heard about it,
what is it good for?” to “Wow, that’s an amazing platform and seems like it
would fit us great with our needs."

I remember trying to understand what Pulsar is, for which the key source is
mostly the documentation. The latter is currently built more like an
encyclopedia rather than a book (which is another point of discussion).
Once I learned how strong Pulsar is and its features, I took the time to
think what are Pulsar’s strongest qualities, and I tried reflecting that in
the 1st 3 sections of the website landing page:

   - Pulsar catchphrase - 1st sentence
   - Pulsar elevator pitch - 1st paragraph
   - Pulsar features - what does it do?

The main line of thinking for a landing page, in my opinion, is what I call
“gradual timing”, and I’ll explain. A person can hear about Pulsar on
several occasions - it can be a lecture they see at a conference, a blog
post someone reposted, or a tweet from a friend in a WhatsApp group. In
today’s world, a lot of time, you find yourself allocating a very small
amount of learning about something new. You might be holding your phone on
the subway or the bus, on the couch, or taking a break from work, opening
your browser to learn what is this Apache Pulsar you just saw.

My assumption is that the time allocated is increasing in stages:

   - They open the browser, and they allocate 5-10 seconds to get the basic
   idea. This is the goal of the 1st sentence - the catchphrase
   - The catchphrase “hooked” them to read a bit more. Now they are willing
   to spend 30 seconds, this is the goal of the Pulsar elevator pitch - the
   1st paragraph.
   - It did the job, now they get the very basic idea of what Pulsar is,
   and they are intrigued to learn what it is capable of - its features.
   They’ll spend up to 5min reading through the feature list.
   - ...

The goal of this pull request is to take all my knowledge about Pulsar and
summarize in the most effective way into those 3 sections.

On Tue, Jan 10, 2023 at 7:01 PM Dave Fisher  wrote:

> I noticed last night this PR -
> https://github.com/apache/pulsar-site/pull/349
>
> I’m not sure about these changes and such changes to our project front
> page should have as many eyes on them as possible.
>
> Please discuss and approve or comment before merging this PR!
>
> Thank you,
> Dave


[DISCUSS] Add unified newTableView method in PulsarClient

2023-01-15 Thread Ruguo Yu
Hi Community,

Currently, we can use PulsarClient to create `Producer`, `Consumer`, `Reader` 
and `TableView`, releated method as below:

```

ProducerBuilder newProducer();

 ProducerBuilder newProducer(Schema schema);

ConsumerBuilder newConsumer();

 ConsumerBuilder newConsumer(Schema schema);

ReaderBuilder newReader();

 ReaderBuilder newReader(Schema schema);

 TableViewBuilder newTableViewBuilder(Schema schema);

```

However, it is obvious that the method of creating `TableView` is not 
consistent with other methods, and no method with default scheme is provided.

 

### Motivation

Add unified `newTableView(Schema)` method and replace 
`newTableViewBuilder(Schema)`(attach `@Deprecated` annotation to it) in 
PulsarClient, which could consistent with `newProducer(Schema)`, 
`newConsumer(Schema)`, `newReader(Schema)`.

In addition, we will provide `newTableView()` method which  has default schema.

 

### Modifications

1. Add `newTableView(Schema)` method and replace 
`newTableViewBuilder(Schema)`(attach `@Deprecated` annotation to it)  in 
PulsarClient

2. Add `newTableView()` method and Scheme default is `Schema.BYTES` in 
PulsarClient

 

Thanks

Ruguo Yu



Re: [DISCUSS] PIP-235: Add metric for subscription back size

2023-01-15 Thread Jiuming Tao
Hi Xiao,

Seems `backlogSize` already existed in the `SubscriptionStatsImpl`, but didn’t 
expose in Prometheus,

We just need to expose it in Prometheus.

It should exposed in Prometheus when the configuration named 
`exposeTopicLevelMetricsInPrometheus` and 
`exposeSubscriptionBacklogSizeInPrometheus` is enabled.

+1

Thanks,
Tao Jiuming

> 2022年12月30日 23:25,萧 易客  写道:
> 
> can



Re: [DISCUSS] PIP-240 A new API to unload subscriptions

2023-01-15 Thread Yubiao Feng
Hi @Enrico @Bo

> If there is a problem we should spend time on investigating the problem
and not in adding this kind of tools.

You are right. When a user encounters a problem, it often takes a while to
solve the root cause. It is important to provide a tool to recover services
quickly, and cmd 'unload topic' is often used to solve problems
temporarily.

But now a topic can be used by multiple teams, such as Business-team A to
produce messages, Business-team B and C to consume messages (each using a
different subscription name), and Data-team D to read messages using reader
API (using a random subscription name). When there is a consumption problem
in subscription B, if we do unload the topic, this will affect teams A, B,
C, D, and when we provide a new API to reset subscribers, the impact can be
controlled only to affect team B.

The new API can also be used in this scenario: Trigger consumer rebalance.
So I feel like we can add this API with relatively small changes.

Thanks
Yubiao Feng

On Wed, Jan 11, 2023 at 5:00 PM Yubiao Feng 
wrote:

> Hi community
>
> I am starting a DISCUSS for PIP-240: A new API to unload subscriptions.
>
> PIP issue: https://github.com/apache/pulsar/issues/19187
>
> ### Motivation
>
> We sometimes try to unload the topic to resolve some consumption-stop
> issues. But the unloading topic will also impact the producer side.
>
> ### Goal
>
> Providing a new API to unload the subscription dimension triggers
> reconnection of all consumers on that subscription and reconnection is
> guaranteed by the client. The API will be used in these ways:
> - unload special subscription of one topic(or partitioned topic)
> - unload all subscriptions of one topic(or partitioned topic)
> - unload subscriptions of one topic(or partitioned topic) by regular
> expression
>   - If a reader's subscription name is not set, a random subscription name
> prefixed with 'multiTopicsReader-' or 'reader-' will be used, and users can
> uninstall these subscriptions using regular expressions.
>
> In addition to triggering consumer disconnection, Unloading Subscribers
> will restart the Dispatcher, which resets the redeliver message queue and
> delayed message queue in the Broker's memory, which can help resolve issues
> caused by an abnormal dispatcher state. However, the execution flow of
> Unloading Subscribers does not include a restart of the Managed Cursor
> related to this dispatcher; if there is a problem with the cursor, we can
> only rely on the unload topic to solve it.
>
> Note: From the client's perspective, this connection may be shared by
> consumers, producers, and transactions, so Unloading Subscribers maybe
> impact the producer and transaction.
>
>  These scenarios are not supported
> - Functions `message-dedup`, `geo-replication,` and `shadow-topic` also
> read messages from the topic, but Unloading subscribers will not support
> triggering restarts of these three functions( because the cursor is used
> directly to read the data in these scenarios, not the consumer or reader ).
> - The Compression task(subscription name is `__compaction`) also use a
> reader to read data, but Unloading Subscribers does not support it because
> this task creates a new reader each time it starts.
> - Do not support all topics related to Transaction features.
>   - `__transaction_buffer_snapshot` works with the task TB recover,  and
> this task will create a new reader each time they start.
>   - `__transaction_pending_ack` works with the task Transaction Pending
> Ack Store replay,  and this task will use managed cursor directly to read
> data.
>   - `__transaction_log_xxx` works with the task Transaction Log, which
> will use managed cursor directly to read data.
>   - `transaction_coordinator_assign` No data will be written on this topic.
>
>  Special system topic supports
> The system topic `__change_events` is used to support topic-level
> policies, there may also be some message delivery issues in this scenario,
> so Unloading Subscribers will support this topic.
>
> ### API Changes
>
>  For persistent topic
> ```
> pulsar-admin persistent unload {topic_name} -s {sub_name}
> ```
>
>  For non-persistent topic
> ```
> pulsar-admin non-persistent unload {topic_name} -s {sub_name}
> ```
>
>  Explain the param `-s`
> - set param `-s` to special sub name to unload special subscription
> - set param `-s` to `**` to unload all subscriptions under this topic
> - set param `-s` to `regexp` to unload a batch subscriptions under this
> topic
>
>
> Thanks
> Yubiao Feng
>


Re: [DISCUSS] PIP-240 A new API to unload subscriptions

2023-01-15 Thread Yubiao Feng
Hi Qiang

> 1. How do you handle the race condition when you are trying to unload the
subscription, and the new consumer wants to subscribe to this subscription
at the same time? I'm unsure if it has the race condition. I just want to
remind you about that.:)

These methods `addConsumer`, `removeConsumer` all have synchronized locks,
we also add synchronized lock when executing `reset subscription` can solve
the problem.

> 2. Would you like to add some restful API design to clarify the
implementation?

Already added the rest API design in the proposal
https://github.com/apache/pulsar/issues/19187

On Thu, Jan 12, 2023 at 3:22 PM  wrote:

> Hi, Yubiao
>
> I agree with this idea because some users care about the production rate.
> They don't want to unload the whole topic to fix the subscription problem.
>
> I've got some questions:
>
> 1. How do you handle the race condition when you are trying to unload the
> subscription, and the new consumer wants to subscribe to this subscription
> at the same time? I'm unsure if it has the race condition. I just want to
> remind you about that. :)
> 2. Would you like to add some restful API design to clarify the
> implementation?
> a. Request method
> b. Request path
> c. Response code
> d. etc.
>
>
> Thanks for your work.
> Mattison
> On Jan 11, 2023, 17:01 +0800, Yubiao Feng 
> ,
> wrote:
> > Hi community
> >
> > I am starting a DISCUSS for PIP-240: A new API to unload subscriptions.
> >
> > PIP issue: https://github.com/apache/pulsar/issues/19187
> >
> > ### Motivation
> >
> > We sometimes try to unload the topic to resolve some consumption-stop
> > issues. But the unloading topic will also impact the producer side.
> >
> > ### Goal
> >
> > Providing a new API to unload the subscription dimension triggers
> > reconnection of all consumers on that subscription and reconnection is
> > guaranteed by the client. The API will be used in these ways:
> > - unload special subscription of one topic(or partitioned topic)
> > - unload all subscriptions of one topic(or partitioned topic)
> > - unload subscriptions of one topic(or partitioned topic) by regular
> > expression
> > - If a reader's subscription name is not set, a random subscription name
> > prefixed with 'multiTopicsReader-' or 'reader-' will be used, and users
> can
> > uninstall these subscriptions using regular expressions.
> >
> > In addition to triggering consumer disconnection, Unloading Subscribers
> > will restart the Dispatcher, which resets the redeliver message queue and
> > delayed message queue in the Broker's memory, which can help resolve
> issues
> > caused by an abnormal dispatcher state. However, the execution flow of
> > Unloading Subscribers does not include a restart of the Managed Cursor
> > related to this dispatcher; if there is a problem with the cursor, we can
> > only rely on the unload topic to solve it.
> >
> > Note: From the client's perspective, this connection may be shared by
> > consumers, producers, and transactions, so Unloading Subscribers maybe
> > impact the producer and transaction.
> >
> >  These scenarios are not supported
> > - Functions `message-dedup`, `geo-replication,` and `shadow-topic` also
> > read messages from the topic, but Unloading subscribers will not support
> > triggering restarts of these three functions( because the cursor is used
> > directly to read the data in these scenarios, not the consumer or reader
> ).
> > - The Compression task(subscription name is `__compaction`) also use a
> > reader to read data, but Unloading Subscribers does not support it
> because
> > this task creates a new reader each time it starts.
> > - Do not support all topics related to Transaction features.
> > - `__transaction_buffer_snapshot` works with the task TB recover, and
> > this task will create a new reader each time they start.
> > - `__transaction_pending_ack` works with the task Transaction Pending Ack
> > Store replay, and this task will use managed cursor directly to read
> data.
> > - `__transaction_log_xxx` works with the task Transaction Log, which will
> > use managed cursor directly to read data.
> > - `transaction_coordinator_assign` No data will be written on this topic.
> >
> >  Special system topic supports
> > The system topic `__change_events` is used to support topic-level
> policies,
> > there may also be some message delivery issues in this scenario, so
> > Unloading Subscribers will support this topic.
> >
> > ### API Changes
> >
> >  For persistent topic
> > ```
> > pulsar-admin persistent unload {topic_name} -s {sub_name}
> > ```
> >
> >  For non-persistent topic
> > ```
> > pulsar-admin non-persistent unload {topic_name} -s {sub_name}
> > ```
> >
> >  Explain the param `-s`
> > - set param `-s` to special sub name to unload special subscription
> > - set param `-s` to `**` to unload all subscriptions under this topic
> > - set param `-s` to `regexp` to unload a batch subscriptions under this
> > topic
> >
> >
>

Re: [DISCUSS] PIP-236: Upload AUTO_CONSUME SchemaType to Broker

2023-01-15 Thread PengHui Li
> This design also has serious compatibility problems between old and new
pulsar clients and new and old brokers.

Could you please explain more details of the compatibility issue if we
leverage
the protocol version?

> We should not use a negative enum number in PulsarApi.proto. It's
unnatural. If we decide to carry the AUTO_CONSUME schema in a
CommandSubscribe, it should not be treated as a negative schema type.

IMO, the protocol is defined as Enum. Users are developing based on the
Enum, not the value of the Enum. We need to make sure the value
of the Enum is immutable. It is not required that he must be a positive
number.
Maybe it looks ugly.

And the protocol is just the API definition, not about which schema will be
persistent.
As I understand from the protocol definition, the Schema in the subscribe
command is
used to pass the used schema of the consumer. We just make it absent before
for
AUTO_CONSUME schema. We just thought we could make it absent if the consumer
is using AUTO_CONSUME schema. But apparently, this is a problem for now.

I think the easier-to-understand way is for the client to set the schema
used when
subscribing or creating the producer. Rather than which ones need to be set
and which
ones do not need to be set.

Thanks,
Penghui

On Mon, Jan 9, 2023 at 11:32 AM SiNan Liu  wrote:

> This design also has serious compatibility problems between old and new
> pulsar clients and new and old brokers.
>
>
> Thanks,
> Sinan
>
>
> PengHui Li  于 2023年1月9日周一 上午9:51写道:
>
> > Sorry for the late reply.
> >
> > We can leverage the `ProtocolVersion` [1] to handle the compatibility
> > issue.
> > It looks like only if the protocol_version >= 21, subscribe with the
> > auto_consume schema
> >
> > IMO, both the new key-value of the subscribe command, and a specific
> > representative are
> > API changes. It's just that some have modified the definition of the API,
> > and some have modified the behavior of the API
> >
> > I prefer the intuitive way. And from the perspective of API-based
> > developers, we should
> > try to provide a simple and clear API with no hidden rules. The client
> just
> > uploads the schema
> > that it has except the byte[] schema. The broker knows how to handle the
> > different schemas,
> > such as AUTO_PRODUCE, and AUTO_CONSUME. And this should not be the burden
> > of the
> > client developer to learn the details of the schema implementation. They
> > should work according
> > to the API spec.
> >
> > If we can resolve the compatibility issue with uploading the AUTO_CONSUME
> > schema with
> > subscribe command, do you see any apparent cons?
> >
> > Best,
> > Penghui
> >
> > [1]
> >
> >
> https://github.com/apache/pulsar/blob/master/pulsar-common/src/main/proto/PulsarApi.proto#L241-L266
> >
> > On Fri, Jan 6, 2023 at 10:31 PM SiNan Liu 
> wrote:
> >
> > > Ok, I will update the PIP issue later.
> > >
> > > About my current design.
> > > When the consumer with AUTO_CONSUME schema subscribed to an "empty"
> > topic,
> > > the schemaInfo will be null.
> > > ```
> > > public SchemaInfo getSchemaInfo(byte[] schemaVersion) {
> > > SchemaVersion sv = getSchemaVersion(schemaVersion);
> > > if (schemaMap.containsKey(sv)) {
> > > return schemaMap.get(sv).getSchemaInfo();
> > > }
> > > return null;
> > >
> > > }
> > >
> > > ```
> > > And checkSchemaCompatibility must be set in
> > > `org.apache.pulsar.common.protocol.Commands#newSubscribe`,
> > > and we need to know that this is an AUTO_CONSUME consumer subscribing.
> So
> > > we should set a "*default*" schemaInfo(schemaType = AUTO_CONSUME) for
> > > AutoConsumeSchema,
> > > this is because schemaInfo is also null when `si.getType` is
> > > SchemaType.BYTES or SchemaType.NONE.
> > > And checkSchemaCompatibility can be set in
> > > `org.apache.pulsar.common.protocol.Commands#newSubscribe`. The most
> > > important thing is clearSchema, which does not carry the wrong schema
> to
> > > the broker.
> > >
> > >
> > > Yunze Xu  于2023年1月6日周五 12:57写道:
> > >
> > > > You only need to describe what's added to the PulsarApi.proto, i.e.
> > > > you don't need to paste all definitions of `CommandSubscribe` in the
> > > > proposal. Take PIP-54 [1] for example, it only pastes the new field
> > > > `ack_set` and does not paste the whole `MessageIdData` definition.
> > > >
> > > > The implementations section involves too much code and just looks
> like
> > > > an actual PR. Take PIP-194 [2] for example, you should only talk
> about
> > > > the implementations from a high level.
> > > >
> > > > Let's talk back to your current design, when the schema type is
> > > > AUTO_CONSUME, you clear the schema in CommandSubscribe. It seems that
> > > > adding a SchemaInfo to the AutoConsumeSchema is meaningless.
> > > >
> > > > [1]
> > > >
> > >
> >
> https://github.com/apache/pulsar/wiki/PIP-54:-Support-acknowledgment-at-batch-index-level#changes
> > > > [2] https://github.com/apache/pulsar/issues/16757
> > > >
> > > > On Fri, Jan 6, 202

Re: [DISCUSS] KeyShared routing additions/configuration

2023-01-15 Thread PengHui Li
Hi Tim,

> The proposal is two-fold: first, we would like to implement a new
KeyShared
routing mechanism which tracks outstanding messages and their consumers,
routing messages with outstanding keys to the current consumer handling
that key, and any messages with new keys to any arbitrary consumer with
available permits (or perhaps the consumer with the most permits).
Basically a "first available consumer" routing strategy.  As far as my
naive first attempt goes, I initially decided to modify
the StickyKeyConsumerSelector::select call to accept an mledger.Position in
addition to the hash.  I also added a release call to notify the selector
of positions to consider no longer outstanding.  I could see this being
implemented any number of other ways as well (such as an entirely new
dispatcher), and would appreciate guidance should this proposal move
forward.

If you are using the consistent hash consumer selector. You can try to
add more replica points. But it also depends on your keys, if the number
of messages for each key roughly the same and how many keys you have.

I just thought about it(available permits-based selector) roughly. The
available
permits are unstable. But after the key is assigned to a consumer, the
relationship will
not change, right?

> Second, I believe the ability to choose a KeyShared routing scheme and
perhaps the settings for that scheme should be configurable as a
namespace/topic policy and not just in the broker config.  I have not begun
work on implementing that at all yet, but would assume it is not too
complicated to do so (though the settings construct may be more freeform
than expected).

Yes, that looks good.

Thanks,
Penghui

On Fri, Jan 13, 2023 at 4:09 PM Enrico Olivelli  wrote:

> Tim,
>
> Il giorno gio 12 gen 2023 alle ore 04:31 Tim Corbett
>  ha scritto:
> >
> > Greetings,
> >
> > I was informed to ask this here but was unclear yet if this should be a
> PIP
> > or not.  Please advise if I should follow a different process.  Thanks in
> > advance.
> >
> > The motivation is as follows: today we have KeyShared subscriptions in
> > which the consumers are rapidly autoscaling up and down based on
> real-time
> > demand.  We've observed that for some data shapes/scaling conditions,
> some
> > consumers will end up receiving considerably more traffic than the rest,
> > slowing down consumption across the topic.  We require all messages with
> > any given routing key to only be outstanding to one consumer at a time,
> but
> > otherwise have no preference for any range of keys to remain sticky to
> any
> > given consumer.  I've also seen reference to KeyShared routing
> performance
> > concerns elsewhere, e.g.: https://github.com/apache/pulsar/issues/15705
> >
> > The proposal is two-fold: first, we would like to implement a new
> KeyShared
> > routing mechanism which tracks outstanding messages and their consumers,
> > routing messages with outstanding keys to the current consumer handling
> > that key, and any messages with new keys to any arbitrary consumer with
> > available permits (or perhaps the consumer with the most permits).
> > Basically a "first available consumer" routing strategy.  As far as my
> > naive first attempt goes, I initially decided to modify
> > the StickyKeyConsumerSelector::select call to accept an mledger.Position
> in
> > addition to the hash.  I also added a release call to notify the selector
> > of positions to consider no longer outstanding.  I could see this being
> > implemented any number of other ways as well (such as an entirely new
> > dispatcher), and would appreciate guidance should this proposal move
> > forward.
>
> Sounds good
>
> >
> > Second, I believe the ability to choose a KeyShared routing scheme and
> > perhaps the settings for that scheme should be configurable as a
> > namespace/topic policy and not just in the broker config.  I have not
> begun
> > work on implementing that at all yet, but would assume it is not too
> > complicated to do so (though the settings construct may be more freeform
> > than expected).
>
> This is a good idea as well
>
> >
> > In the referenced PR above, Penghui mentioned that the cost of tracking
> all
> > outstanding messages would be too costly as a reason for not implementing
> > this in the first place, as "the current implementation doesn't need to
> > maintain the state for each key since a topic might have a huge number of
> > keys."  I would like to counter that with most topics do not have in
> excess
> > of hundreds of thousands of messages outstanding simultaneously in the
> > first place, there is already a policy to limit the number of outstanding
> > messages by subscription, and you don't actually need an entry for each
> key
> > necessarily, if performance/memory constraints demanded, some sort of
> > reference count by (shortened) hash could be attempted.  Regardless, I
> have
> > not performance tested my idea just yet as if it will be rejected for
> other
> > reasons the effort 

Re: [DISCUSS] Add unified newTableView method in PulsarClient

2023-01-15 Thread PengHui Li
+1

And It might be a typo when adding the table view APIs
>From the example:

https://github.com/apache/pulsar/blob/246c2701e5c43e02e9783c82d4d107d06b019951/pulsar-client-api/src/main/java/org/apache/pulsar/client/api/PulsarClient.java#L232-L238

And from the proposal, the method name is newTableView()

https://github.com/apache/pulsar/issues/12356

Thanks,
Penghui

On Sun, Jan 15, 2023 at 6:31 PM Ruguo Yu  wrote:

> Hi Community,
>
> Currently, we can use PulsarClient to create `Producer`, `Consumer`,
> `Reader` and `TableView`, releated method as below:
>
> ```
>
> ProducerBuilder newProducer();
>
>  ProducerBuilder newProducer(Schema schema);
>
> ConsumerBuilder newConsumer();
>
>  ConsumerBuilder newConsumer(Schema schema);
>
> ReaderBuilder newReader();
>
>  ReaderBuilder newReader(Schema schema);
>
>  TableViewBuilder newTableViewBuilder(Schema schema);
>
> ```
>
> However, it is obvious that the method of creating `TableView` is not
> consistent with other methods, and no method with default scheme is
> provided.
>
>
>
> ### Motivation
>
> Add unified `newTableView(Schema)` method and replace
> `newTableViewBuilder(Schema)`(attach `@Deprecated` annotation to it) in
> PulsarClient, which could consistent with `newProducer(Schema)`,
> `newConsumer(Schema)`, `newReader(Schema)`.
>
> In addition, we will provide `newTableView()` method which  has default
> schema.
>
>
>
> ### Modifications
>
> 1. Add `newTableView(Schema)` method and replace
> `newTableViewBuilder(Schema)`(attach `@Deprecated` annotation to it)  in
> PulsarClient
>
> 2. Add `newTableView()` method and Scheme default is `Schema.BYTES` in
> PulsarClient
>
>
>
> Thanks
>
> Ruguo Yu
>
>


RE: Re: [DISCUSS] Add unified newTableView method in PulsarClient

2023-01-15 Thread Ruguo Yu
Sorry,I forgot to paste the corresponding PR: 
https://github.com/apache/pulsar/pull/19048

I have fixed the typo in above PR.

 

On 2023/01/16 02:10:07 PengHui Li wrote:

> +1

> 

> And It might be a typo when adding the table view APIs

> From the example:

> 

> https://github.com/apache/pulsar/blob/246c2701e5c43e02e9783c82d4d107d06b019951/pulsar-client-api/src/main/java/org/apache/pulsar/client/api/PulsarClient.java#L232-L238

> 

> And from the proposal, the method name is newTableView()

> 

> https://github.com/apache/pulsar/issues/12356

> 

> Thanks,

> Penghui

> 

> On Sun, Jan 15, 2023 at 6:31 PM Ruguo Yu  wrote:

> 

> > Hi Community,

> >

> > Currently, we can use PulsarClient to create `Producer`, `Consumer`,

> > `Reader` and `TableView`, releated method as below:

> >

> > ```

> >

> > ProducerBuilder newProducer();

> >

> >  ProducerBuilder newProducer(Schema schema);

> >

> > ConsumerBuilder newConsumer();

> >

> >  ConsumerBuilder newConsumer(Schema schema);

> >

> > ReaderBuilder newReader();

> >

> >  ReaderBuilder newReader(Schema schema);

> >

> >  TableViewBuilder newTableViewBuilder(Schema schema);

> >

> > ```

> >

> > However, it is obvious that the method of creating `TableView` is not

> > consistent with other methods, and no method with default scheme is

> > provided.

> >

> >

> >

> > ### Motivation

> >

> > Add unified `newTableView(Schema)` method and replace

> > `newTableViewBuilder(Schema)`(attach `@Deprecated` annotation to it) in

> > PulsarClient, which could consistent with `newProducer(Schema)`,

> > `newConsumer(Schema)`, `newReader(Schema)`.

> >

> > In addition, we will provide `newTableView()` method which  has default

> > schema.

> >

> >

> >

> > ### Modifications

> >

> > 1. Add `newTableView(Schema)` method and replace

> > `newTableViewBuilder(Schema)`(attach `@Deprecated` annotation to it)  in

> > PulsarClient

> >

> > 2. Add `newTableView()` method and Scheme default is `Schema.BYTES` in

> > PulsarClient

> >

> >

> >

> > Thanks

> >

> > Ruguo Yu

> >

> >

> 

 

 



Re: [DISCUSS] PIP-236: Upload AUTO_CONSUME SchemaType to Broker

2023-01-15 Thread Yunze Xu
Is there any problem with using a positive value for it? I think there
is no compatibility issue because the enum value is never used on the
broker side. Making it positive makes AUTO_CONSUME different with
other implicit schema types like BYTES, AUTO and AUTO_PUBLISH.

Thanks,
Yunze

On Mon, Jan 16, 2023 at 9:36 AM PengHui Li  wrote:
>
> > This design also has serious compatibility problems between old and new
> pulsar clients and new and old brokers.
>
> Could you please explain more details of the compatibility issue if we
> leverage
> the protocol version?
>
> > We should not use a negative enum number in PulsarApi.proto. It's
> unnatural. If we decide to carry the AUTO_CONSUME schema in a
> CommandSubscribe, it should not be treated as a negative schema type.
>
> IMO, the protocol is defined as Enum. Users are developing based on the
> Enum, not the value of the Enum. We need to make sure the value
> of the Enum is immutable. It is not required that he must be a positive
> number.
> Maybe it looks ugly.
>
> And the protocol is just the API definition, not about which schema will be
> persistent.
> As I understand from the protocol definition, the Schema in the subscribe
> command is
> used to pass the used schema of the consumer. We just make it absent before
> for
> AUTO_CONSUME schema. We just thought we could make it absent if the consumer
> is using AUTO_CONSUME schema. But apparently, this is a problem for now.
>
> I think the easier-to-understand way is for the client to set the schema
> used when
> subscribing or creating the producer. Rather than which ones need to be set
> and which
> ones do not need to be set.
>
> Thanks,
> Penghui
>
> On Mon, Jan 9, 2023 at 11:32 AM SiNan Liu  wrote:
>
> > This design also has serious compatibility problems between old and new
> > pulsar clients and new and old brokers.
> >
> >
> > Thanks,
> > Sinan
> >
> >
> > PengHui Li  于 2023年1月9日周一 上午9:51写道:
> >
> > > Sorry for the late reply.
> > >
> > > We can leverage the `ProtocolVersion` [1] to handle the compatibility
> > > issue.
> > > It looks like only if the protocol_version >= 21, subscribe with the
> > > auto_consume schema
> > >
> > > IMO, both the new key-value of the subscribe command, and a specific
> > > representative are
> > > API changes. It's just that some have modified the definition of the API,
> > > and some have modified the behavior of the API
> > >
> > > I prefer the intuitive way. And from the perspective of API-based
> > > developers, we should
> > > try to provide a simple and clear API with no hidden rules. The client
> > just
> > > uploads the schema
> > > that it has except the byte[] schema. The broker knows how to handle the
> > > different schemas,
> > > such as AUTO_PRODUCE, and AUTO_CONSUME. And this should not be the burden
> > > of the
> > > client developer to learn the details of the schema implementation. They
> > > should work according
> > > to the API spec.
> > >
> > > If we can resolve the compatibility issue with uploading the AUTO_CONSUME
> > > schema with
> > > subscribe command, do you see any apparent cons?
> > >
> > > Best,
> > > Penghui
> > >
> > > [1]
> > >
> > >
> > https://github.com/apache/pulsar/blob/master/pulsar-common/src/main/proto/PulsarApi.proto#L241-L266
> > >
> > > On Fri, Jan 6, 2023 at 10:31 PM SiNan Liu 
> > wrote:
> > >
> > > > Ok, I will update the PIP issue later.
> > > >
> > > > About my current design.
> > > > When the consumer with AUTO_CONSUME schema subscribed to an "empty"
> > > topic,
> > > > the schemaInfo will be null.
> > > > ```
> > > > public SchemaInfo getSchemaInfo(byte[] schemaVersion) {
> > > > SchemaVersion sv = getSchemaVersion(schemaVersion);
> > > > if (schemaMap.containsKey(sv)) {
> > > > return schemaMap.get(sv).getSchemaInfo();
> > > > }
> > > > return null;
> > > >
> > > > }
> > > >
> > > > ```
> > > > And checkSchemaCompatibility must be set in
> > > > `org.apache.pulsar.common.protocol.Commands#newSubscribe`,
> > > > and we need to know that this is an AUTO_CONSUME consumer subscribing.
> > So
> > > > we should set a "*default*" schemaInfo(schemaType = AUTO_CONSUME) for
> > > > AutoConsumeSchema,
> > > > this is because schemaInfo is also null when `si.getType` is
> > > > SchemaType.BYTES or SchemaType.NONE.
> > > > And checkSchemaCompatibility can be set in
> > > > `org.apache.pulsar.common.protocol.Commands#newSubscribe`. The most
> > > > important thing is clearSchema, which does not carry the wrong schema
> > to
> > > > the broker.
> > > >
> > > >
> > > > Yunze Xu  于2023年1月6日周五 12:57写道:
> > > >
> > > > > You only need to describe what's added to the PulsarApi.proto, i.e.
> > > > > you don't need to paste all definitions of `CommandSubscribe` in the
> > > > > proposal. Take PIP-54 [1] for example, it only pastes the new field
> > > > > `ack_set` and does not paste the whole `MessageIdData` definition.
> > > > >
> > > > > The implementations section involves too much code and just looks
> > lik

Re: [DISCUSS] PIP-236: Upload AUTO_CONSUME SchemaType to Broker

2023-01-15 Thread PengHui Li
> Is there any problem with using a positive value for it? I think there
is no compatibility issue because the enum value is never used on the
broker side. Making it positive makes AUTO_CONSUME different with
other implicit schema types like BYTES, AUTO and AUTO_PUBLISH.

That sounds good to me to use a positive number for `AUTO_CONSUME`
in the protocol. Maybe 100 or something.

Thanks,
Penghui

On Mon, Jan 16, 2023 at 10:38 AM Yunze Xu 
wrote:

> Is there any problem with using a positive value for it? I think there
> is no compatibility issue because the enum value is never used on the
> broker side. Making it positive makes AUTO_CONSUME different with
> other implicit schema types like BYTES, AUTO and AUTO_PUBLISH.
>
> Thanks,
> Yunze
>
> On Mon, Jan 16, 2023 at 9:36 AM PengHui Li  wrote:
> >
> > > This design also has serious compatibility problems between old and new
> > pulsar clients and new and old brokers.
> >
> > Could you please explain more details of the compatibility issue if we
> > leverage
> > the protocol version?
> >
> > > We should not use a negative enum number in PulsarApi.proto. It's
> > unnatural. If we decide to carry the AUTO_CONSUME schema in a
> > CommandSubscribe, it should not be treated as a negative schema type.
> >
> > IMO, the protocol is defined as Enum. Users are developing based on the
> > Enum, not the value of the Enum. We need to make sure the value
> > of the Enum is immutable. It is not required that he must be a positive
> > number.
> > Maybe it looks ugly.
> >
> > And the protocol is just the API definition, not about which schema will
> be
> > persistent.
> > As I understand from the protocol definition, the Schema in the subscribe
> > command is
> > used to pass the used schema of the consumer. We just make it absent
> before
> > for
> > AUTO_CONSUME schema. We just thought we could make it absent if the
> consumer
> > is using AUTO_CONSUME schema. But apparently, this is a problem for now.
> >
> > I think the easier-to-understand way is for the client to set the schema
> > used when
> > subscribing or creating the producer. Rather than which ones need to be
> set
> > and which
> > ones do not need to be set.
> >
> > Thanks,
> > Penghui
> >
> > On Mon, Jan 9, 2023 at 11:32 AM SiNan Liu 
> wrote:
> >
> > > This design also has serious compatibility problems between old and new
> > > pulsar clients and new and old brokers.
> > >
> > >
> > > Thanks,
> > > Sinan
> > >
> > >
> > > PengHui Li  于 2023年1月9日周一 上午9:51写道:
> > >
> > > > Sorry for the late reply.
> > > >
> > > > We can leverage the `ProtocolVersion` [1] to handle the compatibility
> > > > issue.
> > > > It looks like only if the protocol_version >= 21, subscribe with the
> > > > auto_consume schema
> > > >
> > > > IMO, both the new key-value of the subscribe command, and a specific
> > > > representative are
> > > > API changes. It's just that some have modified the definition of the
> API,
> > > > and some have modified the behavior of the API
> > > >
> > > > I prefer the intuitive way. And from the perspective of API-based
> > > > developers, we should
> > > > try to provide a simple and clear API with no hidden rules. The
> client
> > > just
> > > > uploads the schema
> > > > that it has except the byte[] schema. The broker knows how to handle
> the
> > > > different schemas,
> > > > such as AUTO_PRODUCE, and AUTO_CONSUME. And this should not be the
> burden
> > > > of the
> > > > client developer to learn the details of the schema implementation.
> They
> > > > should work according
> > > > to the API spec.
> > > >
> > > > If we can resolve the compatibility issue with uploading the
> AUTO_CONSUME
> > > > schema with
> > > > subscribe command, do you see any apparent cons?
> > > >
> > > > Best,
> > > > Penghui
> > > >
> > > > [1]
> > > >
> > > >
> > >
> https://github.com/apache/pulsar/blob/master/pulsar-common/src/main/proto/PulsarApi.proto#L241-L266
> > > >
> > > > On Fri, Jan 6, 2023 at 10:31 PM SiNan Liu 
> > > wrote:
> > > >
> > > > > Ok, I will update the PIP issue later.
> > > > >
> > > > > About my current design.
> > > > > When the consumer with AUTO_CONSUME schema subscribed to an "empty"
> > > > topic,
> > > > > the schemaInfo will be null.
> > > > > ```
> > > > > public SchemaInfo getSchemaInfo(byte[] schemaVersion) {
> > > > > SchemaVersion sv = getSchemaVersion(schemaVersion);
> > > > > if (schemaMap.containsKey(sv)) {
> > > > > return schemaMap.get(sv).getSchemaInfo();
> > > > > }
> > > > > return null;
> > > > >
> > > > > }
> > > > >
> > > > > ```
> > > > > And checkSchemaCompatibility must be set in
> > > > > `org.apache.pulsar.common.protocol.Commands#newSubscribe`,
> > > > > and we need to know that this is an AUTO_CONSUME consumer
> subscribing.
> > > So
> > > > > we should set a "*default*" schemaInfo(schemaType = AUTO_CONSUME)
> for
> > > > > AutoConsumeSchema,
> > > > > this is because schemaInfo is also null when `si.getType` is
> > > > > SchemaType.BYTES o

Re: Re: [DISCUSS] Add unified newTableView method in PulsarClient

2023-01-15 Thread Enrico Olivelli
+1



Enrico

Il Lun 16 Gen 2023, 03:38 Ruguo Yu  ha scritto:

> Sorry,I forgot to paste the corresponding PR:
> https://github.com/apache/pulsar/pull/19048
>
> I have fixed the typo in above PR.
>
>
>
> On 2023/01/16 02:10:07 PengHui Li wrote:
>
> > +1
>
> >
>
> > And It might be a typo when adding the table view APIs
>
> > From the example:
>
> >
>
> >
> https://github.com/apache/pulsar/blob/246c2701e5c43e02e9783c82d4d107d06b019951/pulsar-client-api/src/main/java/org/apache/pulsar/client/api/PulsarClient.java#L232-L238
>
> >
>
> > And from the proposal, the method name is newTableView()
>
> >
>
> > https://github.com/apache/pulsar/issues/12356
>
> >
>
> > Thanks,
>
> > Penghui
>
> >
>
> > On Sun, Jan 15, 2023 at 6:31 PM Ruguo Yu 
> wrote:
>
> >
>
> > > Hi Community,
>
> > >
>
> > > Currently, we can use PulsarClient to create `Producer`, `Consumer`,
>
> > > `Reader` and `TableView`, releated method as below:
>
> > >
>
> > > ```
>
> > >
>
> > > ProducerBuilder newProducer();
>
> > >
>
> > >  ProducerBuilder newProducer(Schema schema);
>
> > >
>
> > > ConsumerBuilder newConsumer();
>
> > >
>
> > >  ConsumerBuilder newConsumer(Schema schema);
>
> > >
>
> > > ReaderBuilder newReader();
>
> > >
>
> > >  ReaderBuilder newReader(Schema schema);
>
> > >
>
> > >  TableViewBuilder newTableViewBuilder(Schema schema);
>
> > >
>
> > > ```
>
> > >
>
> > > However, it is obvious that the method of creating `TableView` is not
>
> > > consistent with other methods, and no method with default scheme is
>
> > > provided.
>
> > >
>
> > >
>
> > >
>
> > > ### Motivation
>
> > >
>
> > > Add unified `newTableView(Schema)` method and replace
>
> > > `newTableViewBuilder(Schema)`(attach `@Deprecated` annotation to it) in
>
> > > PulsarClient, which could consistent with `newProducer(Schema)`,
>
> > > `newConsumer(Schema)`, `newReader(Schema)`.
>
> > >
>
> > > In addition, we will provide `newTableView()` method which  has default
>
> > > schema.
>
> > >
>
> > >
>
> > >
>
> > > ### Modifications
>
> > >
>
> > > 1. Add `newTableView(Schema)` method and replace
>
> > > `newTableViewBuilder(Schema)`(attach `@Deprecated` annotation to it)
> in
>
> > > PulsarClient
>
> > >
>
> > > 2. Add `newTableView()` method and Scheme default is `Schema.BYTES` in
>
> > > PulsarClient
>
> > >
>
> > >
>
> > >
>
> > > Thanks
>
> > >
>
> > > Ruguo Yu
>
> > >
>
> > >
>
> >
>
>
>
>
>
>


Re: [DISCUSS] Add unified newTableView method in PulsarClient

2023-01-15 Thread Enrico Olivelli
+1

Enrico

Il Lun 16 Gen 2023, 03:11 PengHui Li  ha scritto:

> +1
>
> And It might be a typo when adding the table view APIs
> From the example:
>
>
> https://github.com/apache/pulsar/blob/246c2701e5c43e02e9783c82d4d107d06b019951/pulsar-client-api/src/main/java/org/apache/pulsar/client/api/PulsarClient.java#L232-L238
>
> And from the proposal, the method name is newTableView()
>
> https://github.com/apache/pulsar/issues/12356
>
> Thanks,
> Penghui
>
> On Sun, Jan 15, 2023 at 6:31 PM Ruguo Yu  wrote:
>
> > Hi Community,
> >
> > Currently, we can use PulsarClient to create `Producer`, `Consumer`,
> > `Reader` and `TableView`, releated method as below:
> >
> > ```
> >
> > ProducerBuilder newProducer();
> >
> >  ProducerBuilder newProducer(Schema schema);
> >
> > ConsumerBuilder newConsumer();
> >
> >  ConsumerBuilder newConsumer(Schema schema);
> >
> > ReaderBuilder newReader();
> >
> >  ReaderBuilder newReader(Schema schema);
> >
> >  TableViewBuilder newTableViewBuilder(Schema schema);
> >
> > ```
> >
> > However, it is obvious that the method of creating `TableView` is not
> > consistent with other methods, and no method with default scheme is
> > provided.
> >
> >
> >
> > ### Motivation
> >
> > Add unified `newTableView(Schema)` method and replace
> > `newTableViewBuilder(Schema)`(attach `@Deprecated` annotation to it) in
> > PulsarClient, which could consistent with `newProducer(Schema)`,
> > `newConsumer(Schema)`, `newReader(Schema)`.
> >
> > In addition, we will provide `newTableView()` method which  has default
> > schema.
> >
> >
> >
> > ### Modifications
> >
> > 1. Add `newTableView(Schema)` method and replace
> > `newTableViewBuilder(Schema)`(attach `@Deprecated` annotation to it)  in
> > PulsarClient
> >
> > 2. Add `newTableView()` method and Scheme default is `Schema.BYTES` in
> > PulsarClient
> >
> >
> >
> > Thanks
> >
> > Ruguo Yu
> >
> >
>