Missing Pulsar entry in Apache Projects list page

2023-11-10 Thread Kiryl Valkovich
The project has an entry here:
https://projects.apache.org/projects.html?name

but hasn't here: https://projects.apache.org/projects.html?language
and here: https://projects.apache.org/projects.html?category

Maybe someone who knows where to fix it can fix it.


Best,
Kiryl


Re: [VOTE] Pulsar Release 3.0.2 Candidate 2

2023-11-10 Thread Yubiao Feng
Hi Lari

Waiting for https://github.com/apache/pulsar/pull/21545 before, it has been
merged today.

I will push Candidate 3 next week

Thanks
Yubiao

On Thu, Nov 9, 2023 at 7:56 PM Lari Hotari  wrote:

> Any updates on the 3.0.2 release?
>
> -Lari
>
> On 2023/10/30 03:23:06 Yubiao Feng wrote:
> > Update:
> >  blocked by https://github.com/apache/pulsar/pull/21445
> >
> > I will do a new candidate later
> >
> > Thanks
> > Yubiao Feng
> >
> > On Fri, Oct 27, 2023 at 10:50 PM Yubiao Feng <
> yubiao.f...@streamnative.io>
> > wrote:
> >
> > > This is the first release candidate for Apache Pulsar version 3.0.2.
> > >
> > > It fixes the following issues:
> > >
> > >
> https://github.com/apache/pulsar/pulls?q=is%3Apr+is%3Amerged+label%3Arelease%2F3.0.2+label%3Acherry-picked%2Fbranch-3.0+
> > >
> > > *** Please download, test and vote on this release. This vote will
> > > stay open for at least 72 hours ***
> > >
> > > Note that we are voting upon the source (tag), binaries are provided
> > > for convenience.
> > >
> > > Source and binary files:
> > >
> https://dist.apache.org/repos/dist/dev/pulsar/pulsar-3.0.2-candidate-2/
> > >
> > > SHA-512 checksums:
> > >
> > >
> > >
> a4eaf3afabfe89f34d4cd29b2bc63ad2219a729319c159bae17940cede3afcf8aebd4d467b9f5226a17ab0b7e878300038364d4d122193aa3494f3b9bad0b3cc
> > >
> > > apache-pulsar-3.0.2-bin.tar.gz
> > >
> > >
> > >
> 9e0103f93e00c09c5db8a4cdf1b7d135bed5f0aa5f1c40a52d8caf4f3d269ca4972e25e87d8c0254212e7e089421ede1a92a608824fd1a240c05372b349ed095
> > >
> > > apache-pulsar-3.0.2-src.tar.gz
> > >
> > > Maven staging repo:
> > >
> https://repository.apache.org/content/repositories/orgapachepulsar-1245/
> > >
> > > The tag to verify:
> > > v3.0.2-candidate-2 (cd5d2bef8d65c0f6158b8eb4b7ca7fbbde7028c1)
> > > https://github.com/apache/pulsar/releases/tag/v3.0.2-candidate-2
> > >
> > > Pulsar's KEYS file containing PGP keys you use to sign the release:
> > > https://dist.apache.org/repos/dist/dev/pulsar/KEYS
> > >
> > > Docker images:
> > >
> > > pulsar images:
> > > https://hub.docker.com/repository/docker/9947090/pulsar-all
> > >
> > > pulsar-all images:
> > > https://hub.docker.com/repository/docker/9947090/pulsar
> > >
> > > Please download the source package, and follow the README to build
> > > and run the Pulsar standalone service.
> > >
> > >
> > > Regards
> > > Yubiao Feng(poorbarcode)
> > >
> >
>


Re: [VOTE] Pulsar Release 3.0.2 Candidate 2

2023-11-10 Thread Yubiao Feng
Hi Lari

I got a conflict when cherry-picking #21445 and #21545 into branch 3.0, so
Candidate 3 will be a bit slower.

Thanks
Yubiao Feng

On Thu, Nov 9, 2023 at 7:56 PM Lari Hotari  wrote:

> Any updates on the 3.0.2 release?
>
> -Lari
>
> On 2023/10/30 03:23:06 Yubiao Feng wrote:
> > Update:
> >  blocked by https://github.com/apache/pulsar/pull/21445
> >
> > I will do a new candidate later
> >
> > Thanks
> > Yubiao Feng
> >
> > On Fri, Oct 27, 2023 at 10:50 PM Yubiao Feng <
> yubiao.f...@streamnative.io>
> > wrote:
> >
> > > This is the first release candidate for Apache Pulsar version 3.0.2.
> > >
> > > It fixes the following issues:
> > >
> > >
> https://github.com/apache/pulsar/pulls?q=is%3Apr+is%3Amerged+label%3Arelease%2F3.0.2+label%3Acherry-picked%2Fbranch-3.0+
> > >
> > > *** Please download, test and vote on this release. This vote will
> > > stay open for at least 72 hours ***
> > >
> > > Note that we are voting upon the source (tag), binaries are provided
> > > for convenience.
> > >
> > > Source and binary files:
> > >
> https://dist.apache.org/repos/dist/dev/pulsar/pulsar-3.0.2-candidate-2/
> > >
> > > SHA-512 checksums:
> > >
> > >
> > >
> a4eaf3afabfe89f34d4cd29b2bc63ad2219a729319c159bae17940cede3afcf8aebd4d467b9f5226a17ab0b7e878300038364d4d122193aa3494f3b9bad0b3cc
> > >
> > > apache-pulsar-3.0.2-bin.tar.gz
> > >
> > >
> > >
> 9e0103f93e00c09c5db8a4cdf1b7d135bed5f0aa5f1c40a52d8caf4f3d269ca4972e25e87d8c0254212e7e089421ede1a92a608824fd1a240c05372b349ed095
> > >
> > > apache-pulsar-3.0.2-src.tar.gz
> > >
> > > Maven staging repo:
> > >
> https://repository.apache.org/content/repositories/orgapachepulsar-1245/
> > >
> > > The tag to verify:
> > > v3.0.2-candidate-2 (cd5d2bef8d65c0f6158b8eb4b7ca7fbbde7028c1)
> > > https://github.com/apache/pulsar/releases/tag/v3.0.2-candidate-2
> > >
> > > Pulsar's KEYS file containing PGP keys you use to sign the release:
> > > https://dist.apache.org/repos/dist/dev/pulsar/KEYS
> > >
> > > Docker images:
> > >
> > > pulsar images:
> > > https://hub.docker.com/repository/docker/9947090/pulsar-all
> > >
> > > pulsar-all images:
> > > https://hub.docker.com/repository/docker/9947090/pulsar
> > >
> > > Please download the source package, and follow the README to build
> > > and run the Pulsar standalone service.
> > >
> > >
> > > Regards
> > > Yubiao Feng(poorbarcode)
> > >
> >
>


Re: [DISCUSS] PIP-310: Support custom publish rate limiters

2023-11-10 Thread Lari Hotari
Hi Girish,

replies inline.

> > I'd pick the first bucket for handling the 10MB rate.
> > The capacity of the first bucket would be 15MB * 120=1800MB. The fill
> > would happen in special way. I'm not sure if Bucket4J has this at all.
> > So describing the way of adding tokens to the bucket: the tokens in
> > the bucket would remain the same when the rate is <10MB. As many
> >
>
> How is this special behavior (tokens in bucket remaining the same when rate
> is <10MB) achieved? I would assume that to even figure out that the rate is
> less than 10MB, there is some counter going around?

It's possible to be creative in implementing the token bucket algorithm.
New tokens will be added with the configured rate. When there actual
traffic rate is less than the token bucket token rate,
there will be left over tokens. One way to implement this is to
calculate the amount of new tokens
before using the tokens. The immediately used tokens are subtracted
from the new tokens and the left over tokens are added to the
separate "filling bucket". I explained earlier that this filling
bucket would be poured in the actual bucket every 10 minutes in the
example scenario.

>
>
> > tokens would be added to the bucket as are consumed by the actual
> > traffic. The left over tokens 10MB - actual rate would go to a
> > separate filling bucket that gets poured into the actual bucket every
> > 10 minutes.
> > This first bucket with this separate "filling bucket" would handle the
> > bursting up to 1800MB.
> >
>
> But this isn't the requirement? Let's assume that the actual traffic has
> been 5MB for a while and this 1800MB capacity bucket is all filled up now..
> What's the real use here for that at all?

How would the rate limiter know if 5MB traffic is degraded traffic
which would need to be allowed to burst?

> I think this approach of thinking about rate limiter - "earning the right
> to burst by letting tokens remain into the bucket, (by doing lower than
> 10MB for a while)" doesn't not fit well in a messaging use case in real
> world, or theoretic.

It might feel like it doesn't fit well, but this is how most rate
limiters work. The model works very well in practice.
Without "earning the right to burst", there would have to be some
other way to detect whether there's a need to burst.
The need to burst could be detected by calculating the time that the
message to be sent has been in queues until it's about to be sent.
In other words, from the end-to-end latency.

> For a 10MB topic, if the actual produce has been , say, 5MB for a long
> while, this shouldn't give the right to that topic to burst to 15MB for as
> much as tokens are present.. This is purely due to the fact that this will
> then start stressing the network and bookie disks.

Well, there's always an option to not configure bursting this way or
limiting the maximum rate of bursting, like it is possible in a dual
token bucket implementation.

> Imagine a 100 of such topics going around with similar configuration of
> fixed+burst limits and were doing way lower than the fixed rate for the
> past couple of hours. Now that they've earned enough tokens, if they all
> start bursting, this will bring down the system, which is probably not
> capable of supporting simultaneous peaks of all possible topics at all.

This is the challenge of capacity management and end-to-end flow
control and backpressure.
With proper system wide capacity management and end-to-end back
pressure, the system won't collapse.

As reference, let's take a look at the Confluent Kora paper [1]:
In 5.2.1 Back Pressure and Auto-Tuning:
"Backpressure is achieved via auto-tuning tenant quotas on the broker
such that the combined tenant usage remains below the broker-wide
limit. The tenant quotas are auto-tuned proportionally to their total
quota allocation on the broker. This mechanism ensures fair sharing of
resources among tenants during temporary overload and re-uses the
quota enforcement mechanism for backpressure."
"The broker-wide limits are generally defined by benchmarking brokers
across clouds. The CPU-related limit is unique because there is no
easy way to measure and attribute CPU usage to a tenant. Instead, the
quota is defned as the clock time the broker spends processing
requests and connections, and the safe limit is variable. So to
protect CPU, request backpressure is triggered when request queues
reache a certain threshold."
In 5.2.2 Dynamic Quota Management"
"A straightforward method for distributing tenant-level quotas among
the brokers hosting the tenant is to statically divide the quota
evenly across the brokers. This static approach, which was deployed
initially, worked reasonably well on lower subscribed clusters."...
"Kora addresses this issue by using a dynamic quota mechanism that
adjusts bandwidth distribution based on a tenant’s bandwidth
consumption. This is achieved through the use of a shared quota
service to manage quota distribution, a design similar to that used by
other 

Re: Missing Pulsar entry in Apache Projects list page

2023-11-10 Thread Lari Hotari
Hi Kiryl,

Pulsar is missing a so-called DOAP file which is an XML RDF file. You
can follow https://projects.apache.org/create.html to create the
required file
another possibility is to take inspiration from
https://github.com/apache/kafka/blob/trunk/doap_Kafka.rdf .

Please send a PR for adding a file called doap_Pulsar.rdf to
apache/pulsar . After the PR has been merged, a Pulsar PMC member
would have to take care of registering the URL to the project master
list as explained in the "DOAP Creation" guide .

Thanks for volunteering!

-Lari

On Fri, 10 Nov 2023 at 10:30, Kiryl Valkovich
 wrote:
>
> The project has an entry here:
> https://projects.apache.org/projects.html?name
>
> but hasn't here: https://projects.apache.org/projects.html?language
> and here: https://projects.apache.org/projects.html?category
>
> Maybe someone who knows where to fix it can fix it.
>
>
> Best,
> Kiryl


Re: [DISCUSS] PIP-310: Support custom publish rate limiters

2023-11-10 Thread Girish Sharma
Hello Lari, replies inline. It's festive season here so I might be late in
the next reply.


On Fri, Nov 10, 2023 at 4:51 PM Lari Hotari  wrote:

> Hi Girish,
>
> >
> >
> > > tokens would be added to the bucket as are consumed by the actual
> > > traffic. The left over tokens 10MB - actual rate would go to a
> > > separate filling bucket that gets poured into the actual bucket every
> > > 10 minutes.
> > > This first bucket with this separate "filling bucket" would handle the
> > > bursting up to 1800MB.
> > >
> >
> > But this isn't the requirement? Let's assume that the actual traffic has
> > been 5MB for a while and this 1800MB capacity bucket is all filled up
> now..
> > What's the real use here for that at all?
>
> How would the rate limiter know if 5MB traffic is degraded traffic
> which would need to be allowed to burst?
>

That's not what I was implying. I was trying to question the need for
1800MB worth of capacity. I am assuming this is to allow a 2 minute burst
of 15MBps? But isn't this bucket only taking are of the delta beyond 10MBps?
Moreover, once 2 minutes are elapsed, which bucket is ensuring that the
rate is only allowed to go upto 10MBps?


> > I think this approach of thinking about rate limiter - "earning the right
> > to burst by letting tokens remain into the bucket, (by doing lower than
> > 10MB for a while)" doesn't not fit well in a messaging use case in real
> > world, or theoretic.
>
> It might feel like it doesn't fit well, but this is how most rate
> limiters work. The model works very well in practice.
> Without "earning the right to burst", there would have to be some
> other way to detect whether there's a need to burst.
>

Why does the rate limiter need to decide if there is a need to burst? It is
dependent on the incoming message rate. Since the rate limiter has no
knowledge of what is going to happen in future, it cannot assume that the
messages beyond a fixed rate (10MB in our example) can be held/paused until
next second - thus deciding that there is no need to burst right now.

While I understand that earning the right to burst model works well in
practice, another approach to think about this is that the bucket,
initially, starts filled. Moreover, the tokens here are being filled into
the bucket due to the available disk, cpu and network bandwidth.. The
general approach of where the tokens are initially required to be earned
might be helpful to tackle cold starts, but beyond that, a topic doing
5MBps to accumulate enough tokens to burst for a few minutes in the future
doesn't really translate to the physical world, does it? The 1:1
translation here is to an SSD where the topic's data is actually being
written to - So my bursting upto 15MBps doesn't really depend on the fact
that I was only doing 5 MBps in the last few minutes (and thus,
accumulating the remaining 5MBps worth tokens towards the burst) - now does
it? The SSD won't really gain the ability to allow for burst just because
there was low throughput in the last few minutes. Not even from a space POV
either.



> The need to burst could be detected by calculating the time that the
> message to be sent has been in queues until it's about to be sent.
> In other words, from the end-to-end latency.
>

In practice this level of cross component coordination would never result
in a responsive and spontaneous system.. unless each message comes along
with a "I waited in queue for this long" time and on the server side, we
now read the netty channel, parse the message and figure this out before
checking for rate limiting.. At which point, rate limiting isn' treally
doing anything if the broker is reading every message anyway. This may also
require prioritization of messages as some messages from a different
producer to the same partition may have waited longer than others.. This is
out of scope of a rate limiter at this point.

> Imagine a 100 of such topics going around with similar configuration of
> > fixed+burst limits and were doing way lower than the fixed rate for the
> > past couple of hours. Now that they've earned enough tokens, if they all
> > start bursting, this will bring down the system, which is probably not
> > capable of supporting simultaneous peaks of all possible topics at all.
>
> This is the challenge of capacity management and end-to-end flow
> control and backpressure.
>

Rate limiter is a major facilitator and guardian of capacity planning. So,
it does relate here.


> With proper system wide capacity management and end-to-end back
> pressure, the system won't collapse.
>

At no point I am trying to go beyond the purview of a single broker here.
Unless, by system, you meant a single broker itself. For which, I talk
about a broker level rate limiter further below in the example.


> In Pulsar, we have "PIP 82: Tenant and namespace level rate limiting"
> [4] which introduced the "resource group" concept. There is the
> resourcegroup resource in the Admin REST API [5]. There's also a
> resource ca