Missing Pulsar entry in Apache Projects list page
The project has an entry here: https://projects.apache.org/projects.html?name but hasn't here: https://projects.apache.org/projects.html?language and here: https://projects.apache.org/projects.html?category Maybe someone who knows where to fix it can fix it. Best, Kiryl
Re: [VOTE] Pulsar Release 3.0.2 Candidate 2
Hi Lari Waiting for https://github.com/apache/pulsar/pull/21545 before, it has been merged today. I will push Candidate 3 next week Thanks Yubiao On Thu, Nov 9, 2023 at 7:56 PM Lari Hotari wrote: > Any updates on the 3.0.2 release? > > -Lari > > On 2023/10/30 03:23:06 Yubiao Feng wrote: > > Update: > > blocked by https://github.com/apache/pulsar/pull/21445 > > > > I will do a new candidate later > > > > Thanks > > Yubiao Feng > > > > On Fri, Oct 27, 2023 at 10:50 PM Yubiao Feng < > yubiao.f...@streamnative.io> > > wrote: > > > > > This is the first release candidate for Apache Pulsar version 3.0.2. > > > > > > It fixes the following issues: > > > > > > > https://github.com/apache/pulsar/pulls?q=is%3Apr+is%3Amerged+label%3Arelease%2F3.0.2+label%3Acherry-picked%2Fbranch-3.0+ > > > > > > *** Please download, test and vote on this release. This vote will > > > stay open for at least 72 hours *** > > > > > > Note that we are voting upon the source (tag), binaries are provided > > > for convenience. > > > > > > Source and binary files: > > > > https://dist.apache.org/repos/dist/dev/pulsar/pulsar-3.0.2-candidate-2/ > > > > > > SHA-512 checksums: > > > > > > > > > > a4eaf3afabfe89f34d4cd29b2bc63ad2219a729319c159bae17940cede3afcf8aebd4d467b9f5226a17ab0b7e878300038364d4d122193aa3494f3b9bad0b3cc > > > > > > apache-pulsar-3.0.2-bin.tar.gz > > > > > > > > > > 9e0103f93e00c09c5db8a4cdf1b7d135bed5f0aa5f1c40a52d8caf4f3d269ca4972e25e87d8c0254212e7e089421ede1a92a608824fd1a240c05372b349ed095 > > > > > > apache-pulsar-3.0.2-src.tar.gz > > > > > > Maven staging repo: > > > > https://repository.apache.org/content/repositories/orgapachepulsar-1245/ > > > > > > The tag to verify: > > > v3.0.2-candidate-2 (cd5d2bef8d65c0f6158b8eb4b7ca7fbbde7028c1) > > > https://github.com/apache/pulsar/releases/tag/v3.0.2-candidate-2 > > > > > > Pulsar's KEYS file containing PGP keys you use to sign the release: > > > https://dist.apache.org/repos/dist/dev/pulsar/KEYS > > > > > > Docker images: > > > > > > pulsar images: > > > https://hub.docker.com/repository/docker/9947090/pulsar-all > > > > > > pulsar-all images: > > > https://hub.docker.com/repository/docker/9947090/pulsar > > > > > > Please download the source package, and follow the README to build > > > and run the Pulsar standalone service. > > > > > > > > > Regards > > > Yubiao Feng(poorbarcode) > > > > > >
Re: [VOTE] Pulsar Release 3.0.2 Candidate 2
Hi Lari I got a conflict when cherry-picking #21445 and #21545 into branch 3.0, so Candidate 3 will be a bit slower. Thanks Yubiao Feng On Thu, Nov 9, 2023 at 7:56 PM Lari Hotari wrote: > Any updates on the 3.0.2 release? > > -Lari > > On 2023/10/30 03:23:06 Yubiao Feng wrote: > > Update: > > blocked by https://github.com/apache/pulsar/pull/21445 > > > > I will do a new candidate later > > > > Thanks > > Yubiao Feng > > > > On Fri, Oct 27, 2023 at 10:50 PM Yubiao Feng < > yubiao.f...@streamnative.io> > > wrote: > > > > > This is the first release candidate for Apache Pulsar version 3.0.2. > > > > > > It fixes the following issues: > > > > > > > https://github.com/apache/pulsar/pulls?q=is%3Apr+is%3Amerged+label%3Arelease%2F3.0.2+label%3Acherry-picked%2Fbranch-3.0+ > > > > > > *** Please download, test and vote on this release. This vote will > > > stay open for at least 72 hours *** > > > > > > Note that we are voting upon the source (tag), binaries are provided > > > for convenience. > > > > > > Source and binary files: > > > > https://dist.apache.org/repos/dist/dev/pulsar/pulsar-3.0.2-candidate-2/ > > > > > > SHA-512 checksums: > > > > > > > > > > a4eaf3afabfe89f34d4cd29b2bc63ad2219a729319c159bae17940cede3afcf8aebd4d467b9f5226a17ab0b7e878300038364d4d122193aa3494f3b9bad0b3cc > > > > > > apache-pulsar-3.0.2-bin.tar.gz > > > > > > > > > > 9e0103f93e00c09c5db8a4cdf1b7d135bed5f0aa5f1c40a52d8caf4f3d269ca4972e25e87d8c0254212e7e089421ede1a92a608824fd1a240c05372b349ed095 > > > > > > apache-pulsar-3.0.2-src.tar.gz > > > > > > Maven staging repo: > > > > https://repository.apache.org/content/repositories/orgapachepulsar-1245/ > > > > > > The tag to verify: > > > v3.0.2-candidate-2 (cd5d2bef8d65c0f6158b8eb4b7ca7fbbde7028c1) > > > https://github.com/apache/pulsar/releases/tag/v3.0.2-candidate-2 > > > > > > Pulsar's KEYS file containing PGP keys you use to sign the release: > > > https://dist.apache.org/repos/dist/dev/pulsar/KEYS > > > > > > Docker images: > > > > > > pulsar images: > > > https://hub.docker.com/repository/docker/9947090/pulsar-all > > > > > > pulsar-all images: > > > https://hub.docker.com/repository/docker/9947090/pulsar > > > > > > Please download the source package, and follow the README to build > > > and run the Pulsar standalone service. > > > > > > > > > Regards > > > Yubiao Feng(poorbarcode) > > > > > >
Re: [DISCUSS] PIP-310: Support custom publish rate limiters
Hi Girish, replies inline. > > I'd pick the first bucket for handling the 10MB rate. > > The capacity of the first bucket would be 15MB * 120=1800MB. The fill > > would happen in special way. I'm not sure if Bucket4J has this at all. > > So describing the way of adding tokens to the bucket: the tokens in > > the bucket would remain the same when the rate is <10MB. As many > > > > How is this special behavior (tokens in bucket remaining the same when rate > is <10MB) achieved? I would assume that to even figure out that the rate is > less than 10MB, there is some counter going around? It's possible to be creative in implementing the token bucket algorithm. New tokens will be added with the configured rate. When there actual traffic rate is less than the token bucket token rate, there will be left over tokens. One way to implement this is to calculate the amount of new tokens before using the tokens. The immediately used tokens are subtracted from the new tokens and the left over tokens are added to the separate "filling bucket". I explained earlier that this filling bucket would be poured in the actual bucket every 10 minutes in the example scenario. > > > > tokens would be added to the bucket as are consumed by the actual > > traffic. The left over tokens 10MB - actual rate would go to a > > separate filling bucket that gets poured into the actual bucket every > > 10 minutes. > > This first bucket with this separate "filling bucket" would handle the > > bursting up to 1800MB. > > > > But this isn't the requirement? Let's assume that the actual traffic has > been 5MB for a while and this 1800MB capacity bucket is all filled up now.. > What's the real use here for that at all? How would the rate limiter know if 5MB traffic is degraded traffic which would need to be allowed to burst? > I think this approach of thinking about rate limiter - "earning the right > to burst by letting tokens remain into the bucket, (by doing lower than > 10MB for a while)" doesn't not fit well in a messaging use case in real > world, or theoretic. It might feel like it doesn't fit well, but this is how most rate limiters work. The model works very well in practice. Without "earning the right to burst", there would have to be some other way to detect whether there's a need to burst. The need to burst could be detected by calculating the time that the message to be sent has been in queues until it's about to be sent. In other words, from the end-to-end latency. > For a 10MB topic, if the actual produce has been , say, 5MB for a long > while, this shouldn't give the right to that topic to burst to 15MB for as > much as tokens are present.. This is purely due to the fact that this will > then start stressing the network and bookie disks. Well, there's always an option to not configure bursting this way or limiting the maximum rate of bursting, like it is possible in a dual token bucket implementation. > Imagine a 100 of such topics going around with similar configuration of > fixed+burst limits and were doing way lower than the fixed rate for the > past couple of hours. Now that they've earned enough tokens, if they all > start bursting, this will bring down the system, which is probably not > capable of supporting simultaneous peaks of all possible topics at all. This is the challenge of capacity management and end-to-end flow control and backpressure. With proper system wide capacity management and end-to-end back pressure, the system won't collapse. As reference, let's take a look at the Confluent Kora paper [1]: In 5.2.1 Back Pressure and Auto-Tuning: "Backpressure is achieved via auto-tuning tenant quotas on the broker such that the combined tenant usage remains below the broker-wide limit. The tenant quotas are auto-tuned proportionally to their total quota allocation on the broker. This mechanism ensures fair sharing of resources among tenants during temporary overload and re-uses the quota enforcement mechanism for backpressure." "The broker-wide limits are generally defined by benchmarking brokers across clouds. The CPU-related limit is unique because there is no easy way to measure and attribute CPU usage to a tenant. Instead, the quota is defned as the clock time the broker spends processing requests and connections, and the safe limit is variable. So to protect CPU, request backpressure is triggered when request queues reache a certain threshold." In 5.2.2 Dynamic Quota Management" "A straightforward method for distributing tenant-level quotas among the brokers hosting the tenant is to statically divide the quota evenly across the brokers. This static approach, which was deployed initially, worked reasonably well on lower subscribed clusters."... "Kora addresses this issue by using a dynamic quota mechanism that adjusts bandwidth distribution based on a tenant’s bandwidth consumption. This is achieved through the use of a shared quota service to manage quota distribution, a design similar to that used by other
Re: Missing Pulsar entry in Apache Projects list page
Hi Kiryl, Pulsar is missing a so-called DOAP file which is an XML RDF file. You can follow https://projects.apache.org/create.html to create the required file another possibility is to take inspiration from https://github.com/apache/kafka/blob/trunk/doap_Kafka.rdf . Please send a PR for adding a file called doap_Pulsar.rdf to apache/pulsar . After the PR has been merged, a Pulsar PMC member would have to take care of registering the URL to the project master list as explained in the "DOAP Creation" guide . Thanks for volunteering! -Lari On Fri, 10 Nov 2023 at 10:30, Kiryl Valkovich wrote: > > The project has an entry here: > https://projects.apache.org/projects.html?name > > but hasn't here: https://projects.apache.org/projects.html?language > and here: https://projects.apache.org/projects.html?category > > Maybe someone who knows where to fix it can fix it. > > > Best, > Kiryl
Re: [DISCUSS] PIP-310: Support custom publish rate limiters
Hello Lari, replies inline. It's festive season here so I might be late in the next reply. On Fri, Nov 10, 2023 at 4:51 PM Lari Hotari wrote: > Hi Girish, > > > > > > > > tokens would be added to the bucket as are consumed by the actual > > > traffic. The left over tokens 10MB - actual rate would go to a > > > separate filling bucket that gets poured into the actual bucket every > > > 10 minutes. > > > This first bucket with this separate "filling bucket" would handle the > > > bursting up to 1800MB. > > > > > > > But this isn't the requirement? Let's assume that the actual traffic has > > been 5MB for a while and this 1800MB capacity bucket is all filled up > now.. > > What's the real use here for that at all? > > How would the rate limiter know if 5MB traffic is degraded traffic > which would need to be allowed to burst? > That's not what I was implying. I was trying to question the need for 1800MB worth of capacity. I am assuming this is to allow a 2 minute burst of 15MBps? But isn't this bucket only taking are of the delta beyond 10MBps? Moreover, once 2 minutes are elapsed, which bucket is ensuring that the rate is only allowed to go upto 10MBps? > > I think this approach of thinking about rate limiter - "earning the right > > to burst by letting tokens remain into the bucket, (by doing lower than > > 10MB for a while)" doesn't not fit well in a messaging use case in real > > world, or theoretic. > > It might feel like it doesn't fit well, but this is how most rate > limiters work. The model works very well in practice. > Without "earning the right to burst", there would have to be some > other way to detect whether there's a need to burst. > Why does the rate limiter need to decide if there is a need to burst? It is dependent on the incoming message rate. Since the rate limiter has no knowledge of what is going to happen in future, it cannot assume that the messages beyond a fixed rate (10MB in our example) can be held/paused until next second - thus deciding that there is no need to burst right now. While I understand that earning the right to burst model works well in practice, another approach to think about this is that the bucket, initially, starts filled. Moreover, the tokens here are being filled into the bucket due to the available disk, cpu and network bandwidth.. The general approach of where the tokens are initially required to be earned might be helpful to tackle cold starts, but beyond that, a topic doing 5MBps to accumulate enough tokens to burst for a few minutes in the future doesn't really translate to the physical world, does it? The 1:1 translation here is to an SSD where the topic's data is actually being written to - So my bursting upto 15MBps doesn't really depend on the fact that I was only doing 5 MBps in the last few minutes (and thus, accumulating the remaining 5MBps worth tokens towards the burst) - now does it? The SSD won't really gain the ability to allow for burst just because there was low throughput in the last few minutes. Not even from a space POV either. > The need to burst could be detected by calculating the time that the > message to be sent has been in queues until it's about to be sent. > In other words, from the end-to-end latency. > In practice this level of cross component coordination would never result in a responsive and spontaneous system.. unless each message comes along with a "I waited in queue for this long" time and on the server side, we now read the netty channel, parse the message and figure this out before checking for rate limiting.. At which point, rate limiting isn' treally doing anything if the broker is reading every message anyway. This may also require prioritization of messages as some messages from a different producer to the same partition may have waited longer than others.. This is out of scope of a rate limiter at this point. > Imagine a 100 of such topics going around with similar configuration of > > fixed+burst limits and were doing way lower than the fixed rate for the > > past couple of hours. Now that they've earned enough tokens, if they all > > start bursting, this will bring down the system, which is probably not > > capable of supporting simultaneous peaks of all possible topics at all. > > This is the challenge of capacity management and end-to-end flow > control and backpressure. > Rate limiter is a major facilitator and guardian of capacity planning. So, it does relate here. > With proper system wide capacity management and end-to-end back > pressure, the system won't collapse. > At no point I am trying to go beyond the purview of a single broker here. Unless, by system, you meant a single broker itself. For which, I talk about a broker level rate limiter further below in the example. > In Pulsar, we have "PIP 82: Tenant and namespace level rate limiting" > [4] which introduced the "resource group" concept. There is the > resourcegroup resource in the Admin REST API [5]. There's also a > resource ca