[GitHub] [pulsar-manager] nicknezis commented on pull request #438: Updated to fix Log4J security vulnerabilities
nicknezis commented on pull request #438: URL: https://github.com/apache/pulsar-manager/pull/438#issuecomment-1008307363 Looks like CI passed this time. I was not able to build locally before or after the change I'm introducing. But I did verify that the Gradle dependencies were updated to point to the new log4j. I was getting similar error during `./gradlew build` as I see in the failed output of #420. Having a release with this update would be greatly appreciated because at the moment we are not allowed to use Pulsar Manager. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [pulsar-helm-chart] Anonymitaet commented on a change in pull request #192: [CI] Upgrade k8s to 1.18 and also upgrade helm, kind & chart releaser versions
Anonymitaet commented on a change in pull request #192: URL: https://github.com/apache/pulsar-helm-chart/pull/192#discussion_r780866433 ## File path: README.md ## @@ -72,9 +72,9 @@ It includes support for: In order to use this chart to deploy Apache Pulsar on Kubernetes, the followings are required. -1. kubectl 1.14 or higher, compatible with your cluster ([+/- 1 minor release from your cluster](https://kubernetes.io/docs/tasks/tools/install-kubectl/#before-you-begin)) +1. kubectl 1.18 or higher, compatible with your cluster ([+/- 1 minor release from your cluster](https://kubernetes.io/docs/tasks/tools/install-kubectl/#before-you-begin)) Review comment: Thanks for updating docs! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [DISCUSSION] PIP-124: Create init subscription before sending message to DLQ
> I think we should consider adding a new policy for Pulsar topics: a namespace (or topic) policy that makes it possible to retain messages indefinitely when a topic has no subscriptions. It looks like a feature that supports retaining data while no subscriptions. With infinite data retention, the data will not be removed after all the subscriptions acked the message. But with “retain_data_no_subscriptions”, the data will be removed after all the subscriptions acked messages. But for the subsequent subscriptions, still can't retrieve all the data, so it looks like only guarantee the first subscription can retrieve all the data. If users want to guarantee all the subscriptions (all the existing and will create subscriptions), that is equivalent to infinite data retention. For the auto-created topic, the subscription can only be determined at the time of creation. It may or may not create. If users are able to determine which consumers are, and these consumers need to receive any message sent by the producer, they should create the topic and subscription manually or use the consumer to trigger the topic auto-creation, not the producer. It is not easy to determine consumer behavior on the producer side. But for DLQ, it's not a normal topic from the user's point of view, it's a local container for a subscription to store the messages that the consumer can't process. It's a "consumer determine consumer behavior", I think this is the most essential difference. Regards, Penghui On Sat, Jan 8, 2022 at 12:34 PM Michael Marshall wrote: > Thanks for your response, Penghui. > > I support simplifying message loss prevention for DLQ topics. However, > it's not clear to me why we should only simplify it for DLQ topics. > > As a Pulsar user, I encountered many of the challenges you mention > when producing to auto created topics. In my architecture, I had > consumers reading from an input topic, transforming the data, and then > producing to an arbitrary number of output topics. My business logic > required that I not lose any messages, which is essentially the same > expectation from DLQ users here. I ended up increasing the retention > policy to about 4 hours on the output topics to minimize the possibility > of losing data. I had to scale up my bookkeeper cluster because of the > extra retention. If I had been able to ensure my auto created topic > would not delete messages before I created my subscriptions, I would > have had no retention policy and a smaller bookie cluster. > > > Yes, essentially, the DLQ is only a topic, no other specific behaviors. > > But the issue that the proposal wants to resolve is not to introduce a > > specific behavior for the DLQ topic or something > > I'm not sure this statement aligns with the PIP. It seems to me that > the PIP proposes solving the message loss issues by adding a DLQ > specific feature to the pulsar client. > > Earlier, I proposed expanding the CreateProducer command to be able to > create a subscription. This solution is not right: it tightly couples > producers and consumers, which we want to avoid. > > I think we should consider adding a new policy for Pulsar topics: a > namespace (or topic) policy that makes it possible to retain messages > indefinitely when a topic has no subscriptions. > > Our message retention feature is very valuable. However, > message retention doesn't solve the "slow to subscribe" consumer > problem. In the event of long network partitions, a consumer might not be > able to subscribe before messages are deleted. This feature > mitigates that risk and allows users to set message retention time > based on other needs, not based on calculations about how long it > could take to subscribe to a topic. > > This feature solves the DLQ message loss issue because the DLQ > producer can produce to any namespace, which is important for clusters > that do not have topic level policies enabled. > > Let me know what you think. > > Thanks, > Michael > > On Tue, Jan 4, 2022 at 10:33 PM PengHui Li wrote: > > > > Thanks for the great comments, Michael. > > > > Let me try to clarify some context about the issue that users encountered > > and the improvement that the proposal wants to Introduce. > > > > > Before we get further into the implementation, I'd like to discuss > > whether the current behavior is the expected behavior, as this is > > the key motivation for this feature. > > > > The DLQ can generate dynamically and users might have short > > data retention for a namespace by time or by size. But the messages > > in the DLQ usually compensate afterward, and we should allow users > > to keep the data in the DLQ only if they want to delete them manually. > > > > The DLQ is always for a subscriber, so a subscriber can use a init name > > to achieve the purpose of not being cleaned up from the DLQ. > > > > So the key point for this proposal is to keep data in the lazy created > DLQ > > topic until users wants to delete them manually. > > > > > I think
Re: [DISCUSSION] PIP-122: Change loadBalancer default loadSheddingStrategy to ThresholdShedder
Looks there is no objection, I will start the official vote for PIP-122 Regards, Hang Michael Marshall 于2022年1月8日周六 07:44写道: > > Hi Frank, > > You can configure the Load Shedding Strategy in the `broker.conf` file > using `loadBalancerLoadSheddingStrategy`. > > The configuration is missing from the document you referenced--it > should be there, though. > > There was a recent PR to add documentation describing the different > configuration options: [0]. Those docs won't go live until 2.10.0 is > released. We could probably add parts of that documentation to the > historical docs, since the Shedding Strategy has existed for some > time. > > - Michael > > [0] https://github.com/apache/pulsar/pull/12902/files > > On Fri, Jan 7, 2022 at 1:29 PM Frank Kelly wrote: > > > > This is very cool. > > > > How is `ThresholdShedder` chosen - I don't see if referenced here? > > https://pulsar.apache.org/docs/en/reference-configuration/ > > > > -F > > > > On Thu, Jan 6, 2022 at 7:45 PM Sijie Guo wrote: > > > > > +1 ThresholdShedder is a much better policy > > > > > > On Tue, Jan 4, 2022 at 6:24 AM Hang Chen wrote: > > > > > > > https://github.com/apache/pulsar/issues/13340 > > > > > > > > Pasted below for quoting convenience. > > > > > > > > > > > > ### Motivation > > > > The ThresholdShedder load balance policy since Pulsar 2.6.0 by > > > > https://github.com/apache/pulsar/pull/6772. It can resolve many load > > > > balance issues of `OverloadShedder` and works well in many Pulsar > > > > production clusters. > > > > > > > > In Pulsar 2.6.0, 2.7.0, 2.8.0 and 2.9.0, Pulsar's default load balance > > > > policy is `OverloadShedder`. > > > > > > > > I think it's a good time for 2.10 to change default load balance > > > > policy to `ThresholdShedder`, it will make throughput more balance > > > > between brokers. > > > > > > > > ### Proposed Changes > > > > In 2.10 release,for `broker.conf`, change > > > > `loadBalancerLoadSheddingStrategy` from > > > > `org.apache.pulsar.broker.loadbalance.impl.OverloadShedder` to > > > > `org.apache.pulsar.broker.loadbalance.impl.ThresholdShedder` > > > > > > >
Re: [DISCUSSION] PIP-121: Pulsar cluster level auto failover
Looks there is no objection, I will start the official vote for PIP-121 Regards, Hang Enrico Olivelli 于2022年1月6日周四 01:52写道: > > I am commeting on the GH issue > > Thanks > > Enrico > > Il Mer 5 Gen 2022, 04:56 PengHui Li ha scritto: > > > +1 > > > > Penghui > > > > On Tue, Jan 4, 2022 at 4:51 PM Hang Chen wrote: > > > > > https://github.com/apache/pulsar/issues/13315 > > > > > > Pasted below for quoting convenience. > > > > > > > > > ### Motivation > > > We have geo-replication to support Pulsar cluster level failover. We > > > can setup Pulsar cluster A as a primary cluster in data center A, and > > > setup Pulsar cluster B as backup cluster in data center B. Then we > > > configure geo-replication between cluster A and cluster B. All the > > > clients are connected to the Pulsar cluster by DNS. If cluster A is > > > down, we should switch the DNS to point the target Pulsar cluster from > > > cluster A to cluster B. After the clients are resolved to cluster B, > > > they can produce and consume messages normally. After cluster A > > > recovers, the administrator should switch the DNS back to cluster A. > > > > > > However, the current method has two shortcomings. > > > 1. The administrator should monitor the status of all Pulsar clusters, > > > and switch the DNS as soon as possible when cluster A is down. The > > > switch and recovery is not automatic and recovery time is controlled > > > by the administrator, which will put the administrator under heavy > > > load. > > > 2. The Pulsar client and DNS system have a cache. When the > > > administrator switches the DNS from cluster A to Cluster B, it will > > > take some time for cache trigger timeout, which will delay client > > > recovery time and lead to the product/consumer message failing. > > > > > > ### Goal > > > It's better to provide an automatic cluster level failure recovery > > > mechanism to make pulsar cluster failover more effective. We should > > > support pulsar clients auto switching from cluster A to cluster B when > > > it detects cluster A has been down according to the configured > > > detecting policy and switch back to cluster A when it has recovered. > > > The reason why we should switch back to cluster A is that most > > > applications may be deployed in data center A and they have low > > > network cost for communicating with pulsar cluster A. If they keep > > > visiting pulsar cluster B, they have high network cost, and cause high > > > produce/consume latency. > > > > > > In order to improve the DNS cache problem, we should provide an > > > administrator controlled switch provider for administrators to update > > > service URLs. > > > > > > In the end, we should provide an auto service URL switch provider and > > > administrator controlled switch provider. > > > > > > ### Design > > > We have already provided the `ServiceUrlProvider` interface to support > > > different service URLs. In order to support automatic cluster level > > > failure auto recovery, we can provide different ServiceUrlProvider > > > implementations. For current requirements, we can provide > > > `AutoClusterFailover` and `ControlledClusterFailover`. > > > > > > AutoClusterFailover > > > In order to support auto switching from the primary cluster to the > > > secondary, we can provide a probe task, which will probe the activity > > > of the primary cluster and the secondary one. When it finds the > > > primary cluster failed more than `failoverDelayMs`, it will switch to > > > the secondary cluster by calling `updateServiceUrl`. After switching > > > to the secondary cluster, the `AutoClusterFailover` will continue to > > > probe the primary cluster. If the primary cluster comes back and > > > remains active for `switchBackDelayMs`, it will switch back to the > > > primary cluster. > > > The APIs are listed as follows. > > > > > > In order to support multiple secondary clusters, use List to store > > > secondary cluster urls. When the primary cluster probe fails for > > > failoverDelayMs, it will start to probe the secondary cluster list one > > > by one, once it finds the active cluster, it will switch to the target > > > cluster. Notice: If you configured multiple clusters, you should turn > > > on cluster level geo-replication to ensure the topic data sync between > > > all primary and secondary clusters. Otherwise, it may distribute the > > > topic data into different clusters. And the consumers won’t get the > > > whole data of the topic. > > > > > > In order to support different authentication configurations between > > > clusters, we provide the authentication relation configurations > > > updated with the target cluster. > > > > > > ```Java > > > public class AutoClusterFailover implements ServiceUrlProvider { > > > > > >private AutoClusterFailover(String primary, List secondary, > > > long failoverDelayNs, long switchBackDelayNs, > > > long intervalMs, Authentication > > > primaryAuthentication, >
[VOTE] PIP-121: Pulsar cluster level auto failover on client side
This is the voting thread for PIP-121. It will stay open for at least 48 hours. https://github.com/apache/pulsar/issues/13315 Pasted below for quoting convenience. - ### Motivation We have geo-replication to support Pulsar cluster level failover. We can set up Pulsar cluster A as a primary cluster in data center A, and setup Pulsar cluster B as backup cluster in data center B. Then we configure geo-replication between cluster A and cluster B. All the clients are connected to the Pulsar cluster by DNS. If cluster A is down, we should switch the DNS to point the target Pulsar cluster from cluster A to cluster B. After the clients are resolved to cluster B, they can produce and consume messages normally. After cluster A recovers, the administrator should switch the DNS back to cluster A. However, the current method has two shortcomings. 1. The administrator should monitor the status of all Pulsar clusters, and switch the DNS as soon as possible when cluster A is down. The switch and recovery is not automatic and recovery time is controlled by the administrator, which will put the administrator under heavy load. 2. The Pulsar client and DNS system have a cache. When the administrator switches the DNS from cluster A to Cluster B, it will take some time for cache trigger timeout, which will delay client recovery time and lead to the product/consumer message failing. ### Goal It's better to provide an automatic cluster level failure recovery mechanism to make pulsar cluster failover more effective. We should support pulsar clients auto switching from cluster A to cluster B when it detects cluster A has been down according to the configured detecting policy and switch back to cluster A when it has recovered. The reason why we should switch back to cluster A is that most applications may be deployed in data center A and they have low network cost for communicating with pulsar cluster A. If they keep visiting pulsar cluster B, they have high network cost, and cause high produce/consume latency. In order to improve the DNS cache problem, we should provide an administrator controlled switch provider for administrators to update service URLs. In the end, we should provide an auto service URL switch provider and administrator controlled switch provider. ### Design We have already provided the `ServiceUrlProvider` interface to support different service URLs. In order to support automatic cluster level failure auto recovery, we can provide different ServiceUrlProvider implementations. For current requirements, we can provide `AutoClusterFailover` and `ControlledClusterFailover`. AutoClusterFailover In order to support auto switching from the primary cluster to the secondary, we can provide a probe task, which will probe the activity of the primary cluster and the secondary one. When it finds the primary cluster failed more than `failoverDelayMs`, it will switch to the secondary cluster by calling `updateServiceUrl`. After switching to the secondary cluster, the `AutoClusterFailover` will continue to probe the primary cluster. If the primary cluster comes back and remains active for `switchBackDelayMs`, it will switch back to the primary cluster. The APIs are listed as follows. In order to support multiple secondary clusters, use List to store secondary cluster urls. When the primary cluster probe fails for failoverDelayMs, it will start to probe the secondary cluster list one by one, once it finds the active cluster, it will switch to the target cluster. Notice: If you configured multiple clusters, you should turn on cluster level geo-replication to ensure the topic data sync between all primary and secondary clusters. Otherwise, it may distribute the topic data into different clusters. And the consumers won’t get the whole data of the topic. In order to support different authentication configurations between clusters, we provide the authentication relation configurations updated with the target cluster. ```Java public class AutoClusterFailover implements ServiceUrlProvider { private AutoClusterFailover(AutoClusterFailoverBuilderImpl builder) { // } @Override public void initialize(PulsarClient client) { this.pulsarClient = client; // start to probe primary cluster active or not executor.scheduleAtFixedRate(catchingAndLoggingThrowables(() -> { // probe and switch }), intervalMs, intervalMs, TimeUnit.MILLISECONDS); } @Override public String getServiceUrl() { return this.currentPulsarServiceUrl; } @Override public void close() { this.executor.shutdown(); } // probe pulsar cluster available private boolean probeAvailable(String url, int timeout) { } ``` In order to create an `AutoClusterFailover` instance, we use `AutoClusterFailoverBuilder` interface to build the target instance. The `AutoClusterFailoverBuilder` interface is located in the `pulsar-client-api` package. In the `probeAva
[VOTE] PIP-122: Change loadBalancer default loadSheddingStrategy to ThresholdShedder
This is the voting thread for PIP-122. It will stay open for at least 48 hours. https://github.com/apache/pulsar/issues/13340 Pasted below for quoting convenience. ### Motivation The ThresholdShedder load balance policy since Pulsar 2.6.0 by https://github.com/apache/pulsar/pull/6772. It can resolve many load balance issues of `OverloadShedder` and works well in many Pulsar production clusters. In Pulsar 2.6.0, 2.7.0, 2.8.0 and 2.9.0, Pulsar's default load balance policy is `OverloadShedder`. I think it's a good time for 2.10 to change default load balance policy to `ThresholdShedder`, it will make throughput more balance between brokers. ### Proposed Changes In 2.10 release,for `broker.conf`, change `loadBalancerLoadSheddingStrategy` from `org.apache.pulsar.broker.loadbalance.impl.OverloadShedder` to `org.apache.pulsar.broker.loadbalance.impl.ThresholdShedder`
Re: Lifting Kubernetes minimum version requirement for Apache Pulsar Helm Charts from k8s 1.14 to 1.18
I'll merge the PR https://github.com/apache/pulsar-helm-chart/pull/192 today to lift the k8s minimum requirement from 1.14 to 1.18 in the Apache Pulsar Helm Charts. The main change is that the CI will run with k8s (kind) 1.18 instead of 1.14 . Other tools (such as helm) are upgraded to latest released versions. BR, Lari On Tue, Jan 4, 2022 at 11:40 AM Lari Hotari wrote: > Hi all, > > Currently k8s 1.14 version is used in CI to verify the Helm chart changes. > > k8s 1.14 became end-of-life 2019-12-11 , over 2 years ago [1]. > The oldest maintained version for Kubernetes is 1.20 and it will become > end-of-life on 2022-02-28, in less than 2 months from now [2]. > > There's a PR to lift the minimum requirement for Apache Pulsar Helm Charts > to 1.18 so that we don't fall too far behind. > https://github.com/apache/pulsar-helm-chart/pull/192 > > Please review. If the PR gets approved and merged, the Kubernetes minimum > version requirement will be lifted to Kubernetes 1.18. > > > Best regards, > > Lari > > > [1] > https://kubernetes.io/releases/patch-releases/#non-active-branch-history > [2] https://kubernetes.io/releases/patch-releases/#1-20 >