[GitHub] [pulsar-manager] nicknezis commented on pull request #438: Updated to fix Log4J security vulnerabilities

2022-01-09 Thread GitBox


nicknezis commented on pull request #438:
URL: https://github.com/apache/pulsar-manager/pull/438#issuecomment-1008307363


   Looks like CI passed this time. I was not able to build locally before or 
after the change I'm introducing. But I did verify that the Gradle dependencies 
were updated to point to the new log4j. I was getting similar error during 
`./gradlew build` as I see in the failed output of #420. 
   
   Having a release with this update would be greatly appreciated because at 
the moment we are not allowed to use Pulsar Manager.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [pulsar-helm-chart] Anonymitaet commented on a change in pull request #192: [CI] Upgrade k8s to 1.18 and also upgrade helm, kind & chart releaser versions

2022-01-09 Thread GitBox


Anonymitaet commented on a change in pull request #192:
URL: https://github.com/apache/pulsar-helm-chart/pull/192#discussion_r780866433



##
File path: README.md
##
@@ -72,9 +72,9 @@ It includes support for:
 
 In order to use this chart to deploy Apache Pulsar on Kubernetes, the 
followings are required.
 
-1. kubectl 1.14 or higher, compatible with your cluster ([+/- 1 minor release 
from your 
cluster](https://kubernetes.io/docs/tasks/tools/install-kubectl/#before-you-begin))
+1. kubectl 1.18 or higher, compatible with your cluster ([+/- 1 minor release 
from your 
cluster](https://kubernetes.io/docs/tasks/tools/install-kubectl/#before-you-begin))

Review comment:
   Thanks for updating docs!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




Re: [DISCUSSION] PIP-124: Create init subscription before sending message to DLQ

2022-01-09 Thread PengHui Li
> I think we should consider adding a new policy for Pulsar topics: a
namespace (or topic) policy that makes it possible to retain messages
indefinitely when a topic has no subscriptions.

It looks like a feature that supports retaining data while no subscriptions.
With infinite data retention, the data will not be removed after all the
subscriptions
acked the message. But with “retain_data_no_subscriptions”, the data will
be removed
after all the subscriptions acked messages. But for the subsequent
subscriptions,
still can't retrieve all the data, so it looks like only guarantee the
first subscription can
retrieve all the data. If users want to guarantee all the subscriptions
(all the existing and will create subscriptions),
that is equivalent to infinite data retention.

For the auto-created topic, the subscription can only be determined at the
time of creation. It may or may not create. If users are able to determine
which consumers are,
and these consumers need to receive any message sent by the producer, they
should
create the topic and subscription manually or use the consumer to trigger
the topic
auto-creation, not the producer.

It is not easy to determine consumer behavior on the producer side. But for
DLQ,
it's not a normal topic from the user's point of view, it's a local
container for a subscription
to store the messages that the consumer can't process.
It's a "consumer determine consumer behavior", I think this is the most
essential difference.

Regards,
Penghui

On Sat, Jan 8, 2022 at 12:34 PM Michael Marshall 
wrote:

> Thanks for your response, Penghui.
>
> I support simplifying message loss prevention for DLQ topics. However,
> it's not clear to me why we should only simplify it for DLQ topics.
>
> As a Pulsar user, I encountered many of the challenges you mention
> when producing to auto created topics. In my architecture, I had
> consumers reading from an input topic, transforming the data, and then
> producing to an arbitrary number of output topics. My business logic
> required that I not lose any messages, which is essentially the same
> expectation from DLQ users here. I ended up increasing the retention
> policy to about 4 hours on the output topics to minimize the possibility
> of losing data. I had to scale up my bookkeeper cluster because of the
> extra retention. If I had been able to ensure my auto created topic
> would not delete messages before I created my subscriptions, I would
> have had no retention policy and a smaller bookie cluster.
>
> > Yes, essentially, the DLQ is only a topic, no other specific behaviors.
> > But the issue that the proposal wants to resolve is not to introduce a
> > specific behavior for the DLQ topic or something
>
> I'm not sure this statement aligns with the PIP. It seems to me that
> the PIP proposes solving the message loss issues by adding a DLQ
> specific feature to the pulsar client.
>
> Earlier, I proposed expanding the CreateProducer command to be able to
> create a subscription. This solution is not right: it tightly couples
> producers and consumers, which we want to avoid.
>
> I think we should consider adding a new policy for Pulsar topics: a
> namespace (or topic) policy that makes it possible to retain messages
> indefinitely when a topic has no subscriptions.
>
> Our message retention feature is very valuable. However,
> message retention doesn't solve the "slow to subscribe" consumer
> problem. In the event of long network partitions, a consumer might not be
> able to subscribe before messages are deleted. This feature
> mitigates that risk and allows users to set message retention time
> based on other needs, not based on calculations about how long it
> could take to subscribe to a topic.
>
> This feature solves the DLQ message loss issue because the DLQ
> producer can produce to any namespace, which is important for clusters
> that do not have topic level policies enabled.
>
> Let me know what you think.
>
> Thanks,
> Michael
>
> On Tue, Jan 4, 2022 at 10:33 PM PengHui Li  wrote:
> >
> > Thanks for the great comments, Michael.
> >
> > Let me try to clarify some context about the issue that users encountered
> > and the improvement that the proposal wants to Introduce.
> >
> > > Before we get further into the implementation, I'd like to discuss
> > whether the current behavior is the expected behavior, as this is
> > the key motivation for this feature.
> >
> > The DLQ can generate dynamically and users might have short
> > data retention for a namespace by time or by size. But the messages
> > in the DLQ usually compensate afterward, and we should allow users
> > to keep the data in the DLQ only if they want to delete them manually.
> >
> > The DLQ is always for a subscriber, so a subscriber can use a init name
> > to achieve the purpose of not being cleaned up from the DLQ.
> >
> > So the key point for this proposal is to keep data in the lazy created
> DLQ
> > topic until users wants to delete them manually.
> >
> > > I think

Re: [DISCUSSION] PIP-122: Change loadBalancer default loadSheddingStrategy to ThresholdShedder

2022-01-09 Thread Hang Chen
Looks there is no objection, I will start the official vote for PIP-122

Regards,
Hang

Michael Marshall  于2022年1月8日周六 07:44写道:
>
> Hi Frank,
>
> You can configure the Load Shedding Strategy in the `broker.conf` file
> using `loadBalancerLoadSheddingStrategy`.
>
> The configuration is missing from the document you referenced--it
> should be there, though.
>
> There was a recent PR to add documentation describing the different
> configuration options: [0]. Those docs won't go live until 2.10.0 is
> released. We could probably add parts of that documentation to the
> historical docs, since the Shedding Strategy has existed for some
> time.
>
> - Michael
>
> [0] https://github.com/apache/pulsar/pull/12902/files
>
> On Fri, Jan 7, 2022 at 1:29 PM Frank Kelly  wrote:
> >
> > This is very cool.
> >
> > How is `ThresholdShedder` chosen - I don't see if referenced here?
> > https://pulsar.apache.org/docs/en/reference-configuration/
> >
> > -F
> >
> > On Thu, Jan 6, 2022 at 7:45 PM Sijie Guo  wrote:
> >
> > > +1 ThresholdShedder is a much better policy
> > >
> > > On Tue, Jan 4, 2022 at 6:24 AM Hang Chen  wrote:
> > >
> > > > https://github.com/apache/pulsar/issues/13340
> > > >
> > > > Pasted below for quoting convenience.
> > > >
> > > > 
> > > > ### Motivation
> > > > The ThresholdShedder load balance policy since Pulsar 2.6.0 by
> > > > https://github.com/apache/pulsar/pull/6772. It can resolve many load
> > > > balance issues of `OverloadShedder` and works well in many Pulsar
> > > > production clusters.
> > > >
> > > > In Pulsar 2.6.0, 2.7.0, 2.8.0 and 2.9.0, Pulsar's default load balance
> > > > policy is `OverloadShedder`.
> > > >
> > > > I think it's a good time for 2.10 to change default load balance
> > > > policy to `ThresholdShedder`, it will make throughput more balance
> > > > between brokers.
> > > >
> > > > ### Proposed Changes
> > > > In 2.10 release,for `broker.conf`, change
> > > > `loadBalancerLoadSheddingStrategy` from
> > > > `org.apache.pulsar.broker.loadbalance.impl.OverloadShedder` to
> > > > `org.apache.pulsar.broker.loadbalance.impl.ThresholdShedder`
> > > >
> > >


Re: [DISCUSSION] PIP-121: Pulsar cluster level auto failover

2022-01-09 Thread Hang Chen
Looks there is no objection, I will start the official vote for PIP-121

Regards,
Hang

Enrico Olivelli  于2022年1月6日周四 01:52写道:
>
> I am commeting on the GH issue
>
> Thanks
>
> Enrico
>
> Il Mer 5 Gen 2022, 04:56 PengHui Li  ha scritto:
>
> > +1
> >
> > Penghui
> >
> > On Tue, Jan 4, 2022 at 4:51 PM Hang Chen  wrote:
> >
> > > https://github.com/apache/pulsar/issues/13315
> > >
> > > Pasted below for quoting convenience.
> > >
> > > 
> > > ### Motivation
> > > We have geo-replication to support Pulsar cluster level failover. We
> > > can setup Pulsar cluster A as a primary cluster in data center A, and
> > > setup Pulsar cluster B as backup cluster in data center B. Then we
> > > configure geo-replication between cluster A and cluster B. All the
> > > clients are connected to the Pulsar cluster by DNS. If cluster A is
> > > down, we should switch the DNS to point the target Pulsar cluster from
> > > cluster A to cluster B. After the clients are resolved to cluster B,
> > > they can produce and consume messages normally. After cluster A
> > > recovers, the administrator should switch the DNS back to cluster A.
> > >
> > > However, the current method has two shortcomings.
> > > 1. The administrator should monitor the status of all Pulsar clusters,
> > > and switch the DNS as soon as possible when cluster A is down. The
> > > switch and recovery is not automatic and recovery time is controlled
> > > by the administrator, which will put the administrator under heavy
> > > load.
> > > 2. The Pulsar client and DNS system have a cache. When the
> > > administrator switches the DNS from cluster A to Cluster B, it will
> > > take some time for cache trigger timeout, which will delay client
> > > recovery time and lead to the product/consumer message failing.
> > >
> > > ### Goal
> > > It's better to provide an automatic cluster level failure recovery
> > > mechanism to make pulsar cluster failover more effective. We should
> > > support pulsar clients auto switching from cluster A to cluster B when
> > > it detects cluster A has been down according to the configured
> > > detecting policy and switch back to cluster A when it has recovered.
> > > The reason why we should switch back to cluster A is that most
> > > applications may be deployed in data center A and they have low
> > > network cost for communicating with pulsar cluster A. If they keep
> > > visiting pulsar cluster B, they have high network cost, and cause high
> > > produce/consume latency.
> > >
> > > In order to improve the DNS cache problem, we should provide an
> > > administrator controlled switch provider for administrators to update
> > > service URLs.
> > >
> > > In the end, we should provide an auto service URL switch provider and
> > > administrator controlled switch provider.
> > >
> > > ### Design
> > > We have already provided the `ServiceUrlProvider` interface to support
> > > different service URLs. In order to support automatic cluster level
> > > failure auto recovery, we can provide different ServiceUrlProvider
> > > implementations. For current requirements, we can provide
> > > `AutoClusterFailover` and `ControlledClusterFailover`.
> > >
> > >  AutoClusterFailover
> > > In order to support auto switching from the primary cluster to the
> > > secondary, we can provide a probe task, which will probe the activity
> > > of the primary cluster and the secondary one. When it finds the
> > > primary cluster failed more than `failoverDelayMs`, it will switch to
> > > the secondary cluster by calling `updateServiceUrl`. After switching
> > > to the secondary cluster, the `AutoClusterFailover` will continue to
> > > probe the primary cluster. If the primary cluster comes back and
> > > remains active for `switchBackDelayMs`, it will switch back to the
> > > primary cluster.
> > > The APIs are listed as follows.
> > >
> > > In order to support multiple secondary clusters, use List to store
> > > secondary cluster urls. When the primary cluster probe fails for
> > > failoverDelayMs, it will start to probe the secondary cluster list one
> > > by one, once it finds the active cluster, it will switch to the target
> > > cluster. Notice: If you configured multiple clusters, you should turn
> > > on cluster level geo-replication to ensure the topic data sync between
> > > all primary and secondary clusters. Otherwise, it may distribute the
> > > topic data into different clusters. And the consumers won’t get the
> > > whole data of the topic.
> > >
> > > In order to support different authentication configurations between
> > > clusters, we provide the authentication relation configurations
> > > updated with the target cluster.
> > >
> > > ```Java
> > > public class AutoClusterFailover implements ServiceUrlProvider {
> > >
> > >private AutoClusterFailover(String primary, List secondary,
> > > long failoverDelayNs, long switchBackDelayNs,
> > > long intervalMs, Authentication
> > > primaryAuthentication,
> 

[VOTE] PIP-121: Pulsar cluster level auto failover on client side

2022-01-09 Thread Hang Chen
This is the voting thread for PIP-121. It will stay open for at least 48
hours.

https://github.com/apache/pulsar/issues/13315

Pasted below for quoting convenience.

-
### Motivation
We have geo-replication to support Pulsar cluster level failover. We
can set up Pulsar cluster A as a primary cluster in data center A, and
setup Pulsar cluster B as backup cluster in data center B. Then we
configure geo-replication between cluster A and cluster B. All the
clients are connected to the Pulsar cluster by DNS. If cluster A is
down, we should switch the DNS to point the target Pulsar cluster from
cluster A to cluster B. After the clients are resolved to cluster B,
they can produce and consume messages normally. After cluster A
recovers, the administrator should switch the DNS back to cluster A.

However, the current method has two shortcomings.
1. The administrator should monitor the status of all Pulsar clusters,
and switch the DNS as soon as possible when cluster A is down. The
switch and recovery is not automatic and recovery time is controlled
by the administrator, which will put the administrator under heavy
load.
2. The Pulsar client and DNS system have a cache. When the
administrator switches the DNS from cluster A to Cluster B, it will
take some time for cache trigger timeout, which will delay client
recovery time and lead to the product/consumer message failing.

### Goal
It's better to provide an automatic cluster level failure recovery
mechanism to make pulsar cluster failover more effective. We should
support pulsar clients auto switching from cluster A to cluster B when
it detects cluster A has been down according to the configured
detecting policy and switch back to cluster A when it has recovered.
The reason why we should switch back to cluster A is that most
applications may be deployed in data center A and they have low
network cost for communicating with pulsar cluster A. If they keep
visiting pulsar cluster B, they have high network cost, and cause high
produce/consume latency.

In order to improve the DNS cache problem, we should provide an
administrator controlled switch provider for administrators to update
service URLs.

In the end, we should provide an auto service URL switch provider and
administrator controlled switch provider.

### Design
We have already provided the `ServiceUrlProvider` interface to support
different service URLs. In order to support automatic cluster level
failure auto recovery, we can provide different ServiceUrlProvider
implementations. For current requirements, we can provide
`AutoClusterFailover` and `ControlledClusterFailover`.

 AutoClusterFailover
In order to support auto switching from the primary cluster to the
secondary, we can provide a probe task, which will probe the activity
of the primary cluster and the secondary one. When it finds the
primary cluster failed more than `failoverDelayMs`, it will switch to
the secondary cluster by calling `updateServiceUrl`. After switching
to the secondary cluster, the `AutoClusterFailover` will continue to
probe the primary cluster. If the primary cluster comes back and
remains active for `switchBackDelayMs`, it will switch back to the
primary cluster.
The APIs are listed as follows.

In order to support multiple secondary clusters, use List to store
secondary cluster urls. When the primary cluster probe fails for
failoverDelayMs, it will start to probe the secondary cluster list one
by one, once it finds the active cluster, it will switch to the target
cluster. Notice: If you configured multiple clusters, you should turn
on cluster level geo-replication to ensure the topic data sync between
all primary and secondary clusters. Otherwise, it may distribute the
topic data into different clusters. And the consumers won’t get the
whole data of the topic.

In order to support different authentication configurations between
clusters, we provide the authentication relation configurations
updated with the target cluster.

```Java
public class AutoClusterFailover implements ServiceUrlProvider {

   private AutoClusterFailover(AutoClusterFailoverBuilderImpl builder) {
//
}

@Override
public void initialize(PulsarClient client) {
this.pulsarClient = client;

// start to probe primary cluster active or not
executor.scheduleAtFixedRate(catchingAndLoggingThrowables(() -> {
// probe and switch
}), intervalMs, intervalMs, TimeUnit.MILLISECONDS);

}

@Override
public String getServiceUrl() {
return this.currentPulsarServiceUrl;
}

@Override
public void close() {
this.executor.shutdown();
}

// probe pulsar cluster available
private boolean probeAvailable(String url, int timeout) {

}
```

In order to create an `AutoClusterFailover` instance, we use
`AutoClusterFailoverBuilder` interface to build the target instance.
The `AutoClusterFailoverBuilder` interface is located in the
`pulsar-client-api` package.

In the `probeAva

[VOTE] PIP-122: Change loadBalancer default loadSheddingStrategy to ThresholdShedder

2022-01-09 Thread Hang Chen
This is the voting thread for PIP-122. It will stay open for at least 48
hours.

https://github.com/apache/pulsar/issues/13340

Pasted below for quoting convenience.



### Motivation
The ThresholdShedder load balance policy since Pulsar 2.6.0 by
https://github.com/apache/pulsar/pull/6772. It can resolve many load
balance issues of `OverloadShedder` and works well in many Pulsar
production clusters.

In Pulsar 2.6.0, 2.7.0, 2.8.0 and 2.9.0, Pulsar's default load balance
policy is `OverloadShedder`.

I think it's a good time for 2.10 to change default load balance
policy to `ThresholdShedder`, it will make throughput more balance
between brokers.

### Proposed Changes
In 2.10 release,for `broker.conf`, change
`loadBalancerLoadSheddingStrategy` from
`org.apache.pulsar.broker.loadbalance.impl.OverloadShedder` to
`org.apache.pulsar.broker.loadbalance.impl.ThresholdShedder`


Re: Lifting Kubernetes minimum version requirement for Apache Pulsar Helm Charts from k8s 1.14 to 1.18

2022-01-09 Thread Lari Hotari
I'll merge the PR https://github.com/apache/pulsar-helm-chart/pull/192 today
to lift the k8s minimum requirement from 1.14 to 1.18 in the Apache Pulsar
Helm Charts.
The main change is that the CI will run with k8s (kind) 1.18 instead of
1.14 . Other tools (such as helm) are upgraded to latest released versions.

BR, Lari

On Tue, Jan 4, 2022 at 11:40 AM Lari Hotari  wrote:

> Hi all,
>
> Currently k8s 1.14 version is used in CI to verify the Helm chart changes.
>
> k8s 1.14 became end-of-life 2019-12-11 , over 2 years ago [1].
> The oldest maintained version for Kubernetes is 1.20 and it will become
> end-of-life on 2022-02-28, in less than 2 months from now [2].
>
> There's a PR to lift the minimum requirement for Apache Pulsar Helm Charts
> to 1.18 so that we don't fall too far behind.
> https://github.com/apache/pulsar-helm-chart/pull/192
>
> Please review. If the PR gets approved and merged, the Kubernetes minimum
> version requirement will be lifted to Kubernetes 1.18.
>
>
> Best regards,
>
> Lari
>
>
> [1]
> https://kubernetes.io/releases/patch-releases/#non-active-branch-history
> [2] https://kubernetes.io/releases/patch-releases/#1-20
>