[jira] [Updated] (PULSAR-9) Improve the message written count metrics for the topic

2021-01-26 Thread Penghui Li (Jira)


 [ 
https://issues.apache.org/jira/browse/PULSAR-9?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Penghui Li updated PULSAR-9:

Description: 
Currently, Pulsar exposes the message written count metrics though the 
Prometheus endpoint, and the metrics maintain in the broker, no been 
persistent. So if the topic ownership changes or restart broker, this will lead 
to reset the message written count of the topic to 0. This will confused users 
and not able to get the correct message written count metrics.

PIP-70 
https://github.com/apache/pulsar/wiki/PIP-72%3A-Introduce-Pulsar-Interface-Taxonomy%3A-Audience-and-Stability-Classification[
 
|https://github.com/apache/pulsar/wiki/PIP-70%3A-Introduce-lightweight-raw-Message-metadata]Introduced
 a broker level entry metadata which can support message index for a topic(or 
message offset of a topic), this will provide the ability to calculate the 
precise message written count for a topic. So we can leverage PIP-70 to improve 
the message written count metrics for the topic

  was:
Currently, Pulsar exposes the message written count metrics though the 
Prometheus endpoint, and the metrics maintain in the broker, no been 
persistent. So if the topic ownership changes or restart broker, this will lead 
to reset the message written count of the topic to 0. This will confused users 
and not able to get the correct message written count metrics.

PIP-70 
[https://github.com/apache/pulsar/wiki/PIP-70%3A-Introduce-lightweight-raw-Message-metadata
 
|https://github.com/apache/pulsar/wiki/PIP-70%3A-Introduce-lightweight-raw-Message-metadata]Introduced
 a broker level entry metadata which can support message index for a topic(or 
message offset of a topic), this will provide the ability to calculate the 
precise message written count for a topic. So we can leverage PIP-70 to improve 
the message written count metrics for the topic


> Improve the message written count metrics for the topic
> ---
>
> Key: PULSAR-9
> URL: https://issues.apache.org/jira/browse/PULSAR-9
> Project: Pulsar
>  Issue Type: Improvement
>Reporter: Penghui Li
>Priority: Major
>  Labels: Pulsar, gsoc, gsoc2021, mentor
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Currently, Pulsar exposes the message written count metrics though the 
> Prometheus endpoint, and the metrics maintain in the broker, no been 
> persistent. So if the topic ownership changes or restart broker, this will 
> lead to reset the message written count of the topic to 0. This will confused 
> users and not able to get the correct message written count metrics.
> PIP-70 
> https://github.com/apache/pulsar/wiki/PIP-72%3A-Introduce-Pulsar-Interface-Taxonomy%3A-Audience-and-Stability-Classification[
>  
> |https://github.com/apache/pulsar/wiki/PIP-70%3A-Introduce-lightweight-raw-Message-metadata]Introduced
>  a broker level entry metadata which can support message index for a topic(or 
> message offset of a topic), this will provide the ability to calculate the 
> precise message written count for a topic. So we can leverage PIP-70 to 
> improve the message written count metrics for the topic



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (PULSAR-9) Improve the message written count metrics for the topic

2021-01-26 Thread Penghui Li (Jira)


 [ 
https://issues.apache.org/jira/browse/PULSAR-9?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Penghui Li reopened PULSAR-9:
-

> Improve the message written count metrics for the topic
> ---
>
> Key: PULSAR-9
> URL: https://issues.apache.org/jira/browse/PULSAR-9
> Project: Pulsar
>  Issue Type: Improvement
>Reporter: Penghui Li
>Priority: Major
>  Labels: Pulsar, gsoc, gsoc2021, mentor
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Currently, Pulsar exposes the message written count metrics though the 
> Prometheus endpoint, and the metrics maintain in the broker, no been 
> persistent. So if the topic ownership changes or restart broker, this will 
> lead to reset the message written count of the topic to 0. This will confused 
> users and not able to get the correct message written count metrics.
> PIP-70 
> https://github.com/apache/pulsar/wiki/PIP-72%3A-Introduce-Pulsar-Interface-Taxonomy%3A-Audience-and-Stability-Classification[
>  
> |https://github.com/apache/pulsar/wiki/PIP-70%3A-Introduce-lightweight-raw-Message-metadata]Introduced
>  a broker level entry metadata which can support message index for a topic(or 
> message offset of a topic), this will provide the ability to calculate the 
> precise message written count for a topic. So we can leverage PIP-70 to 
> improve the message written count metrics for the topic



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (PULSAR-9) Improve the message written count metrics for the topic

2021-01-26 Thread Penghui Li (Jira)


 [ 
https://issues.apache.org/jira/browse/PULSAR-9?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Penghui Li closed PULSAR-9.
---
Resolution: Won't Do

> Improve the message written count metrics for the topic
> ---
>
> Key: PULSAR-9
> URL: https://issues.apache.org/jira/browse/PULSAR-9
> Project: Pulsar
>  Issue Type: Improvement
>Reporter: Penghui Li
>Priority: Major
>  Labels: Pulsar, gsoc, gsoc2021, mentor
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Currently, Pulsar exposes the message written count metrics though the 
> Prometheus endpoint, and the metrics maintain in the broker, no been 
> persistent. So if the topic ownership changes or restart broker, this will 
> lead to reset the message written count of the topic to 0. This will confused 
> users and not able to get the correct message written count metrics.
> PIP-70 
> https://github.com/apache/pulsar/wiki/PIP-72%3A-Introduce-Pulsar-Interface-Taxonomy%3A-Audience-and-Stability-Classification[
>  
> |https://github.com/apache/pulsar/wiki/PIP-70%3A-Introduce-lightweight-raw-Message-metadata]Introduced
>  a broker level entry metadata which can support message index for a topic(or 
> message offset of a topic), this will provide the ability to calculate the 
> precise message written count for a topic. So we can leverage PIP-70 to 
> improve the message written count metrics for the topic



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (PULSAR-9) Improve the message written count metrics for the topic

2021-01-26 Thread Penghui Li (Jira)


 [ 
https://issues.apache.org/jira/browse/PULSAR-9?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Penghui Li updated PULSAR-9:

Description: 
Currently, Pulsar exposes the message written count metrics though the 
Prometheus endpoint, and the metrics maintain in the broker, no been 
persistent. So if the topic ownership changes or restart broker, this will lead 
to reset the message written count of the topic to 0. This will confused users 
and not able to get the correct message written count metrics.

PIP-70 
https://github.com/apache/pulsar/wiki/PIP-72%3A-Introduce-Pulsar-Interface-Taxonomy%3A-Audience-and-Stability-Classification
 Introduced a broker level entry metadata which can support message index for a 
topic(or message offset of a topic), this will provide the ability to calculate 
the precise message written count for a topic. So we can leverage PIP-70 to 
improve the message written count metrics for the topic

  was:
Currently, Pulsar exposes the message written count metrics though the 
Prometheus endpoint, and the metrics maintain in the broker, no been 
persistent. So if the topic ownership changes or restart broker, this will lead 
to reset the message written count of the topic to 0. This will confused users 
and not able to get the correct message written count metrics.

PIP-70 
[https://github.com/apache/pulsar/wiki/PIP-72%3A-Introduce-Pulsar-Interface-Taxonomy%3A-Audience-and-Stability-Classification
 
|https://github.com/apache/pulsar/wiki/PIP-72%3A-Introduce-Pulsar-Interface-Taxonomy%3A-Audience-and-Stability-Classification[]Introduced
 a broker level entry metadata which can support message index for a topic(or 
message offset of a topic), this will provide the ability to calculate the 
precise message written count for a topic. So we can leverage PIP-70 to improve 
the message written count metrics for the topic


> Improve the message written count metrics for the topic
> ---
>
> Key: PULSAR-9
> URL: https://issues.apache.org/jira/browse/PULSAR-9
> Project: Pulsar
>  Issue Type: Improvement
>Reporter: Penghui Li
>Priority: Major
>  Labels: Pulsar, gsoc, gsoc2021, mentor
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Currently, Pulsar exposes the message written count metrics though the 
> Prometheus endpoint, and the metrics maintain in the broker, no been 
> persistent. So if the topic ownership changes or restart broker, this will 
> lead to reset the message written count of the topic to 0. This will confused 
> users and not able to get the correct message written count metrics.
> PIP-70 
> https://github.com/apache/pulsar/wiki/PIP-72%3A-Introduce-Pulsar-Interface-Taxonomy%3A-Audience-and-Stability-Classification
>  Introduced a broker level entry metadata which can support message index for 
> a topic(or message offset of a topic), this will provide the ability to 
> calculate the precise message written count for a topic. So we can leverage 
> PIP-70 to improve the message written count metrics for the topic



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (PULSAR-9) Improve the message written count metrics for the topic

2021-01-26 Thread Penghui Li (Jira)


 [ 
https://issues.apache.org/jira/browse/PULSAR-9?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Penghui Li updated PULSAR-9:

Description: 
Currently, Pulsar exposes the message written count metrics though the 
Prometheus endpoint, and the metrics maintain in the broker, no been 
persistent. So if the topic ownership changes or restart broker, this will lead 
to reset the message written count of the topic to 0. This will confused users 
and not able to get the correct message written count metrics.

PIP-70 
[https://github.com/apache/pulsar/wiki/PIP-72%3A-Introduce-Pulsar-Interface-Taxonomy%3A-Audience-and-Stability-Classification
 
|https://github.com/apache/pulsar/wiki/PIP-72%3A-Introduce-Pulsar-Interface-Taxonomy%3A-Audience-and-Stability-Classification[]Introduced
 a broker level entry metadata which can support message index for a topic(or 
message offset of a topic), this will provide the ability to calculate the 
precise message written count for a topic. So we can leverage PIP-70 to improve 
the message written count metrics for the topic

  was:
Currently, Pulsar exposes the message written count metrics though the 
Prometheus endpoint, and the metrics maintain in the broker, no been 
persistent. So if the topic ownership changes or restart broker, this will lead 
to reset the message written count of the topic to 0. This will confused users 
and not able to get the correct message written count metrics.

PIP-70 
https://github.com/apache/pulsar/wiki/PIP-72%3A-Introduce-Pulsar-Interface-Taxonomy%3A-Audience-and-Stability-Classification[
 
|https://github.com/apache/pulsar/wiki/PIP-70%3A-Introduce-lightweight-raw-Message-metadata]Introduced
 a broker level entry metadata which can support message index for a topic(or 
message offset of a topic), this will provide the ability to calculate the 
precise message written count for a topic. So we can leverage PIP-70 to improve 
the message written count metrics for the topic


> Improve the message written count metrics for the topic
> ---
>
> Key: PULSAR-9
> URL: https://issues.apache.org/jira/browse/PULSAR-9
> Project: Pulsar
>  Issue Type: Improvement
>Reporter: Penghui Li
>Priority: Major
>  Labels: Pulsar, gsoc, gsoc2021, mentor
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Currently, Pulsar exposes the message written count metrics though the 
> Prometheus endpoint, and the metrics maintain in the broker, no been 
> persistent. So if the topic ownership changes or restart broker, this will 
> lead to reset the message written count of the topic to 0. This will confused 
> users and not able to get the correct message written count metrics.
> PIP-70 
> [https://github.com/apache/pulsar/wiki/PIP-72%3A-Introduce-Pulsar-Interface-Taxonomy%3A-Audience-and-Stability-Classification
>  
> |https://github.com/apache/pulsar/wiki/PIP-72%3A-Introduce-Pulsar-Interface-Taxonomy%3A-Audience-and-Stability-Classification[]Introduced
>  a broker level entry metadata which can support message index for a topic(or 
> message offset of a topic), this will provide the ability to calculate the 
> precise message written count for a topic. So we can leverage PIP-70 to 
> improve the message written count metrics for the topic



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (PULSAR-9) Improve the message written count metrics for the topic

2021-01-26 Thread Penghui Li (Jira)


 [ 
https://issues.apache.org/jira/browse/PULSAR-9?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Penghui Li closed PULSAR-9.
---
Resolution: Won't Do

> Improve the message written count metrics for the topic
> ---
>
> Key: PULSAR-9
> URL: https://issues.apache.org/jira/browse/PULSAR-9
> Project: Pulsar
>  Issue Type: Improvement
>Reporter: Penghui Li
>Priority: Major
>  Labels: Pulsar, gsoc, gsoc2021, mentor
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Currently, Pulsar exposes the message written count metrics though the 
> Prometheus endpoint, and the metrics maintain in the broker, no been 
> persistent. So if the topic ownership changes or restart broker, this will 
> lead to reset the message written count of the topic to 0. This will confused 
> users and not able to get the correct message written count metrics.
> PIP-70 
> https://github.com/apache/pulsar/wiki/PIP-72%3A-Introduce-Pulsar-Interface-Taxonomy%3A-Audience-and-Stability-Classification
>  Introduced a broker level entry metadata which can support message index for 
> a topic(or message offset of a topic), this will provide the ability to 
> calculate the precise message written count for a topic. So we can leverage 
> PIP-70 to improve the message written count metrics for the topic



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[ANNOUNCE] Apache Pulsar 2.6.3 released

2021-01-26 Thread PengHui Li
The Apache Pulsar team is proud to announce Apache Pulsar version 2.6.3.

Pulsar is a highly scalable, low latency messaging platform running on
commodity hardware. It provides simple pub-sub semantics over topics,
guaranteed at-least-once delivery of messages, automatic cursor management for
subscribers, and cross-datacenter replication.

For Pulsar release details and downloads, visit:
https://pulsar.apache.org/download

Release Notes are at:http://pulsar.apache.org/release-notes

We would like to thank the contributors that made the release possible.

Regards,

The Pulsar Team


[Webinar] How to implement OpenTelemetry with Pulsar - Ricardo Ferreira from Elastic

2021-01-26 Thread Anonymitaet _
Hi Pulsar enthusiasts,

Here comes an upcoming webinar:

Last chance to sign up for Pulsar 
webinar
 (01/26) and hear from Pulsar expert Ricardo Ferreira from Elastic as he shares 
how to implement OpenTelemetry with Pulsar.

In this webinar, you will learn:
1. How tracing technologies work in the context of OpenTelemetry — an 
observability framework for cloud-native software.
2. The architecture of OpenTelemetry deployments and how to instrument Java 
applications using Apache Pulsar to emit traces compatible with OpenTelemetry 
specification.
3. How to perform RCA in a transaction composed of different services that 
write and read data streams.

Save your spot today!

Sincerely,
Anonymitaet





Re: [ANNOUNCE] Apache Pulsar 2.6.3 released

2021-01-26 Thread Jinfeng Huang
It's really exciting news.
It is a great milestone for the fast-growing project and the whole Pulsar
community. Welcome everybody to download
 and use Pulsar.

Best Regards,
Jennifer


On Tue, Jan 26, 2021 at 7:07 PM PengHui Li  wrote:

> The Apache Pulsar team is proud to announce Apache Pulsar version 2.6.3.
>
> Pulsar is a highly scalable, low latency messaging platform running on
> commodity hardware. It provides simple pub-sub semantics over topics,
> guaranteed at-least-once delivery of messages, automatic cursor management
> for
> subscribers, and cross-datacenter replication.
>
> For Pulsar release details and downloads, visit:
> https://pulsar.apache.org/download
>
> Release Notes are at:http://pulsar.apache.org/release-notes
>
> We would like to thank the contributors that made the release possible.
>
> Regards,
>
> The Pulsar Team
>


Re: quick question

2021-01-26 Thread Jesse Davidson
I wanted to check in and see if you got my note about RAID data recovery?

Thanks,
Jesse

On Thursday, January 21, 2021 at 1:10 PM, Jesse Davidson <
jdavid...@madberry.com> wrote:

> I noticed you shared an article from En.Wikipedia.org when you talked
> about RAID data recovery, here:
> http://pulsar.apache.org/docs/v2.0.0-rc1-incubating/getting-started/ConceptsAndArchitecture/
> .
>
>
> We recently published an article that examines the latest technologies in
> data storage and recovery. We start by looking at innovative techniques
> such as RAID (redundant independent hard drives that mirror the same data),
> Internet of Things (IoT), machine learning, and cloud computing. We examine
> how these widely-used technologies are likely to impact data storage and
> recovery practices over the next 5-10 years.
>
>
> May I send you the article to see if you think it’s worth linking to from
> your already excellent page?
>
>
> Thanks,
>
> -Jesse
>
> --
> Jesse Davidson, Editor
> 5 Ross Rd
> Durham, NH 03824
>
> BTW, if you didn't like getting this email, please reply with something
> like "please don't email me anymore", and I'll make sure that we don't.
>


[GitHub] [pulsar-helm-chart] miecio45 opened a new pull request #99: Fix namespace handling and missing dnsNames

2021-01-26 Thread GitBox


miecio45 opened a new pull request #99:
URL: https://github.com/apache/pulsar-helm-chart/pull/99


   Fixes for wrong namespace handling in some RBAC and missing dnsNames for TLS
   
   ### Motivation
   
   Fixes old unused handling of namespace name in RBAC for autorecovery and 
bookkeeper.
   Fixes Helm exception of missing key when not defining TLS dnsNames
   
   ### Modifications
   
   Use namespace template in RBAC definitions for bookkeeper and autorecovery. 
Add if around every `toYaml .Values.tls.bookie.dnsNames` clause in TLS certs 
definitions.
   
   ### Verifying this change
   
   - [ ] Make sure that the change passes the CI checks.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [pulsar-helm-chart] miecio45 opened a new pull request #100: Extend podmonitor and add relabels

2021-01-26 Thread GitBox


miecio45 opened a new pull request #100:
URL: https://github.com/apache/pulsar-helm-chart/pull/100


   ### Motivation
   
   As I wanted to use 
[streamnative/apache-pulsar-grafana-dashboard](https://github.com/streamnative/apache-pulsar-grafana-dashboard)
 with this helm chart and own cluster wide Prometheus stack I decided that use 
of PodMonitor CRD is a good way. Unfortunately prometheus config has some 
metrics relabelings that are required by grafana dashboard. I decied to port 
them directly to PodMonitor definition
   
   ### Modifications
   
   * Added missing PodMonitor for autorecovery
   * Port relabelings from `prometheus-configmap.yaml` to each PodMonitor
   
   ### Verifying this change
   
   - [ ] Make sure that the change passes the CI checks.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [pulsar-helm-chart] miecio45 opened a new pull request #101: Fix for missing PSP for bookie initialize

2021-01-26 Thread GitBox


miecio45 opened a new pull request #101:
URL: https://github.com/apache/pulsar-helm-chart/pull/101


   ### Motivation
   
   When using standard bookkeeper installation on PSP cluster initialization 
fails because has to be started as root
   
   ### Modifications
   
   Add same ServiceAccount and SecurityContext for 
bookkeeper-cluster-initialize as in bookkeeper specyfication.
   
   ### Verifying this change
   
   - [ ] Make sure that the change passes the CI checks.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




Re: [E] Re: [PIP-78] Split the individual acknowledgments into multiple entries

2021-01-26 Thread PengHui Li
Thanks for all your suggestion, LinLin and I will work on abstract this
acknowledgment management as an
interface.

- Penghui

Sijie Guo  于2021年1月25日周一 上午11:14写道:

> Rajan,
>
> Thank you for sharing the prototype! That looks great to me.
>
> In order for the community to evolve and experiment with different
> approaches, how about we abstract this acknowledgment management as an
> interface?
>
> If this approach works for you, maybe Lin Lin can focus on abstracting this
> interface. Then we can implement your approach and his approach using this
> interface. Users can decide which implementation to use.
>
> Thanks,
> Sijie
>
> On Sat, Jan 23, 2021 at 7:46 PM Rajan Dhabalia 
> wrote:
>
> > *Hi, >> Delayed messages or certain user logic can introduce a lot of
> > message-holes. We have seen this issue in quite a lot of customers'
> > production environment.I agree that we need a larger buffer to store and
> > recover individually deleted messages, and the existing 150K limit might
> > not be enough for many usescases. However, I would also like to highlight
> > issues that happened in the past due to keeping a large number of deleted
> > messages in the broker’s memory and one of them was high GC pauses.
> > Therefore, we introduced ConcurrentOpenLongPairRangeSet
> >  to manage deleted messages
> > without actually storing range objects in memory. OpenRangeSet uses
> bitset
> > to store ranges in memory and we can also utilize it to persist in disk
> for
> > the recovery. This approach has various advantages: simple
> implementation,
> > large enough range for recovery, and it skips intermediate conversion
> from
> > unack-messages to bitset in OpenRangeSet which saves extra CPU while
> > recovery.I implemented a simple prototype
> > <
> >
> https://github.com/rdhabalia/pulsar/commit/1f8e5e745e9f1d1429697b5dee1da70545385653
> > >
> > to store deleted messages using bitset in OpenRangeSet and we can persist
> > 10M ranges with 5MB data size which I guess is large enough for any
> > usecases. So, we can use this approach to solve the problem without
> > introducing unnecessary complexity in managed-cursor.Thanks,Rajan*
> >
> > On Fri, Jan 22, 2021 at 7:52 PM Sijie Guo  wrote:
> >
> > > Joe - Delayed messages or certain user logic can introduce a lot of
> > message
> > > holes. We have seen this issue in quite a lot of customers' production
> > > environment. Hence we need to find a solution for solving these
> problems.
> > > If you are skeptical of an implementation like that, how about us
> making
> > > cursor implementation pluggable. We can make this proposal implemented
> as
> > > one plugin. So it will not impact any existing logic but allowing
> people
> > > use a plugin to solve this problem.
> > >
> > > Thanks,
> > > Sijie
> > >
> > > On Fri, Jan 22, 2021 at 5:00 PM Joe Francis
> >  > > >
> > > wrote:
> > >
> > > > Let me take a step back and explain  how I am looking at this from a
> > > > high-level
> > > > design viewpoint
> > > >
> > > >
> > > > Bookkeeper (BK) is like an LSM implementation of a KV store. Writes
> to
> > > all
> > > > keys are appended to a single file; deletes are logical.  Compaction
> > > > reclaims space.  An Index is used locate entries, tracking logical
> > > deletes
> > > > and reclaim space.
> > > >
> > > >
> > > > The index in BK  is another LSM.  Again, writes are appended, deletes
> > are
> > > > logical, and  an index is used to  locate entries , account for
> deletes
> > > and
> > > > compaction to reclaim space (the implementation within rocksdb is far
> > > more
> > > > complex with bloom filters and memtables, but you get the idea )   BK
> > > just
> > > > uses a sophisticated index (rocksdb) which is tiny and cacheable and
> > > > rocksdb has within it a sophisticated index which is small and
> > cacheable
> > > >
> > > >
> > > > So when I look at this proposal, what I see is the same - another
> > attempt
> > > > to build an LSM with a sophisticated index/cache mechanism using log
> > > > structured storage. So I am quite skeptical that this needs to solved
> > > this
> > > > way,  within Pulsar.
> > > >
> > > >
> > > >
> > > > Joe
> > > >
> > > > On Wed, Jan 20, 2021 at 12:30 AM linlin  wrote:
> > > >
> > > > > We can look at
> ManagedCursorImpl.buildIndividualDeletedMessageRanges
> > > > >
> > > > > What is saved in the entry is not a bitSet, but a messageRange one
> by
> > > > one,
> > > > > which contains information such as ledgerId and entryId. BitSet
> only
> > > > exists
> > > > > in the memory and is used to quickly determine whether it already
> > > exists.
> > > > > In addition, the position of each ack will be stored in the
> > > > > individualDeletedMessages queue. When persisted to the entry, the
> > queue
> > > > > will be traversed, and the position information of each ack will
> > > > generate a
> > > > > messageRange.
> > > > > A messageRange contains lowerEndpoint (ledgerId+entryId),
> > upperEndpoint

Re: [ANNOUNCE] Apache Pulsar 2.6.3 released

2021-01-26 Thread Anonymitaet _
It is wonderful to have 2.6.3 on board. A huge well done and heartfelt thanks 
goes to the whole team for always working hard and remaining focused and 
dedicated.

On 2021/1/26, 19:07, "PengHui Li"  wrote:

The Apache Pulsar team is proud to announce Apache Pulsar version 2.6.3.

Pulsar is a highly scalable, low latency messaging platform running on
commodity hardware. It provides simple pub-sub semantics over topics,
guaranteed at-least-once delivery of messages, automatic cursor management 
for
subscribers, and cross-datacenter replication.

For Pulsar release details and downloads, visit:
https://pulsar.apache.org/download

Release Notes are at:http://pulsar.apache.org/release-notes

We would like to thank the contributors that made the release possible.

Regards,

The Pulsar Team




Re: [E] Re: [PIP-78] Split the individual acknowledgments into multiple entries

2021-01-26 Thread Rajan Dhabalia
I have created a PR which should allow brokers to store up to 10M
unack-message ranges. I think it should be large enough for any usecases
and probably now, we might not need to introduce abstraction for ack
management to avoid any further complexity in message acknowledgement path
as well.
https://github.com/apache/pulsar/pull/9292

Thanks,
Rajan



On Sun, Jan 24, 2021 at 7:14 PM Sijie Guo  wrote:

> Rajan,
>
> Thank you for sharing the prototype! That looks great to me.
>
> In order for the community to evolve and experiment with different
> approaches, how about we abstract this acknowledgment management as an
> interface?
>
> If this approach works for you, maybe Lin Lin can focus on abstracting this
> interface. Then we can implement your approach and his approach using this
> interface. Users can decide which implementation to use.
>
> Thanks,
> Sijie
>
> On Sat, Jan 23, 2021 at 7:46 PM Rajan Dhabalia 
> wrote:
>
> > *Hi, >> Delayed messages or certain user logic can introduce a lot of
> > message-holes. We have seen this issue in quite a lot of customers'
> > production environment.I agree that we need a larger buffer to store and
> > recover individually deleted messages, and the existing 150K limit might
> > not be enough for many usescases. However, I would also like to highlight
> > issues that happened in the past due to keeping a large number of deleted
> > messages in the broker’s memory and one of them was high GC pauses.
> > Therefore, we introduced ConcurrentOpenLongPairRangeSet
> >  to manage deleted messages
> > without actually storing range objects in memory. OpenRangeSet uses
> bitset
> > to store ranges in memory and we can also utilize it to persist in disk
> for
> > the recovery. This approach has various advantages: simple
> implementation,
> > large enough range for recovery, and it skips intermediate conversion
> from
> > unack-messages to bitset in OpenRangeSet which saves extra CPU while
> > recovery.I implemented a simple prototype
> > <
> >
> https://github.com/rdhabalia/pulsar/commit/1f8e5e745e9f1d1429697b5dee1da70545385653
> > >
> > to store deleted messages using bitset in OpenRangeSet and we can persist
> > 10M ranges with 5MB data size which I guess is large enough for any
> > usecases. So, we can use this approach to solve the problem without
> > introducing unnecessary complexity in managed-cursor.Thanks,Rajan*
> >
> > On Fri, Jan 22, 2021 at 7:52 PM Sijie Guo  wrote:
> >
> > > Joe - Delayed messages or certain user logic can introduce a lot of
> > message
> > > holes. We have seen this issue in quite a lot of customers' production
> > > environment. Hence we need to find a solution for solving these
> problems.
> > > If you are skeptical of an implementation like that, how about us
> making
> > > cursor implementation pluggable. We can make this proposal implemented
> as
> > > one plugin. So it will not impact any existing logic but allowing
> people
> > > use a plugin to solve this problem.
> > >
> > > Thanks,
> > > Sijie
> > >
> > > On Fri, Jan 22, 2021 at 5:00 PM Joe Francis
> >  > > >
> > > wrote:
> > >
> > > > Let me take a step back and explain  how I am looking at this from a
> > > > high-level
> > > > design viewpoint
> > > >
> > > >
> > > > Bookkeeper (BK) is like an LSM implementation of a KV store. Writes
> to
> > > all
> > > > keys are appended to a single file; deletes are logical.  Compaction
> > > > reclaims space.  An Index is used locate entries, tracking logical
> > > deletes
> > > > and reclaim space.
> > > >
> > > >
> > > > The index in BK  is another LSM.  Again, writes are appended, deletes
> > are
> > > > logical, and  an index is used to  locate entries , account for
> deletes
> > > and
> > > > compaction to reclaim space (the implementation within rocksdb is far
> > > more
> > > > complex with bloom filters and memtables, but you get the idea )   BK
> > > just
> > > > uses a sophisticated index (rocksdb) which is tiny and cacheable and
> > > > rocksdb has within it a sophisticated index which is small and
> > cacheable
> > > >
> > > >
> > > > So when I look at this proposal, what I see is the same - another
> > attempt
> > > > to build an LSM with a sophisticated index/cache mechanism using log
> > > > structured storage. So I am quite skeptical that this needs to solved
> > > this
> > > > way,  within Pulsar.
> > > >
> > > >
> > > >
> > > > Joe
> > > >
> > > > On Wed, Jan 20, 2021 at 12:30 AM linlin  wrote:
> > > >
> > > > > We can look at
> ManagedCursorImpl.buildIndividualDeletedMessageRanges
> > > > >
> > > > > What is saved in the entry is not a bitSet, but a messageRange one
> by
> > > > one,
> > > > > which contains information such as ledgerId and entryId. BitSet
> only
> > > > exists
> > > > > in the memory and is used to quickly determine whether it already
> > > exists.
> > > > > In addition, the position of each ack will be stored in the
> > > > > individualDeletedMessages queue. When p

Re: [E] Re: [PIP-78] Split the individual acknowledgments into multiple entries

2021-01-26 Thread PengHui Li
Hi Rajan,

10M unack-message ranges can respond to most use-cases
but there are still some scenarios that cannot be dealt with.
We can make an abstraction for users who want to extend the implementation.
I think this will not conflict with your improvement.

This PIP not just to improve the persistent of the individual acks,
there are others that the PR does not been covered such as the individual
acks memory limitation.
Every time mark delete changes, we need to write a few MB of data to
bookies, if you have 1000 topics with
this scenario, it's also a point to worry about.
We do not expect one implementation to solve all the problems of users.

BTW, I don't think introducing an abstraction will bring complexity in
message acknowledgment.
This is help to make the various components of managed-ledger are clearly
blamed.

Thanks
Penghui



Rajan Dhabalia  于2021年1月27日周三 下午12:00写道:

> I have created a PR which should allow brokers to store up to 10M
> unack-message ranges. I think it should be large enough for any usecases
> and probably now, we might not need to introduce abstraction for ack
> management to avoid any further complexity in message acknowledgement path
> as well.
> https://github.com/apache/pulsar/pull/9292
>
> Thanks,
> Rajan
>
>
>
> On Sun, Jan 24, 2021 at 7:14 PM Sijie Guo  wrote:
>
> > Rajan,
> >
> > Thank you for sharing the prototype! That looks great to me.
> >
> > In order for the community to evolve and experiment with different
> > approaches, how about we abstract this acknowledgment management as an
> > interface?
> >
> > If this approach works for you, maybe Lin Lin can focus on abstracting
> this
> > interface. Then we can implement your approach and his approach using
> this
> > interface. Users can decide which implementation to use.
> >
> > Thanks,
> > Sijie
> >
> > On Sat, Jan 23, 2021 at 7:46 PM Rajan Dhabalia 
> > wrote:
> >
> > > *Hi, >> Delayed messages or certain user logic can introduce a lot of
> > > message-holes. We have seen this issue in quite a lot of customers'
> > > production environment.I agree that we need a larger buffer to store
> and
> > > recover individually deleted messages, and the existing 150K limit
> might
> > > not be enough for many usescases. However, I would also like to
> highlight
> > > issues that happened in the past due to keeping a large number of
> deleted
> > > messages in the broker’s memory and one of them was high GC pauses.
> > > Therefore, we introduced ConcurrentOpenLongPairRangeSet
> > >  to manage deleted
> messages
> > > without actually storing range objects in memory. OpenRangeSet uses
> > bitset
> > > to store ranges in memory and we can also utilize it to persist in disk
> > for
> > > the recovery. This approach has various advantages: simple
> > implementation,
> > > large enough range for recovery, and it skips intermediate conversion
> > from
> > > unack-messages to bitset in OpenRangeSet which saves extra CPU while
> > > recovery.I implemented a simple prototype
> > > <
> > >
> >
> https://github.com/rdhabalia/pulsar/commit/1f8e5e745e9f1d1429697b5dee1da70545385653
> > > >
> > > to store deleted messages using bitset in OpenRangeSet and we can
> persist
> > > 10M ranges with 5MB data size which I guess is large enough for any
> > > usecases. So, we can use this approach to solve the problem without
> > > introducing unnecessary complexity in managed-cursor.Thanks,Rajan*
> > >
> > > On Fri, Jan 22, 2021 at 7:52 PM Sijie Guo  wrote:
> > >
> > > > Joe - Delayed messages or certain user logic can introduce a lot of
> > > message
> > > > holes. We have seen this issue in quite a lot of customers'
> production
> > > > environment. Hence we need to find a solution for solving these
> > problems.
> > > > If you are skeptical of an implementation like that, how about us
> > making
> > > > cursor implementation pluggable. We can make this proposal
> implemented
> > as
> > > > one plugin. So it will not impact any existing logic but allowing
> > people
> > > > use a plugin to solve this problem.
> > > >
> > > > Thanks,
> > > > Sijie
> > > >
> > > > On Fri, Jan 22, 2021 at 5:00 PM Joe Francis
> > >  > > > >
> > > > wrote:
> > > >
> > > > > Let me take a step back and explain  how I am looking at this from
> a
> > > > > high-level
> > > > > design viewpoint
> > > > >
> > > > >
> > > > > Bookkeeper (BK) is like an LSM implementation of a KV store. Writes
> > to
> > > > all
> > > > > keys are appended to a single file; deletes are logical.
> Compaction
> > > > > reclaims space.  An Index is used locate entries, tracking logical
> > > > deletes
> > > > > and reclaim space.
> > > > >
> > > > >
> > > > > The index in BK  is another LSM.  Again, writes are appended,
> deletes
> > > are
> > > > > logical, and  an index is used to  locate entries , account for
> > deletes
> > > > and
> > > > > compaction to reclaim space (the implementation within rocksdb is
> far
> > > > more
> > > > > complex with 

[GitHub] [pulsar-helm-chart] Xellendor opened a new issue #102: Pulsar python function not working with TLS enabled

2021-01-26 Thread GitBox


Xellendor opened a new issue #102:
URL: https://github.com/apache/pulsar-helm-chart/issues/102


   **Describe the bug**
   
functionAuthProviderClassName:org.apache.pulsar.functions.auth.KubernetesSecretsTokenAuthProvider
 expects that the tlsTrustCertsFilePath: /pulsar/certs/ca/ca.crt is mapped in 
the functions_worker.yaml. => By adding PF_tlsTrustCertsFilePath: 
/pulsar/certs/ca/ca.crt in the helm for broker config map (in case TLS is 
turned on) issue is fixed.
   
   **To Reproduce**
   1.) Deploy with tls.enabled: true in values.yaml
   2.) Create some python function with pulsar-admin functions create ...
   3.) TLS Handshake not possible because ca.crt and tls-config is not ingested 
into the function pod.
   
   **Expected behavior**
   If TLS is enabled on broker, python functions should run out of the box 
without the need to manually adapt helm chart.
   
   **Screenshots**
   "Downloaded successfully"
   shardId=0
   [2021-01-25 08:43:11 +] [INFO] python_instance_main.py: Starting Python 
instance with Namespace(client_auth_params=None, client_auth_plugin=None, 
cluster_name='neuron-dev01', dependency_repository=None, 
expected_healthcheck_interval=-1, extra_dependency_repository=None, 
function_details='{"tenant":"31000","namespace":"jwt","name":"f_dummy","className":"f_dummy.DummyFunction","logTopic":"31000/jwt/log_partition","runtime":"PYTHON","autoAck":true,"parallelism":1,"source":{"inputSpecs":{"31000/jwt/inputtopic":{}},"cleanupSubscription":true},"sink":{"topic":"31000/jwt/output","forwardSourceMessageProperty":true},"resources":{"cpu":1.0,"ram":"1073741824","disk":"10737418240"},"componentType":"FUNCTION"}',
 function_id='e0e084c9-62ef-4236-9d12-f79bf13633cd', 
function_version='e393f52d-3adb-4fd7-97a0-dc7aeae80c3f', 
hostname_verification_enabled=None, install_usercode_dependencies=True, 
instance_id='0', 
logging_config_file='/pulsar/conf/functions-logging/console_logging_config.ini',
 lo
 gging_directory='logs/func...
   [2021-01-25 08:43:11 +] [INFO] log.py: Setting up producer for log topic 
31000/jwt/log_partition
   2021-01-25 08:43:11.555 INFO  [139914923747136] ConnectionPool:85 | Created 
connection for pulsar+ssl://pulsar-broker:6651/
   2021-01-25 08:43:11.558 INFO  [139914856883968] ClientConnection:353 | 
[10.129.2.96:35334 -> 10.129.2.94:6651] Connected to broker
   2021-01-25 08:43:11.564 ERROR [139914856883968] ClientConnection:411 | 
[10.129.2.96:35334 -> 10.129.2.94:6651] Handshake failed: certificate verify 
failed
   2021-01-25 08:43:11.564 INFO  [139914856883968] ClientConnection:1425 | 
[10.129.2.96:35334 -> 10.129.2.94:6651] Connection closed
   2021-01-25 08:43:11.564 ERROR [139914856883968] ClientImpl:181 | Error 
Checking/Getting Partition Metadata while creating producer on 
persistent://31000/jwt/log_partition -- ConnectError
   2021-01-25 08:43:11.564 INFO  [139914856883968] ClientConnection:242 | 
[10.129.2.96:35334 -> 10.129.2.94:6651] Destroyed connection
   
   **Desktop (please complete the following information):**
- OKD 4.6
   
   **Additional context**
   Suggested solution in pulsar/templates/broker-configmap.yaml:
   
   ...
 {{- if and .Values.tls.enabled .Values.tls.broker.enabled }}
 brokerServicePortTls: "{{ .Values.broker.ports.pulsarssl }}"
 webServicePortTls: "{{ .Values.broker.ports.https }}"
 # TLS Settings
 tlsCertificateFilePath: "/pulsar/certs/broker/tls.crt"
 tlsKeyFilePath: "/pulsar/certs/broker/tls.key"
 tlsTrustCertsFilePath: "/pulsar/certs/ca/ca.crt"
 **_# For functions pods to also run TLS enabled
 PF_tlsTrustCertsFilePath: "/pulsar/certs/ca/ca.crt"_**
 {{- end }}
   ...
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org