Re: [DISCUSS] PIP-166: Function add NONE delivery semantics

2022-06-02 Thread Asaf Mesika
>
> I tend to fail. Although this breaks the current logic. but the current
> implementation can be considered is a bug.

Ok. I would add in the Compatability change another section with bold or
capital letters to highlight you're creating a breaking change. It should
be reflected in the release notes somehow - don't know the process for that.

On Tue, May 31, 2022 at 7:16 PM 石宝迪 
wrote:

> >> If you fail to start the function, you immediately break people's
> > functions when they upgrade to this version. How about notifying them
> once
> > via logger (WARN)?
>
>
> I tend to fail. Although this breaks the current logic. but the current
> implementation can be considered is a bug.
>
> > It will flood their logs if they used it wrong. Maybe write to log once?
>
>
> Agree, I changed PIP.
>
> Thanks,
> Baodi Shi
>
> > 2022年5月31日 23:5720,Asaf Mesika  写道:
> >
> > Hi Baodi,
> >
> > Regarding
> >
> >>
> >>   1. When the delivery semantic is ATMOST_ONCE, add verify autoAck must
> >>   be true. If the validation fails, let the function fail to start (This
> >>   temporarily resolves the configuration ambiguity). When autoAck is
> >>   subsequently removed, the message will be acked immediately after
> receiving
> >>   the message.
> >>
> >>
> >> If you fail to start the function, you immediately break people's
> > functions when they upgrade to this version. How about notifying them
> once
> > via logger (WARN)?
> >
> > Regarding
> >
> >>
> >>   1.
> >>
> >>
> >>   When user call record.ack() in function, just ProcessingGuarantees ==
> >>   MANUAL can be work. In turn, when ProcessingGuarantees != MANUAL, user
> >>   call record.ack() is invalid(print warn log).
> >>
> >> It will flood their logs if they used it wrong. Maybe write to log once?
> >
> > On Tue, May 31, 2022 at 12:24 PM Baozi  .invalid>
> > wrote:
> >
> >> Hi, Asaf.
> >>
> >> Thanks review.
> >>
>  I'm not entirely sure that is accurate. The Effectively-Once as I
> >>> understand it is achieved using transactions, thus the consumption of
> >> that
> >>> message and production of any messages, as a result, are considered one
> >>> atomic unit - either message acknowledged and messages produced or
> >> neither.
> >>
> >>
> >> Not using transactions now, I understand: EFFECTIVELY_ONCE =
> ATLEAST_ONCE
> >> + Message Deduplication.
> >>
> >> @Neng Lu @Rui Fu Can help make sure?
> >>
>  I would issue a WARN when reading configuring the function (thus
> emitted
> >>> once) when the user actively configured autoAck=false and warn them
> that
> >>> this configuration is deprecated and they should switch to the MANUAL
> >>> ProcessingGuarantee configuration option.
> >>
> >>
> >> Added to API Change(2)
> >>
>  suggest you clarify what existing behavior remains for backward
> >>> compatibility with the appropriate comment that this is deprecated and
> >> will
> >>> be removed.
> >>
> >> Yes, I have rewritten it, please see Implementation(1)
> >>
> >>> 5. Regarding Test Plan
> >>> * I would add: Validate the test of autoAck=false still works (you
> >> haven't
> >>> broken anything)
> >>> * I would add: Validate existing ProcessingGuarantee test for
> AtMostOnce,
> >>> AtLeastOnce, ExactlyOnce still works (when autoAck=true)
> >>
> >>
> >> Nice, I added to PIP.
> >>
> >>
> >> Thanks,
> >> Baodi Shi
> >>
> >>> 2022年5月30日 22:0011,Asaf Mesika  写道:
> >>>
> >>> Thanks for applying the fixes.
> >>>
> >>> 1. Regarding
> >>>
> 
>   - EFFECTIVELY_ONCE: The message is acknowledged *after* the function
>   finished execution. Depends on pulsar deduplication, and provides
>   end-to-end effectively once processing.
> 
>  I'm not entirely sure that is accurate. The Effectively-Once as I
> >>> understand it is achieved using transactions, thus the consumption of
> >> that
> >>> message and production of any messages, as a result, are considered one
> >>> atomic unit - either message acknowledged and messages produced or
> >> neither.
> >>>
> >>> 2. Regarding
> >>>
> 
>   1. Indication of autoAck is deprecated, and not use it in the code.
>   (and also Function.proto)
> 
>  * I would issue a WARN when reading configuring the function (thus
> >> emitted
> >>> once) when the user actively configured autoAck=false and warn them
> that
> >>> this configuration is deprecated and they should switch to the MANUAL
> >>> ProcessingGuarantee configuration option.
> >>>
> >>> 3. Regarding
> >>>
> 
>   1. When the delivery semantic is ATMOST_ONCE, the message will be
>   acked immediately after receiving the message, no longer affected by
> >> the
>   autoAck configuration.
> 
>  I suggest you clarify what existing behavior remains for backward
> >>> compatibility with the appropriate comment that this is deprecated and
> >> will
> >>> be removed.
> >>>
> >>> 4. Regarding
> >>>
> 
>   1.
> 
>   When user call record.ack() in function, just ProcessingGuarantees ==
>   MANUAL can be work. In t

Re: [DISCUSS] PIP-166: Function add NONE delivery semantics

2022-06-02 Thread 石宝迪
> Ok. I would add in the Compatability change another section with bold or
> capital letters to highlight you're creating a breaking change. It should
> be reflected in the release notes somehow - don't know the process for that.

Ok, I added to `Incompatible case`. PTAL.


Thanks,
Baodi Shi

> 2022年6月2日 18:0404,Asaf Mesika  写道:
> 
>> 
>> I tend to fail. Although this breaks the current logic. but the current
>> implementation can be considered is a bug.
> 
> Ok. I would add in the Compatability change another section with bold or
> capital letters to highlight you're creating a breaking change. It should
> be reflected in the release notes somehow - don't know the process for that.
> 
> On Tue, May 31, 2022 at 7:16 PM 石宝迪 
> wrote:
> 
 If you fail to start the function, you immediately break people's
>>> functions when they upgrade to this version. How about notifying them
>> once
>>> via logger (WARN)?
>> 
>> 
>> I tend to fail. Although this breaks the current logic. but the current
>> implementation can be considered is a bug.
>> 
>>> It will flood their logs if they used it wrong. Maybe write to log once?
>> 
>> 
>> Agree, I changed PIP.
>> 
>> Thanks,
>> Baodi Shi
>> 
>>> 2022年5月31日 23:5720,Asaf Mesika  写道:
>>> 
>>> Hi Baodi,
>>> 
>>> Regarding
>>> 
 
  1. When the delivery semantic is ATMOST_ONCE, add verify autoAck must
  be true. If the validation fails, let the function fail to start (This
  temporarily resolves the configuration ambiguity). When autoAck is
  subsequently removed, the message will be acked immediately after
>> receiving
  the message.
 
 
 If you fail to start the function, you immediately break people's
>>> functions when they upgrade to this version. How about notifying them
>> once
>>> via logger (WARN)?
>>> 
>>> Regarding
>>> 
 
  1.
 
 
  When user call record.ack() in function, just ProcessingGuarantees ==
  MANUAL can be work. In turn, when ProcessingGuarantees != MANUAL, user
  call record.ack() is invalid(print warn log).
 
 It will flood their logs if they used it wrong. Maybe write to log once?
>>> 
>>> On Tue, May 31, 2022 at 12:24 PM Baozi > .invalid>
>>> wrote:
>>> 
 Hi, Asaf.
 
 Thanks review.
 
>> I'm not entirely sure that is accurate. The Effectively-Once as I
> understand it is achieved using transactions, thus the consumption of
 that
> message and production of any messages, as a result, are considered one
> atomic unit - either message acknowledged and messages produced or
 neither.
 
 
 Not using transactions now, I understand: EFFECTIVELY_ONCE =
>> ATLEAST_ONCE
 + Message Deduplication.
 
 @Neng Lu @Rui Fu Can help make sure?
 
>> I would issue a WARN when reading configuring the function (thus
>> emitted
> once) when the user actively configured autoAck=false and warn them
>> that
> this configuration is deprecated and they should switch to the MANUAL
> ProcessingGuarantee configuration option.
 
 
 Added to API Change(2)
 
>> suggest you clarify what existing behavior remains for backward
> compatibility with the appropriate comment that this is deprecated and
 will
> be removed.
 
 Yes, I have rewritten it, please see Implementation(1)
 
> 5. Regarding Test Plan
> * I would add: Validate the test of autoAck=false still works (you
 haven't
> broken anything)
> * I would add: Validate existing ProcessingGuarantee test for
>> AtMostOnce,
> AtLeastOnce, ExactlyOnce still works (when autoAck=true)
 
 
 Nice, I added to PIP.
 
 
 Thanks,
 Baodi Shi
 
> 2022年5月30日 22:0011,Asaf Mesika  写道:
> 
> Thanks for applying the fixes.
> 
> 1. Regarding
> 
>> 
>> - EFFECTIVELY_ONCE: The message is acknowledged *after* the function
>> finished execution. Depends on pulsar deduplication, and provides
>> end-to-end effectively once processing.
>> 
>> I'm not entirely sure that is accurate. The Effectively-Once as I
> understand it is achieved using transactions, thus the consumption of
 that
> message and production of any messages, as a result, are considered one
> atomic unit - either message acknowledged and messages produced or
 neither.
> 
> 2. Regarding
> 
>> 
>> 1. Indication of autoAck is deprecated, and not use it in the code.
>> (and also Function.proto)
>> 
>> * I would issue a WARN when reading configuring the function (thus
 emitted
> once) when the user actively configured autoAck=false and warn them
>> that
> this configuration is deprecated and they should switch to the MANUAL
> ProcessingGuarantee configuration option.
> 
> 3. Regarding
> 
>> 
>> 1. When the delivery semantic is ATMOST_ONCE, the message will be
>> acked immediately after receiving the message, no lo

Pulsar Flaky test report 2022-06-02 for PR builds in CI

2022-06-02 Thread Lari Hotari
Dear Pulsar community members,

Here's a report of the flaky tests in Pulsar CI during the observation
period of 2022-05-26 to 2022-06-02 .
The full report is available as a Google Sheet,
https://docs.google.com/spreadsheets/d/165FHpHjs5fHccSsmQM4beeg6brn-zfUjcrXf6xAu4yQ/edit?usp=sharing

The report contains a subset of the test failures.
The flaky tests are observed from builds of merged PRs.
The GitHub Actions logs will be checked for builds where the SHA of the
head of the PR matches the SHA which got merged.
This ensures that all found exceptions are real flakes, since no changes
were made to the PR to make the tests pass later
so that the PR was merged successfully.

Here are the most flaky test methods:
Test method nameNumber of build failures due to this test
org.apache.pulsar.tests.integration.functions.java.PulsarFunctionsJavaThreadTest.testJavaLoggingFunction
32
org.apache.pulsar.client.impl.MessageImplTest.testMessageBrokerAndEntryMetadataTimestampMissed
  18
org.apache.pulsar.client.impl.MultiTopicsConsumerImplTest.testParallelSubscribeAsync
12
org.apache.pulsar.functions.worker.PulsarFunctionTlsTest.tearDown   10
org.apache.pulsar.client.impl.BinaryProtoLookupServiceTest.maxLookupRedirectsTest1
  8
org.apache.pulsar.tests.integration.functions.python.PulsarFunctionsPythonProcessTest.testPythonExclamationFunction
 8
org.apache.pulsar.broker.admin.PersistentTopicsTest.testTriggerCompactionTopic  
7
org.apache.pulsar.broker.service.RackAwareTest.testRackUpdate   6
org.apache.pulsar.metadata.bookkeeper.PulsarLedgerIdGeneratorTest.testGenerateLedgerId
  5
org.apache.pulsar.broker.service.RackAwareTest.testPlacement5
org.apache.pulsar.client.impl.ClientCnxTest.testClientCnxTimeout5
org.apache.pulsar.tests.integration.functions.PulsarStateTest.testPythonWordCountFunction
   5
org.apache.pulsar.metadata.ZKSessionTest.testSessionLost4
org.apache.pulsar.tests.integration.io.sources.debezium.PulsarDebeziumOracleSourceTest.testDebeziumOracleDbSource
   4
org.apache.pulsar.client.impl.ConnectionTimeoutTest.testLowTimeout  4
org.apache.pulsar.client.impl.ProducerCloseTest.brokerCloseTopicTest3

Markdown formatted summary reports for each test class can be accessed at
https://github.com/lhotari/pulsar-flakes/tree/master/2022-05-26-to-2022-06-02
The summary report links are now available in the Google sheet
https://docs.google.com/spreadsheets/d/165FHpHjs5fHccSsmQM4beeg6brn-zfUjcrXf6xAu4yQ/edit?usp=sharing

We need more help in addressing the flaky tests. Please join the efforts
so that we can get CI to a more stable state.

To coordinate the work,
1) please search for an existing issues or search for all flaky issues with
"flaky" or the test class name (without package) in the search:
https://github.com/apache/pulsar/issues?q=is%3Aopen+flaky+sort%3Aupdated-desc
2) If there isn't an issue for a particular flaky test failure that you'd
like to fix, please create an issue using the "Flaky test" template at
https://github.com/apache/pulsar/issues/new/choose
3) Please comment on the issue that you are working on it.

We have a few active contributors working on the flaky tests, thanks for
the contributions.

I'm looking forward to more contributors joining the efforts. Please join
the #testing channel on Slack if you'd like to ask questions and tips about
reproducing flaky tests locally and how to fix them.
Sharing stories about fixing flaky tests is also helpful for sharing the
knowledge about how flaky tests can be fixed. That's also a valuable way to
contribute.
Some flaky tests might be actual real production code bugs. Fixing
the flaky test might result in fixing a real production code bug.

Current contributors, please keep up the good work! 
New contributors, you are welcome to join the efforts! You will learn about 
Pulsar and its internals as a side effect.  If you'd love to learn Pulsar 
internals and Pulsar OSS development, start by fixing flaky tests. :)

BR, -Lari


Re: [VOTE] PIP-165: Auto release client useless connections

2022-06-02 Thread guo jiwei
+1


Regards
Jiwei Guo (Tboy)


On Wed, Jun 1, 2022 at 3:12 PM Yubiao Feng
 wrote:

> Hi Pulsar Community:
>
> There were some mistakes in the last email, so I have corrected them.
>
> I would like to start a VOTE on "Auto release client useless connections"
> (PIP-165).
>
> Proposal Link: [PIP-165] Auto release client useless connections · Issue
> #15516 · apache/pulsar (github.com)
> 
>
> Discuss Link: [DISCUSS] [PIP-165] Auto release client useless
> connections-Apache Mail Archives
> 
>
> Voting will stay open for at least 48h.
> Thanks, Yubiao Feng
>
> On Wed, Jun 1, 2022 at 2:40 PM Yubiao Feng 
> wrote:
>
> > [VOTE] PIP-165: Auto release client useless connectionsHi Pulsar
> > Community, I would like to start a VOTE on "Auto release client useless
> > connections" (PIP-165). The proposal can be read at
> > https://github.com/apache/pulsar/issues/15516
> >  and the discussion thead
> > is available at
> > https://lists.apache.org/thread/t6h98qs2coc56z06tw38hdlljl67ft4n
> >   Voting will stay open for at least 48h. Thanks, Yubiao Feng
> >
>


Re: [VOTE] PIP-165: Auto release client useless connections

2022-06-02 Thread PengHui Li
+1

Penghui

On Thu, Jun 2, 2022 at 8:21 PM guo jiwei  wrote:

> +1
>
>
> Regards
> Jiwei Guo (Tboy)
>
>
> On Wed, Jun 1, 2022 at 3:12 PM Yubiao Feng
>  wrote:
>
> > Hi Pulsar Community:
> >
> > There were some mistakes in the last email, so I have corrected them.
> >
> > I would like to start a VOTE on "Auto release client useless connections"
> > (PIP-165).
> >
> > Proposal Link: [PIP-165] Auto release client useless connections · Issue
> > #15516 · apache/pulsar (github.com)
> > 
> >
> > Discuss Link: [DISCUSS] [PIP-165] Auto release client useless
> > connections-Apache Mail Archives
> > 
> >
> > Voting will stay open for at least 48h.
> > Thanks, Yubiao Feng
> >
> > On Wed, Jun 1, 2022 at 2:40 PM Yubiao Feng 
> > wrote:
> >
> > > [VOTE] PIP-165: Auto release client useless connectionsHi Pulsar
> > > Community, I would like to start a VOTE on "Auto release client useless
> > > connections" (PIP-165). The proposal can be read at
> > > https://github.com/apache/pulsar/issues/15516
> > >  and the discussion
> thead
> > > is available at
> > > https://lists.apache.org/thread/t6h98qs2coc56z06tw38hdlljl67ft4n
> > >   Voting will stay open for at least 48h. Thanks, Yubiao Feng
> > >
> >
>


Re: [DISCUSS] PIP-166: Function add NONE delivery semantics

2022-06-02 Thread PengHui Li
> It should be reflected in the release notes somehow - don't know the
process for that.

Yes, we are using the label `release/important-notice` to track the
important things we need to highlight in the release not.
I have added the label.

I support the proposal.

Thanks,
Penghui

On Thu, Jun 2, 2022 at 6:42 PM 石宝迪 
wrote:

> > Ok. I would add in the Compatability change another section with bold or
> > capital letters to highlight you're creating a breaking change. It should
> > be reflected in the release notes somehow - don't know the process for
> that.
>
> Ok, I added to `Incompatible case`. PTAL.
>
>
> Thanks,
> Baodi Shi
>
> > 2022年6月2日 18:0404,Asaf Mesika  写道:
> >
> >>
> >> I tend to fail. Although this breaks the current logic. but the current
> >> implementation can be considered is a bug.
> >
> > Ok. I would add in the Compatability change another section with bold or
> > capital letters to highlight you're creating a breaking change. It should
> > be reflected in the release notes somehow - don't know the process for
> that.
> >
> > On Tue, May 31, 2022 at 7:16 PM 石宝迪 
> > wrote:
> >
>  If you fail to start the function, you immediately break people's
> >>> functions when they upgrade to this version. How about notifying them
> >> once
> >>> via logger (WARN)?
> >>
> >>
> >> I tend to fail. Although this breaks the current logic. but the current
> >> implementation can be considered is a bug.
> >>
> >>> It will flood their logs if they used it wrong. Maybe write to log
> once?
> >>
> >>
> >> Agree, I changed PIP.
> >>
> >> Thanks,
> >> Baodi Shi
> >>
> >>> 2022年5月31日 23:5720,Asaf Mesika  写道:
> >>>
> >>> Hi Baodi,
> >>>
> >>> Regarding
> >>>
> 
>   1. When the delivery semantic is ATMOST_ONCE, add verify autoAck must
>   be true. If the validation fails, let the function fail to start
> (This
>   temporarily resolves the configuration ambiguity). When autoAck is
>   subsequently removed, the message will be acked immediately after
> >> receiving
>   the message.
> 
> 
>  If you fail to start the function, you immediately break people's
> >>> functions when they upgrade to this version. How about notifying them
> >> once
> >>> via logger (WARN)?
> >>>
> >>> Regarding
> >>>
> 
>   1.
> 
> 
>   When user call record.ack() in function, just ProcessingGuarantees ==
>   MANUAL can be work. In turn, when ProcessingGuarantees != MANUAL,
> user
>   call record.ack() is invalid(print warn log).
> 
>  It will flood their logs if they used it wrong. Maybe write to log
> once?
> >>>
> >>> On Tue, May 31, 2022 at 12:24 PM Baozi  >> .invalid>
> >>> wrote:
> >>>
>  Hi, Asaf.
> 
>  Thanks review.
> 
> >> I'm not entirely sure that is accurate. The Effectively-Once as I
> > understand it is achieved using transactions, thus the consumption of
>  that
> > message and production of any messages, as a result, are considered
> one
> > atomic unit - either message acknowledged and messages produced or
>  neither.
> 
> 
>  Not using transactions now, I understand: EFFECTIVELY_ONCE =
> >> ATLEAST_ONCE
>  + Message Deduplication.
> 
>  @Neng Lu @Rui Fu Can help make sure?
> 
> >> I would issue a WARN when reading configuring the function (thus
> >> emitted
> > once) when the user actively configured autoAck=false and warn them
> >> that
> > this configuration is deprecated and they should switch to the MANUAL
> > ProcessingGuarantee configuration option.
> 
> 
>  Added to API Change(2)
> 
> >> suggest you clarify what existing behavior remains for backward
> > compatibility with the appropriate comment that this is deprecated
> and
>  will
> > be removed.
> 
>  Yes, I have rewritten it, please see Implementation(1)
> 
> > 5. Regarding Test Plan
> > * I would add: Validate the test of autoAck=false still works (you
>  haven't
> > broken anything)
> > * I would add: Validate existing ProcessingGuarantee test for
> >> AtMostOnce,
> > AtLeastOnce, ExactlyOnce still works (when autoAck=true)
> 
> 
>  Nice, I added to PIP.
> 
> 
>  Thanks,
>  Baodi Shi
> 
> > 2022年5月30日 22:0011,Asaf Mesika  写道:
> >
> > Thanks for applying the fixes.
> >
> > 1. Regarding
> >
> >>
> >> - EFFECTIVELY_ONCE: The message is acknowledged *after* the function
> >> finished execution. Depends on pulsar deduplication, and provides
> >> end-to-end effectively once processing.
> >>
> >> I'm not entirely sure that is accurate. The Effectively-Once as I
> > understand it is achieved using transactions, thus the consumption of
>  that
> > message and production of any messages, as a result, are considered
> one
> > atomic unit - either message acknowledged and messages produced or
>  neither.
> >
> > 2. Regarding
> >
> 

[DISCUSS] PIP-173 : Create a built-in Function implementing the most common basic transformations

2022-06-02 Thread Christophe Bornet
Dear Pulsar community,

I opened PIP-173 https://github.com/apache/pulsar/issues/15902 to create a
built-in Function implementing the most common basic transformations.

Let me know what you think.

Best regards,

Christophe

--

## Motivation

Currently, when users want to modify the data in Pulsar, they need to write
a Function.
For a lot of use cases, it would be handy for them to be able to use a
ready-made built-in Function that implements the most common basic
transformations like the ones available in [Kafka Connect’s SMTs](
https://docs.confluent.io/platform/current/connect/transforms/overview.html
).
This removes users the burden of writing the Function themselves, having to
understanding the perks of Pulsar Schemas, coding in a language that they
may not master (probably Java if they want to do advanced stuff), and they
benefit from battle-tested, maintained, performance-optimised code.

## Goal

This PIP is about providing a `TransformFunction` that executes a sequence
of basic transformations on the data.
The `TransformFunction` shall be easy to configure, launchable as a
built-in NAR.
The `TransformFunction` shall be able to apply a sequence of common
transformations in-memory so we don’t need to execute the
`TransformFunction` multiple times and read/write to a topic each time.

This PIP is not about appending such a Function to a Source or a Sink.
While this is the ultimate goal, so we can provide an experience similar to
Kafka SMTs and avoid a read/write to a topic, this work will be done in a
future PIP.
It is expected that the code written for this PIP will be reusable in this
future work.

## API Changes

This PIP will introduce a new `transform` module in `pulsar-function`
multi-module project.  The produced artifact will be a NAR of the
TransformFunction.

## Implementation

When it processes a record, `TransformFunction` will :

* Create a mutable structure `TransformContext` that contains

```java
@Data
public class TransformContext {
private Context context;
private Schema keySchema;
private Object keyObject;
private boolean keyModified;
private Schema valueSchema;
private Object valueObject;
private boolean valueModified;
private KeyValueEncodingType keyValueEncodingType;
private String key;
private Map properties;
private String outputTopic;
```

If the record is a `KeyValue`, the key and value schemas and object are
unpacked. Otherwise the `keySchema` and `keyObject` are null.

* Call in sequence the process method of a series of `TransformStep` on
this `TransformContext`

```java
public interface TransformStep {
void process(TransformContext transformContext) throws Exception;
}
```

Each `TransformStep` can then modify the `TransformContext` as needed.

* Call the `send()` method of the `TransformContext` which will create the
message to send to the outputTopic, repacking the KeyValue if needed.

The `TransformFunction` will read its configuration as Json from
`userConfig` in the format:

```json
{
  "steps": [
{
  "type": "drop-fields", "fields": "keyField1,keyField2", "part": "key"
},
{
  "type": "merge-key-value"
},
{
  "type": "unwrap-key-value"
},
{
  "type": "cast", "schema-type": "STRING"
}
  ]
}
```

Each step is defined by its `type` and uses its own arguments.

This example config applied on a KeyValue input record with
value `{key={keyField1: key1, keyField2: key2, keyField3: key3},
value={valueField1: value1, valueField2: value2, valueField3: value3}}`
will give after each step:
```
{key={keyField1: key1, keyField2: key2, keyField3: key3},
value={valueField1: value1, valueField2: value2, valueField3:
value3}}(KeyValue)
   |
   | ”type": "drop-fields", "fields": "keyField1,keyField2”,
"part": "key”
   |
{key={keyField3: key3}, value={valueField1: value1, valueField2: value2,
valueField3: value3}} (KeyValue)
   |
   | "type": "merge-key-value"
   |
{key={keyField3: key3}, value={keyField3: key3, valueField1: value1,
valueField2: value2, valueField3: value3}} (KeyValue)
   |
   | "type": "unwrap-key-value"
   |
{keyField3: key3, valueField1: value1, valueField2: value2, valueField3:
value3} (AVRO)
   |
   | "type": "cast", "schema-type": "STRING"
   |
{"keyField3": "key3", "valueField1": "value1", "valueField2": "value2",
"valueField3": "value3"} (STRING)
```

`TransformFunction` will be built as a NAR including a `pulsar-io.yaml`
service file so it can be registered as a built-in function with name
`transform`.

## Reject Alternatives

None


Re: [DISCUSS] [PIP-165] Auto release client useless connections

2022-06-02 Thread Ran Gao
This is a good idea, but I have a concern, Pulsar has the config 
`brokerMaxConnections` to control max connection count against one broker. If 
the connection is closed, it will re-connect when consumers or producers start 
to consume and produce messages again, but this time the max connection count 
will reach the max count.


On 2022/05/26 06:31:37 Yubiao Feng wrote:
> I open a pip to discuss Auto release client useless connections, could you
> help me review
> 
> 
> ## Motivation
> Currently, the Pulsar client keeps the connection even if no producers or
> consumers use this connection.
> If a client produces messages to topic A and we have 3 brokers 1, 2, 3. Due
> to the bundle unloading(load manager)
> topic ownership will change from A to B and finally to C. For now, the
> client-side will keep 3 connections to all 3 brokers.
> We can optimize this part to reduce the broker side connections, the client
> should close the unused connections.
> 
> So a mechanism needs to be added to release unwanted connections.
> 
> ### Why are there idle connections?
> 
> 1.When configuration `maxConnectionsPerHosts ` is not set to 0, the
> connection is not closed at all.
> The design is to hold a fixed number of connections per Host, avoiding
> frequent closing and creation.
> 
> https://github.com/apache/pulsar/blob/72349117c4fd9825adaaf16d3588a695e8a9dd27/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConnectionPool.java#L325-L335
> 
> 2-1. When clients receive `command-close`, will reconnect immediately.
> It's designed to make it possible to reconnect, rebalance, and unload.
> 
> https://github.com/apache/pulsar/blob/72349117c4fd9825adaaf16d3588a695e8a9dd27/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConnectionHandler.java#L122-L141
> 
> 2-2. The broker will close client connections before writing ownership info
> to the ZK. Then clients will get deprecated broker address when it tries
> lookup.
> 
> https://github.com/apache/pulsar/blob/72349117c4fd9825adaaf16d3588a695e8a9dd27/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentTopic.java#L1282-L1293
> 
> ## Goal
> Automatically release connections that are no longer used.
> 
> - Scope
>   - **Pulsar client**
> Contains connections used by consumers, Producers, and Transactions.
> 
>   - **Pulsar proxy**
> Contains only the connection between Proxy and broker
> 
> ## Approach
> Periodically check for idle connections and close them.
> 
> ## Changes
> 
> ### API changes
> **ClientCnx** added an idle check method to mark idle time.
> 
> ```java
> /** Create time. **/
> private final long createTime;
> /** The time when marks the connection is idle. **/
> private long IdleMarkTime;
> /** The time when the last valid data was transmitted. **/
> private long lastWorkTime;
> /** Stat. enumerated values: using, idle_marked, before_release, released**/
> private int stat;
> /**
>   * Check client connection is now free. This method may change the state
> to idle.
>   * This method will not change the state to idle.
>   */
> public boolen doIdleCheck();
> /** Get stat **/
> public int getStat();
> /** Change stat **/
> public int setStat(int originalStat, int newStat);
> ```
> 
> ### Configuration changes
> We can set the check frequency and release rule for idle connections at
> `ClientConfigurationData`.
> 
> ```java
> @ApiModelProperty(
> name = "autoReleaseIdleConnectionsEnabled",
> value = "Do you want to automatically clean up unused connections"
> )
> private boolean autoReleaseIdleConnectionsEnabled = true;
> 
> @ApiModelProperty(
> name = "connectionMaxIdleSeconds",
> value = "Release the connection if it is not used for more than
> [connectionMaxIdleSeconds] seconds"
> )
> private int connectionMaxIdleSeconds = 180;
> 
> @ApiModelProperty(
> name = "connectionIdleDetectionIntervalSeconds",
> value = "How often check idle connections"
> )
> private int connectionIdleDetectionIntervalSeconds = 60;
> ```
> 
> ## Implementation
> 
> - **Pulsar client**
> If no consumer, producer, or transaction uses the current connection,
> release it.
> 
> - **Pulsar proxy**
> If the connection has not transmitted valid data for a long time, release
> it.
> 
> 
> Yubiao Feng
> Thanks
> 


Re: [DISCUSS] Apache Pulsar 2.9.3 release

2022-06-02 Thread mattison chao
Hi Dave Fisher,

> There are some PRs that are coming in that must be included.

How's the progress on these PRs?

Best,
Mattison

On Wed, 25 May 2022 at 21:33, Just do it 
wrote:

> +1
> Thanks,
> Dezhi
>
>
>
>
>
> -- Original --
> From: Hang Chen  Date: Wed,May 25,2022 9:10 AM
> To: dev  Subject: Re: [DISCUSS] Apache Pulsar 2.9.3 release
>
>
>
> +1
>
> Thanks,
> Hang
>
> Dave Fisher  >
> > There are some PRs that are coming in that must be included.
> >
> > Thanks,
> > Dave
> >
> >
> > > On May 23, 2022, at 4:29 AM, PengHui Li  wrote:
> > >
> > > +1
> > >
> > > Thanks
> > > Penghui
> > >
> > > On Mon, May 23, 2022 at 3:31 PM mattison chao <
> mattisonc...@apache.org>
> > > wrote:
> > >
> > >> Hello, Pulsar community:
> > >>
> > >> I'd like to propose to release Apache Pulsar 2.9.3
> > >>
> > >> Currently, we have 192 commits [0] and there are many
> transaction
> > >> fixes, security fixes.
> > >>
> > >> And there are 22 open PRs [1], I will follow them to make
> sure that
> > >> the important fixes could be contained in 2.9.3
> > >>
> > >> If you have any important fixes or any questions,
> > >> please reply to this email, we will evaluate whether to
> > >> include it in 2.9.3
> > >>
> > >> [0]
> > >>
> > >>
> https://github.com/apache/pulsar/pulls?q=is%3Amerged+is%3Apr+label%3Arelease%2F2.9.3+
> >
> >;
> >> [1]
> > >>
> > >>
> https://github.com/apache/pulsar/pulls?q=is%3Aopen+is%3Apr+label%3Arelease%2F2.9.3+
> >
> >;
> >>
> > >> Best Regards
> > >> Mattison
> > >>
> >