[VOTE] KIP-714: Client metrics and observability

2023-08-04 Thread Andrew Schofield
Hi,
After almost 2 1/2 years in the making, I would like to call a vote for KIP-714 
(https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability).

This KIP aims to improve monitoring and troubleshooting of client performance 
by enabling clients to push metrics to brokers.

I’d like to thank everyone that participated in the discussion, especially the 
librdkafka team since one of the aims of the KIP is to enable any client to 
participate, not just the Apache Kafka project’s Java clients.

Thanks,
Andrew

[jira] [Resolved] (KAFKA-14598) Fix flaky ConnectRestApiTest

2023-08-04 Thread Ashwin Pankaj (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashwin Pankaj resolved KAFKA-14598.
---
Resolution: Fixed

Did not observe this recently

> Fix flaky ConnectRestApiTest
> 
>
> Key: KAFKA-14598
> URL: https://issues.apache.org/jira/browse/KAFKA-14598
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Ashwin Pankaj
>Assignee: Ashwin Pankaj
>Priority: Minor
>  Labels: flaky-test
>
> ConnectRestApiTest sometimes fails with the message
> {{ConnectRestError(404, '\n\n content="text/html;charset=ISO-8859-1"/>\nError 404 Not 
> Found\n\nHTTP ERROR 404 Not 
> Found\n\nURI:/connector-plugins/\nSTATUS:404\nMESSAGE:Not
>  
> Found\nSERVLET:-\n\n\n\n\n',
>  'http://172.31.1.75:8083/connector-plugins/')}}
> This happens because ConnectDistributedService.start() by default waits till 
> the the line
> {{Joined group at generation ..}} is visible in the logs.
> In most cases this is sufficient. But in the cases where the test fails, we 
> see that this message appears even before Connect RestServer has finished 
> initialization.
>  {quote}   - [2022-12-15 15:40:29,064] INFO [Worker clientId=connect-1, 
> groupId=connect-cluster] Joined group at generation 2 with protocol version 1 
> and got assignment: Assignment{error=0, 
> leader='connect-1-07d9da63-9acb-4633-aee4-1ab79f4ab1ae', 
> leaderUrl='http://worker34:8083/', offset=-1, connectorIds=[], taskIds=[], 
> revokedConnectorIds=[], revokedTaskIds=[], delay=0} with rebalance delay: 0 
> (org.apache.kafka.connect.runtime.distributed.DistributedHerder)
> - [2022-12-15 15:40:29,560] INFO 172.31.5.66 - - [15/Dec/2022:15:40:29 
> +] "GET /connector-plugins/ HTTP/1.1" 404 375 "-" 
> "python-requests/2.24.0" 71 (org.apache.kafka.connect.runtime.rest.RestServer)
> - [2022-12-15 15:40:29,579] INFO REST resources initialized; server is 
> started and ready to handle requests 
> (org.apache.kafka.connect.runtime.rest.RestServer)
> {quote}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #2068

2023-08-04 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 399101 lines...]
Gradle Test Run :streams:integrationTest > Gradle Test Executor 183 > 
TableTableJoinIntegrationTest > [caching enabled = false] > 
testInnerInner[caching enabled = false] PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 183 > 
TableTableJoinIntegrationTest > [caching enabled = false] > 
testInnerOuter[caching enabled = false] STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 183 > 
TableTableJoinIntegrationTest > [caching enabled = false] > 
testInnerOuter[caching enabled = false] PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 183 > 
TableTableJoinIntegrationTest > [caching enabled = false] > 
testInnerLeft[caching enabled = false] STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 183 > 
TableTableJoinIntegrationTest > [caching enabled = false] > 
testInnerLeft[caching enabled = false] PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 183 > 
TableTableJoinIntegrationTest > [caching enabled = false] > 
testOuterInner[caching enabled = false] STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 183 > 
TableTableJoinIntegrationTest > [caching enabled = false] > 
testOuterInner[caching enabled = false] PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 183 > 
TableTableJoinIntegrationTest > [caching enabled = false] > 
testOuterOuter[caching enabled = false] STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 183 > 
TableTableJoinIntegrationTest > [caching enabled = false] > 
testOuterOuter[caching enabled = false] PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 183 > 
TableTableJoinIntegrationTest > [caching enabled = false] > 
testInnerWithRightVersionedOnly[caching enabled = false] STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 183 > 
TableTableJoinIntegrationTest > [caching enabled = false] > 
testInnerWithRightVersionedOnly[caching enabled = false] PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 183 > 
TableTableJoinIntegrationTest > [caching enabled = false] > 
testLeftWithLeftVersionedOnly[caching enabled = false] STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 183 > 
TableTableJoinIntegrationTest > [caching enabled = false] > 
testLeftWithLeftVersionedOnly[caching enabled = false] PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 183 > 
TableTableJoinIntegrationTest > [caching enabled = false] > 
testInnerWithLeftVersionedOnly[caching enabled = false] STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 183 > 
TableTableJoinIntegrationTest > [caching enabled = false] > 
testInnerWithLeftVersionedOnly[caching enabled = false] PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 183 > 
TaskAssignorIntegrationTest > shouldProperlyConfigureTheAssignor STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 183 > 
TaskAssignorIntegrationTest > shouldProperlyConfigureTheAssignor PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 183 > 
TaskMetadataIntegrationTest > shouldReportCorrectEndOffsetInformation STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 183 > 
TaskMetadataIntegrationTest > shouldReportCorrectEndOffsetInformation PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 183 > 
TaskMetadataIntegrationTest > shouldReportCorrectCommittedOffsetInformation 
STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 183 > 
TaskMetadataIntegrationTest > shouldReportCorrectCommittedOffsetInformation 
PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 183 > 
EmitOnChangeIntegrationTest > shouldEmitSameRecordAfterFailover() STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 183 > 
EmitOnChangeIntegrationTest > shouldEmitSameRecordAfterFailover() PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 183 > 
HighAvailabilityTaskAssignorIntegrationTest > 
shouldScaleOutWithWarmupTasksAndPersistentStores(TestInfo) STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 183 > 
HighAvailabilityTaskAssignorIntegrationTest > 
shouldScaleOutWithWarmupTasksAndPersistentStores(TestInfo) PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 183 > 
HighAvailabilityTaskAssignorIntegrationTest > 
shouldScaleOutWithWarmupTasksAndInMemoryStores(TestInfo) STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 183 > 
HighAvailabilityTaskAssignorIntegrationTest > 
shouldScaleOutWithWarmupTasksAndInMemoryStores(TestInfo) PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 183 > 
KStreamAggregationDedup

[jira] [Created] (KAFKA-15303) Foreign key joins no longer triggered by events on the right side of the join after deployment with a new compatible Avro schema

2023-08-04 Thread Charles-Eddy (Jira)
Charles-Eddy created KAFKA-15303:


 Summary: Foreign key joins no longer triggered by events on the 
right side of the join after deployment with a new compatible Avro schema
 Key: KAFKA-15303
 URL: https://issues.apache.org/jira/browse/KAFKA-15303
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 3.4.0
Reporter: Charles-Eddy


Hello everyone, I am currently working on a project that uses Kafka Streams 
(version 3.4.0) with a Kafka broker (version 2.7.0) managed by Amazon MSK.
Our goal is to join offer information from our sellers with additional data 
from various input topics, and then feed the resulting joined information into 
an output topic.

Our application is deployed in Kubernetes using the StatefulSet feature, with 
one EBS volume per Kafka Streams pod and 5 Streams Threads per pod.

We are using avro to serialize / deserialize input topics and storing in the 
state stores of Kafka streams.

We have encountered a bug in Kafka Streams that prevents us from deploying new 
versions of Kafka Streams containing new compatible Avro schemas of our input 
topics.

The symptom is that after deploying our new version, which contains no changes 
in topology but only changes to the Avro schema used, we discard every event 
coming from the right part of the join concerned by these Avro schema changes 
until we receive something from the left part of the join.

As a result, we are losing events and corrupting our output topics and stores 
with outdated information.

After checking the local help for the priority to assign, I have assigned it as 
*CRITICAL* because we are losing data (for example, tombstones are not 
propagated to the following joins, so some products are still visible on our 
website when they should not be).

Please feel free to change the priority if you think it is not appropriate.

 

*The bug:*
After multiple hours of investigation we found out that the bug is located in 
the foreign key join feature and specifically in this class: 
*SubscriptionResolverJoinProcessorSupplier* in the left part of a foreign key 
join. 
This class and his method process(...) is computing a hash from the local store 
via a serialization of a deserialized value from the left state store and 
comparing it with the hash of the original message from the 
subscription-response-topic. 
 
It means that when we deploy a new version of our kafka streams instance with a 
new compatible avro schema from the left side of a join, every join triggered 
by the right part of the join are invalidated until we receive all the events 
again on the left side. Every join triggered by the right part of the join are 
discarded because all the hashes computed by kafka streams are different now 
from the original messages.
 
*How to reproduce it:*
If we take a working and a non-working workflow, it will do something like this:
+Normal non-breaking+ workflow from the left part of the FK join: # A new offer 
event occurs. The offer is received and stored (v1).
 # A subscription registration is sent with the offer-hash (v1).
 # The subscription is saved to the store with the v1 offer-hash.
 # Product data is searched for.
 # If product data is found, a subscription response is sent back, including 
the v1 offer-hash.
 # The offer data in the store is searched for and the offer hashes between the 
store (v1) and response event (also v1) are compared.
 # Finally, the join result is sent.

New product event from the right part of the FK join: # The product is received 
and stored.
 # All related offers in the registration store are searched for.
 # A subscription response is sent for each offer, including their offer hash 
(v1).
 # The offer data in the store is searched for and the offer hashes between the 
store (v1) and response event (also v1) are compared.
 # Finally, the join result is sent.

 
+Breaking workflow:+ 
The offer serializer is changed to offer v2
New product event from the right part of the FK join: # The product is received 
and stored.
 # All related offers in the registration store are searched for.
 # A subscription response is sent for each offer, including their offer hash 
(v1).
 # The offer data in the store is searched for and the offer hashes between the 
store (v2) and response event (still v1) are compared.
 # The join result is not sent since the hash is different.

 

*Potential Fix:*
Currently, the offer’s hash is computed from the serialization of deserialized 
offer data. This means that binary offer data v1 is deserialized into a v2 data 
class using a v1-compatible deserializer, and then serialized into binary v2 
data.
As a result, we are comparing a v2 hash (from the store) with a v1 hash (from 
the response event).
Is it possible to avoid this binary v1 -> data class v2 -> binary v2 step by 
directly retrieving the bytes from the store to compute the ha

Re: Apache Kafka 3.6.0 release

2023-08-04 Thread Satish Duggana
Hi,
Myself and Divij discussed and added the wiki for Kafka TieredStorage
Early Access Release[1]. If you have any comments or feedback, please
feel free to share them.

1. 
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Tiered+Storage+Early+Access+Release+Notes

Thanks,
Satish.

On Fri, 4 Aug 2023 at 08:40, Satish Duggana  wrote:
>
> Hi Chris,
> Thanks for the update. This looks to be a minor change and is also
> useful for backward compatibility. I added it to the release plan as
> an exceptional case.
>
> ~Satish.
>
> On Thu, 3 Aug 2023 at 21:34, Chris Egerton  wrote:
> >
> > Hi Satish,
> >
> > Would it be possible to include KIP-949 (
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-949%3A+Add+flag+to+enable+the+usage+of+topic+separator+in+MM2+DefaultReplicationPolicy)
> > in the 3.6.0 release? It passed voting yesterday, and is a very small,
> > low-risk change that we'd like to put out as soon as possible in order to
> > patch an accidental break in backwards compatibility caused a few versions
> > ago.
> >
> > Best,
> >
> > Chris
> >
> > On Fri, Jul 28, 2023 at 2:35 AM Satish Duggana 
> > wrote:
> >
> > > Hi All,
> > > Whoever has KIP entries in the 3.6.0 release plan. Please update it
> > > with the latest status by tomorrow(end of the day 29th Jul UTC ).
> > >
> > > Thanks
> > > Satish.
> > >
> > > On Fri, 28 Jul 2023 at 12:01, Satish Duggana 
> > > wrote:
> > > >
> > > > Thanks Ismael and Divij for the suggestions.
> > > >
> > > > One way was to follow the earlier guidelines that we set for any early
> > > > access release. It looks Ismael already mentioned the example of
> > > > KRaft.
> > > >
> > > > KIP-405 mentions upgrade/downgrade and limitations sections. We can
> > > > clarify that in the release notes for users on how this feature can be
> > > > used for early access.
> > > >
> > > > Divij, We do not want users to enable this feature on production
> > > > environments in early access release. Let us work together on the
> > > > followups Ismael suggested.
> > > >
> > > > ~Satish.
> > > >
> > > > On Fri, 28 Jul 2023 at 02:24, Divij Vaidya 
> > > wrote:
> > > > >
> > > > > Those are great suggestions, thank you. We will continue this
> > > discussion
> > > > > forward in a separate KIP for release plan for Tiered Storage.
> > > > >
> > > > > On Thu 27. Jul 2023 at 21:46, Ismael Juma  wrote:
> > > > >
> > > > > > Hi Divij,
> > > > > >
> > > > > > I think the points you bring up for discussion are all good. My main
> > > > > > feedback is that they should be discussed in the context of KIPs vs
> > > the
> > > > > > release template. That's why we have a backwards compatibility
> > > section for
> > > > > > every KIP, it's precisely to ensure we think carefully about some of
> > > the
> > > > > > points you're bringing up. When it comes to defining the meaning of
> > > early
> > > > > > access, we have two options:
> > > > > >
> > > > > > 1. Have a KIP specifically for tiered storage.
> > > > > > 2. Have a KIP to define general guidelines for what early access
> > > means.
> > > > > >
> > > > > > Does this make sense?
> > > > > >
> > > > > > Ismael
> > > > > >
> > > > > > On Thu, Jul 27, 2023 at 6:38 PM Divij Vaidya <
> > > divijvaidy...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Thank you for the response, Ismael.
> > > > > > >
> > > > > > > 1. Specifically in context of 3.6, I wanted this compatibility
> > > > > > > guarantee point to encourage a discussion on
> > > > > > >
> > > > > > >
> > > > > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-952%3A+Regenerate+segment-aligned+producer+snapshots+when+upgrading+to+a+Kafka+version+supporting+Tiered+Storage
> > > > > > > .
> > > > > > > Due to lack of producer snapshots in <2.8 versions, a customer may
> > > not
> > > > > > > be able to upgrade to 3.6 and use TS on a topic which was created
> > > when
> > > > > > > the cluster was on <2.8 version (see motivation for details). We
> > > can
> > > > > > > discuss and agree that it does not break compatibility, which is
> > > fine.
> > > > > > > But I want to ensure that we have a discussion soon on this to
> > > reach a
> > > > > > > conclusion.
> > > > > > >
> > > > > > > 2. I will start a KIP on this for further discussion.
> > > > > > >
> > > > > > > 3. In the context of 3.6, this would mean that there should be
> > > > > > > no-regression, if a user does "not" turn-on remote storage (early
> > > > > > > access feature) at a cluster level. We have some known cases (such
> > > as
> > > > > > > https://issues.apache.org/jira/browse/KAFKA-15189) which violate
> > > this
> > > > > > > compatibility requirement. Having this guarantee mentioned in the
> > > > > > > release plan will ensure that we are all in agreement with which
> > > cases
> > > > > > > are truly blockers and which aren't.
> > > > > > >
> > > > > > > 4. Fair, instead of a general goal, let me put it specifically in
> > > the
> > > > > > > context of 3.6. Let me know if this is not the r

Fwd: [DISCUSS] KIP-714: Client metrics and observability

2023-08-04 Thread Doğuşcan Namal
Hi Andrew, thanks a lot for this KIP. I was thinking of something similar
so thanks for writing this down 😊



Couple of questions related to the design:



1. Can we investigate the option for using the Kraft controllers instead of
the brokers for sending metrics? The disadvantage of sending these metrics
directly to the brokers tightly couples metric observability to data plane
availability. If the broker is unhealthy then the root cause of an incident
is clear however on partial failures it makes it hard to debug these
incidents from the brokers perspective.



2. Ratelimiting will be disable if the `PushTelemetryRequest.Terminating`
flag is set. However, this may cause unavailability on the broker if too
many clients are terminated at once, especially network threads could
become busy and introduce latency on the produce/consume on other
non-terminating clients connections. I think there is a room for
improvement here. If the client is gracefully shutting down, it could wait
for the request to be handled if it is being ratelimited, it doesn't need
to "force push" the metrics. For that reason, maybe we could define a
separate ratelimiting for telemetry data?



3. `PushIntervalMs` is set on the client side by a response from
`GetTelemetrySubscriptionsResponse`. If the broker sets this value to too
low, like 1msec, this may hog all of the clients activity and cause an
impact on the client side. I think we should introduce a configuration both
on the client and the broker side for the minimum and maximum numbers for
this value to fence out misconfigurations.



4. One of the important things I face during debugging the client side
failures is to understand the client side configurations. Can the client
sends these configs during the GetTelemetrySubscriptions request as well?



Small comments:

5. Default PushIntervalMs is 5 minutes. Can we make it 1 minute instead? I
think 5 minutes of aggregated data is too not helpful in the world of
telemetry 😊

6. UnsupportedCompressionType: Shall we fallback to non-compression mode in
that case? I think compression is nice to have, but non-compressed
telemetry data is valuable as well. Especially for low throughput clients,
compressing telemetry data may cause more CPU load then the actual data
plane work.


Thanks again.

Doguscan



> On Jun 13, 2023, at 8:06 AM, Andrew Schofield

>  wrote:

>

> Hi,

> I would like to start a new discussion thread on KIP-714: Client metrics
and

> observability.

>

>
https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability

>

> I have edited the proposal significantly to reduce the scope. The overall

> mechanism for client metric subscriptions is unchanged, but the

> KIP is now based on the existing client metrics, rather than introducing
new

> metrics. The purpose remains helping cluster operators

> investigate performance problems experienced by clients without requiring

> changes to the client application code or configuration.

>

> Thanks,

> Andrew


Re: Apache Kafka 3.6.0 release

2023-08-04 Thread Chris Egerton
Thanks for adding KIP-949, Satish!

On Fri, Aug 4, 2023 at 7:06 AM Satish Duggana 
wrote:

> Hi,
> Myself and Divij discussed and added the wiki for Kafka TieredStorage
> Early Access Release[1]. If you have any comments or feedback, please
> feel free to share them.
>
> 1.
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Tiered+Storage+Early+Access+Release+Notes
>
> Thanks,
> Satish.
>
> On Fri, 4 Aug 2023 at 08:40, Satish Duggana 
> wrote:
> >
> > Hi Chris,
> > Thanks for the update. This looks to be a minor change and is also
> > useful for backward compatibility. I added it to the release plan as
> > an exceptional case.
> >
> > ~Satish.
> >
> > On Thu, 3 Aug 2023 at 21:34, Chris Egerton 
> wrote:
> > >
> > > Hi Satish,
> > >
> > > Would it be possible to include KIP-949 (
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-949%3A+Add+flag+to+enable+the+usage+of+topic+separator+in+MM2+DefaultReplicationPolicy
> )
> > > in the 3.6.0 release? It passed voting yesterday, and is a very small,
> > > low-risk change that we'd like to put out as soon as possible in order
> to
> > > patch an accidental break in backwards compatibility caused a few
> versions
> > > ago.
> > >
> > > Best,
> > >
> > > Chris
> > >
> > > On Fri, Jul 28, 2023 at 2:35 AM Satish Duggana <
> satish.dugg...@gmail.com>
> > > wrote:
> > >
> > > > Hi All,
> > > > Whoever has KIP entries in the 3.6.0 release plan. Please update it
> > > > with the latest status by tomorrow(end of the day 29th Jul UTC ).
> > > >
> > > > Thanks
> > > > Satish.
> > > >
> > > > On Fri, 28 Jul 2023 at 12:01, Satish Duggana <
> satish.dugg...@gmail.com>
> > > > wrote:
> > > > >
> > > > > Thanks Ismael and Divij for the suggestions.
> > > > >
> > > > > One way was to follow the earlier guidelines that we set for any
> early
> > > > > access release. It looks Ismael already mentioned the example of
> > > > > KRaft.
> > > > >
> > > > > KIP-405 mentions upgrade/downgrade and limitations sections. We can
> > > > > clarify that in the release notes for users on how this feature
> can be
> > > > > used for early access.
> > > > >
> > > > > Divij, We do not want users to enable this feature on production
> > > > > environments in early access release. Let us work together on the
> > > > > followups Ismael suggested.
> > > > >
> > > > > ~Satish.
> > > > >
> > > > > On Fri, 28 Jul 2023 at 02:24, Divij Vaidya <
> divijvaidy...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > Those are great suggestions, thank you. We will continue this
> > > > discussion
> > > > > > forward in a separate KIP for release plan for Tiered Storage.
> > > > > >
> > > > > > On Thu 27. Jul 2023 at 21:46, Ismael Juma 
> wrote:
> > > > > >
> > > > > > > Hi Divij,
> > > > > > >
> > > > > > > I think the points you bring up for discussion are all good.
> My main
> > > > > > > feedback is that they should be discussed in the context of
> KIPs vs
> > > > the
> > > > > > > release template. That's why we have a backwards compatibility
> > > > section for
> > > > > > > every KIP, it's precisely to ensure we think carefully about
> some of
> > > > the
> > > > > > > points you're bringing up. When it comes to defining the
> meaning of
> > > > early
> > > > > > > access, we have two options:
> > > > > > >
> > > > > > > 1. Have a KIP specifically for tiered storage.
> > > > > > > 2. Have a KIP to define general guidelines for what early
> access
> > > > means.
> > > > > > >
> > > > > > > Does this make sense?
> > > > > > >
> > > > > > > Ismael
> > > > > > >
> > > > > > > On Thu, Jul 27, 2023 at 6:38 PM Divij Vaidya <
> > > > divijvaidy...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Thank you for the response, Ismael.
> > > > > > > >
> > > > > > > > 1. Specifically in context of 3.6, I wanted this
> compatibility
> > > > > > > > guarantee point to encourage a discussion on
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-952%3A+Regenerate+segment-aligned+producer+snapshots+when+upgrading+to+a+Kafka+version+supporting+Tiered+Storage
> > > > > > > > .
> > > > > > > > Due to lack of producer snapshots in <2.8 versions, a
> customer may
> > > > not
> > > > > > > > be able to upgrade to 3.6 and use TS on a topic which was
> created
> > > > when
> > > > > > > > the cluster was on <2.8 version (see motivation for
> details). We
> > > > can
> > > > > > > > discuss and agree that it does not break compatibility,
> which is
> > > > fine.
> > > > > > > > But I want to ensure that we have a discussion soon on this
> to
> > > > reach a
> > > > > > > > conclusion.
> > > > > > > >
> > > > > > > > 2. I will start a KIP on this for further discussion.
> > > > > > > >
> > > > > > > > 3. In the context of 3.6, this would mean that there should
> be
> > > > > > > > no-regression, if a user does "not" turn-on remote storage
> (early
> > > > > > > > access feature) at a cluster level. We have some known cases
> (such
> > > > as
> > > 

Re: [DISCUSS] KIP-714: Client metrics and observability

2023-08-04 Thread Andrew Schofield
Hi Doguscan,
Thanks for your comments. I’m glad to hear you’re interested in this KIP.

1) It is preferred that a client sends its metrics to the same broker connection
but actually it is able to send them to any broker. As a result, if a broker 
becomes
unhealthy, the client can push its metrics to any other broker. It seems to me 
that
pushing to KRaft controllers instead just has the effect of increasing the load 
on
the controllers, while still having the characteristic that an unhealthy 
controller
would present inconvenience for collecting metrics.

2) When the `PushTelemetryRequest.Terminating` flag is set, the standard request
throttling is not disabled. The metrics rate-limiting based on the push 
interval is
not applied in this case for a single request for the combination of client 
instance ID
and subscription ID.

(I have corrected the KIP text because it erroneously said “client ID and 
subscription ID”.

3) While this is a theoretical problem, I’m not keen on adding yet more 
configurations
to the broker or client. The `interval.ms` configuration on the CLIENT_METRICS
resource could perhaps have a minimum and maximum value to prevent accidental
misconfiguration.

4) One of the reasons that this KIP has taken so long to get to this stage is 
that
it tried to do many things all at once. So, it’s greatly simplified compared 
with
6 months ago. I can see the value of collecting client configurations for 
problem
determination, but I don’t want to make this KIP more complicated. I think the
idea has merit as a separate follow-on KIP. I would be happy to collaborate
with you on this.

5) The default is set to 5 minutes to minimise the load on the broker for 
situations
in which the administrator didn’t set an interval on a metrics subscription. To
use an interval of 1 minute, it is only necessary to set `interval.ms` on the 
metrics
subscription to 6ms.

6) Uncompressed data is always supported. The KIP says:
 "The CompressionType of NONE will not be
"present in the response from the broker, though the broker does support 
uncompressed
"client telemetry if none of the accepted compression codecs are supported by 
the client.”
So in your example, the client need only use CompressionType=NONE.

Thanks,
Andrew

> On 4 Aug 2023, at 14:04, Doğuşcan Namal  wrote:
>
> Hi Andrew, thanks a lot for this KIP. I was thinking of something similar
> so thanks for writing this down 😊
>
>
>
> Couple of questions related to the design:
>
>
>
> 1. Can we investigate the option for using the Kraft controllers instead of
> the brokers for sending metrics? The disadvantage of sending these metrics
> directly to the brokers tightly couples metric observability to data plane
> availability. If the broker is unhealthy then the root cause of an incident
> is clear however on partial failures it makes it hard to debug these
> incidents from the brokers perspective.
>
>
>
> 2. Ratelimiting will be disable if the `PushTelemetryRequest.Terminating`
> flag is set. However, this may cause unavailability on the broker if too
> many clients are terminated at once, especially network threads could
> become busy and introduce latency on the produce/consume on other
> non-terminating clients connections. I think there is a room for
> improvement here. If the client is gracefully shutting down, it could wait
> for the request to be handled if it is being ratelimited, it doesn't need
> to "force push" the metrics. For that reason, maybe we could define a
> separate ratelimiting for telemetry data?
>
>
>
> 3. `PushIntervalMs` is set on the client side by a response from
> `GetTelemetrySubscriptionsResponse`. If the broker sets this value to too
> low, like 1msec, this may hog all of the clients activity and cause an
> impact on the client side. I think we should introduce a configuration both
> on the client and the broker side for the minimum and maximum numbers for
> this value to fence out misconfigurations.
>
>
>
> 4. One of the important things I face during debugging the client side
> failures is to understand the client side configurations. Can the client
> sends these configs during the GetTelemetrySubscriptions request as well?
>
>
>
> Small comments:
>
> 5. Default PushIntervalMs is 5 minutes. Can we make it 1 minute instead? I
> think 5 minutes of aggregated data is too not helpful in the world of
> telemetry 😊
>
> 6. UnsupportedCompressionType: Shall we fallback to non-compression mode in
> that case? I think compression is nice to have, but non-compressed
> telemetry data is valuable as well. Especially for low throughput clients,
> compressing telemetry data may cause more CPU load then the actual data
> plane work.
>
>
> Thanks again.
>
> Doguscan
>
>
>
>> On Jun 13, 2023, at 8:06 AM, Andrew Schofield
>
>>  wrote:
>
>>
>
>> Hi,
>
>> I would like to start a new discussion thread on KIP-714: Client metrics
> and
>
>> observability.
>
>>
>
>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+

[DISCUSS] Cluster-wide disablement of Tiered Storage

2023-08-04 Thread Christo Lolov
Hello all!

I wanted to gather more opinions for
https://issues.apache.org/jira/browse/KAFKA-15267

In summary, the problem which I would like to solve is disabling TS (and
freeing the resources used by RemoteLog*Manager) because I have decided I
no longer want to use it without having to provision a whole new cluster
which just doesn't have it enabled.

My preference would be for option 4.1 without a KIP followed by option 4.2
in the future with a KIP once KIP-950 makes it in.

Please let me know your thoughts!

Best,
Christo


Jenkins build is unstable: Kafka » Kafka Branch Builder » trunk #2069

2023-08-04 Thread Apache Jenkins Server
See 




Re: [DISCUSS] Cluster-wide disablement of Tiered Storage

2023-08-04 Thread Andrew Schofield
Hi Christo,
I agree with you.

Option 4.1 without a KIP seems like an acceptable starting point for something
which will be relatively rare, provided that it’s easy for the user to get a 
list of the
topics that have to be deleted before they can successfully start the broker 
with
TS turned off.

Option 4.2 in the future with a KIP improves things later on.

Thanks,
Andrew

> On 4 Aug 2023, at 16:12, Christo Lolov  wrote:
>
> Hello all!
>
> I wanted to gather more opinions for
> https://issues.apache.org/jira/browse/KAFKA-15267
>
> In summary, the problem which I would like to solve is disabling TS (and
> freeing the resources used by RemoteLog*Manager) because I have decided I
> no longer want to use it without having to provision a whole new cluster
> which just doesn't have it enabled.
>
> My preference would be for option 4.1 without a KIP followed by option 4.2
> in the future with a KIP once KIP-950 makes it in.
>
> Please let me know your thoughts!
>
> Best,
> Christo



Re: [DISCUSS] Cluster-wide disablement of Tiered Storage

2023-08-04 Thread Divij Vaidya
Hey folks

Option 4.1 is what was posted as the proposed solution for "early
access" in the 3.6 release email thread earlier in the day today.
Please refer to the document attached to that email thread:
https://lists.apache.org/thread/9chh6h52xf2p6fdsqojy25w7k6jqlrkj and
leave your thoughts on that thread.

--
Divij Vaidya

On Fri, Aug 4, 2023 at 6:03 PM Andrew Schofield
 wrote:
>
> Hi Christo,
> I agree with you.
>
> Option 4.1 without a KIP seems like an acceptable starting point for something
> which will be relatively rare, provided that it’s easy for the user to get a 
> list of the
> topics that have to be deleted before they can successfully start the broker 
> with
> TS turned off.
>
> Option 4.2 in the future with a KIP improves things later on.
>
> Thanks,
> Andrew
>
> > On 4 Aug 2023, at 16:12, Christo Lolov  wrote:
> >
> > Hello all!
> >
> > I wanted to gather more opinions for
> > https://issues.apache.org/jira/browse/KAFKA-15267
> >
> > In summary, the problem which I would like to solve is disabling TS (and
> > freeing the resources used by RemoteLog*Manager) because I have decided I
> > no longer want to use it without having to provision a whole new cluster
> > which just doesn't have it enabled.
> >
> > My preference would be for option 4.1 without a KIP followed by option 4.2
> > in the future with a KIP once KIP-950 makes it in.
> >
> > Please let me know your thoughts!
> >
> > Best,
> > Christo
>


[jira] [Created] (KAFKA-15304) CompletableApplicationEvents aren't being completed when the consumer is closing

2023-08-04 Thread Philip Nee (Jira)
Philip Nee created KAFKA-15304:
--

 Summary: CompletableApplicationEvents aren't being completed when 
the consumer is closing
 Key: KAFKA-15304
 URL: https://issues.apache.org/jira/browse/KAFKA-15304
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Reporter: Philip Nee
Assignee: Philip Nee


If the background thread is closed before ingesting all ApplicationEvents, we 
should drain the background queue and try to cancel these events before 
closing. We can try to process these events before closing down the consumer; 
however, we assume that when the user issues a close command, the consumer 
should be shut down promptly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15305) The background thread should try to process the remaining task until the shutdown timer is expired

2023-08-04 Thread Philip Nee (Jira)
Philip Nee created KAFKA-15305:
--

 Summary: The background thread should try to process the remaining 
task until the shutdown timer is expired
 Key: KAFKA-15305
 URL: https://issues.apache.org/jira/browse/KAFKA-15305
 Project: Kafka
  Issue Type: Sub-task
  Components: consumer
Reporter: Philip Nee
Assignee: Philip Nee


While working on https://issues.apache.org/jira/browse/KAFKA-15304

close() API supplies a timeout parameter so that the consumer can have a grace 
period to process things before shutting down.  The background thread currently 
doesn't do that, when close() is initiated, it will immediately close all of 
its dependencies.

 

This might not be desirable because there could be remaining tasks to be 
processed before closing.  Maybe the correct things to do is to first stop 
accepting API request, second, let the runOnce() continue to run before the 
shutdown timer expires, then we can force closing all of its dependencies.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


RE: [DISCUSS] KIP-714: Client metrics and observability

2023-08-04 Thread Doğuşcan Namal
Hi Andrew, thanks a lot for this KIP. I was thinking of something similar
so thanks for writing this down 😊



Couple of questions related to the design:



1. Can we investigate the option for using the Kraft controllers instead of
the brokers for sending metrics? The disadvantage of sending these metrics
directly to the brokers tightly couples metric observability to data plane
availability. If the broker is unhealthy then the root cause of an incident
is clear however on partial failures it makes it hard to debug these
incidents from the brokers perspective.



2. Ratelimiting will be disable if the `PushTelemetryRequest.Terminating`
flag is set. However, this may cause unavailability on the broker if too
many clients are terminated at once, especially network threads could
become busy and introduce latency on the produce/consume on other
non-terminating clients connections. I think there is a room for
improvement here. If the client is gracefully shutting down, it could wait
for the request to be handled if it is being ratelimited, it doesn't need
to "force push" the metrics. For that reason, maybe we could define a
separate ratelimiting for telemetry data?



3. `PushIntervalMs` is set on the client side by a response from
`GetTelemetrySubscriptionsResponse`. If the broker sets this value to too
low, like 1msec, this may hog all of the clients activity and cause an
impact on the client side. I think we should introduce a configuration both
on the client and the broker side for the minimum and maximum numbers for
this value to fence out misconfigurations.



4. One of the important things I face during debugging the client side
failures is to understand the client side configurations. Can the client
sends these configs during the GetTelemetrySubscriptions request as well?



Small comments:

5. Default PushIntervalMs is 5 minutes. Can we make it 1 minute instead? I
think 5 minutes of aggregated data is too not helpful in the world of
telemetry 😊

6. UnsupportedCompressionType: Shall we fallback to non-compression mode in
that case? I think compression is nice to have, but non-compressed
telemetry data is valuable as well. Especially for low throughput clients,
compressing telemetry data may cause more CPU load then the actual data
plane work.


Thanks again.

Doguscan



> On Jun 13, 2023, at 8:06 AM, Andrew Schofield

>  wrote:

>

> Hi,

> I would like to start a new discussion thread on KIP-714: Client metrics
and

> observability.

>

>
https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability

>

> I have edited the proposal significantly to reduce the scope. The overall

> mechanism for client metric subscriptions is unchanged, but the

> KIP is now based on the existing client metrics, rather than introducing
new

> metrics. The purpose remains helping cluster operators

> investigate performance problems experienced by clients without requiring

> changes to the client application code or configuration.

>

> Thanks,

> Andrew


[jira] [Created] (KAFKA-15306) Integrate committed offsets logic when updating fetching positions

2023-08-04 Thread Lianet Magrans (Jira)
Lianet Magrans created KAFKA-15306:
--

 Summary: Integrate committed offsets logic when updating fetching 
positions
 Key: KAFKA-15306
 URL: https://issues.apache.org/jira/browse/KAFKA-15306
 Project: Kafka
  Issue Type: Task
Reporter: Lianet Magrans
Assignee: Lianet Magrans


Integrate refreshCommittedOffsets logic, currently performed by the 
coordinator, into the update fetch positions performed on every iteration of 
the consumer poll loop. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [kafka-site] mjsax opened a new pull request, #536: MINOR: update Kafka Streams state.dir doc

2023-08-04 Thread via GitHub


mjsax opened a new pull request, #536:
URL: https://github.com/apache/kafka-site/pull/536

   Default state directory was changes in 2.8.0 release (cf KAFKA-10604)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (KAFKA-15307) Kafka Streams configuration docs outdate

2023-08-04 Thread Matthias J. Sax (Jira)
Matthias J. Sax created KAFKA-15307:
---

 Summary: Kafka Streams configuration docs outdate
 Key: KAFKA-15307
 URL: https://issues.apache.org/jira/browse/KAFKA-15307
 Project: Kafka
  Issue Type: Task
  Components: docs, streams
Reporter: Matthias J. Sax


[https://kafka.apache.org/35/documentation/streams/developer-guide/config-streams.html]
 need to be updated.

It's missing a lot of newly added config, and still lists already removed 
configs.

For deprecated configs, we could consider to also remove them, or add a 
"deprecated config" section and keep the for the time being.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-962 Relax non-null key requirement in Kafka Streams

2023-08-04 Thread Matthias J. Sax

Guozhang,

thanks for pointing out ValueJoinerWithKey. In the end, it's just a 
documentation change, ie, point out that the passed in key could be 
`null` and similar?


-Matthias


On 8/2/23 3:20 PM, Guozhang Wang wrote:

Thanks Florin for the writeup,

One quick thing I'd like to bring up is that in KIP-149
(https://cwiki.apache.org/confluence/display/KAFKA/KIP-149%3A+Enabling+key+access+in+ValueTransformer%2C+ValueMapper%2C+and+ValueJoiner)
we introduced ValueJoinerWithKey which is aimed to enhance
ValueJoiner. It would have a benefit for this KIP such that
implementers can distinguish "null-key" v.s. "not-null-key but
null-value" scenarios.

Hence I'd suggest we also include the semantic changes with
ValueJoinerWithKey, which can help distinguish these two scenarios,
and also document that if users apply ValueJoiner only, they may not
have this benefit, and hence we suggest users to use the former.


Guozhang

On Mon, Jul 31, 2023 at 12:11 PM Florin Akermann
 wrote:


https://cwiki.apache.org/confluence/display/KAFKA/KIP-962%3A+Relax+non-null+key+requirement+in+Kafka+Streams


Re: [DISCUSS] KIP-955: Add stream-table join on foreign key

2023-08-04 Thread Matthias J. Sax
Thanks a lot for providing more background. It's getting much clear to 
me now.


Couple of follow up questions:



It is not possible to use table-table join in this case because triggering
events are supplied separately from the actual data entity that needs to be
"assembled" and these events could only be presented as KStream due to
their nature.


Not sure if I understand this part? Why can't those events not 
represented as a KTable. You say "could only be presented as KStream due 
to their nature" -- what do you mean by this?


In the end, my understanding is the following (using the example for the 
KIP):


For the shipments <-> orders and order-details <-> orders join, shipment 
and order-details are the fact table, what is "reverse" to what you 
want? Using existing FK join, it would mean you get two enriched tables, 
that you cannot join to each other any further (because we don't support 
n:m join): in the end, shipmentId+orderDetailId would be the PK of such 
a n:m join?


If that's correct, (just for the purpose to make sure I understand 
correctly), if we would add an n:m join, you could join shipment <-> 
order-details first, and use a FK join to enrich the result with orders. 
-- In addition, you could also do a FK join to event if you represent 
events as a table (this relates to my question from above, why events 
cannot be represented as a KTable).



A the KIP itself, I am still wondering about details: if we get an event 
in, and we do a lookup into the "FK table" and find multiple matches, 
would we emit multiple results? This would kinda defeat the purpose to 
re-assemble everything into a single entity? (And it might require an 
additional aggregation downstream to put the entity together.) -- Or 
would we join the singe event, with all found table rows, and emit a 
single "enriched" event?



Thus, I am actually wondering, if you would not pre-process both 
shipment and order-details table, via `groupBy(orderId)` and assemble a 
list (or similar) of alls shipments (or order-details) per order? If you 
do this pre-processing, you can do a PK-PK (1:1) join with the orders 
table, and also do a stream-table join to enrich your events will the 
full order information?




-Matthias

On 7/26/23 7:13 AM, Igor Fomenko wrote:

Hello Matthias,

Thank you for this response. It provides the context for a good discussion
related to the need for this new interface.

The use case I have in mind is not really a stream enrichment which usually
implies that the event has a primary key to some external info and that
external info could be just looked up in some other data source.

The pattern this KIP proposes is more akin to the data entity assembly
pattern from the persistence layer so it is not purely integration pattern
but rather a pattern that enables an event stream from persistence layer of
a data source application. The main driver here is the ability to stream a
data entity of any complexity (complexity in terms of the relational model)
from an application database to some data consumers. The technical
precondition here is of course that data is already extracted from the
relational database with something like Change Data Capture (CDC) and
placed to Kafka topics. Also due to CDC limitations, each database table
that is related to the entity relational data model is extracted to the
separate Kafka topic.

So to answer you first question the entity that needs to be "assembled"
from Kafka topics in the very common use case has 1:n relations where 1
corresponds to the triggering event enriched with the data from the main
(or parent) table of the data entity (for example completion of the
purchase order event + order data from the order table) and n corresponds
to the many children that needs to be joined with the order table to have
the full data entity (for example multiple line items of the purchase order
needs to be added from the line items child table).

It is not possible to use table-table join in this case because triggering
events are supplied separately from the actual data entity that needs to be
"assembled" and these events could only be presented as KStream due to
their nature. Also currently the FK table in table-table join is on the
"wrong" side of the join.
It is possible to use existing stream-table join only to get data from the
parent entity table (order table) because the event to order is 1:1. After
that it is required to add "children" tables of the order to complete
entity assembly, these childered are related as 1:n with foreign key fields
in each child table (which is order ID).

This use case is typically implemented with some sort of ESB (like
Mulesoft) where ESB receives an event and then uses JDBC adapter to issue
SQL query with left join on foreign key for child tables. ESB then loops
through the returned record set to assemble the full data entity. However
in many cases for various architecture reasons there is a desire to remove
JDBC queries from the data sourc

[jira] [Created] (KAFKA-15308) Wipe Stores upon OffsetOutOfRangeException in ALOS

2023-08-04 Thread Colt McNealy (Jira)
Colt McNealy created KAFKA-15308:


 Summary: Wipe Stores upon OffsetOutOfRangeException in ALOS
 Key: KAFKA-15308
 URL: https://issues.apache.org/jira/browse/KAFKA-15308
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 3.5.0, 3.4.0, 3.3.0
Reporter: Colt McNealy


As per this [Confluent Community Slack 
Thread|https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1690843733272449?thread_ts=1690663361.858559&cid=C48AHTCUQ],
 Streams currently does not wipe away RocksDB state upon encountering an 
`OffsetOutOfRangeException` in ALOS.

 

`OffsetOutOfRangeException` is a rare case that occurs when a standby task 
requests offsets that no longer exist in the topic. We should wipe the store 
for three reasons:
 # Not wiping the store can be a violation of ALOS since some of the 
now-missing offsets could have contained tombstone records.
 # Wiping the store has no performance cost since we need to replay the 
entirety of what's in the changelog topic anyways.
 # I have heard (not yet confirmed myself) that we wipe the store in EOS 
anyways, so fixing this bug could remove a bit of complexity from supporting 
EOS and ALOS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Testing FixedKeyProcessor implementation using unit tests

2023-08-04 Thread Matthias J. Sax
Thanks for filing a ticket for it: 
https://issues.apache.org/jira/browse/KAFKA-15242




On 7/14/23 1:06 AM, EXT.Zlatibor.Veljkovic wrote:

Hi Matthias,

Here's the repro of the project that has these issues 
https://github.com/zveljkovic/kafka-repro.

Please look at the:
Topology definition: 
https://github.com/zveljkovic/kafka-repro/blob/master/src/main/java/com/example/demo/DemoApplication.java
FixedKeyProcessor: 
https://github.com/zveljkovic/kafka-repro/blob/master/src/main/java/com/example/demo/MyFixedKeyProcessor.java
Test of FixedKeyProcessor: 
https://github.com/zveljkovic/kafka-repro/blob/master/src/test/java/com/example/demo/MyFixedKeyProcessorTest.java

Test is where I am having issues.

Thanks,
Zed


-Original Message-
From: Matthias J. Sax 
Sent: Tuesday, July 11, 2023 1:13 AM
To: dev@kafka.apache.org
Subject: Re: Testing FixedKeyProcessor implementation using unit tests

External email:Be careful with links and attachments


Not sure right now, but could be a bug.

Can you maybe share the full stack trace and the test program?

-Matthias

On 7/10/23 3:47 AM, EXT.Zlatibor.Veljkovic wrote:

Hi, I am using kafka-streams-test-utils and have problem with testing 
FixedKeyProcessor [KIP-820 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-820%3A+Extend+KStream+process+with+new+Processor+API#KIP820:ExtendKStreamprocesswithnewProcessorAPI-InfrastructureforFixedKeyRecords].

Using mock processor context to get the forwarded message doesn't work.

class org.apache.kafka.streams.processor.api.MockProcessorContext cannot be 
cast to class org.apache.kafka.streams.processor.api.FixedKeyProcessorContext

Anything I can do to get forwarded records?

Thanks,
Zed



[jira] [Created] (KAFKA-15309) Add custom error handler to Producer

2023-08-04 Thread Matthias J. Sax (Jira)
Matthias J. Sax created KAFKA-15309:
---

 Summary: Add custom error handler to Producer
 Key: KAFKA-15309
 URL: https://issues.apache.org/jira/browse/KAFKA-15309
 Project: Kafka
  Issue Type: New Feature
  Components: producer 
Reporter: Matthias J. Sax


The producer batches up multiple records into batches, and a single record 
specific error might fail the whole batch.

This ticket suggest to add a per-record error handler, that allows user to opt 
into skipping bad records without failing the whole batch (similar to Kafka 
Streams `ProductionExceptionHandler`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2070

2023-08-04 Thread Apache Jenkins Server
See