Request for contributor permission

2018-10-25 Thread Andrew Schofield
Please could I have contributor permission to Apache Kafka. Here are my details:

JIRA ID: schofielaj
Wiki ID: schofielaj

Thanks,
Andrew Schofield
IBM Event Streams


Re: transaction 2PC protocol

2018-12-19 Thread Andrew Schofield
Hi,
This is very similar to traditional two-phase commit. There are essentially 
multiple logs
being used - one per TopicPartition involved and the overall transaction log. 
At the point
where COMMIT is being written to the TopicPartitions, it is assumed that it 
will be possible to
write all of these without error and it is assumed that it will be possible to 
write COMMITTED
to the transaction log. Once a single COMMIT has been written, it's too late to 
abort the
transaction and everything has to go forwards to complete the commit to end up 
in a consistent
state.

When a consumer sees COMMIT for a transaction, it can deliver any messages it 
has held
buffered for that transaction. It's not really sure whether COMMITTED was 
written successfully
and fully replicated. It would generally be considered "good enough" to make 
this assumption.

Andrew Schofield
IBM Event Streams

On 18/12/2018, 21:51, "Jose Raul Perez Rodriguez" 
 wrote:

Hi all,

Reading this 

<https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-98%2B-%2BExactly%2BOnce%2BDelivery%2Band%2BTransactional%2BMessaging&data=02%7C01%7C%7Cb1d4e3c2f6b7457112a108d66532fdf5%7C84df9e7fe9f640afb435%7C1%7C0%7C636807667015030870&sdata=zcoqxussvy%2FbAUYzDg2Q2V%2BODTQfDGlFEfOqkxv42cM%3D&reserved=0>
 
document about transactions in Kafka, specifically epigraphs 5.2 
WriteTxnMarkerRequest and 5.3 Writing the final Commit or Abort Message.

What I understand  from *5.2* is that the coordinator sends a "write txt 
market request" to each topic/partition leader in the transaction, then, 
each leader broker in those topic-partition write a Commit or Abort 
message to the log, then, using this information the Consumer for that 
particular topic-partition decide to read the messages (in case of 
Commit) or to drop the message (in case of Abort).

My doubt is; those messages that passed the "write txt market request" 
phase for a particular topic-partition (the Commit cases) are just 
delivered to the user?, or hided from the user until full transaction 
confirms it Committed?, in the case is delivered to the user, there 
could be inconsistent reads, because another topic-partition could fail 
and then the full transaction needs to abort. On the other hand, if 
those messages are not delivered to user until the Consumer reads a 
Commit message for the full transaction *(5.3*), here is my second 
question; how this works?, i.e how a Consumer is aware that a particular 
message belonging to a transaction can be delivered to the user, i.e the 
transaction that owns the succeed message?

I hope is clear this explanation, I have not found this question in any doc.

Thanks in advance,





Re: transaction 2PC protocol

2018-12-19 Thread Andrew Schofield
Hi,
There is a design document for transaction support linked at the bottom of 
KIP-98 that you can read here
https://docs.google.com/document/d/11Jqy_GjUGtdXJK94XGsEIK7CP1SnQGdp2eF0wSw9ra8 
.
That describes some of the recovery/retry mechanisms. The design relies on 
partition availability
to make forward progress. If components restart, they carry on from where they 
left off.

Andrew Schofield
IBM Event Streams

On 19/12/2018, 13:37, "Jose Raul Perez Rodriguez" 
 wrote:

Hi, thanks for the answer, it was helpful.

So, if there are several topic-partitions in a transaction, the reads 
are eventually consistent; it is possible some message from that 
transaction are not available yet, until some recovery/retry mechanism 
is completed for the fail topic-partitions? If this is the case, what 
kind of recovery/retry mechanism is implemented to deal with this, and 
keep kafka transactions eventually consistent.

Thanks in advance,


On 12/19/18 2:17 PM, Andrew Schofield wrote:
> Hi,
> This is very similar to traditional two-phase commit. There are 
essentially multiple logs
> being used - one per TopicPartition involved and the overall transaction 
log. At the point
> where COMMIT is being written to the TopicPartitions, it is assumed that 
it will be possible to
> write all of these without error and it is assumed that it will be 
possible to write COMMITTED
> to the transaction log. Once a single COMMIT has been written, it's too 
late to abort the
> transaction and everything has to go forwards to complete the commit to 
end up in a consistent
> state.
>
> When a consumer sees COMMIT for a transaction, it can deliver any 
messages it has held
> buffered for that transaction. It's not really sure whether COMMITTED was 
written successfully
> and fully replicated. It would generally be considered "good enough" to 
make this assumption.
>
> Andrew Schofield
> IBM Event Streams
>
> On 18/12/2018, 21:51, "Jose Raul Perez Rodriguez" 
 wrote:
>
>  Hi all,
>  
>  Reading this
>  
<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-98%2B-%2BExactly%2BOnce%2BDelivery%2Band%2BTransactional%2BMessaging&data=02%7C01%7C%7Cf25d15fb694e498608da08d665b71564%7C84df9e7fe9f640afb435%7C1%7C0%7C636808234337757549&sdata=MtPOa%2Fi8%2B2dRmvMpwOPGAfXHvBvTZ6%2FMDqHtz7iX6vU%3D&reserved=0>
>  document about transactions in Kafka, specifically epigraphs 5.2
>  WriteTxnMarkerRequest and 5.3 Writing the final Commit or Abort 
Message.
>  
>  What I understand  from *5.2* is that the coordinator sends a "write 
txt
>  market request" to each topic/partition leader in the transaction, 
then,
>  each leader broker in those topic-partition write a Commit or Abort
>  message to the log, then, using this information the Consumer for 
that
>  particular topic-partition decide to read the messages (in case of
>  Commit) or to drop the message (in case of Abort).
>  
>  My doubt is; those messages that passed the "write txt market 
request"
>  phase for a particular topic-partition (the Commit cases) are just
>  delivered to the user?, or hided from the user until full transaction
>  confirms it Committed?, in the case is delivered to the user, there
>  could be inconsistent reads, because another topic-partition could 
fail
>  and then the full transaction needs to abort. On the other hand, if
>  those messages are not delivered to user until the Consumer reads a
>  Commit message for the full transaction *(5.3*), here is my second
>  question; how this works?, i.e how a Consumer is aware that a 
particular
>  message belonging to a transaction can be delivered to the user, i.e 
the
>  transaction that owns the succeed message?
>  
>  I hope is clear this explanation, I have not found this question in 
any doc.
>  
>  Thanks in advance,
>  
>  
>




Re: [VOTE] KIP-382 MirrorMaker 2.0

2018-12-21 Thread Andrew Schofield
+1 (non-binding)

Andrew Schofield
IBM Event Streams

On 21/12/2018, 01:23, "Srinivas Reddy"  wrote:

+1 (non binding)

Thank you Ryan for the KIP, let me know if you need support in implementing
it.

-
Srinivas

- Typed on tiny keys. pls ignore typos.{mobile app}


On Fri, 21 Dec, 2018, 08:26 Ryanne Dolan  Thanks for the votes so far!
>
> Due to recent discussions, I've removed the high-level REST API from the
> KIP.
>
> On Thu, Dec 20, 2018 at 12:42 PM Paul Davidson 
> wrote:
>
> > +1
> >
> > Would be great to see the community build on the basic approach we took
> > with Mirus. Thanks Ryanne.
> >
> > On Thu, Dec 20, 2018 at 9:01 AM Andrew Psaltis  >
> > wrote:
> >
> > > +1
> > >
> > > Really looking forward to this and to helping in any way I can. Thanks
> > for
> > > kicking this off Ryanne.
> > >
> > > On Thu, Dec 20, 2018 at 10:18 PM Andrew Otto 
> wrote:
> > >
> > > > +1
> > > >
> > > > This looks like a huge project! Wikimedia would be very excited to
> have
> > > > this. Thanks!
> > > >
> > > > On Thu, Dec 20, 2018 at 9:52 AM Ryanne Dolan 
> > > > wrote:
> > > >
> > > > > Hey y'all, please vote to adopt KIP-382 by replying +1 to this
> > thread.
> > > > >
> > > > > For your reference, here are the highlights of the proposal:
> > > > >
> > > > > - Leverages the Kafka Connect framework and ecosystem.
> > > > > - Includes both source and sink connectors.
> > > > > - Includes a high-level driver that manages connectors in a
> dedicated
> > > > > cluster.
> > > > > - High-level REST API abstracts over connectors between multiple
> > Kafka
> > > > > clusters.
> > > > > - Detects new topics, partitions.
> > > > > - Automatically syncs topic configuration between clusters.
> > > > > - Manages downstream topic ACL.
> > > > > - Supports "active/active" cluster pairs, as well as any number of
> > > active
> > > > > clusters.
> > > > > - Supports cross-data center replication, aggregation, and other
> > > complex
> > > > > topologies.
> > > > > - Provides new metrics including end-to-end replication latency
> > across
> > > > > multiple data centers/clusters.
> > > > > - Emits offsets required to migrate consumers between clusters.
> > > > > - Tooling for offset translation.
> > > > > - MirrorMaker-compatible legacy mode.
> > > > >
> > > > > Thanks, and happy holidays!
> > > > > Ryanne
> > > > >
> > > >
> > >
> >
> >
> > --
> > Paul Davidson
> > Principal Engineer, Ajna Team
> > Big Data & Monitoring
> >
>




Re: [VOTE] KIP-389: Introduce a configurable consumer group size limit

2019-01-14 Thread Andrew Schofield
+1 (non-binding)

Looks like a good improvement.

Andrew Schofield
IBM Event Streams

On 11/01/2019, 17:33, "Boyang Chen"  wrote:

+1 (non-binding)

Thanks for the great work!

Get Outlook for 
iOS<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Faka.ms%2Fo0ukef&data=02%7C01%7C%7Cb3ec0b70e2174aa2f48808d677eade50%7C84df9e7fe9f640afb435%7C1%7C0%7C636828247959240611&sdata=2aqjyi7E6tAcrz5P0OG2ZFoGuD4W2GFc8Nh5kj6Rmic%3D&reserved=0>


From: Gwen Shapira 
Sent: Friday, January 11, 2019 9:13 AM
To: dev
Subject: Re: [VOTE] KIP-389: Introduce a configurable consumer group size 
limit

+1
Thank you for driving this change!

On Fri, Jan 11, 2019, 1:09 AM Stanislav Kozlovski  Hey folks,
>
> I'd like to initiate a vote thread about KIP-389
> <
> 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BIntroduce%2Ba%2Bconfigurable%2Bconsumer%2Bgroup%2Bsize%2Blimit&data=02%7C01%7C%7Cb3ec0b70e2174aa2f48808d677eade50%7C84df9e7fe9f640afb435%7C1%7C0%7C636828247959240611&sdata=RdZyv1aMRkoFgDCvY7XcbFWV0fTegO7qKu%2BLAqgbpzs%3D&reserved=0
> >
> .
>
> --
> Best,
> Stanislav
>




[DISCUSS] KIP-419 Safely notify Kafka Connect SourceTask is stopped

2019-01-18 Thread Andrew Schofield
Hi,
I’ve created a new KIP to enhance the SourceTask interface in Kafka Connect.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-419:+Safely+notify+Kafka+Connect+SourceTask+is+stopped

Comments welcome.

Andrew Schofield
IBM Event Streams



Re: [DISCUSS] KIP-419 Safely notify Kafka Connect SourceTask is stopped

2019-01-21 Thread Andrew Schofield
Ryanne,
Thanks for your comments. I think my overarching point is that the various 
states of a SourceTask and the transitions between them seem a bit loose and 
that makes it difficult to figure out when the resources held by a SourceTask 
can be safely released. Your "I can't tell from the documentation" comment is 
key here __ Neither could I.

The problem is that stop() is a signal to stop polling. It's basically a 
request from the framework to the task and it doesn't tell the task that it's 
actually finished. One of the purposes of the KC framework is to make life easy 
for a connector developer and a nice clean "all done now" method would help.

I think I'll add a diagram to illustrate to the KIP.

Andrew Schofield
IBM Event Streams

On 18/01/2019, 19:02, "Ryanne Dolan"  wrote:

Andrew, do we know whether the SourceTask may be start()ed again? If this
is the last call to a SourceTask I suggest we call it close(). I can't tell
from the documentation.

Also, do we need this if a SourceTask can keep track of whether it was
start()ed since the last stop()?
    
    Ryanne
    

On Fri, Jan 18, 2019, 12:02 PM Andrew Schofield  Hi,
> I’ve created a new KIP to enhance the SourceTask interface in Kafka
> Connect.
>
>
> 
https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-419%3A%2BSafely%2Bnotify%2BKafka%2BConnect%2BSourceTask%2Bis%2Bstopped&data=02%7C01%7C%7C74dcfefa8a5043219ce708d67d77856b%7C84df9e7fe9f640afb435%7C1%7C0%7C636834349619907833&sdata=u37Dn5yQu8xymEc8wQ9L6upIJ91P2UG2LOsyDjFg%2BCg%3D&reserved=0
>
> Comments welcome.
>
> Andrew Schofield
> IBM Event Streams
>
>




Re: [VOTE] KIP-416: Notify SourceTask of ACK'd offsets, metadata

2019-01-21 Thread Andrew Schofield
Hi,
I'm not quite sure about the etiquette here but I wonder whether the KIP could 
be improved. I think I missed the DISCUSS thread.

I think that really your recordLogged(SourceRecord, RecordMetadata) method is 
actually a better version of commitRecord() and perhaps it ought to be an 
overload. This is similar to the situation in which the Serializer interface 
was enhanced when record headers were added.

public abstract class SourceTask implements Task {
  public void commitRecord(SourceRecord sourceRecord, RecordMetadata 
recordMetadata) {
this.commitRecord();
  }

  public void commitRecord() {
  }
}

Or something like that. I do understand that the KIP mentions that 
recordLogged() is only called for records that are actually ACKed, but it's 
very similar in intent to commitRecord() in my view.

Just my 2 cents.

Andrew Schofield
IBM Event Streams


On 17/01/2019, 23:54, "Ryanne Dolan"  wrote:

Hey y'all, please vote for KIP-416 by replying +1 to this thread.

Right now, there is no way for a SourceConnector/Task to know:

- whether a record was successfully sent to Kafka, vs filtered out or
skipped.
- the downstream offsets and metadata of sent records

KIP-416 proposes adding a recordLogged() callback for this purpose.


https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-416%253A%2BNotify%2BSourceTask%2Bof%2BACK%2527d%2Boffsets%252C%2Bmetadata&data=02%7C01%7C%7C9fa617754cce4bab7ba508d67cd7128f%7C84df9e7fe9f640afb435%7C1%7C0%7C636833660500817715&sdata=udEP27%2FrshuP5sWthvZmUIdt13whM5XqKMoia1wE93c%3D&reserved=0

Thanks!
Ryanne




Re: [VOTE] KIP-396: Add Commit/List Offsets Operations to AdminClient

2019-01-21 Thread Andrew Schofield
+1 (non-binding). Thanks for the KIP.

On 21/01/2019, 12:45, "Eno Thereska"  wrote:

+1 (non binding). Thanks.

On Mon, Jan 21, 2019 at 12:30 PM Mickael Maison 
wrote:

> Bumping this thread. Considering this KIP is relatively straigh
> forward, can we get some votes or feedback if you think it's not?
> Thanks
>
> On Tue, Jan 8, 2019 at 5:40 PM Edoardo Comar  wrote:
> >
> > +1 (non-binding)
> > Thanks Mickael!
> >
> > On Tue, 8 Jan 2019 at 17:39, Patrik Kleindl  wrote:
> >
> > > +1 (non-binding)
> > > Thanks, sounds very helpful
> > > Best regards
> > > Patrik
> > >
> > > > Am 08.01.2019 um 18:10 schrieb Mickael Maison <
> mickael.mai...@gmail.com
> > > >:
> > > >
> > > > Hi all,
> > > >
> > > > I'd like to start the vote on KIP-396:
> > > >
> > >
> 
https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fpages%2Fviewpage.action%3FpageId%3D97551484&data=02%7C01%7C%7C47c103e1919142c35d7c08d67f9e4c5d%7C84df9e7fe9f640afb435%7C1%7C0%7C636836715187389495&sdata=ihLaSXvB8C%2BK%2F%2BWjVDqKXgUJoRDmwfIi7FvFLRzmFe4%3D&reserved=0
> > > >
> > > > Thanks
> > >
> >
> >
> > --
> > "When the people fear their government, there is tyranny; when the
> > government fears the people, there is liberty." [Thomas Jefferson]
>




Re: [DISCUSS] KIP-419 Safely notify Kafka Connect SourceTask is stopped

2019-01-24 Thread Andrew Schofield
I've now added a diagram to illustrate the states of a SourceTask. The KIP is 
essentially trying to give a clear signal to SourceTask when all work has 
stopped. In particular, if a SourceTask has a session to the source system that 
it uses in poll() and commit(), it now has a safe way to release this.

Andrew Schofield
IBM Event Streams

On 21/01/2019, 10:13, "Andrew Schofield"  wrote:

Ryanne,
Thanks for your comments. I think my overarching point is that the various 
states of a SourceTask and the transitions between them seem a bit loose and 
that makes it difficult to figure out when the resources held by a SourceTask 
can be safely released. Your "I can't tell from the documentation" comment is 
key here __ Neither could I.

The problem is that stop() is a signal to stop polling. It's basically a 
request from the framework to the task and it doesn't tell the task that it's 
actually finished. One of the purposes of the KC framework is to make life easy 
for a connector developer and a nice clean "all done now" method would help.

I think I'll add a diagram to illustrate to the KIP.

Andrew Schofield
IBM Event Streams

On 18/01/2019, 19:02, "Ryanne Dolan"  wrote:

Andrew, do we know whether the SourceTask may be start()ed again? If 
this
is the last call to a SourceTask I suggest we call it close(). I can't 
tell
from the documentation.

Also, do we need this if a SourceTask can keep track of whether it was
start()ed since the last stop()?
    
    Ryanne


On Fri, Jan 18, 2019, 12:02 PM Andrew Schofield 
 Hi,
> I’ve created a new KIP to enhance the SourceTask interface in Kafka
> Connect.
>
>
> 
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-419%3A%2BSafely%2Bnotify%2BKafka%2BConnect%2BSourceTask%2Bis%2Bstopped&data=02%7C01%7C%7Cfa73e281fe0348a2740b08d67f8924b8%7C84df9e7fe9f640afb435%7C1%7C0%7C636836624328119778&sdata=v6BU3q3W4Q2RIkdWtHCCn5uCSTF%2BMAnbj%2F%2B2%2Flladco%3D&reserved=0
>
> Comments welcome.
>
> Andrew Schofield
> IBM Event Streams
>
>






Re: [DISCUSS] KIP-416: Notify SourceTask of ACK'd offsets, metadata

2019-01-31 Thread Andrew Schofield
As you might expect, I like the overloaded commitRecord() but I think the 
overloaded method should be called in exactly the same situations as the 
previous method. When it does not reflect an ACK, the second parameter could be 
null. The text of the KIP says that the overloaded method is only called when a 
record is ACKed and I would have thought that the connector implementor would 
want to provide only a single variant of commitRecord().

Andrew Schofield
IBM Event Streams

On 31/01/2019, 03:00, "Ryanne Dolan"  wrote:

I've updated the KIP and PR to overload commitRecord instead of adding a
new method. Here's the PR:


https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fkafka%2Fpull%2F6171&data=02%7C01%7C%7Cc627d954fa6f44574f7908d6872838c5%7C84df9e7fe9f640afb435%7C1%7C0%7C636845004151935856&sdata=hxBWSTt5gF7AAVxw2P8%2BZ8duBB0T97gHOOYG6GCkdd8%3D&reserved=0

Ryanne

On Mon, Jan 21, 2019 at 6:29 PM Ryanne Dolan  wrote:

> Andrew Schofield suggested we overload the commitRecord method instead of
> adding a new one. Thoughts?
>
> Ryanne
>
> On Thu, Jan 17, 2019, 5:34 PM Ryanne Dolan 
>> I had to change the KIP number (concurrency is hard!) so the link is now:
>>
>>
>> 
https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-416%253A%2BNotify%2BSourceTask%2Bof%2BACK%2527d%2Boffsets%252C%2Bmetadata&data=02%7C01%7C%7Cc627d954fa6f44574f7908d6872838c5%7C84df9e7fe9f640afb435%7C1%7C0%7C636845004151935856&sdata=VkAFrM8B2ozCRJosPQjgM3aDD1cS%2Bob8KWVuNuuOJ9s%3D&reserved=0
>>
>> Ryanne
>>
>> On Fri, Jan 11, 2019 at 2:43 PM Ryanne Dolan 
>> wrote:
>>
>>> Hey y'all,
>>>
>>> Please review the following small KIP:
>>>
>>>
>>> 
https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-414%253A%2BNotify%2BSourceTask%2Bof%2BACK%2527d%2Boffsets%252C%2Bmetadata&data=02%7C01%7C%7Cc627d954fa6f44574f7908d6872838c5%7C84df9e7fe9f640afb435%7C1%7C0%7C636845004151945855&sdata=2mhXA4hEV3ZvrFaOcTqagO1rYNj1JsYAEDHQsFqkzG8%3D&reserved=0
>>>
>>> Thanks!
>>> Ryanne
>>>
>>




Re: [VOTE] KIP-412: Extend Admin API to support dynamic application log levels

2019-02-19 Thread Andrew Schofield
Thanks for the KIP.

+1 (non-binding)

On 18/02/2019, 12:48, "Stanislav Kozlovski"  wrote:

Hey everybody, I'm starting a VOTE thread for KIP-412. This feature should
significantly improve the flexibility and ease in debugging Kafka in run
time

KIP-412 -

https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-412%253A%2BExtend%2BAdmin%2BAPI%2Bto%2Bsupport%2Bdynamic%2Bapplication%2Blog%2Blevels&data=02%7C01%7C%7C69bc63a9d7864e25ec3c08d69596eec4%7C84df9e7fe9f640afb435%7C1%7C0%7C636860872825557120&sdata=XAnMhy6EPC7JkB77NBBhLR%2FvE7XrTutuS5Rlt%2FDpwfU%3D&reserved=0


-- 
Best,
Stanislav




Re: [VOTE] 2.2.0 RC2

2019-03-21 Thread Andrew Schofield
+1 (non-binding)

- Downloaded the artifacts
- Ran Kafka Connect connectors

Thanks,
Andrew Schofield
IBM Event Streams

On 19/03/2019, 19:13, "Manikumar"  wrote:

+1 (non-binding)

- Verified the artifacts, build from src, ran tests
- Verified the quickstart, ran producer/consumer performance tests.

Thanks for running release!.

Thanks,
Manikumar

On Wed, Mar 20, 2019 at 12:19 AM David Arthur 
wrote:

> +1
>
> Validated signatures, and ran through quick-start.
>
> Thanks!
>
> On Mon, Mar 18, 2019 at 4:00 AM Jakub Scholz  wrote:
>
> > +1 (non-binding). I used the staged binaries and run some of my tests
> > against them. All seems to look good to me.
> >
> > On Sat, Mar 9, 2019 at 11:56 PM Matthias J. Sax 
> > wrote:
> >
> > > Hello Kafka users, developers and client-developers,
> > >
> > > This is the third candidate for release of Apache Kafka 2.2.0.
> > >
> > >  - Added SSL support for custom principal name
> > >  - Allow SASL connections to periodically re-authenticate
> > >  - Command line tool bin/kafka-topics.sh adds AdminClient support
> > >  - Improved consumer group management
> > >- default group.id is `null` instead of empty string
> > >  - API improvement
> > >- Producer: introduce close(Duration)
> > >- AdminClient: introduce close(Duration)
> > >- Kafka Streams: new flatTransform() operator in Streams DSL
> > >- KafkaStreams (and other classed) now implement AutoClosable to
> > > support try-with-resource
> > >- New Serdes and default method implementations
> > >  - Kafka Streams exposed internal client.id via ThreadMetadata
> > >  - Metric improvements:  All `-min`, `-avg` and `-max` metrics will 
now
> > > output `NaN` as default value
> > > Release notes for the 2.2.0 release:
> > > 
https://eur02.safelinks.protection.outlook.com/?url=https:%2F%2Fhome.apache.org%2F~mjsax%2Fkafka-2.2.0-rc2%2FRELEASE_NOTES.html&data=02%7C01%7C%7Cbc5822a806a749b0638208d6ac9ef756%7C84df9e7fe9f640afb435%7C1%7C0%7C636886196079314852&sdata=zBUbQlQiAuGZzs33TUPUqsuC8IpPavg2lT3yPFO%2F3nA%3D&reserved=0
> > >
> > > *** Please download, test, and vote by Thursday, March 14, 9am PST.
> > >
> > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > 
https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fkafka.apache.org%2FKEYS&data=02%7C01%7C%7Cbc5822a806a749b0638208d6ac9ef756%7C84df9e7fe9f640afb435%7C1%7C0%7C636886196079314852&sdata=g1Gg%2BoIRgpKUum5%2Bmi2plT1qIfH9d2aZkdK9jw7DLxM%3D&reserved=0
> > >
> > > * Release artifacts to be voted upon (source and binary):
> > > 
https://eur02.safelinks.protection.outlook.com/?url=https:%2F%2Fhome.apache.org%2F~mjsax%2Fkafka-2.2.0-rc2%2F&data=02%7C01%7C%7Cbc5822a806a749b0638208d6ac9ef756%7C84df9e7fe9f640afb435%7C1%7C0%7C636886196079324862&sdata=dUZrMCGvR4ki8XS%2B9dEDQ5Bavv4A4xq86CtcXQ6tnFs%3D&reserved=0
> > >
> > > * Maven artifacts to be voted upon:
> > > 
https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Fgroups%2Fstaging%2Forg%2Fapache%2Fkafka%2F&data=02%7C01%7C%7Cbc5822a806a749b0638208d6ac9ef756%7C84df9e7fe9f640afb435%7C1%7C0%7C636886196079324862&sdata=sCoRIXmcRQd473bRqwFgQaSm2XI%2BBqHw%2FbiddQd4hnE%3D&reserved=0
> > >
> > > * Javadoc:
> > > 
https://eur02.safelinks.protection.outlook.com/?url=https:%2F%2Fhome.apache.org%2F~mjsax%2Fkafka-2.2.0-rc2%2Fjavadoc%2F&data=02%7C01%7C%7Cbc5822a806a749b0638208d6ac9ef756%7C84df9e7fe9f640afb435%7C1%7C0%7C636886196079324862&sdata=iK4WEFuaK0lCySWROi7BbBv%2Bpg8h%2B9umbVNA7I1rqxc%3D&reserved=0
> > >
> > > * Tag to be voted upon (off 2.2 branch) is the 2.2.0 tag:
> > > 
https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fkafka%2Freleases%2Ftag%2F2.2.0-rc2&data=02%7C01%7C%7Cbc5822a806a749b0638208d6ac9ef756%7C84df9e7fe9f640afb435%7C1%7C0%7C636886196079324862&sdata=UYH6TgZ%2Fiki82Ep5kp6V8f6K6A1PMSLSmnR29OfJxmc%3D&reserved=0
> > >
> > > * Documentation:
> > > 
https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fkafka.apache.org%2F22%2Fdocumentation.html&data=02%7C01%7C%7Cbc5822a806a749b0638208d6ac9ef756%7C84df9e7fe9f640afb435%7C1%7C0%

[VOTE] KIP-419: Safely notify Kafka Connect SourceTask is stopped

2019-04-08 Thread Andrew Schofield
Hi,
I’d like to begin the voting thread for KIP-419. This is a minor KIP to add a 
new stopped() method to the SourceTask interface in Kafka Connect. Its purpose 
is to give the task a safe opportunity to clean up its resources, in the 
knowledge that this is the final call to the task.

KIP: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-419%3A+Safely+notify+Kafka+Connect+SourceTask+is+stopped
PR: https://github.com/apache/kafka/pull/6551
JIRA: https://issues.apache.org/jira/browse/KAFKA-7841

Thanks,
Andrew Schofield
IBM


Re: [VOTE] KIP-419: Safely notify Kafka Connect SourceTask is stopped

2019-04-25 Thread Andrew Schofield
I'd like to encourage some more votes on KIP-419. It's a pretty small KIP to 
make it easier to handle resource clean up in Kafka Connect SourceTasks.

Currently only +2 non-binding.

KIP: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-419%3A+Safely+notify+Kafka+Connect+SourceTask+is+stopped
PR: https://github.com/apache/kafka/pull/6551

Thanks,
Andrew Schofield
IBM Event Streams

On 15/04/2019, 15:59, "Edoardo Comar"  wrote:

Thanks Andrew.

+1 (non-binding)

--

Edoardo Comar

IBM Event Streams
IBM UK Ltd, Hursley Park, SO21 2JN




From:   Mickael Maison 
To: dev 
Date:   10/04/2019 10:14
Subject:Re: [VOTE] KIP-419: Safely notify Kafka Connect SourceTask 
is stopped



+1 (non-binding)
Thanks for the KIP!

On Mon, Apr 8, 2019 at 8:07 PM Andrew Schofield
 wrote:
>
> Hi,
> I’d like to begin the voting thread for KIP-419. This is a minor KIP to 
add a new stopped() method to the SourceTask interface in Kafka Connect. 
Its purpose is to give the task a safe opportunity to clean up its 
resources, in the knowledge that this is the final call to the task.
>
> KIP: 

https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__cwiki.apache.org_confluence_display_KAFKA_KIP-2D419-253A-2BSafely-2Bnotify-2BKafka-2BConnect-2BSourceTask-2Bis-2Bstopped%26d%3DDwIFaQ%26c%3Djf_iaSHvJObTbx-siA1ZOg%26r%3DEzRhmSah4IHsUZVekRUIINhltZK7U0OaeRo7hgW4_tQ%26m%3DvBvXztcRTgKwMpQ54ziN_GoOo0_fHSvTEMoXwQABvfs%26s%3DParyN6mWVuOGJR7kA84NOshRJA2LAK6htiD2gqf-h_M%26e&data=02%7C01%7C%7Cffb96440b207419a7fcd08d6c1b2ed17%7C84df9e7fe9f640afb435%7C1%7C0%7C636909371551269648&sdata=WJ4ZgMEIUTl83QXBIm%2Fn3ekWWabpZTIWsPbQOQGR6J8%3D&reserved=0=

> PR: 

https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__github.com_apache_kafka_pull_6551%26d%3DDwIFaQ%26c%3Djf_iaSHvJObTbx-siA1ZOg%26r%3DEzRhmSah4IHsUZVekRUIINhltZK7U0OaeRo7hgW4_tQ%26m%3DvBvXztcRTgKwMpQ54ziN_GoOo0_fHSvTEMoXwQABvfs%26s%3DR_udYap1tpd83ISv1Rh0TY6ttH6RuEIwQ0KwOFMB3zU%26e&data=02%7C01%7C%7Cffb96440b207419a7fcd08d6c1b2ed17%7C84df9e7fe9f640afb435%7C1%7C0%7C636909371551279653&sdata=4LrcZVcLG9acQm7rjZz8%2F9MO2UeKK08242TW1SSJdlE%3D&reserved=0=

> JIRA: 

https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__issues.apache.org_jira_browse_KAFKA-2D7841%26d%3DDwIFaQ%26c%3Djf_iaSHvJObTbx-siA1ZOg%26r%3DEzRhmSah4IHsUZVekRUIINhltZK7U0OaeRo7hgW4_tQ%26m%3DvBvXztcRTgKwMpQ54ziN_GoOo0_fHSvTEMoXwQABvfs%26s%3D5WqDQPU2J8yAxRXsjOgydtzJSE8yQCoB7qX0TtQyHA0%26e&data=02%7C01%7C%7Cffb96440b207419a7fcd08d6c1b2ed17%7C84df9e7fe9f640afb435%7C1%7C0%7C636909371551279653&sdata=JxRlNBP9FmuCmSVHIj6T30eT3uMPijbHi%2B%2F1QsUfA5U%3D&reserved=0=

>
> Thanks,
> Andrew Schofield
> IBM




Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU





Re: [DISCUSS] KIP-419 Safely notify Kafka Connect SourceTask is stopped

2019-05-03 Thread Andrew Schofield
Hi Vahid,
Thanks for taking a look at this KIP.

- The KIP is proposing a new interface because the existing "stop()" interface
isn't called at the end of the SourceTask's existence. It's a signal to the
task to stop, rather than a signal that it has actually stopped. Essentially, if
the task has resources to clean up, there's no clear point at which to do this 
before
this KIP. I'm trying to make it a bit easier to write a connector with this 
need.

- The "stop()" interface can be called multiple times which you can see by 
setting
breakpoints. That could be a defect in the implementation but I think it's a bit
risky changing the timing of that call because some connector somewhere might
cease working. Unlikely, but compatibility is important. Also, it's important 
that
the stop() signal is noticed and a SourceTask runs on multiple threads so it's 
tricky.
The new method is called exactly once after everything has quiesced.

- I don't disagree that a verb sounds better but couldn't really think of a more
final alternative to "stop()". That's why I went with "stopped()". Could be 
"cleanup()"
or "release()". Suggestions are welcome.

Thanks.
Andrew 

On 03/05/2019, 06:16, "Vahid Hashemian"  wrote:

Hi Andrew,

Thanks for the KIP. I'm not too familiar with the internals of KC so I hope
you can clarify a couple of things:

   - It seems the KIP is proposing a new interface because the existing
   "stop()" interface doesn't fully perform what it should ideally be doing.
   Is that a fair statement?
   - You mentioned the "stop()" interface can be called multiple times.
   Would the same thing be true for the proposed interface? Does it matter? 
Or
   there is a guard against that?
   - I also agree with Ryan that using a verb sounds more intuitive for an
   interface that's supposed to trigger some action.

Regards,
--Vahid


On Thu, Jan 24, 2019 at 9:23 AM Ryanne Dolan  wrote:

> Ah, I'm sorta wrong -- in the current implementation, restartTask()
> stops the task and starts a *new* task instance with the same task ID.
> (I'm not certain that is clear from the documentation or interfaces,
> or if that may change in the future.)
>
> Ryanne
>
> On Thu, Jan 24, 2019 at 10:25 AM Ryanne Dolan 
> wrote:
> >
> > Andrew, I believe the task can be started again with start() during the
> stopping and stopped states in your diagram.
> >
> > Ryanne
> >
> > On Thu, Jan 24, 2019, 10:20 AM Andrew Schofield <
> andrew_schofi...@live.com wrote:
> >>
> >> I've now added a diagram to illustrate the states of a SourceTask. The
> KIP is essentially trying to give a clear signal to SourceTask when all
> work has stopped. In particular, if a SourceTask has a session to the
> source system that it uses in poll() and commit(), it now has a safe way 
to
> release this.
> >>
> >> Andrew Schofield
> >> IBM Event Streams
> >>
> >> On 21/01/2019, 10:13, "Andrew Schofield" 
> wrote:
> >>
> >> Ryanne,
> >> Thanks for your comments. I think my overarching point is that the
> various states of a SourceTask and the transitions between them seem a bit
> loose and that makes it difficult to figure out when the resources held by
> a SourceTask can be safely released. Your "I can't tell from the
> documentation" comment is key here __ Neither could I.
> >>
> >> The problem is that stop() is a signal to stop polling. It's
> basically a request from the framework to the task and it doesn't tell the
> task that it's actually finished. One of the purposes of the KC framework
> is to make life easy for a connector developer and a nice clean "all done
> now" method would help.
> >>
> >> I think I'll add a diagram to illustrate to the KIP.
> >>
> >> Andrew Schofield
> >> IBM Event Streams
> >>
> >> On 18/01/2019, 19:02, "Ryanne Dolan"  wrote:
> >>
> >> Andrew, do we know whether the SourceTask may be start()ed
> again? If this
> >> is the last call to a SourceTask I suggest we call it close().
> I can't tell
> >> from the documentation.
> >>
> >> Also, do we need this if a SourceTask can keep track of whether
  

Re: [DISCUSS] KIP-419 Safely notify Kafka Connect SourceTask is stopped

2019-05-05 Thread Andrew Schofield
Hi Ryanne,
I also worked around it, but would rather not have needed to. So, I'm in the
position of creating a small KIP that I personally don't need to use right now.
But, I do have a PR and it's really simple, so I reckon it's a no-brainer to
enhance the framework in this small way so future connector implementors
can avoid the slight inconvenience of working out when it's safe to clean up
their resources.

Andrew

On 04/05/2019, 14:27, "Ryanne Dolan"  wrote:

>  That could be a defect in the implementation

That's what I am thinking. I don't think there is anything in the interface
or documentation that would imply that stop() could be called multiple
times, so I wouldn't expect any code to break if you fixed this behavior.
Quite the contrary -- I think most implementers would assume stop would be
called exactly once. Also, I don't think I've observed this behavior myself
and would be surprised if I did.

That said, I believe the problem is not that stop() is called multiple
times, but that poll() etc may still have work to do after the framework
calls stop(). This is a signal to the implementation to stop calling
poll(), not to clean up resources, per se.

I handle this by locking resources that I need to close, and then in stop()
I kick off a thread that waits for the locks to be released, at which point
I can cleanup the resources. I agree that it would be nicer if the
framework handled this for me, but the current API isn't too difficult to
work around.
    
Ryanne

On Fri, May 3, 2019, 10:45 AM Andrew Schofield 
wrote:

> Hi Vahid,
> Thanks for taking a look at this KIP.
>
> - The KIP is proposing a new interface because the existing "stop()"
> interface
> isn't called at the end of the SourceTask's existence. It's a signal to 
the
> task to stop, rather than a signal that it has actually stopped.
> Essentially, if
> the task has resources to clean up, there's no clear point at which to do
> this before
> this KIP. I'm trying to make it a bit easier to write a connector with
> this need.
>
> - The "stop()" interface can be called multiple times which you can see by
> setting
> breakpoints. That could be a defect in the implementation but I think it's
> a bit
> risky changing the timing of that call because some connector somewhere
> might
> cease working. Unlikely, but compatibility is important. Also, it's
> important that
> the stop() signal is noticed and a SourceTask runs on multiple threads so
> it's tricky.
> The new method is called exactly once after everything has quiesced.
>
> - I don't disagree that a verb sounds better but couldn't really think of
> a more
> final alternative to "stop()". That's why I went with "stopped()". Could
> be "cleanup()"
> or "release()". Suggestions are welcome.
>
> Thanks.
> Andrew
>
> On 03/05/2019, 06:16, "Vahid Hashemian" 
> wrote:
>
> Hi Andrew,
>
> Thanks for the KIP. I'm not too familiar with the internals of KC so I
> hope
> you can clarify a couple of things:
>
>- It seems the KIP is proposing a new interface because the 
existing
>"stop()" interface doesn't fully perform what it should ideally be
> doing.
>Is that a fair statement?
>- You mentioned the "stop()" interface can be called multiple 
times.
>Would the same thing be true for the proposed interface? Does it
> matter? Or
>there is a guard against that?
>- I also agree with Ryan that using a verb sounds more intuitive
> for an
>interface that's supposed to trigger some action.
>
> Regards,
> --Vahid
>
>
> On Thu, Jan 24, 2019 at 9:23 AM Ryanne Dolan 
> wrote:
>
> > Ah, I'm sorta wrong -- in the current implementation, restartTask()
> > stops the task and starts a *new* task instance with the same task
> ID.
> > (I'm not certain that is clear from the documentation or interfaces,
    > > or if that may change in the future.)
> >
> > Ryanne
> >
> > On Thu, Jan 24, 2019 at 10:25 AM Ryanne Dolan  >
> > wrote:
> > >
> > > Andrew, I believe the task can be started again with start()
> 

Re: [VOTE] KIP-419: Safely notify Kafka Connect SourceTask is stopped

2019-05-07 Thread Andrew Schofield
It should be possible to get this small KIP into Kafka 2.3 but it's proving 
hard to get some binding votes. Please could the committers with background in 
Kafka Connect take a look and vote this week.

Thanks,
Andrew Schofield

On 25/04/2019, 16:11, "Andrew Schofield"  wrote:

I'd like to encourage some more votes on KIP-419. It's a pretty small KIP 
to make it easier to handle resource clean up in Kafka Connect SourceTasks.

Currently only +2 non-binding.

KIP: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-419%3A+Safely+notify+Kafka+Connect+SourceTask+is+stopped
PR: https://github.com/apache/kafka/pull/6551
    
Thanks,
Andrew Schofield
IBM Event Streams

On 15/04/2019, 15:59, "Edoardo Comar"  wrote:

Thanks Andrew.

+1 (non-binding)

--

Edoardo Comar

IBM Event Streams
IBM UK Ltd, Hursley Park, SO21 2JN




From:   Mickael Maison 
To: dev 
Date:   10/04/2019 10:14
Subject:Re: [VOTE] KIP-419: Safely notify Kafka Connect 
SourceTask 
is stopped



+1 (non-binding)
Thanks for the KIP!

On Mon, Apr 8, 2019 at 8:07 PM Andrew Schofield
 wrote:
>
> Hi,
> I’d like to begin the voting thread for KIP-419. This is a minor KIP 
to 
add a new stopped() method to the SourceTask interface in Kafka 
Connect. 
Its purpose is to give the task a safe opportunity to clean up its 
resources, in the knowledge that this is the final call to the task.
>
> KIP: 

https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__cwiki.apache.org_confluence_display_KAFKA_KIP-2D419-253A-2BSafely-2Bnotify-2BKafka-2BConnect-2BSourceTask-2Bis-2Bstopped%26d%3DDwIFaQ%26c%3Djf_iaSHvJObTbx-siA1ZOg%26r%3DEzRhmSah4IHsUZVekRUIINhltZK7U0OaeRo7hgW4_tQ%26m%3DvBvXztcRTgKwMpQ54ziN_GoOo0_fHSvTEMoXwQABvfs%26s%3DParyN6mWVuOGJR7kA84NOshRJA2LAK6htiD2gqf-h_M%26e&data=02%7C01%7C%7Cffb96440b207419a7fcd08d6c1b2ed17%7C84df9e7fe9f640afb435%7C1%7C0%7C636909371551269648&sdata=WJ4ZgMEIUTl83QXBIm%2Fn3ekWWabpZTIWsPbQOQGR6J8%3D&reserved=0=

> PR: 

https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__github.com_apache_kafka_pull_6551%26d%3DDwIFaQ%26c%3Djf_iaSHvJObTbx-siA1ZOg%26r%3DEzRhmSah4IHsUZVekRUIINhltZK7U0OaeRo7hgW4_tQ%26m%3DvBvXztcRTgKwMpQ54ziN_GoOo0_fHSvTEMoXwQABvfs%26s%3DR_udYap1tpd83ISv1Rh0TY6ttH6RuEIwQ0KwOFMB3zU%26e&data=02%7C01%7C%7Cffb96440b207419a7fcd08d6c1b2ed17%7C84df9e7fe9f640afb435%7C1%7C0%7C636909371551279653&sdata=4LrcZVcLG9acQm7rjZz8%2F9MO2UeKK08242TW1SSJdlE%3D&reserved=0=

> JIRA: 

https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__issues.apache.org_jira_browse_KAFKA-2D7841%26d%3DDwIFaQ%26c%3Djf_iaSHvJObTbx-siA1ZOg%26r%3DEzRhmSah4IHsUZVekRUIINhltZK7U0OaeRo7hgW4_tQ%26m%3DvBvXztcRTgKwMpQ54ziN_GoOo0_fHSvTEMoXwQABvfs%26s%3D5WqDQPU2J8yAxRXsjOgydtzJSE8yQCoB7qX0TtQyHA0%26e&data=02%7C01%7C%7Cffb96440b207419a7fcd08d6c1b2ed17%7C84df9e7fe9f640afb435%7C1%7C0%7C636909371551279653&sdata=JxRlNBP9FmuCmSVHIj6T30eT3uMPijbHi%2B%2F1QsUfA5U%3D&reserved=0=

>
> Thanks,
> Andrew Schofield
> IBM




Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with 
number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 
3AU







Re: [VOTE] KIP-440: Extend Connect Converter to support headers

2019-05-09 Thread Andrew Schofield
+1 (non-binding).

Looks good.

On 09/05/2019, 15:55, "Gwen Shapira"  wrote:

+1 (binding)
Thank you!

On Mon, May 6, 2019, 11:25 PM Yaroslav Tkachenko  wrote:

> Hi everyone,
>
> I'd like to start a vote for KIP-440: Extend Connect Converter to support
> headers (
>
> 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-440%253A%2BExtend%2BConnect%2BConverter%2Bto%2Bsupport%2Bheaders&data=02%7C01%7C%7C82864a9a5f3049e8ca1d08d6d48e5147%7C84df9e7fe9f640afb435%7C1%7C0%7C636930105039335723&sdata=FyQT5pfTwtT2NTMStgSl6zRjiaWFVJqG3%2B4vt5nRYmI%3D&reserved=0
> )
>
> Discussion:
>
> 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Fthread.html%2F1fc1e3d2cddd8311d3db7c98f0d09a1a137ca4b20d1f3c8ab203a855%40%253Cdev.kafka.apache.org%253E&data=02%7C01%7C%7C82864a9a5f3049e8ca1d08d6d48e5147%7C84df9e7fe9f640afb435%7C1%7C0%7C636930105039335723&sdata=B2a4Mx9ScWaO3HEEw0LoIRX0ajETAcwUmDxt5Ir5FIs%3D&reserved=0
>
> Thanks!
>




Re: [VOTE] KIP-475: New Metric to Measure Number of Tasks on a Connector

2019-06-05 Thread Andrew Schofield
+1 (non-binding)

Andrew Schofield

On 05/06/2019, 14:04, "Ryanne Dolan"  wrote:

+1 (non-binding)

Thanks
Ryanne

On Tue, Jun 4, 2019, 11:29 PM Cyrus Vafadari  wrote:

> Hi all,
>
> Like like to start voting in the following KIP:
>
> 
https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-475%253A%2BNew%2BMetric%2Bto%2BMeasure%2BNumber%2Bof%2BTasks%2Bon%2Ba%2BConnector&data=02%7C01%7C%7C95f8a8ebb4a44882773808d6e9b65983%7C84df9e7fe9f640afb435%7C1%7C0%7C636953366722392496&sdata=vbE%2BjrAapcQ68Vnwh5OkY1FFoOzFHs9rZRaPHlwqxSU%3D&reserved=0
>
> Discussion thread:
>
> 
https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Fthread.html%2Fbf7c92224aa798336c14d7e96ec8f2e3406c61879ec381a50652acfe%40%253Cdev.kafka.apache.org%253E&data=02%7C01%7C%7C95f8a8ebb4a44882773808d6e9b65983%7C84df9e7fe9f640afb435%7C1%7C0%7C636953366722402501&sdata=0JpQuCpTKwJyOjWH8cM%2B6eU%2FjNT28eE7xvMOBQgghjA%3D&reserved=0
>
> Thanks!
>
> Cyrus
>




Re: [VOTE] KIP-434: Dead replica fetcher and log cleaner metrics

2019-06-06 Thread Andrew Schofield
+1 (non-binding)

Andrew

On 06/06/2019, 15:15, "Ryanne Dolan"  wrote:

+1 (non-binding)

Thanks
Ryanne

On Wed, Jun 5, 2019, 9:31 PM Satish Duggana 
wrote:

> Thanks Viktor, proposed metrics are really useful to monitor replication
> status on brokers.
>
> +1 (non-binding)
>
> On Thu, Jun 6, 2019 at 2:05 AM Colin McCabe  wrote:
>
> > +1 (binding)
> >
> > best,
> > Colin
> >
> >
> > On Wed, Jun 5, 2019, at 03:38, Viktor Somogyi-Vass wrote:
> > > Hi Folks,
> > >
> > > This vote sunk a bit, I'd like to draw some attention to this again in
> > the
> > > hope I get some feedback or votes.
> > >
> > > Thanks,
> > > Viktor
> > >
> > > On Tue, May 7, 2019 at 4:28 PM Harsha  wrote:
> > >
> > > > Thanks for the kip. LGTM +1.
> > > >
> > > > -Harsha
> > > >
> > > > On Mon, Apr 29, 2019, at 8:14 AM, Viktor Somogyi-Vass wrote:
> > > > > Hi Jason,
> > > > >
> > > > > I too agree this is more of a problem in older versions and
> > therefore we
> > > > > could backport it. Were you thinking of any specific versions? I
> > guess
> > > > the
> > > > > 2.x and 1.x versions are definitely targets here but I was 
thinking
> > that
> > > > we
> > > > > might not want to further.
> > > > >
> > > > > Viktor
> > > > >
> > > > > On Mon, Apr 29, 2019 at 12:55 AM Stanislav Kozlovski <
> > > > stanis...@confluent.io>
> > > > > wrote:
> > > > >
> > > > > > Thanks for the work done, Viktor! +1 (non-binding)
> > > > > >
> > > > > > I strongly agree with Jason that this monitoring-focused KIP is
> > worth
> > > > > > porting back to older versions. I am sure users will find it 
very
> > > > useful
> > > > > >
> > > > > > Best,
> > > > > > Stanislav
> > > > > >
> > > > > > On Fri, Apr 26, 2019 at 9:38 PM Jason Gustafson <
> > ja...@confluent.io>
> > > > > > wrote:
> > > > > >
> > > > > > > Thanks, that works for me. +1
> > > > > > >
> > > > > > > By the way, we don't normally port KIPs to older releases, but
> I
> > > > wonder
> > > > > > if
> > > > > > > it's worth making an exception here. From recent experience, 
it
> > > > tends to
> > > > > > be
> > > > > > > the older versions that are more prone to fetcher failures.
> > Thoughts?
> > > > > > >
> > > > > > > -Jason
> > > > > > >
> > > > > > > On Fri, Apr 26, 2019 at 5:18 AM Viktor Somogyi-Vass <
> > > > > > > viktorsomo...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Let me have a second thought, I'll just add the clientId
> > instead to
> > > > > > > follow
> > > > > > > > the convention, so it'll change DeadFetcherThreadCount but
> > with the
> > > > > > > > clientId tag.
> > > > > > > >
> > > > > > > > On Fri, Apr 26, 2019 at 11:29 AM Viktor Somogyi-Vass <
> > > > > > > > viktorsomo...@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > Hi Jason,
> > > > > > > > >
> > > > > > > > > Yea I think it could make sense. In this case I would
> rename
> > the
> > > > > > > > > DeadFetcherThreadCount to DeadReplicaFetcherThreadCount 
and
> > > > introduce
> > > > > > > the
> > > > > > > > > metric you're referring to as 
DeadLogDirFetcherThreadCount.
> > > > > > > > > I'll update the KIP to reflect this.
> > > > > > > > >
> > > > > > > > > Viktor
> > > > > > > > >
> > > > > > > > > On Thu, Apr 25, 2019 at 8:07 PM Jason Gustafson <
> > > > ja...@confluent.io>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> Hi Viktor,
> > > > > > > > >>
> > > > > > > > >> This looks good. Just one question I had is whether we 
may
> > as
> > > > well
> > > > > > > cover
> > > > > > > > >> the log dir fetchers as well.
> > > > > > > > >>
> > > > > > > > >> Thanks,
> > > > > > > > >> Jason
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> On Thu, Apr 25, 2019 at 7:46 AM Viktor Somogyi-Vass <
> > > > > > > > >> viktorsomo...@gmail.com>
> > > > > > > > >> wrote:
> > > > > > > > >>
> > > > > > > > >> > Hi Folks,
> > > > > > > > >> >
> > > > > > > > >> > This thread sunk a bit but I'd like to bump it hoping 
to
> > get
> > > > some
> > > > > > > > >> feedback
> > > > > > > > >> > and/or votes.
> > > > > > > > >> >
> > > > > > > > >> > Thanks,
> > > > > > > > >> > Viktor
> > > > > > > > >> >
> > > > > > > > >> > On Thu, Mar 28, 2019 at 8:47 PM Viktor Somogyi-Vass <
> > > > > > > > >> > viktorsomo...@gmail.com>
> > > > > > > > >> > wrote:
> > > > > > > > >> >
> > > > > > > > >> > > Sorry, the end of the message cut off.
> > > > > > > > >> > >
> > > > > > > > >> > > So I tried to be consistent

Re: [VOTE] KIP-475: New Metric to Measure Number of Tasks on a Connector

2019-06-23 Thread Andrew Schofield
add 2
> additional
> > >>> > > statuses analogous to the 3 proposed, it is a very minor change
> of
> > no
> > >>> > > structural consequence to the KIP.
> > >>> > >
> > >>> > > I've updated the KIP to incorporate your suggestion, and any
> voters
> > >>> who
> > >>> > > disagree should definitely respond in the thread.
> > >>> > >
> > >>> > > Cyrus
> > >>> > >
> > >>> > > On Thu, Jun 6, 2019 at 11:16 AM Konstantine Karantasis <
> > >>> > > konstant...@confluent.io> wrote:
> > >>> > >
> > >>> > > > Thanks Cyrus,
> > >>> > > >
> > >>> > > > this is a nice and straightforward addition.
> > >>> > > >
> > >>> > > > I'm +1 too, but I'd like to return with a question here as 
well
> > >>> > regarding
> > >>> > > > whether the unassigned tasks will be taken into account.
> > >>> > > > Especially after KIP-415 we might start seeing this status for
> > >>> specific
> > >>> > > > periods of time. Therefore, I think it's a meaningful 
addition.
> > >>> > > > Then there's the `destroyed` status which might be a lot more
> > >>> > transient but
> > >>> > > > we could also include for the sake of completion.
> > >>> > > > Check org.apache.kafka.connect.runtime.AbstractStatus for the
> > list
> > >>> of
> > >>> > all
> > >>> > > > possible statuses.
> > >>> > > >
> > >>> > > > Konstantine
> > >>> > > >
> > >>> > > > On Wed, Jun 5, 2019 at 4:32 PM Randall Hauch  >
> > >>> wrote:
> > >>> > > >
> > >>> > > > > Thanks, Cyrus.
> > >>> > > > >
> > >>> > > > > +1 (binding)
> > >>> > > > >
> > >>> > > > > Randall Hauch
> > >>> > > > >
> > >>> > > > > On Wed, Jun 5, 2019 at 10:36 AM Andrew Schofield <
> > >>> > > > > andrew_schofi...@live.com>
> > >>> > > > > wrote:
> > >>> > > > >
> > >>> > > > > > +1 (non-binding)
> > >>> > > > > >
> > >>> > > > > > Andrew Schofield
> > >>> > > > > >
> > >>> > > > > > On 05/06/2019, 14:04, "Ryanne Dolan" <
> ryannedo...@gmail.com
> > >
> > >>> > wrote:
> > >>> > > > > >
> > >>> > > > > > +1 (non-binding)
> > >>> > > > > >
> > >>> > > > > > Thanks
> > >>> > > > > > Ryanne
> > >>> > > > > >
> > >>> > > > > > On Tue, Jun 4, 2019, 11:29 PM Cyrus Vafadari <
> > >>> > cy...@confluent.io>
> > >>> > > > > > wrote:
> > >>> > > > > >
> > >>> > > > > > > Hi all,
> > >>> > > > > > >
> > >>> > > > > > > Like like to start voting in the following KIP:
> > >>> > > > > > >
> > >>> > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> >
> > >>>
> >
> 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-475%253A%2BNew%2BMetric%2Bto%2BMeasure%2BNumber%2Bof%2BTasks%2Bon%2Ba%2BConnector&data=02%7C01%7C%7C5613cc7be8084f15b82a08d6f7f7a226%7C84df9e7fe9f640afb435%7C1%7C0%7C636969040283157621&sdata=DOweIjXCIxA%2Bo8V2uKmR7wPzR4Nceoph4%2B4BKotfTuI%3D&reserved=0
> > >>> > > > > > >
> > >>> > > > > > > Discussion thread:
> > >>> > > > > > >
> > >>> > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> >
> > >>>
> >
> 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Fthread.html%2Fbf7c92224aa798336c14d7e96ec8f2e3406c61879ec381a50652acfe%40%253Cdev.kafka.apache.org%253E&data=02%7C01%7C%7C5613cc7be8084f15b82a08d6f7f7a226%7C84df9e7fe9f640afb435%7C1%7C0%7C636969040283157621&sdata=l9Te5p3evVOIVnSASAgThB5F1YEpo%2B1pMwkC7Nauyz4%3D&reserved=0
> > >>> > > > > > >
> > >>> > > > > > > Thanks!
> > >>> > > > > > >
> > >>> > > > > > > Cyrus
> > >>> > > > > > >
> > >>> > > > > >
> > >>> > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> >
> > >>> >
> > >>> >
> > >>> > --
> > >>> > Gwen Shapira
> > >>> > Product Manager | Confluent
> > >>> > 650.450.2760 | @gwenshap
> > >>> > Follow us: Twitter | blog
> > >>> >
> > >>>
> > >>>
> > >>> --
> > >>> -- Guozhang
> > >>>
> > >>
> >
>
>
> --
> -- Guozhang
>




Re: [VOTE] KIP-419: Safely notify Kafka Connect SourceTask is stopped

2019-08-05 Thread Andrew Schofield
Hi,
I'd like to have a final try at getting some voting on this KIP. I'd like to 
get it into Kafka 2.4 so get your votes in please.

Currently only +2 non-binding votes.

KIP: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-419%3A+Safely+notify+Kafka+Connect+SourceTask+is+stopped
PR: https://github.com/apache/kafka/pull/6551

The idea is that source connectors are multi-threaded and the signal to stop a 
running connector can be delivered while messages are being requested from the 
source system. It's not rocket science to handle clean up correctly in a 
connector, but you might need to create a thread to wait for activity to 
quieten down so you can do it safely. This KIP just gives a way to be called by 
the KC framework when the connector has properly quiesced. Makes the connector 
code a bit simpler, and that's particularly helpful for someone just knocking 
up a simple connector without worrying about thread management.

Thanks,
Andrew Schofield

On 25/04/2019, 16:11, "Andrew Schofield"  wrote:

I'd like to encourage some more votes on KIP-419. It's a pretty small KIP 
to make it easier to handle resource clean up in Kafka Connect SourceTasks.

Currently only +2 non-binding.

KIP: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-419%3A+Safely+notify+Kafka+Connect+SourceTask+is+stopped
PR: https://github.com/apache/kafka/pull/6551

Thanks,
Andrew Schofield
IBM Event Streams

On 15/04/2019, 15:59, "Edoardo Comar"  wrote:

Thanks Andrew.

+1 (non-binding)

--

Edoardo Comar

IBM Event Streams
IBM UK Ltd, Hursley Park, SO21 2JN




From:   Mickael Maison 
To: dev 
Date:   10/04/2019 10:14
Subject:Re: [VOTE] KIP-419: Safely notify Kafka Connect 
SourceTask 
is stopped



+1 (non-binding)
Thanks for the KIP!
    
    On Mon, Apr 8, 2019 at 8:07 PM Andrew Schofield
 wrote:
>
> Hi,
> I’d like to begin the voting thread for KIP-419. This is a minor KIP 
to 
add a new stopped() method to the SourceTask interface in Kafka 
Connect. 
Its purpose is to give the task a safe opportunity to clean up its 
resources, in the knowledge that this is the final call to the task.
>
> KIP: 

https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__cwiki.apache.org_confluence_display_KAFKA_KIP-2D419-253A-2BSafely-2Bnotify-2BKafka-2BConnect-2BSourceTask-2Bis-2Bstopped%26d%3DDwIFaQ%26c%3Djf_iaSHvJObTbx-siA1ZOg%26r%3DEzRhmSah4IHsUZVekRUIINhltZK7U0OaeRo7hgW4_tQ%26m%3DvBvXztcRTgKwMpQ54ziN_GoOo0_fHSvTEMoXwQABvfs%26s%3DParyN6mWVuOGJR7kA84NOshRJA2LAK6htiD2gqf-h_M%26e&data=02%7C01%7C%7Cffb96440b207419a7fcd08d6c1b2ed17%7C84df9e7fe9f640afb435%7C1%7C0%7C636909371551269648&sdata=WJ4ZgMEIUTl83QXBIm%2Fn3ekWWabpZTIWsPbQOQGR6J8%3D&reserved=0=

> PR: 

https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__github.com_apache_kafka_pull_6551%26d%3DDwIFaQ%26c%3Djf_iaSHvJObTbx-siA1ZOg%26r%3DEzRhmSah4IHsUZVekRUIINhltZK7U0OaeRo7hgW4_tQ%26m%3DvBvXztcRTgKwMpQ54ziN_GoOo0_fHSvTEMoXwQABvfs%26s%3DR_udYap1tpd83ISv1Rh0TY6ttH6RuEIwQ0KwOFMB3zU%26e&data=02%7C01%7C%7Cffb96440b207419a7fcd08d6c1b2ed17%7C84df9e7fe9f640afb435%7C1%7C0%7C636909371551279653&sdata=4LrcZVcLG9acQm7rjZz8%2F9MO2UeKK08242TW1SSJdlE%3D&reserved=0=

> JIRA: 

https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__issues.apache.org_jira_browse_KAFKA-2D7841%26d%3DDwIFaQ%26c%3Djf_iaSHvJObTbx-siA1ZOg%26r%3DEzRhmSah4IHsUZVekRUIINhltZK7U0OaeRo7hgW4_tQ%26m%3DvBvXztcRTgKwMpQ54ziN_GoOo0_fHSvTEMoXwQABvfs%26s%3D5WqDQPU2J8yAxRXsjOgydtzJSE8yQCoB7qX0TtQyHA0%26e&data=02%7C01%7C%7Cffb96440b207419a7fcd08d6c1b2ed17%7C84df9e7fe9f640afb435%7C1%7C0%7C636909371551279653&sdata=JxRlNBP9FmuCmSVHIj6T30eT3uMPijbHi%2B%2F1QsUfA5U%3D&reserved=0=

>
> Thanks,
> Andrew Schofield
> IBM




Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with 
number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 
3AU







Re: [DISCUSS] KIP-317: Transparent Data Encryption

2019-08-07 Thread Andrew Schofield
Hi,
I think this is a useful KIP and it looks good in principle. While it can all 
be done using
interceptors, if the brokers do not know anything about it, you need to 
maintain the
mapping from topics to key ids somewhere external. I'd prefer the way you've 
done it.

I'm not sure whether you'll manage to interest any committers in volunteering an
opinion, and you'll need that before you can get the KIP accepted into Kafka.

Thanks,
Andrew Schofield (IBM)

On 06/08/2019, 15:46, "Sönke Liebau"  
wrote:

Hi,

I have so far received pretty much no comments on the technical details
outlined in the KIP. While I am happy to continue with my own ideas of how
to implement this, I would much prefer to at least get a very broad "looks
good in principle, but still lots to flesh out" from a few people before I
but more work into this.

Best regards,
Sönke




On Tue, 21 May 2019 at 14:15, Sönke Liebau 
wrote:

> Hi everybody,
>
> I'd like to rekindle the discussion around KIP-317.
> I have reworked the KIP a little bit in order to design everything as a
> pluggable implementation. During the course of that work I've also decided
> to rename the KIP, as encryption will only be transparent in some cases. 
It
> is now called "Add end to end data encryption functionality to Apache
> Kafka" [1].
>
> I'd very much appreciate it if you could give the KIP a quick read. This
> is not at this point a fully fleshed out design, as I would like to agree
> on the underlying structure that I came up with first, before spending 
time
> on details.
>
> TL/DR is:
> Create three pluggable classes:
> KeyManager runs on the broker and manages which keys to use, key rollover
> etc
> KeyProvider runs on the client and retrieves keys based on what the
> KeyManager tells it
> EncryptionEngine runs on the client andhandles the actual encryption
> First idea of control flow between these components can be seen at [2]
>
> Please let me know any thoughts or concerns that you may have!
>
> Best regards,
> Sönke
>
> [1]
> 
https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-317%253A%2BAdd%2Bend-to-end%2Bdata%2Bencryption%2Bfunctionality%2Bto%2BApache%2BKafka&data=02%7C01%7C%7Cc858aa722cc9434ba98d08d71a7cd547%7C84df9e7fe9f640afb435%7C1%7C0%7C637006995760557724&sdata=GwcvmfILdjTZBxOseHR4IjUY0oMG3%2BKEjFNHo3pJlvc%3D&reserved=0
> [2]
> 
https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdownload%2Fattachments%2F85479936%2Fkafka_e2e-encryption_control-flow.png%3Fversion%3D1%26modificationDate%3D1558439227551%26api%3Dv2&data=02%7C01%7C%7Cc858aa722cc9434ba98d08d71a7cd547%7C84df9e7fe9f640afb435%7C1%7C0%7C637006995760557724&sdata=FcMoNEliLn48OZfWca1TCQv%2BiIlRNqJNQvU52UfkbEs%3D&reserved=0
>
>
>
> On Fri, 10 Aug 2018 at 14:05, Sönke Liebau 
> wrote:
>
>> Hi Viktor,
>>
>> thanks for your input! We could accommodate magic headers by removing any
>> known fixed bytes pre-encryption, sticking them in a header field and
>> prepending them after decryption. However, I am not sure whether this is
>> actually necessary, as most modern (AES for sure) algorithms are 
considered
>> to be resistant to known-plaintext types of attack. Even if the entire
>> plaintext is known to the attacker he still needs to brute-force the key 
-
>> which may take a while.
>>
>> Something different to consider in this context are compression
>> sidechannel attacks like CRIME or BREACH, which may be relevant depending
>> on what type of data is being sent through Kafka. Both these attacks 
depend
>> on the encrypted record containing a combination of secret and user
>> controlled data.
>> For example if Kafka was used to forward data that the user entered on a
>> website along with a secret API key that the website adds to a back-end
>> server and the user can obtain the Kafka messages, these attacks would
>> become relevant. Not much we can do about that except disallow encryption
>> when compression is enabled (TLS chose this approach in version 1.3)
>>
>> I agree with you, that we definitely need to clearly document any risks
>> and how much security can reasonably be expected in any given scenario. 
We
>> might even consider logging a warning message when sending data that is
>&

Re: [VOTE] KIP-499 - Unify connection name flag for command line tool

2019-08-09 Thread Andrew Schofield
+1 (non-binding)

On 09/08/2019, 08:39, "Sönke Liebau"  wrote:

+1 (non-binding)



On Fri, 9 Aug 2019 at 04:45, Harsha Chintalapani  wrote:

> +1  (binding). much needed!!
>
>
> On Thu, Aug 08, 2019 at 6:43 PM, Gwen Shapira  wrote:
>
> > +1 (binding) THANK YOU. It would be +100 if I could.
> >
> > On Thu, Aug 8, 2019 at 6:37 PM Mitchell  wrote:
> >
> > Hello Dev,
> > After the discussion I would like to start the vote for KIP-499
> >
> > The following command line tools will have the `--bootstrap-server`
> > command line argument added: kafka-console-producer.sh,
> > kafka-consumer-groups.sh, kafka-consumer-perf-test.sh,
> > kafka-verifiable-consumer.sh, kafka-verifiable-producer.sh
> >
> > Thanks,
> > -Mitch
> >
> > --
> > Gwen Shapira
> > Product Manager | Confluent
> > 650.450.2760 | @gwenshap
> > Follow us: Twitter | blog
> >
>


-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany




Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

2016-12-09 Thread Andrew Schofield
I've been pondering this question of coordinating other resource managers with
Kafka transactions for a while and I'm not convinced it's a good idea. My
reservations come down to the guarantees that it would provide in failure
scenarios.

I don't think KIP-98 gives proper ACID semantics in the presence of all
failures. For a transaction which contains a mixture of publishes and offset
updates, a bunch of topics are involved and it appears to me that an
uncontrolled shutdown could result in some but not all of the lazy writes
making it to disk.

Here are some of the failures that I'm worried about:

* A message is published to a topic which crashes the leader Kafka node, as
  it's replicated across the cluster, it crashes all of the other Kafka nodes
  (we've really had this - SEGV, our fault and we've fixed it, but it happened)
  so this is a kind of rolling node crash in a cluster
* Out of memory error in one or more Kafka nodes
* Disk fills in one or more Kafka nodes
* Uncontrolled power-off to all nodes in the cluster

Does KIP-98 guarantee atomicity for transactions in all of these cases?
Unless all of the topics involved in a transaction are recovered to the
same point in time, you can't consider a transaction to be properly atomic.
If application code is designed expecting atomicity, there are going to be
tears. Perhaps only when disaster strikes, but the risk is there.

I think KIP-98 is interesting, but I wouldn't equate what it provides
with the transactions provided by resource managers with traditional transaction
logging. It's not better or worse, just different. If you tried to migrate
from a previous transactional system to Kafka transactions, I think you'd
better have procedures for reconciliation with the other resource managers.
Better still, don't build applications that are so fragile. The principle
of dumb pipes and smart endpoints is good in my view.

If you're creating a global transaction containing two or more resource
managers and using two-phase commit, it's very important that all of the
resource managers maintain a coherent view of the sequence of events. If any
part fails due to a software or hardware failure, once the system is
recovered, nothing must be forgotten. If you look at how presume-abort
works, you'll see how important this is.

Kafka doesn't really fit very nicely in this kind of environment because of
the way that it writes lazily to disk. The theory is that you must avoid at all
costs having an uncontrolled shutdown of an entire cluster because you'll lose
a little data at the end of the logs. So, if you are coordinating Kafka and a
relational database in a global transaction, it's theoretically possible that
a crashed Kafka would be a little forgetful while a crashed database would not.
The database would be an order of magnitude or more slower because of the way
its recovery logs are handled, but it would not be forgetful in the same way.

You get exactly the same kind of worries when you implement some kind of
asynchronous replication for disaster recovery, even if all of the resource
managers force all of their log writes to disk eagerly. The replica at the DR
site is slightly behind the primary site, so if you have to recover from an
outage and switch to the DR site, it can be considered to be slightly forgetful
about the last few moments before the outage. This is why a DR plan usually has
some concept of compensation or reconciliation to make good any forgotten work.

In summary, I think Kafka would have to change in ways which would negate many
of its good points in order to support XA transactions. It would be better to
design applications to be resilient to message duplication and loss, rather
than tightly coupling resource managers and ending up with something fragile.

Don't get me wrong. This is not an anti-Kafka rant. I just work with people
used to traditional transactional systems, making use of Kafka for business
applications, and it's important to understand the concepts on both sides
and make sure your assumptions are valid.

Andrew Schofield
IBM Watson and Cloud Platform


> From: Michael Pearce 
> Sent: 09 December 2016 06:19
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional 
> Messaging
>     
> Usecase in IG:
> 
> Fund transfer between accounts. When we debit one account and fund another we 
> must ensure the records to both occur > as an acid action, and as a single 
> transaction.
> 
> Today we achieve this because we have jms, as such we can do the actions 
> needed in an xa transaction across both the > accounts. To move this flow to 
> Kafka we would need support of XA transaction.
> 
> 
> 
> Sent using OWA for iPhone
> 
> From: Michael Pearce 
> Sent: Friday, December 9, 2

Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

2016-12-12 Thread Andrew Schofield
Guozhang,
Exactly. This is the crux of the matter. Because it's async, the log is 
basically
slightly out of date wrt to the run-time state and a failure of all replicas 
might
take the data slightly back in time.

Given this, do you think that KIP-98 gives an all-or-nothing, no-matter-what 
guarantee
for Kafka transactions? I think the key is whether the data which is 
asynchronously
flushed is guaranteed to be recovered atomically in all cases. Asynchronous but
atomic would be good.

Andrew Schofield
IBM Watson and Cloud Platform


>
> From: Guozhang Wang 
> Sent: 09 December 2016 22:59
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional 
> Messaging
>  
> Onur,
> 
> I understand your question now. So it is indeed possible that after
> commitTxn() returned the messages could still be lost permanently if all
> replicas failed before the data was flushed to disk. This is the virtue of
> Kafka's design to reply on replication (probably in memory) for high
> availability, hence async flushing. This scenario already exist today and
> KIP-98 did not intend to change this factor in any ways.
> 
> Guozhang
>
>
>

RE: [VOTE] KIP-51 - List Connectors REST API

2016-03-23 Thread Andrew Schofield
+1 (non-binding)

+1 for re-evaluating the KIP process. This one went through very quickly, but 
they can drag on sometimes.


> Date: Wed, 23 Mar 2016 11:31:37 -0700
> Subject: Re: [VOTE] KIP-51 - List Connectors REST API
> From: wangg...@gmail.com
> To: dev@kafka.apache.org
>
> +1.
>
> On Wed, Mar 23, 2016 at 10:24 AM, Ashish Singh  wrote:
>
>> +1 (non-binding)
>>
>> On Wed, Mar 23, 2016 at 10:00 AM, Gwen Shapira  wrote:
>>
>>> Very large +1 on re-evaluating the KIP process.
>>>
>>> I was hoping we can do a meta-kip meeting after the release (Maybe even
>>> in-person at Kafka Summit?) to discuss.
>>>
>>> On Tue, Mar 22, 2016 at 7:59 PM, Grant Henke 
>> wrote:
>>>
 +1 (non-binding)

 I am also a +1 to evaluating the KIP process and ways to make it more
 effective and streamlined.

 On Tue, Mar 22, 2016 at 6:04 PM, Neha Narkhede 
>>> wrote:

> +1 (binding)
>
> On Tue, Mar 22, 2016 at 3:56 PM, Liquan Pei 
>>> wrote:
>
>> +1
>>
>> On Tue, Mar 22, 2016 at 3:54 PM, Gwen Shapira 
 wrote:
>>
>>> +1
>>>
>>> Straight forward enough and can't possibly break anything.
>>>
>>> On Tue, Mar 22, 2016 at 3:46 PM, Ewen Cheslack-Postava <
>> e...@confluent.io>
>>> wrote:
>>>
 Since it's pretty minimal, we'd like to squeeze it into 0.10 if
>> possible,
 and VOTE threads take 3 days, it was suggested it might make
>>> sense
 to
>>> just
 kick off voting on this KIP immediately (and restart it if
>>> someone
>> raises
 an issue). Feel free to object and comment in the DISCUSS
>> thread
>>> if
> you
 feel there's something to still be discussed.



>>>
>>
>

>>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-51+-+List+Connectors+REST+API

 I'll obviously kick things off with a +1.

 -Ewen

>>>
>>
>>
>>
>> --
>> Liquan Pei
>> Department of Physics
>> University of Massachusetts Amherst
>>
>
>
>
> --
> Thanks,
> Neha
>



 --
 Grant Henke
 Software Engineer | Cloudera
 gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke

>>>
>>
>>
>>
>> --
>>
>> Regards,
>> Ashish
>>
>
>
>
> --
> -- Guozhang
  

RE: [VOTE] KIP-43: Kafka SASL enhancements

2016-03-24 Thread Andrew Schofield
+1 (non-binding) on the revised KIP


> Date: Thu, 24 Mar 2016 08:21:14 +
> Subject: Re: [VOTE] KIP-43: Kafka SASL enhancements
> From: rajinisiva...@googlemail.com
> To: dev@kafka.apache.org
>
> Gwen,
>
> Is it still possible to include this in 0.10.0.0?
>
> Thanks,
>
> Rajini
>
> On Wed, Mar 23, 2016 at 11:08 PM, Gwen Shapira  wrote:
>
>> Sorry! Got distracted by the impending release!
>>
>> +1 on the current revision of the KIP.
>>
>> On Wed, Mar 23, 2016 at 3:33 PM, Harsha  wrote:
>>
>>> Any update on this. Gwen since the KIP is adjusted to address the
>>> pluggable classes we should make a move on this.
>>>
>>> Rajini,
>>> Can you restart the voting thread.
>>>
>>> Thanks,
>>> Harsha
>>>
>>> On Wed, Mar 16, 2016, at 06:42 AM, Rajini Sivaram wrote:
>>>> As discussed in the KIP meeting yesterday, the scope of KIP-43 has been
>>>> reduced so that it can be integrated into 0.10.0.0. The updated KIP is
>>>> here:
>>>>
>>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-43%3A+Kafka+SASL+enhancements
>>>> .
>>>>
>>>> Can we continue the vote on the updated KIP?
>>>>
>>>> Thank you,
>>>>
>>>> Rajini
>>>>
>>>> On Thu, Mar 10, 2016 at 2:09 AM, Gwen Shapira 
>> wrote:
>>>>
>>>>> Harsha,
>>>>>
>>>>> Since you are clearly in favor of the KIP, do you mind jumping into
>>>>> the discussion thread and help me understand the decision behind the
>>>>> configuration parameters only allowing a single Login and
>>>>> CallbackHandler class? This seems too limiting to me, and while
>> Rajini
>>>>> is trying hard to convince me otherwise, I remain doubtful. Perhaps
>>>>> (since we have similar experience with Hadoop), you can help me see
>>>>> what I am missing.
>>>>>
>>>>> Gwen
>>>>>
>>>>> On Wed, Mar 9, 2016 at 12:02 PM, Harsha  wrote:
>>>>>> +1 (binding)
>>>>>>
>>>>>> On Tue, Mar 8, 2016, at 02:37 AM, tao xiao wrote:
>>>>>>> +1 (non-binding)
>>>>>>>
>>>>>>> On Tue, 8 Mar 2016 at 05:33 Andrew Schofield <
>>>>>>> andrew_schofield_j...@outlook.com> wrote:
>>>>>>>
>>>>>>>> +1 (non-binding)
>>>>>>>>
>>>>>>>> 
>>>>>>>>> From: ism...@juma.me.uk
>>>>>>>>> Date: Mon, 7 Mar 2016 19:52:11 +
>>>>>>>>> Subject: Re: [VOTE] KIP-43: Kafka SASL enhancements
>>>>>>>>> To: dev@kafka.apache.org
>>>>>>>>>
>>>>>>>>> +1 (non-binding)
>>>>>>>>>
>>>>>>>>> On Thu, Mar 3, 2016 at 10:37 AM, Rajini Sivaram <
>>>>>>>>> rajinisiva...@googlemail.com> wrote:
>>>>>>>>>
>>>>>>>>>> I would like to start the voting process for *KIP-43: Kafka
>>> SASL
>>>>>>>>>> enhancements*. This KIP extends the SASL implementation in
>>> Kafka to
>>>>>>>> support
>>>>>>>>>> new SASL mechanisms to enable Kafka to be integrated with
>>> different
>>>>>>>>>> authentication servers.
>>>>>>>>>>
>>>>>>>>>> The KIP is available here for reference:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>
>>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-43:+Kafka+SASL+enhancements
>>>>>>>>>>
>>>>>>>>>> And here's is a link to the discussion on the mailing list:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>
>>>
>> http://mail-archives.apache.org/mod_mbox/kafka-dev/201601.mbox/%3CCAOJcB39b9Vy7%3DZEM3tLw2zarCS4A_s-%2BU%2BC%3DuEcWs0712UaYrQ%40mail.gmail.com%3E
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thank you...
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>
>>>>>>>>>> Rajini
>>>>>>>>>>
>>>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>>
>>>> Rajini
>>>
>>
>
>
>
> --
> Regards,
>
> Rajini
  

RE: Interacting with a secured Kafka cluster via GSS-API

2015-12-10 Thread Andrew Schofield
Wouldn't you use TLS to secure the connections? Encrypting just the 
credentials but not the connection seems brave.

Andrew



From:   Dave Ariens 
To: "dev@kafka.apache.org" 
Date:   10/12/2015 15:43
Subject:RE: Interacting with a secured Kafka cluster via GSS-API



> Is there a reason why you are using GSS-API directly instead of via 
SASL?

There sure is--because I have no clue what I'm doing :)

Our Kafka 0.9.0 cluster is currently only configured for SASL_PLAINTEXT so 
we're not encrypting anything at the moment.  I'll take a look through 
SaslClientAuthenticator and try and come back with either confirmation 
that everything is working as expected (hopefully) or at least more 
intelligent questions...

Thanks!


From: isma...@gmail.com [isma...@gmail.com] on behalf of Ismael Juma 
[ism...@juma.me.uk]
Sent: Thursday, December 10, 2015 10:36 AM
To: dev@kafka.apache.org
Subject: Re: Interacting with a secured Kafka cluster via GSS-API

Hi Dave,

Is there a reason why you are using GSS-API directly instead of via SASL?
It should still work, but if you do the latter, you can potentially reuse
the existing code (or at least use it as inspiration), see
`org.apache.kafka.common.security.authenticator.SaslClientAuthenticator`.

Also, please keep in mind that we are only using SASL for authentication
and that to encrypt the communication, you have to use SASL_SSL (ie we
don't support the SASL confidentiality QOP, for example).

I hope this helps.

Ismael



Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


RE: [DISCUSS] KIP-47 - Add timestamp-based log deletion policy

2016-02-13 Thread Andrew Schofield
This KIP is related to KIP-32, but I strikes me that it only makes sense with 
one of the two proposed message timestamp types. If I understand correctly, 
message timestamps are only certain to be monotonically increasing in the log 
if message.timestamp.type=LogAppendTime.



Does timestamp-based auto-expiration require use of 
message.timestamp.type=LogAppendTime?




I think this KIP is a good idea, but I think it relies on strict ordering of 
timestamps to be workable.



Andrew Schofield




> Date: Fri, 12 Feb 2016 10:38:46 -0800
> Subject: Re: [DISCUSS] KIP-47 - Add timestamp-based log deletion policy
> From: n...@confluent.io
> To: dev@kafka.apache.org
> 
> Adding a timestamp based auto-expiration is useful and this proposal makes
> sense. Thx!
> 
> On Wed, Feb 10, 2016 at 3:35 PM, Jay Kreps  wrote:
> 
>> I think this makes a lot of sense and won't be hard to implement and
>> doesn't create too much in the way of new interfaces.
>>
>> -Jay
>>
>> On Tue, Feb 9, 2016 at 8:13 AM, Bill Warshaw  wrote:
>>
>>> Hello,
>>>
>>> I just submitted KIP-47 for adding a new log deletion policy based on a
>>> minimum timestamp of messages to retain.
>>>
>>>
>>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-47+-+Add+timestamp-based+log+deletion+policy
>>>
>>> I'm open to any comments or suggestions.
>>>
>>> Thanks,
>>> Bill Warshaw
>>>
>>
> 
> 
> 
> -- 
> Thanks,
> Neha

RE: [DISCUSS] Deprecating the old Scala producers for the next release

2016-02-23 Thread Andrew Schofield
>From my point of view, it seems very odd to deprecate the Scala producer but 
>not the consumer. So, I would vote to deprecate them both in 0.10.

It doesn't sound like there's an established mechanism for deprecation. So, for 
the sake of discussion, how about:
* Start with deprecation annotations. It's just a marker that they're now 
living on borrowed time.
* Remove the ability to connect from these deprecated clients two releases 
later - so I mean 0.12, not 0.10.0.2.

Andrew Schofield


> Subject: Re: [DISCUSS] Deprecating the old Scala producers for the next 
> release
> From: f...@apache.org
> Date: Tue, 23 Feb 2016 18:46:27 +
> To: dev@kafka.apache.org
>
> It does make sense, thanks for the clarification. If we deprecate the 
> producer first, does it mean that the following release won't have a scala 
> producer but will have a scala consumer? Actually I should have asked this 
> question first: what's the deprecation path precisely?
>
> -Flavio
>
>> On 23 Feb 2016, at 17:28, Ismael Juma  wrote:
>>
>> Hi Flavio,
>>
>> On Tue, Feb 23, 2016 at 9:23 AM, Flavio Junqueira  wrote:
>>
>>> Ismael, I'm curious about why you want to deprecate one and not the other.
>>> You say that it is premature to deprecate the old scala consumers and I'm
>>> wondering about the reasoning behind it.
>>>
>>
>> The new Java producer was introduced in 0.8.2 and it's considered
>> production-ready. The new Java consumer was introduced in 0.9.0.0 and it's
>> still marked as beta. We would like to have at least one full release cycle
>> where the new consumer is no longer in beta, before we consider deprecating
>> the old consumers. Does that make sense?
>>
>> Ismael
>
  

RE: [VOTE] KIP-43: Kafka SASL enhancements

2016-03-07 Thread Andrew Schofield
+1 (non-binding)


> From: ism...@juma.me.uk
> Date: Mon, 7 Mar 2016 19:52:11 +
> Subject: Re: [VOTE] KIP-43: Kafka SASL enhancements
> To: dev@kafka.apache.org
>
> +1 (non-binding)
>
> On Thu, Mar 3, 2016 at 10:37 AM, Rajini Sivaram <
> rajinisiva...@googlemail.com> wrote:
>
>> I would like to start the voting process for *KIP-43: Kafka SASL
>> enhancements*. This KIP extends the SASL implementation in Kafka to support
>> new SASL mechanisms to enable Kafka to be integrated with different
>> authentication servers.
>>
>> The KIP is available here for reference:
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-43:+Kafka+SASL+enhancements
>>
>> And here's is a link to the discussion on the mailing list:
>>
>> http://mail-archives.apache.org/mod_mbox/kafka-dev/201601.mbox/%3CCAOJcB39b9Vy7%3DZEM3tLw2zarCS4A_s-%2BU%2BC%3DuEcWs0712UaYrQ%40mail.gmail.com%3E
>>
>>
>> Thank you...
>>
>> Regards,
>>
>> Rajini
>>
  

RE: [DISCUSS] Time-based releases for Apache Kafka

2016-08-25 Thread Andrew Schofield
The proposal sounds pretty good, but the main thing currently missing is a 
proper long-term support release.

Having 3 releases a year sounds OK, but if they're all equivalent and bugfix 
releases are produced for the most
recent 2 or 3 releases, anyone wanting to run on an "in support" release of 
Kafka has to upgrade every 8-12 months.
If you don't actually want anything specific from the newer releases, it's just 
unnecessary churn.

Wouldn't it be better to designate one release every 12-18 months as a 
long-term support release with bugfix releases
produced for those for a longer period of say 24 months. That halves the 
upgrade work for people just wanting to keep
"in support". Now that adoption is increasing, there are plenty of users that 
just want a dependable messaging system
without having to be deeply knowledgeable about its innards.

LTS works nicely for plenty of open-source projects. I think it would work well 
for Kafka too.

Andrew Schofield


> From: ofir.ma...@equalum.io
> Date: Thu, 25 Aug 2016 16:07:07 +0300
> Subject: Re: [DISCUSS] Time-based releases for Apache Kafka
> To: dev@kafka.apache.org
>
> Regarding bug fixes, you may want to consider to have an LTS release once a
> year - designating it for longer-term support / better for the masses.
> If you like that - then fix bugs in trunk, backport important ones to
> latest release + the last two LTS releases.
> Even if you don't - if a downstream distribution picks a Kafka version and
> plans to support it over a few years, it could be nice of them to "own"
> that older release - volunteer to be a release manager for bug backports to
> that version over a longer period...
> Just my two cents :)
>
> Ofir Manor
>
> Co-Founder & CTO | Equalum
>
> Mobile: +972-54-7801286 | Email: ofir.ma...@equalum.io
>
> On Thu, Aug 25, 2016 at 12:32 PM, Ismael Juma  wrote:
>
>> Thanks for putting this together Gwen. I think it sounds reasonable and
>> instead of trying to optimise every aspect of it ahead of time (which is
>> hard, subjective and time-consuming), I am happy to try what is being
>> proposed and tweak based on experience. One thing we should pay particular
>> attention to is how the stabilisation process works out in practice.
>>
>> A couple of comments:
>>
>> "Given 3 releases a year and the fact that no one upgrades three times a
>> year, we propose making sure (by testing!) that rolling upgrade can be done
>> from each release in the past year (i.e. last 3 releases) to the latest
>> version."
>>
>> Because the cost of doing this for a larger number of releases is
>> relatively low, I still think we should go for 6 here (our code currently
>> supports 5 versions as I said in a previous message, so we're close to that
>> target already). I'm generally very keen to make upgrades as easy as
>> possible so that people have no reason not to upgrade. :)
>>
>> "We will also attempt, as a community to do bugfix releases as needed for
>> the last 3 releases."
>>
>> I would suggest 2, personally, but since this is a bit fuzzy, I am OK with
>> 3 if people prefer that.
>>
>> Ismael
>>
>> On Thu, Aug 25, 2016 at 6:22 AM, Gwen Shapira  wrote:
>>
>>> Hi Team Kafka,
>>>
>>> As per the KIP meeting, I created a wiki:
>>> https://cwiki.apache.org/confluence/display/KAFKA/Time+
>> Based+Release+Plan
>>> Summarizing most of the discussion so far.
>>>
>>> Comments and additional discussion is welcome :)
>>>
>>> Gwen
>>>
>>> On Wed, Aug 17, 2016 at 12:31 PM, Vahid S Hashemian
>>>  wrote:
>>>> Time-based releases is a good idea and something that has proved to be
>>>> working in a number of open source projects. One successful example is
>>>> Node.js, that goes through two major releases a year. The interesting
>>> fact
>>>> about the two releases is that only one (the even-number release) comes
>>>> with a long term support (LTS) plan (30 months). More can be read here:
>>>> https://github.com/nodejs/LTS. The odd-number releases still come with
>>>> major changes and help build the ecosystem, but as far as LTS goes,
>> there
>>>> is only one per year. This LTS plan makes most enterprises want to
>> stick
>>>> to even-number releases, which is okay since frequent upgrades is not
>>>> something they are normally interested in anyway.
>>>>
>>>> There could be several minor releases (non-breaking) in bet

RE: [DISCUSS] Time-based releases for Apache Kafka

2016-08-25 Thread Andrew Schofield
I agree that the Kafka community has managed to maintain a very high quality 
level, so I'm not concerned
about the quality of non-LTS releases. If the principle is that every release 
is supported for 2 years, that
would be good. I suppose that if the burden of having that many in-support 
releases proves too heavy,
as you say we could reconsider.

Andrew Schofield


> From: g...@confluent.io
> Date: Thu, 25 Aug 2016 09:57:30 -0700
> Subject: Re: [DISCUSS] Time-based releases for Apache Kafka
> To: dev@kafka.apache.org
>
> I prefer Ismael's suggestion for supporting 2-years (6 releases)
> rather than have designated LTS releases.
>
> The LTS model seems to work well when some releases are high quality
> (LTS) and the rest are a bit more questionable. It is great for
> companies like Redhat, where they have to invest less to support few
> releases and let the community deal with everything else.
>
> Until now the Kafka community has managed to maintain very high
> quality level. Not just for releases, our trunk is often of better
> quality than other project's releases - we don't think of stability as
> something you tuck into a release (and just some releases) but rather
> as an on-going concern. There are costs to doing things that way, but
> in general, I think it has served us well - allowing even conservative
> companies to run on the latest released version.
>
> I hope we can agree to at least try maintaining last 6 releases as LTS
> (i.e. every single release is supported for 2 years) rather than
> designate some releases as better than others. Of course, if this
> totally fails, we can reconsider.
>
> Gwen
>
> On Thu, Aug 25, 2016 at 9:51 AM, Andrew Schofield
>  wrote:
>> The proposal sounds pretty good, but the main thing currently missing is a 
>> proper long-term support release.
>>
>> Having 3 releases a year sounds OK, but if they're all equivalent and bugfix 
>> releases are produced for the most
>> recent 2 or 3 releases, anyone wanting to run on an "in support" release of 
>> Kafka has to upgrade every 8-12 months.
>> If you don't actually want anything specific from the newer releases, it's 
>> just unnecessary churn.
>>
>> Wouldn't it be better to designate one release every 12-18 months as a 
>> long-term support release with bugfix releases
>> produced for those for a longer period of say 24 months. That halves the 
>> upgrade work for people just wanting to keep
>> "in support". Now that adoption is increasing, there are plenty of users 
>> that just want a dependable messaging system
>> without having to be deeply knowledgeable about its innards.
>>
>> LTS works nicely for plenty of open-source projects. I think it would work 
>> well for Kafka too.
>>
>> Andrew Schofield
>>
>> 
>>> From: ofir.ma...@equalum.io
>>> Date: Thu, 25 Aug 2016 16:07:07 +0300
>>> Subject: Re: [DISCUSS] Time-based releases for Apache Kafka
>>> To: dev@kafka.apache.org
>>>
>>> Regarding bug fixes, you may want to consider to have an LTS release once a
>>> year - designating it for longer-term support / better for the masses.
>>> If you like that - then fix bugs in trunk, backport important ones to
>>> latest release + the last two LTS releases.
>>> Even if you don't - if a downstream distribution picks a Kafka version and
>>> plans to support it over a few years, it could be nice of them to "own"
>>> that older release - volunteer to be a release manager for bug backports to
>>> that version over a longer period...
>>> Just my two cents :)
>>>
>>> Ofir Manor
>>>
>>> Co-Founder & CTO | Equalum
>>>
>>> Mobile: +972-54-7801286 | Email: ofir.ma...@equalum.io
>>>
>>> On Thu, Aug 25, 2016 at 12:32 PM, Ismael Juma  wrote:
>>>
>>>> Thanks for putting this together Gwen. I think it sounds reasonable and
>>>> instead of trying to optimise every aspect of it ahead of time (which is
>>>> hard, subjective and time-consuming), I am happy to try what is being
>>>> proposed and tweak based on experience. One thing we should pay particular
>>>> attention to is how the stabilisation process works out in practice.
>>>>
>>>> A couple of comments:
>>>>
>>>> "Given 3 releases a year and the fact that no one upgrades three times a
>>>> year, we propose making sure (by testing!) that rolling upgrade can be done
>>&g

Re: [DISCUSS] KIP-80: Kafka REST Server

2016-10-07 Thread Andrew Schofield
There's a massive difference between the governance of Kafka and the governance 
of the REST proxy.

In Kafka, there is a broad community of people contributing their opinions 
about future enhancements in the form of KIPs. There's some really deep 
consideration that goes into some of the trickier KIPs. There are people 
outside Confluent deeply knowledgeable  in Kafka and building the reputations 
to become committers. I get the impression that the roadmap of Kafka is not 
really community-owned (what's the big feature for Kafka 0.11, for example), 
but the conveyor belt of smaller features in the form of KIPs works  nicely. 
It's a good example of open-source working well.

The equivalent for the REST proxy is basically issues on GitHub. The roadmap is 
less clear. There's not really a community properly engaged in the way that 
there is with Kafka. So, you could say that it's clear that fewer people are 
interested, but I think  the whole governance thing is a big barrier to 
engagement. And it's looking like it's getting out of date.

In technical terms, I can think of two big improvements to the REST proxy. 
First, it needs to use the new consumer API so that it's possible to secure 
connections between the REST proxy and Kafka. I don't care too much which 
method calls it uses actually  uses to consume messages, but I do care that I 
cannot secure connections because of network security rules. Second, there's an 
affinity between a consumer and the instance of the REST proxy to which it 
first connected. Kafka itself avoids this kind of affinity for good reason, and 
in the name of availability the REST proxy should too. These are natural KIPs.

I think it would be good to have the code for the REST proxy contributed to 
Apache Kafka so that it would be able to be developed in the same way.

Andrew Schofield
  
From: Suresh Srinivas 
Sent: 07 October 2016 22:41:52
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-80: Kafka REST Server
    
ASF already gives us a clear framework and governance model for community
development. This is already understood by the people contributing to
Apache Kafka project, and they are the same people who want to contribute
to the REST server capability as well. Everyone is in agreement on the
need for collaborating on this effort. So why not contribute the code to
Apache Kafka. This will help avoid duplication of effort and forks that
may crop up, hugely benefitting the user community. This will also avoid
having to define a process similar to ASF on a GitHub project and instead
there is a single community with clear understanding community process as
defined in ASF.

As others have said, this is an important capability for Apache Kafka. It
is worth maintaining this as a part of the project.

Regards,
Suresh

On 10/6/16, 8:32 AM, "Ofir Manor"  wrote:

>I personally think it would be quite wasteful to re-implement the REST
>gateway just because that an actively-maintained piece of Apache-licensed
>software is not governed directly by the Apache Kafka community... While
>kafka-rest repo is owned by Confluent, the contributors including the main
>one are also part of the Apache Kafka  community, so there is a chance to
>work this out.
>
>However, there are two valid concerns here that could be addressed, around
>community and accessibility:
>>> What we are worried about is a project
>>> that's not maintained in a community. So the process of accepting
>>>patches
>>> and priorities is not clear, and it's not developed in Apache
>>>community.
>>> Not only that, existing REST API project doesn't support new client API
>and
>>> hence there is no security support either.
>
>This might be easy to fix. Maybe Confluent / kafka-rest community can
>clarify that - what is their contribution policy, dev style, roadmap etc.
>If they want, they can make an effort to encourage participation from
>people outside Confluent (easily accept contributions, invite external
>commiters or have open dev process similar to Apache Kafka etc), as there
>is definitely seems to be some interest on the list. That might clear the
>community concern and help kafka-rest project (but that is a calculation
>Confluent will have to make).
>
>The other, independent, concern is that REST is something that is expected
>to be available out of the box with Kafka. I personally don't feel
>strongly
>about it (better use proper, efficient APIs from day one), though it is
>definitely way smaller than adding a stream processing engine to the
>project :)
>Again,the kafka-rest "community" could take steps to make it even easier
>to
>install, configure and run kafka-rest for new users on vanilla Apache
>Kafka
>(outside the Confluent platform), if they wish that (or welc

KIP-382: MirrorMaker 2.0 progress to delivery?

2019-08-28 Thread Andrew Schofield
Hi,
KIP-382 (MirrorMaker 2.0) has been approved for a while now but the code hasn’t 
yet made it into Kafka. If my memory serves me well, it looked like a candidate 
for 2.3 and it’s now a candidate for 2.4.

For such a significant feature, I guess it’s going to take a little time to 
mature and that’s easiest with a broad range of people using it, and that’s 
only going to happen once it’s in a release. Does anyone have a view on the 
likelihood of it making 2.4?

Thanks,
Andrew Schofield


Re: [DISCUSS] KIP-416: Notify SourceTask of ACK'd offsets, metadata

2019-10-01 Thread Andrew Schofield
I favour this approach too.

Andrew Schofield

On 01/10/2019, 09:15, "Ryanne Dolan"  wrote:

Thanks Randall, that works for me.

Ryanne

On Tue, Oct 1, 2019 at 9:09 AM Randall Hauch  wrote:

> Apologies for the late entry -- I entirely missed this KIP and discussion.
> :-(
>
> Thanks for creating the KIP and proposing this change. I do think it's
> useful for source connector tasks to get more information about the
> acknowledgement after the record was written.
>
> However, given the KIPs suggestion that the two `commitRecord(...)` method
> variants are disjoint, I'm a bit surprised that the WorkerSourceTask would
> do the following:
>
> task.commitRecord(preTransformRecord);
> if (recordMetadata != null)
> task.commitRecord(preTransformRecord, recordMetadata);
>
> rather than:
>
> if (recordMetadata != null)
> task.commitRecord(preTransformRecord, recordMetadata);
> else
> task.commitRecord(preTransformRecord);
>
> But if this is the case, I would argue that it is better to simply have 
one
> `commitRecord(SourceRecord record, RecordMetadata metadata)` method that
> clearly denotes that the metadata may be null if the record was not 
written
> (e.g., an SMT caused it to be dropped) or could not be written (after
> giving up retrying after failures in the SMTs and/or the converter), and
> let the implementation deal with the differences. Essentially, we've be
> deprecating the existing `commitRecord(SourceRecord)` method, changing the
> framework to always use the new method, and having the new method by
> default delegate to the existing method. (This is what Jun also suggested
> on the PR request,
> https://github.com/apache/kafka/pull/6295#discussion_r330097541). This is
> backwards compatible for connector implementations that only override the
> old method, yet provides a way for connectors that do implement the new 
API
> to override the new method without having to also implement the old 
method,
> too.
>
> IOW:
>
> @deprecated
> public void commitRecord(SourceRecord sourceRecord) {
>   // nop
> }
>
> /**
>  * 
>  * Commit an individual {@link SourceRecord} when the callback from the
> producer client is received, or if a record is filtered by a 
transformation
> and not sent to the producer.
>  * By default, this method delegates to the {@link
> #commitRecord(SourceRecord)} method to maintain backward compatibility.
> Tasks can choose to override this method,
>  * override the {@link #commitRecord(SourceRecord)} method, or not 
override
> either one.
>  * 
>  * 
>  * SourceTasks are not required to implement this functionality; Kafka
> Connect will record offsets
>  * automatically. This hook is provided for systems that also need to 
store
> offsets internally
>  * in their own system.
>  * 
>  *
>  * @param record {@link SourceRecord} that was successfully sent via the
> producer.
>  * @param recordMetadata the metadata from the producer's write
> acknowledgement, or null if the record was not sent to the producer 
because
> it was filtered by an SMT or could not be transformed and/or converted
>  * @throws InterruptedException
>  */
> public void commitRecord(SourceRecord sourceRecord, RecordMetadata
> recordMetadata) {
>   commitRecord(sourceRecord);
> }
>
> Best regards,
>
> Randall
>
>
> On Thu, Jan 31, 2019 at 9:02 AM Ryanne Dolan 
> wrote:
>
> > Andrew, I have considered this, but I think passing null for
> RecordMetadata
> > would be surprising and error prone for anyone implementing SourceTask. 
I
> > figure the only use-case for overriding this variant (and not the
> existing
> > one) is to capture the RecordMetadata. If that's the case, every
> > implementation would need to check for null. What worries me is that an
> > implementation that does not check for null will seem to work until an
> SMT
> > is configured to filter records, which I believe would be exceedingly
> rare.
> > Moreover, the presence of the RecordMetadata parameter strongly implies
> > that the record has been sent and ACK'd, and it would be surprising to
> > discover otherwise.
> >
> > On the other hand, the current PR makes it difficult to distinguish
> between
> > recor

Re: [VOTE] KIP-959 Add BooleanConverter to Kafka Connect

2023-07-25 Thread Andrew Schofield
Thanks for the KIP. As you say, not that controversial.

+1 (non-binding)

Thanks,
Andrew

> On 25 Jul 2023, at 18:22, Hector Geraldino (BLOOMBERG/ 919 3RD A) 
>  wrote:
>
> Hi everyone,
>
> The changes proposed by KIP-959 (Add BooleanConverter to Kafka Connect) have 
> a limited scope and shouldn't be controversial. I'm opening a voting thread 
> with the hope that it can be included in the next upcoming 3.6 release.
>
> Here are some links:
>
> KIP: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-959%3A+Add+BooleanConverter+to+Kafka+Connect
> JIRA: https://issues.apache.org/jira/browse/KAFKA-15248
> Discussion thread: 
> https://lists.apache.org/thread/15c2t0kl9bozmzjxmkl5n57kv4l4o1dt
> Pull Request: https://github.com/apache/kafka/pull/14093
>
> Thanks!




Re: [DISCUSS] KIP-953: partition method to be overloaded to accept headers as well.

2023-07-27 Thread Andrew Schofield
Hi Jack,
Thanks for the KIP. I have a few concerns about the idea.

1) I think that while a client-side partitioner seems like a neat idea and it’s 
an established part of Kafka,
it’s one of the things which makes Kafka clients quite complicated. Just as 
KIP-848 is moving from
client-side assignors to server-side assignors, I wonder whether really we 
should be looking to make
partitioning a server-side capability too over time. So, I’m not convinced that 
making the Partitioner
interface richer is moving in the right direction.

2) For records with a key, the partitioner usually calculates the partition 
from the key. This means
that records with the same key end up on the same partition. Many applications 
expect this to give ordering.
Log compaction expects this. There are situations in which records have to be 
repartitioned, such as
sometimes happens with Kafka Streams. I think that a header-based partitioner 
for records which have
keys is going to be surprising and only going to have limited applicability as 
a result.

The tricky part about clever partitioning is that downstream systems have no 
idea how the partition
number was arrived at, so they do not truly understand how the ordering was 
derived. I do think that
perhaps there’s value to being able to influence the partitioning using the 
headers, but I wonder if actually
transforming the headers into an “ordering context” that then flows with the 
record as it moves through
the system would be a stronger solution. Today, the key is the ordering 
context. Maybe it should be a
concept in its own right and the Producer could configure a converter from 
headers to ordering context.
That is of course a much bigger change.

In one of the examples you mention in the KIP, you mentioned using a header to 
control priority. I guess the
idea is to preferentially process records off specific partitions so they can 
overtake lower priority records.
I suggest just sending the records explicitly to specific partitions to achieve 
this.

Sorry for the essay, but you did ask for people to share their thoughts :)

Just my opinion. Let’s see what others think.

Thanks,
Andrew

> On 25 Jul 2023, at 14:58, Jack Tomy  wrote:
>
> Hey @Sagar
>
> Thanks again for the review.
> 1. "a null headers value is equivalent to invoking the older partition
> method", this is not true. If someone makes an implementation and the
> headers come as null, still the new implementation will take effect.
> Instead I have added : "Not overriding this method in the Partitioner
> interface has the same behaviour as using the existing method."
> 2. Corrected.
>
> Hey @Sagar and everyone,
> Please have a look at the new version and share your thoughts.
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263424937
> Like Sagar mentioned, I would also request more people who have more
> context on clients to chime in.
>
>
> On Tue, Jul 25, 2023 at 2:58 PM Sagar  wrote:
>
>> Hi Jack,
>>
>> Thanks I have a couple of final comments and then I am good.
>>
>> 1) Can you elaborate on the Javadocs of the partition headers argument to
>> specify that a null headers value is equivalent to invoking the older
>> partition method? It is apparent but generally good to call out.
>> 2) In the Compatibility section, you have mentioned backward comparable. I
>> believe it should be *backward compatible change.*
>>
>> I don't have other comments. Post this, probably someone else who has more
>> context on Clients can also chime in on this before we can move this to
>> Voting.
>>
>> Thanks!
>> Sagar.
>>
>>
>> On Sat, Jul 22, 2023 at 10:09 AM Jack Tomy  wrote:
>>
>>> Hey @Sagar,
>>>
>>> Thank you again for the response and feedback.
>>>
>>>   1. Though the ask wasn't very clear to me I have attached the Javadoc
>> as
>>>   per your suggestion. Please have a look and let me know if this meets
>>> the
>>>   expectations.
>>>   2. Done.
>>>   3. Done
>>>   4. Done
>>>
>>> Hey @Sagar and everyone,
>>> Please have a look at the new version and share your thoughts.
>>>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263424937
>>>
>>> On Thu, Jul 20, 2023 at 9:46 PM Sagar  wrote:
>>>
 Thanks Jack for the updates.

 Some more feedback:

 1) It would be better if you can add the Javadoc in the Public
>> interfaces
 section. That is a general practice used which gives the readers of the
>>> KIP
 a high level idea of the Public Interfaces.

 2) In the proposed section, the bit about marking headers as read only
 seems like an implementation detail This can generally be avoided in
>>> KIPs.

 3) Also, in the Deprecation section, can you mention again that this
>> is a
 backward compatible change and the reason for it (already done in the
 Proposed Changes section).

 4) In the Testing Plan section, there is still the KIP template bit
>>> copied
 over. That can be removed.

 Thanks!
 Sagar.


 On T

Re: [DISCUSS] KIP-953: partition method to be overloaded to accept headers as well.

2023-07-28 Thread Andrew Schofield
Hi Sagar,
Thanks for your comments.

1) Server-side partitioning doesn’t necessarily mean that there’s only one
way to do it. It just means that the partitioning logic runs on the broker and
any configuration of partitioning applies to the broker’s partitioner. If we 
ever
see a KIP for this, that’s the kind of thing I would expect to see.

2) In the priority example in the KIP, there is a kind of contract between the
producers and consumers so that some records can be processed before
others regardless of the order in which they were sent. The producer
wants to apply special significance to a particular header to control which
partition is used. I would simply achieve this by setting the partition number
in the ProducerRecord at the time of sending.

I don’t think the KIP proposes adjusting the built-in partitioner or adding to 
AK
a new one that uses headers in the partitioning decision. So, any configuration
for a partitioner that does support headers would be up to the implementation
of that specific partitioner. Partitioner implements Configurable.

I’m just providing an alternative view and I’m not particularly opposed to the 
KIP.
I just don’t think it quite merits the work involved to get it voted and merged.
As an aside, a long time ago, I created a small KIP that was never adopted
and I didn’t push it because I eventually didn’t need it.

Thanks,
Andrew

> On 28 Jul 2023, at 05:15, Sagar  wrote:
>
> Hey Andrew,
>
> Thanks for the review. Since I had reviewed the KIP I thought I would also
> respond. Of course Jack has the final say on this since he wrote the KIP.
>
> 1) This is an interesting point and I hadn't considered it. The
> comparison with KIP-848 is a valid one but even within that KIP, it allows
> client side partitioning for power users like Streams. So while we would
> want to move away from client side partitioner as much as possible, we
> still shouldn't do away completely with Client side partitioning and end up
> being in a state of inflexibility for different kinds of usecases. This is
> my opinion though and you have more context on Clients, so would like to
> know your thoughts on this.
>
> 2) Regarding this, I assumed that since the headers are already part of the
> consumer records they should have access to the headers and if there is a
> contract b/w the applications producing and the application consuming, that
> decisioning should be transparent. Was my assumption incorrect? But as you
> rightly pointed out header based partitioning with keys is going to lead to
> surprising results. Assuming there is merit in this proposal, do you think
> we should ignore the keys in this case (similar to the effect of
> setting *partitioner.ignore.keys
> *config to false) and document it appropriately?
>
> Let me know what you think.
>
> Thanks!
> Sagar.
>
>
> On Thu, Jul 27, 2023 at 9:41 PM Andrew Schofield <
> andrew_schofield_j...@outlook.com> wrote:
>
>> Hi Jack,
>> Thanks for the KIP. I have a few concerns about the idea.
>>
>> 1) I think that while a client-side partitioner seems like a neat idea and
>> it’s an established part of Kafka,
>> it’s one of the things which makes Kafka clients quite complicated. Just
>> as KIP-848 is moving from
>> client-side assignors to server-side assignors, I wonder whether really we
>> should be looking to make
>> partitioning a server-side capability too over time. So, I’m not convinced
>> that making the Partitioner
>> interface richer is moving in the right direction.
>>
>> 2) For records with a key, the partitioner usually calculates the
>> partition from the key. This means
>> that records with the same key end up on the same partition. Many
>> applications expect this to give ordering.
>> Log compaction expects this. There are situations in which records have to
>> be repartitioned, such as
>> sometimes happens with Kafka Streams. I think that a header-based
>> partitioner for records which have
>> keys is going to be surprising and only going to have limited
>> applicability as a result.
>>
>> The tricky part about clever partitioning is that downstream systems have
>> no idea how the partition
>> number was arrived at, so they do not truly understand how the ordering
>> was derived. I do think that
>> perhaps there’s value to being able to influence the partitioning using
>> the headers, but I wonder if actually
>> transforming the headers into an “ordering context” that then flows with
>> the record as it moves through
>> the system would be a stronger solution. Today, the key is the ordering
>> context. Maybe it should be a
>> concept in its own right and the Producer could configure a converter from
>>

Re: [DISCUSS] KIP-714: Client metrics and observability

2023-07-31 Thread Andrew Schofield
Hi Milind,
Thanks for your question.

On reflection, I agree that INVALID_RECORD is most likely to be caused by a
problem in the serialization in the client. I have changed the client action in 
this case
to “Log an error and stop pushing metrics”.

I have updated the KIP text accordingly.

Thanks,
Andrew

> On 31 Jul 2023, at 12:09, Milind Luthra  wrote:
>
> Hi Andrew,
> Thanks for the clarifications.
>
> About 2b:
> In case a client has a bug while serializing, it might be difficult for the
> client to recover from that without code changes. In that, it might be good
> to just log the INVALID_RECORD as an error, and treat the error as fatal
> for the client (only fatal in terms of sending the metrics, the client can
> keep functioning otherwise). What do you think?
>
> Thanks
> Milind
>
> On Mon, Jul 24, 2023 at 8:18 PM Andrew Schofield <
> andrew_schofield_j...@outlook.com> wrote:
>
>> Hi Milind,
>> Thanks for your questions about the KIP.
>>
>> 1) I did some archaeology and looked at historical versions of the KIP. I
>> think this is
>> just a mistake. 5 minutes is the default metric push interval. 30 minutes
>> is a mystery
>> to me. I’ve updated the KIP.
>>
>> 2) I think there are two situations in which INVALID_RECORD might occur.
>> a) The client might perhaps be using a content-type that the broker does
>> not support.
>> The KIP mentions content-type as a future extension, but there’s only one
>> supported
>> to start with. Until we have multiple content-types, this seems out of
>> scope. I think a
>> future KIP would add another error code for this.
>> b) The client might perhaps have a bug which means the metrics payload is
>> malformed.
>> Logging a warning and attempting the next metrics push on the push
>> interval seems
>> appropriate.
>>
>> UNKNOWN_SUBSCRIPTION_ID would indeed be handled by making an immediate
>> GetTelemetrySubscriptionsRequest.
>>
>> UNSUPPORTED_COMPRESSION_TYPE seems like either a client bug or perhaps
>> a situation in which a broker sends a compression type in a
>> GetTelemetrySubscriptionsResponse
>> which is subsequently not supported when its used with a
>> PushTelemetryRequest.
>> We do want the client to have the opportunity to get an up-to-date list of
>> supported
>> compression types. I think an immediate GetTelemetrySubscriptionsRequest
>> is appropriate.
>>
>> 3) If a client attempts a subsequent handshake with a Null
>> ClientInstanceId, the
>> receiving broker may not already know the client's existing
>> ClientInstanceId. If the
>> receiving broker knows the existing ClientInstanceId, it simply responds
>> the existing
>> value back to the client. If it does not know the existing
>> ClientInstanceId, it will create
>> a new client instance ID and respond with that.
>>
>> I will update the KIP with these clarifications.
>>
>> Thanks,
>> Andrew
>>
>>> On 17 Jul 2023, at 14:21, Milind Luthra 
>> wrote:
>>>
>>> Hi Andrew, thanks for this KIP.
>>>
>>> I had a few questions regarding the "Error handling" section.
>>>
>>> 1. It mentions that "The 5 and 30 minute retries are to eventually
>> trigger
>>> a retry and avoid having to restart clients if the cluster metrics
>>> configuration is disabled temporarily, e.g., by operator error, rolling
>>> upgrades, etc."
>>> But this 30 min interval isn't mentioned anywhere else. What is it
>>> referring to?
>>>
>>> 2. For the actual errors:
>>> INVALID_RECORD : The action required is to "Log a warning to the
>>> application and schedule the next GetTelemetrySubscriptionsRequest to 5
>>> minutes". Why is this 5 minutes, and not something like PushIntervalMs?
>> And
>>> also, why are we scheduling a GetTelemetrySubscriptionsRequest in this
>>> case, if the serialization is broken?
>>> UNKNOWN_SUBSCRIPTION_ID , UNSUPPORTED_COMPRESSION_TYPE : just to confirm,
>>> the GetTelemetrySubscriptionsRequest needs to be scheduled immediately
>>> after the PushTelemetry response, correct?
>>>
>>> 3. For "Subsequent GetTelemetrySubscriptionsRequests must include the
>>> ClientInstanceId returned in the first response, regardless of broker":
>>> Will a broker error be returned in case some implementation of this KIP
>>> violates this accidentally and sends a request with ClientInstanceId =
>> Null
>>> even when it's been obtained already? Or 

Re: [DISCUSS] KIP-953: partition method to be overloaded to accept headers as well.

2023-08-03 Thread Andrew Schofield
Hi Jack,
I do understand the idea of extending the Partitioner interface so that
people are now able to use headers in the partitioning decision, and I see
that it’s filling in gap in the interface which was left when headers were
originally added.

Experience with non-default partitioning schemes in the past makes me
unlikely to use anything other than the default partitioning scheme.
But I wouldn’t let that dissuade you.

Thanks,
Andrew

> On 3 Aug 2023, at 13:43, Jack Tomy  wrote:
>
> Hey Andrew, Sagar
>
> Please share your thoughts. Thanks.
>
>
>
> On Mon, Jul 31, 2023 at 5:58 PM Jack Tomy  wrote:
>
>> Hey Andrew, Sagar
>>
>> Thanks. I'm travelling so sorry for being brief and getting back late.
>>
>> 1. For the first concern, that is moving in a direction of server side
>> partitioner, the idea seems very much promising but I believe we still have
>> a long way to go. Since the proposal/design for the same is still not
>> available, it's hard for me to defend my proposal.
>> 2.  For the second concern:
>> 2.1 Loss of order in messages, I believe the ordering of messages is
>> never promised and the partitioner has no requirement to ensure the same.
>> It is upto the user to implement/use a partitioner which ensures ordering
>> based on keys.
>> 2.2 Key deciding the partitioner, It is totally up to the user to decide
>> the partition regardless of the key, we are also passing the value to the
>> partitioner. Even the existing implementation receives the value which lets
>> the user decide the partition based on value.
>> 2.3 Sending to a specific partition, for this, I need to be aware of the
>> total number of partitions, but if I can do that same in partitioner, the
>> cluster param gives me all the information I want.
>>
>> I would also quote a line from KIP-82 where headers were added to the
>> serializer : The payload is traditionally for the business object, and 
>> *headers
>> are traditionally used for transport routing*, filtering etc. So I
>> believe when a user wants to add some routing information (in this case
>> which set of partitions to go for), headers seem to be the right place.
>>
>>
>>
>> On Sat, Jul 29, 2023 at 8:48 PM Sagar  wrote:
>>
>>> Hi Andrew,
>>>
>>> Thanks for your comments.
>>>
>>> 1) Yes that makes sense and that's what even would expect to see as well.
>>> I
>>> just wanted to highlight that we might still need a way to let client side
>>> partitioning logic be present as well. Anyways, I am good on this point.
>>> 2) The example provided does seem achievable by simply attaching the
>>> partition number in the ProducerRecord. I guess if we can't find any
>>> further examples which strengthen the case of this partitioner, it might
>>> be
>>> harder to justify adding it.
>>>
>>>
>>> Thanks!
>>> Sagar.
>>>
>>> On Fri, Jul 28, 2023 at 2:05 PM Andrew Schofield <
>>> andrew_schofield_j...@outlook.com> wrote:
>>>
>>>> Hi Sagar,
>>>> Thanks for your comments.
>>>>
>>>> 1) Server-side partitioning doesn’t necessarily mean that there’s only
>>> one
>>>> way to do it. It just means that the partitioning logic runs on the
>>> broker
>>>> and
>>>> any configuration of partitioning applies to the broker’s partitioner.
>>> If
>>>> we ever
>>>> see a KIP for this, that’s the kind of thing I would expect to see.
>>>>
>>>> 2) In the priority example in the KIP, there is a kind of contract
>>> between
>>>> the
>>>> producers and consumers so that some records can be processed before
>>>> others regardless of the order in which they were sent. The producer
>>>> wants to apply special significance to a particular header to control
>>> which
>>>> partition is used. I would simply achieve this by setting the partition
>>>> number
>>>> in the ProducerRecord at the time of sending.
>>>>
>>>> I don’t think the KIP proposes adjusting the built-in partitioner or
>>>> adding to AK
>>>> a new one that uses headers in the partitioning decision. So, any
>>>> configuration
>>>> for a partitioner that does support headers would be up to the
>>>> implementation
>>>> of that specific partitioner. Partitioner implements Configurable.
>>>>
>>>> I’m just providing an alternative view and I’m not partic

[VOTE] KIP-714: Client metrics and observability

2023-08-04 Thread Andrew Schofield
Hi,
After almost 2 1/2 years in the making, I would like to call a vote for KIP-714 
(https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability).

This KIP aims to improve monitoring and troubleshooting of client performance 
by enabling clients to push metrics to brokers.

I’d like to thank everyone that participated in the discussion, especially the 
librdkafka team since one of the aims of the KIP is to enable any client to 
participate, not just the Apache Kafka project’s Java clients.

Thanks,
Andrew

Re: [DISCUSS] KIP-714: Client metrics and observability

2023-08-04 Thread Andrew Schofield
Hi Doguscan,
Thanks for your comments. I’m glad to hear you’re interested in this KIP.

1) It is preferred that a client sends its metrics to the same broker connection
but actually it is able to send them to any broker. As a result, if a broker 
becomes
unhealthy, the client can push its metrics to any other broker. It seems to me 
that
pushing to KRaft controllers instead just has the effect of increasing the load 
on
the controllers, while still having the characteristic that an unhealthy 
controller
would present inconvenience for collecting metrics.

2) When the `PushTelemetryRequest.Terminating` flag is set, the standard request
throttling is not disabled. The metrics rate-limiting based on the push 
interval is
not applied in this case for a single request for the combination of client 
instance ID
and subscription ID.

(I have corrected the KIP text because it erroneously said “client ID and 
subscription ID”.

3) While this is a theoretical problem, I’m not keen on adding yet more 
configurations
to the broker or client. The `interval.ms` configuration on the CLIENT_METRICS
resource could perhaps have a minimum and maximum value to prevent accidental
misconfiguration.

4) One of the reasons that this KIP has taken so long to get to this stage is 
that
it tried to do many things all at once. So, it’s greatly simplified compared 
with
6 months ago. I can see the value of collecting client configurations for 
problem
determination, but I don’t want to make this KIP more complicated. I think the
idea has merit as a separate follow-on KIP. I would be happy to collaborate
with you on this.

5) The default is set to 5 minutes to minimise the load on the broker for 
situations
in which the administrator didn’t set an interval on a metrics subscription. To
use an interval of 1 minute, it is only necessary to set `interval.ms` on the 
metrics
subscription to 6ms.

6) Uncompressed data is always supported. The KIP says:
 "The CompressionType of NONE will not be
"present in the response from the broker, though the broker does support 
uncompressed
"client telemetry if none of the accepted compression codecs are supported by 
the client.”
So in your example, the client need only use CompressionType=NONE.

Thanks,
Andrew

> On 4 Aug 2023, at 14:04, Doğuşcan Namal  wrote:
>
> Hi Andrew, thanks a lot for this KIP. I was thinking of something similar
> so thanks for writing this down 😊
>
>
>
> Couple of questions related to the design:
>
>
>
> 1. Can we investigate the option for using the Kraft controllers instead of
> the brokers for sending metrics? The disadvantage of sending these metrics
> directly to the brokers tightly couples metric observability to data plane
> availability. If the broker is unhealthy then the root cause of an incident
> is clear however on partial failures it makes it hard to debug these
> incidents from the brokers perspective.
>
>
>
> 2. Ratelimiting will be disable if the `PushTelemetryRequest.Terminating`
> flag is set. However, this may cause unavailability on the broker if too
> many clients are terminated at once, especially network threads could
> become busy and introduce latency on the produce/consume on other
> non-terminating clients connections. I think there is a room for
> improvement here. If the client is gracefully shutting down, it could wait
> for the request to be handled if it is being ratelimited, it doesn't need
> to "force push" the metrics. For that reason, maybe we could define a
> separate ratelimiting for telemetry data?
>
>
>
> 3. `PushIntervalMs` is set on the client side by a response from
> `GetTelemetrySubscriptionsResponse`. If the broker sets this value to too
> low, like 1msec, this may hog all of the clients activity and cause an
> impact on the client side. I think we should introduce a configuration both
> on the client and the broker side for the minimum and maximum numbers for
> this value to fence out misconfigurations.
>
>
>
> 4. One of the important things I face during debugging the client side
> failures is to understand the client side configurations. Can the client
> sends these configs during the GetTelemetrySubscriptions request as well?
>
>
>
> Small comments:
>
> 5. Default PushIntervalMs is 5 minutes. Can we make it 1 minute instead? I
> think 5 minutes of aggregated data is too not helpful in the world of
> telemetry 😊
>
> 6. UnsupportedCompressionType: Shall we fallback to non-compression mode in
> that case? I think compression is nice to have, but non-compressed
> telemetry data is valuable as well. Especially for low throughput clients,
> compressing telemetry data may cause more CPU load then the actual data
> plane work.
>
>
> Thanks again.
>
> Doguscan
>
>
>
>> On Jun 13, 2023, at 8:

Re: [DISCUSS] Cluster-wide disablement of Tiered Storage

2023-08-04 Thread Andrew Schofield
Hi Christo,
I agree with you.

Option 4.1 without a KIP seems like an acceptable starting point for something
which will be relatively rare, provided that it’s easy for the user to get a 
list of the
topics that have to be deleted before they can successfully start the broker 
with
TS turned off.

Option 4.2 in the future with a KIP improves things later on.

Thanks,
Andrew

> On 4 Aug 2023, at 16:12, Christo Lolov  wrote:
>
> Hello all!
>
> I wanted to gather more opinions for
> https://issues.apache.org/jira/browse/KAFKA-15267
>
> In summary, the problem which I would like to solve is disabling TS (and
> freeing the resources used by RemoteLog*Manager) because I have decided I
> no longer want to use it without having to provision a whole new cluster
> which just doesn't have it enabled.
>
> My preference would be for option 4.1 without a KIP followed by option 4.2
> in the future with a KIP once KIP-950 makes it in.
>
> Please let me know your thoughts!
>
> Best,
> Christo



Re: [VOTE] KIP-940: Broker extension point for validating record contents at produce time

2023-08-07 Thread Andrew Schofield
Hi Adrian,
Thanks for the KIP. Looks like a nice improvement.

+1 (non-binding)

Thanks,
Andrew

> On 2 Aug 2023, at 12:33, Adrian Preston  wrote:
>
> Hello all,
>
>
>
> Edo and I would like to call for a vote on KIP-940:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-940%3A+Broker+extension+point+for+validating+record+contents+at+produce+time
>
>
>
> Thanks,
>
> Adrian
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU



Re: [DISCUSS] KIP-714: Client metrics and observability

2023-08-10 Thread Andrew Schofield
Hi Doguscan,
Thanks for your question.

If the target broker is unreachable, the client can send the metrics to another
broker. It can select any of the other brokers for this purpose. What I expect 
in
practice is that it loses connection to the broker it’s being using for metrics,
chooses or establishes a connection to another broker, and then selects that
broker for subsequent metrics pushes.

Thanks,
Andrew

> On 8 Aug 2023, at 08:34, Doğuşcan Namal  wrote:
>
> Thanks for your answers Andrew. I share your pain that it took a while for
> you to get this KIP approved and you want to reduce the scope of it, will
> be happy to help you with the implementation :)
>
> Could you help me walk through what happens if the target broker is
> unreachable? Is the client going to drop these metrics or is it going to
> send it to the other brokers it is connected to? This information is
> crucial to understand the client side impact on leadership failovers.
> Moreover, in case of partial outages, such as only the network between the
> client and the broker is partitioned whereas the network within the cluster
> is healthy, practically there is no other way than the client side metrics
> to identify this problem.
>
> Doguscan
>
> On Fri, 4 Aug 2023 at 15:33, Andrew Schofield <
> andrew_schofield_j...@outlook.com> wrote:
>
>> Hi Doguscan,
>> Thanks for your comments. I’m glad to hear you’re interested in this KIP.
>>
>> 1) It is preferred that a client sends its metrics to the same broker
>> connection
>> but actually it is able to send them to any broker. As a result, if a
>> broker becomes
>> unhealthy, the client can push its metrics to any other broker. It seems
>> to me that
>> pushing to KRaft controllers instead just has the effect of increasing the
>> load on
>> the controllers, while still having the characteristic that an unhealthy
>> controller
>> would present inconvenience for collecting metrics.
>>
>> 2) When the `PushTelemetryRequest.Terminating` flag is set, the standard
>> request
>> throttling is not disabled. The metrics rate-limiting based on the push
>> interval is
>> not applied in this case for a single request for the combination of
>> client instance ID
>> and subscription ID.
>>
>> (I have corrected the KIP text because it erroneously said “client ID and
>> subscription ID”.
>>
>> 3) While this is a theoretical problem, I’m not keen on adding yet more
>> configurations
>> to the broker or client. The `interval.ms` configuration on the
>> CLIENT_METRICS
>> resource could perhaps have a minimum and maximum value to prevent
>> accidental
>> misconfiguration.
>>
>> 4) One of the reasons that this KIP has taken so long to get to this stage
>> is that
>> it tried to do many things all at once. So, it’s greatly simplified
>> compared with
>> 6 months ago. I can see the value of collecting client configurations for
>> problem
>> determination, but I don’t want to make this KIP more complicated. I think
>> the
>> idea has merit as a separate follow-on KIP. I would be happy to collaborate
>> with you on this.
>>
>> 5) The default is set to 5 minutes to minimise the load on the broker for
>> situations
>> in which the administrator didn’t set an interval on a metrics
>> subscription. To
>> use an interval of 1 minute, it is only necessary to set `interval.ms` on
>> the metrics
>> subscription to 6ms.
>>
>> 6) Uncompressed data is always supported. The KIP says:
>> "The CompressionType of NONE will not be
>> "present in the response from the broker, though the broker does support
>> uncompressed
>> "client telemetry if none of the accepted compression codecs are supported
>> by the client.”
>> So in your example, the client need only use CompressionType=NONE.
>>
>> Thanks,
>> Andrew
>>
>>> On 4 Aug 2023, at 14:04, Doğuşcan Namal 
>> wrote:
>>>
>>> Hi Andrew, thanks a lot for this KIP. I was thinking of something similar
>>> so thanks for writing this down 😊
>>>
>>>
>>>
>>> Couple of questions related to the design:
>>>
>>>
>>>
>>> 1. Can we investigate the option for using the Kraft controllers instead
>> of
>>> the brokers for sending metrics? The disadvantage of sending these
>> metrics
>>> directly to the brokers tightly couples metric observability to data
>> plane
>>> availability. If the broker is unhealthy then the root cause of an
>> incident
>>> is clear howe

Re: [DISCUSS] KIP-714: Client metrics and observability

2023-08-25 Thread Andrew Schofield
Hi Tom,
Thanks for your comments. Sorry for the delay in responding. They’re still 
useful
comments in spite of the fact that the voting has begun.

1) This is a good idea. I expect the client will emit the client instance ID
as a log line.

2) I will add PROXY protocol support to the future work. I agree.

3) Thanks for the suggestion.

4) Yes, the client authenticates before it can send any of the RPCs in this KIP.

5) a) Yes, a rogue client could in principle send metrics to all brokers 
resulting
in a lot of data being exported to the back end. Of course, a proper deployment
of a client telemetry reporter plugin would be instrumented to help the operator
diagnose this situation.

b) There are already instances of the client sending compressed data to
the broker. I think it is prudent to limit the maximum metrics payload size.
I will update the KIP accordingly.

6) Yes, bundling and relocating.

7) I will add a broker metric to help with diagnosis.

8) I will add some clarifying text. If the broker does not have a configured
metrics reporter that supports the new interface, it should not push metrics
and will not receive a metrics subscription. I am thinking over the options for
achieving this cleanly and will update the KIP.

Thanks for your interest in the KIP.

Thanks,
Andrew

> On 11 Aug 2023, at 09:48, Tom Bentley  wrote:
>
> Hi Andrew,
>
> Thanks for picking this KIP up. I know you've started a vote, so these are
> unhelpfully late... sorry about that, but hopefully they're still useful.
>
> 1. "The Kafka Java client provides an API for the application to read the
> generated client instance id to assist in mapping and identification of the
> client based on the collected metrics." In the multi-client, single-process
> model perhaps it would be desirable to have the option of including this in
> log messages emitted by the client library.
>
> 2. "Mapping the client instance id to an actual application instance
> running on a (virtual) machine can be done by inspecting the metrics
> resource labels, such as the client source address and source port, or
> security principal, all of which are added by the receiving broker." The
> specific example of correlation given here (source IP address) is
> problematic in environments where there may be network proxies (e.g.
> Kubernetes ingress) on the path between client and broker: The broker sees
> the IP address of the proxy. This is a rough edge which could be smoothed
> out if Kafka supported the PROXY protocol[1] which seems to have become
> something of a defacto standard. I'm not suggesting this need to be part of
> the KIP, but perhaps it could be added to Future Work?
> [1]: http://www.haproxy.org/download/2.9/doc/proxy-protocol.txt
>
> 3. Compression... just an idle idea, but I wonder if a useful further
> improvement in compression ratio could be achieve using zstd's support for
> dictionary compression[2]. I.e. a client could initially use training mode
> when sending metrics, but eventually include a dictionary to be used for
> subsequent metric sends. It's probably not worth the effort (at least for
> the initial implementation), but since you've gone to the effort of
> providing some numbers anyway, maybe it's not much additional effort to at
> least find out whether this makes a useful difference.
> [2]: http://facebook.github.io/zstd/#small-data
>
> 4. Maybe I didn't spot it, but I assume the initial
> GetTelemetrySubscriptionsRequest
> happens after authentication?
>
> 5. Rogue clients -- There are some interesting things to consider if we're
> trying to defend against a genuinely adversarial client.
>
> a) Client sends profiling information to all brokers at the maximum rate.
> Each broker forwards to the time series DB. Obviously this scales linearly
> with number of brokers, but it's clear that the load on the tsdb could be
> many times larger than users might expect.
> b) Client sends crafted compressed data which decompresses to require more
> memory that the broker can allocate.
>
> 6. Shadowing the OLTP and protobuf jars -- to be clear by this you mean
> both bundling _and_ relocating?
>
> 7. "In case the cluster load induced from metrics requests becomes
> unmanageable the remedy is to temporarily remove or limit configured
> metrics subscriptions.  " How would a user know that the observed load was
> due to handling metrics requests?
>
> 8. If I understand correctly, when the configured metrics reporter does not
> implement the new interface the client would still follow the described
> protocol only to have nowhere to send the metrics. Am I overlooking
> something?
>
> Thanks again,
>
> Tom
>
> On Fri, 11 Aug 2023 at 07:52, Andrew Schofield <
>

Re: [VOTE] KIP-970: Deprecate and remove Connect's redundant task configurations endpoint

2023-08-30 Thread Andrew Schofield
Thanks for the KIP. Looks good to me.

+1 (non-binding).

Andrew

> On 30 Aug 2023, at 18:07, Hector Geraldino (BLOOMBERG/ 919 3RD A) 
>  wrote:
>
> This makes sense to me, +1 (non-binding)
>
> From: dev@kafka.apache.org At: 08/30/23 02:58:59 UTC-4:00To:  
> dev@kafka.apache.org
> Subject: [VOTE] KIP-970: Deprecate and remove Connect's redundant task 
> configurations endpoint
>
> Hi all,
>
> This is the vote thread for KIP-970 which proposes deprecating (in the
> Apache Kafka 3.7 release) and eventually removing (in the next major Apache
> Kafka release - 4.0) Connect's redundant task configurations endpoint.
>
> KIP -
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-970%3A+Deprecate+and+remov
> e+Connect%27s+redundant+task+configurations+endpoint
>
> Discussion thread -
> https://lists.apache.org/thread/997qg9oz58kho3c19mdrjodv0n98plvj
>
> Thanks,
> Yash
>
>



Re: [VOTE] KIP-714: Client metrics and observability

2023-09-08 Thread Andrew Schofield
Bumping the voting thread for KIP-714.

So far, we have:
Non-binding +2 (Milind and Kirk), non-binding -1 (Ryanne)

Thanks,
Andrew

> On 4 Aug 2023, at 09:45, Andrew Schofield  wrote:
> 
> Hi,
> After almost 2 1/2 years in the making, I would like to call a vote for 
> KIP-714 
> (https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability).
> 
> This KIP aims to improve monitoring and troubleshooting of client performance 
> by enabling clients to push metrics to brokers.
> 
> I’d like to thank everyone that participated in the discussion, especially 
> the librdkafka team since one of the aims of the KIP is to enable any client 
> to participate, not just the Apache Kafka project’s Java clients.
> 
> Thanks,
> Andrew




Re: [VOTE] KIP-714: Client metrics and observability

2023-09-12 Thread Andrew Schofield
Hi Philip,
Thanks for your vote and interest in the KIP.

KIP-714 does not introduce any new client metrics, and that’s intentional. It 
does
tell how that all of the client metrics can have their names transformed into
equivalent "telemetry metric names”, and then potentially used in metrics
subscriptions.

I am interested in the idea of client’s leader epoch in this context, but I 
don’t have
an immediate plan for how best to do this, and it would take another KIP to 
enhance
existing metrics or introduce some new ones. Those would then naturally be
applicable to the metrics push introduced in KIP-714.

In a similar vein, there are no existing client metrics specifically for 
auto-commit.
We could add them to Kafka, but I really think this is just an example of 
asynchronous
commit in which the application has decided not to specify when the commit 
should
begin.

It is possible to increase the cadence of pushing by modifying the interval.ms
configuration property of the CLIENT_METRICS resource.

There is an “assigned-partitions” metric for each consumer, but not one for
active partitions. We could add one, again as a follow-on KIP.

I take your point about holding on to a connection in a channel which might
experience congestion. Do you have a suggestion for how to improve on this?
For example, the client does have the concept of a least-loaded node. Maybe
this is something we should investigate in the implementation and decide on the
best approach. In general, I think sticking with the same node for consecutive
pushes is best, but if you choose the “wrong” node to start with, it’s not 
ideal.

Thanks,
Andrew

> On 8 Sep 2023, at 19:29, Philip Nee  wrote:
>
> Hey Andrew -
>
> +1 but I don't have a binding vote!
>
> It took me a while to go through the KIP. Here are some of my notes during
> the reading:
>
> *Metrics*
> - Should we care about the client's leader epoch? There is a case where the
> user recreates the topic, but the consumer thinks it is still the same
> topic and therefore, attempts to start from an offset that doesn't exist.
> KIP-848 addresses this issue, but I can still see some potential benefits
> from knowing the client's epoch information.
> - I assume poll idle is similar to poll interval: I needed to read the
> description a few times.
> - I don't have a clear use case in mind for the commit latency, but I do
> think sometimes people lack clarity about how much progress was tracked by
> the auto-commit.  Would tracking auto-commit-related metrics be useful? I
> was thinking: the last offset committed or the actual cadence in ms.
> - Are there cases when we need to increase the cadence of telemetry data
> push? i.e. variable interval.
> - Thanks for implementing the randomized initial metric push; I think it is
> really important.
> - Is there a potential use case for tracking the number of active
> partitions? The consumer can pause partitions via API, during revocation,
> or during offset reset for the stream.
>
> *Connections*:
> - The KIP stated that it will keep the same connection until the connection
> is disconnected. I wonder if that could potentially cause congestion if it
> is already a busy channel, which leads to connection timeout and
> subsequently disconnection.
>
> Thanks,
> P
>
> On Fri, Sep 8, 2023 at 4:15 AM Andrew Schofield <
> andrew_schofield_j...@outlook.com> wrote:
>
>> Bumping the voting thread for KIP-714.
>>
>> So far, we have:
>> Non-binding +2 (Milind and Kirk), non-binding -1 (Ryanne)
>>
>> Thanks,
>> Andrew
>>
>>> On 4 Aug 2023, at 09:45, Andrew Schofield 
>> wrote:
>>>
>>> Hi,
>>> After almost 2 1/2 years in the making, I would like to call a vote for
>> KIP-714 (
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability
>> ).
>>>
>>> This KIP aims to improve monitoring and troubleshooting of client
>> performance by enabling clients to push metrics to brokers.
>>>
>>> I’d like to thank everyone that participated in the discussion,
>> especially the librdkafka team since one of the aims of the KIP is to
>> enable any client to participate, not just the Apache Kafka project’s Java
>> clients.
>>>
>>> Thanks,
>>> Andrew




Re: [DISCUSS] KIP-714: Client metrics and observability

2023-09-21 Thread Andrew Schofield
19. "If there is no client metrics receiver plugin configured on the
> broker, it will respond to GetTelemetrySubscriptionsRequest with
> RequestedMetrics set to Null and a -1 SubscriptionId. The client should
> send a new GetTelemetrySubscriptionsRequest after the PushIntervalMs has
> expired. This allows the metrics receiver to be enabled or disabled without
> having to restart the broker or reset the client connection."
> "no client metrics receiver plugin configured" is defined by no metric
> reporter implementing the ClientTelemetry interface, right? In that case,
> it would be useful to avoid the clients sending
> GetTelemetrySubscriptionsRequest periodically to preserve the current
> behavior.
>
> 120. GetTelemetrySubscriptionsResponseV0 and PushTelemetryRequestV0: Could
> we list error codes for each?
>
> 121. "ClientTelemetryReceiver.ClientTelemetryReceiver This method may be
> called from the request handling thread": Where else can this method be
> called?
>
> 122. DescribeConfigs/AlterConfigs already exist. Are we changing the ACL?
>
> Thanks,
>
> Jun
>
> On Mon, Jul 31, 2023 at 4:33 AM Andrew Schofield <
> andrew_schofield_j...@outlook.com> wrote:
>
>> Hi Milind,
>> Thanks for your question.
>>
>> On reflection, I agree that INVALID_RECORD is most likely to be caused by a
>> problem in the serialization in the client. I have changed the client
>> action in this case
>> to “Log an error and stop pushing metrics”.
>>
>> I have updated the KIP text accordingly.
>>
>> Thanks,
>> Andrew
>>
>>> On 31 Jul 2023, at 12:09, Milind Luthra 
>> wrote:
>>>
>>> Hi Andrew,
>>> Thanks for the clarifications.
>>>
>>> About 2b:
>>> In case a client has a bug while serializing, it might be difficult for
>> the
>>> client to recover from that without code changes. In that, it might be
>> good
>>> to just log the INVALID_RECORD as an error, and treat the error as fatal
>>> for the client (only fatal in terms of sending the metrics, the client
>> can
>>> keep functioning otherwise). What do you think?
>>>
>>> Thanks
>>> Milind
>>>
>>> On Mon, Jul 24, 2023 at 8:18 PM Andrew Schofield <
>>> andrew_schofield_j...@outlook.com> wrote:
>>>
>>>> Hi Milind,
>>>> Thanks for your questions about the KIP.
>>>>
>>>> 1) I did some archaeology and looked at historical versions of the KIP.
>> I
>>>> think this is
>>>> just a mistake. 5 minutes is the default metric push interval. 30
>> minutes
>>>> is a mystery
>>>> to me. I’ve updated the KIP.
>>>>
>>>> 2) I think there are two situations in which INVALID_RECORD might occur.
>>>> a) The client might perhaps be using a content-type that the broker does
>>>> not support.
>>>> The KIP mentions content-type as a future extension, but there’s only
>> one
>>>> supported
>>>> to start with. Until we have multiple content-types, this seems out of
>>>> scope. I think a
>>>> future KIP would add another error code for this.
>>>> b) The client might perhaps have a bug which means the metrics payload
>> is
>>>> malformed.
>>>> Logging a warning and attempting the next metrics push on the push
>>>> interval seems
>>>> appropriate.
>>>>
>>>> UNKNOWN_SUBSCRIPTION_ID would indeed be handled by making an immediate
>>>> GetTelemetrySubscriptionsRequest.
>>>>
>>>> UNSUPPORTED_COMPRESSION_TYPE seems like either a client bug or perhaps
>>>> a situation in which a broker sends a compression type in a
>>>> GetTelemetrySubscriptionsResponse
>>>> which is subsequently not supported when its used with a
>>>> PushTelemetryRequest.
>>>> We do want the client to have the opportunity to get an up-to-date list
>> of
>>>> supported
>>>> compression types. I think an immediate GetTelemetrySubscriptionsRequest
>>>> is appropriate.
>>>>
>>>> 3) If a client attempts a subsequent handshake with a Null
>>>> ClientInstanceId, the
>>>> receiving broker may not already know the client's existing
>>>> ClientInstanceId. If the
>>>> receiving broker knows the existing ClientInstanceId, it simply responds
>>>> the existing
>>>> value back to the client. If it does not know the e

Re: [DISCUSS] KIP-714: Client metrics and observability

2023-09-22 Thread Andrew Schofield
Hi Kirk,
Thanks for your question. You are correct that the presence or absence of the 
new RPCs in the
ApiVersionsResponse tells the client whether to request the telemetry 
subscriptions and push
metrics.

This is of course tricky in practice. It would be conceivable, as a cluster is 
upgraded to AK 3.7
or as a client metrics receiver plugin is deployed across the cluster, that a 
client connects to some
brokers that support the new RPCs and some that do not.

Here’s my suggestion:
* If a client is not connected to any brokers that support in the new RPCs, it 
cannot push metrics.
* If a client is only connected to brokers that support the new RPCs, it will 
use the new RPCs in
accordance with the KIP.
* If a client is connected to some brokers that support the new RPCs and some 
that do not, it will
use the new RPCs with the supporting subset of brokers in accordance with the 
KIP.

Comments?

Thanks,
Andrew

> On 22 Sep 2023, at 16:01, Kirk True  wrote:
>
> Hi Andrew/Jun,
>
> I want to make sure I understand question/comment #119… In the case where a 
> cluster without a metrics client receiver is later reconfigured and restarted 
> to include a metrics client receiver, do we want the client to thereafter 
> begin pushing metrics to the cluster? From Andrew’s response to question 
> #119, it sounds like we’re using the presence/absence of the relevant RPCs in 
> ApiVersionsResponse as the to-push-or-not-to-push indicator. Do I have that 
> correct?
>
> Thanks,
> Kirk
>
>> On Sep 21, 2023, at 7:42 AM, Andrew Schofield 
>>  wrote:
>>
>> Hi Jun,
>> Thanks for your comments. I’ve updated the KIP to clarify where necessary.
>>
>> 110. Yes, agree. The motivation section mentions this.
>>
>> 111. The replacement of ‘-‘ with ‘.’ for metric names and the replacement of
>> ‘-‘ with ‘_’ for attribute keys is following the OTLP guidelines. I think 
>> it’s a bit
>> of a debatable point. OTLP makes a distinction between a namespace and a
>> multi-word component. If it was “client.id” then “client” would be a 
>> namespace with
>> an attribute key “id”. But “client_id” is just a key. So, it was 
>> intentional, but debatable.
>>
>> 112. Thanks. The link target moved. Fixed.
>>
>> 113. Thanks. Fixed.
>>
>> 114.1. If a standard metric makes sense for a client, it should use the 
>> exact same
>> name. If a standard metric doesn’t make sense for a client, then it can omit 
>> that metric.
>>
>> For a required metric, the situation is stronger. All clients must implement 
>> these
>> metrics with these names in order to implement the KIP. But the required 
>> metrics
>> are essentially the number of connections and the request latency, which do 
>> not
>> reference the underlying implementation of the client (which 
>> producer.record.queue.time.max
>> of course does).
>>
>> I suppose someone might build a producer-only client that didn’t have 
>> consumer metrics.
>> In this case, the consumer metrics would conceptually have the value 0 and 
>> would not
>> need to be sent to the broker.
>>
>> 114.2. If a client does not implement some metrics, they will not be 
>> available for
>> analysis and troubleshooting. It just makes the ability to combine metrics 
>> from lots
>> different clients less complete.
>>
>> 115. I think it was probably a mistake to be so specific about threading in 
>> this KIP.
>> When the consumer threading refactor is complete, of course, it would do the 
>> appropriate
>> equivalent. I’ve added a clarification and massively simplified this section.
>>
>> 116. I removed “client.terminating”.
>>
>> 117. Yes. Horrid. Fixed.
>>
>> 118. The Terminating flag just indicates that this is the final 
>> PushTelemetryRequest
>> from this client. Any subsequent request will be rejected. I think this flag 
>> should remain.
>>
>> 119. Good catch. This was actually contradicting another part of the KIP. 
>> The current behaviour
>> is indeed preserved. If the broker doesn’t have a client metrics receiver 
>> plugin, the new RPCs
>> in this KIP are “turned off” and not reported in ApiVersionsResponse. The 
>> client will not
>> attempt to push metrics.
>>
>> 120. The error handling table lists the error codes for 
>> PushTelemetryResponse. I’ve added one
>> but it looked good to me. GetTelemetrySubscriptions doesn’t have any error 
>> codes, since the
>> situation in which the client telemetry is not supported is handled by the 
>> RPCs not being offered
>> by the broker.
>>
>> 121. Again

Re: [DISCUSS] KIP-714: Client metrics and observability

2023-09-22 Thread Andrew Schofield
Hi Philip,
No, I do not think it should actively search for a broker that supports the new
RPCs. In general, either all of the brokers or none of the brokers will support 
it.
In the window, where the cluster is being upgraded or client telemetry is being
enabled, there might be a mixed situation. I wouldn’t put too much effort into
this mixed scenario. As the client finds brokers which support the new RPCs,
it can begin to follow the KIP-714 mechanism.

Thanks,
Andrew

> On 22 Sep 2023, at 20:01, Philip Nee  wrote:
>
> Hi Andrew -
>
> Question on top of your answers: Do you think the client should actively
> search for a broker that supports this RPC? As previously mentioned, the
> broker uses the leastLoadedNode to find its first connection (am
> I correct?), and what if that broker doesn't support the metric push?
>
> P
>
> On Fri, Sep 22, 2023 at 10:20 AM Andrew Schofield <
> andrew_schofield_j...@outlook.com> wrote:
>
>> Hi Kirk,
>> Thanks for your question. You are correct that the presence or absence of
>> the new RPCs in the
>> ApiVersionsResponse tells the client whether to request the telemetry
>> subscriptions and push
>> metrics.
>>
>> This is of course tricky in practice. It would be conceivable, as a
>> cluster is upgraded to AK 3.7
>> or as a client metrics receiver plugin is deployed across the cluster,
>> that a client connects to some
>> brokers that support the new RPCs and some that do not.
>>
>> Here’s my suggestion:
>> * If a client is not connected to any brokers that support in the new
>> RPCs, it cannot push metrics.
>> * If a client is only connected to brokers that support the new RPCs, it
>> will use the new RPCs in
>> accordance with the KIP.
>> * If a client is connected to some brokers that support the new RPCs and
>> some that do not, it will
>> use the new RPCs with the supporting subset of brokers in accordance with
>> the KIP.
>>
>> Comments?
>>
>> Thanks,
>> Andrew
>>
>>> On 22 Sep 2023, at 16:01, Kirk True  wrote:
>>>
>>> Hi Andrew/Jun,
>>>
>>> I want to make sure I understand question/comment #119… In the case
>> where a cluster without a metrics client receiver is later reconfigured and
>> restarted to include a metrics client receiver, do we want the client to
>> thereafter begin pushing metrics to the cluster? From Andrew’s response to
>> question #119, it sounds like we’re using the presence/absence of the
>> relevant RPCs in ApiVersionsResponse as the to-push-or-not-to-push
>> indicator. Do I have that correct?
>>>
>>> Thanks,
>>> Kirk
>>>
>>>> On Sep 21, 2023, at 7:42 AM, Andrew Schofield <
>> andrew_schofield_j...@outlook.com> wrote:
>>>>
>>>> Hi Jun,
>>>> Thanks for your comments. I’ve updated the KIP to clarify where
>> necessary.
>>>>
>>>> 110. Yes, agree. The motivation section mentions this.
>>>>
>>>> 111. The replacement of ‘-‘ with ‘.’ for metric names and the
>> replacement of
>>>> ‘-‘ with ‘_’ for attribute keys is following the OTLP guidelines. I
>> think it’s a bit
>>>> of a debatable point. OTLP makes a distinction between a namespace and a
>>>> multi-word component. If it was “client.id” then “client” would be a
>> namespace with
>>>> an attribute key “id”. But “client_id” is just a key. So, it was
>> intentional, but debatable.
>>>>
>>>> 112. Thanks. The link target moved. Fixed.
>>>>
>>>> 113. Thanks. Fixed.
>>>>
>>>> 114.1. If a standard metric makes sense for a client, it should use the
>> exact same
>>>> name. If a standard metric doesn’t make sense for a client, then it can
>> omit that metric.
>>>>
>>>> For a required metric, the situation is stronger. All clients must
>> implement these
>>>> metrics with these names in order to implement the KIP. But the
>> required metrics
>>>> are essentially the number of connections and the request latency,
>> which do not
>>>> reference the underlying implementation of the client (which
>> producer.record.queue.time.max
>>>> of course does).
>>>>
>>>> I suppose someone might build a producer-only client that didn’t have
>> consumer metrics.
>>>> In this case, the consumer metrics would conceptually have the value 0
>> and would not
>>>> need to be sent to the broker.
>>>>
>>>> 114.2. If a client does not imp

Re: [DISCUSS] KIP-714: Client metrics and observability

2023-10-03 Thread Andrew Schofield
Hi David,
Thanks for your interest in KIP-714.

Because this KIP is under development at the same time as KIP-848, it will
need to support both the existing KafkaConsumer code and the refactored code
being worked on under KIP-848. I’ve updated the Threading section accordingly.

Thanks,
Andrew

> On 30 Sep 2023, at 01:45, David Jacot  wrote:
>
> Hi Andrew,
>
> Thanks for driving this one. I haven't read all the KIP yet but I already
> have an initial question. In the Threading section, it is written
> "KafkaConsumer: the "background" thread (based on the consumer threading
> refactor which is underway)". If I understand this correctly, it means
> that KIP-714 won't work if the "old consumer" is used. Am I correct?
>
> Cheers,
> David
>
>
> On Fri, Sep 22, 2023 at 12:18 PM Andrew Schofield <
> andrew_schofield_j...@outlook.com> wrote:
>
>> Hi Philip,
>> No, I do not think it should actively search for a broker that supports
>> the new
>> RPCs. In general, either all of the brokers or none of the brokers will
>> support it.
>> In the window, where the cluster is being upgraded or client telemetry is
>> being
>> enabled, there might be a mixed situation. I wouldn’t put too much effort
>> into
>> this mixed scenario. As the client finds brokers which support the new
>> RPCs,
>> it can begin to follow the KIP-714 mechanism.
>>
>> Thanks,
>> Andrew
>>
>>> On 22 Sep 2023, at 20:01, Philip Nee  wrote:
>>>
>>> Hi Andrew -
>>>
>>> Question on top of your answers: Do you think the client should actively
>>> search for a broker that supports this RPC? As previously mentioned, the
>>> broker uses the leastLoadedNode to find its first connection (am
>>> I correct?), and what if that broker doesn't support the metric push?
>>>
>>> P
>>>
>>> On Fri, Sep 22, 2023 at 10:20 AM Andrew Schofield <
>>> andrew_schofield_j...@outlook.com> wrote:
>>>
>>>> Hi Kirk,
>>>> Thanks for your question. You are correct that the presence or absence
>> of
>>>> the new RPCs in the
>>>> ApiVersionsResponse tells the client whether to request the telemetry
>>>> subscriptions and push
>>>> metrics.
>>>>
>>>> This is of course tricky in practice. It would be conceivable, as a
>>>> cluster is upgraded to AK 3.7
>>>> or as a client metrics receiver plugin is deployed across the cluster,
>>>> that a client connects to some
>>>> brokers that support the new RPCs and some that do not.
>>>>
>>>> Here’s my suggestion:
>>>> * If a client is not connected to any brokers that support in the new
>>>> RPCs, it cannot push metrics.
>>>> * If a client is only connected to brokers that support the new RPCs, it
>>>> will use the new RPCs in
>>>> accordance with the KIP.
>>>> * If a client is connected to some brokers that support the new RPCs and
>>>> some that do not, it will
>>>> use the new RPCs with the supporting subset of brokers in accordance
>> with
>>>> the KIP.
>>>>
>>>> Comments?
>>>>
>>>> Thanks,
>>>> Andrew
>>>>
>>>>> On 22 Sep 2023, at 16:01, Kirk True  wrote:
>>>>>
>>>>> Hi Andrew/Jun,
>>>>>
>>>>> I want to make sure I understand question/comment #119… In the case
>>>> where a cluster without a metrics client receiver is later reconfigured
>> and
>>>> restarted to include a metrics client receiver, do we want the client to
>>>> thereafter begin pushing metrics to the cluster? From Andrew’s response
>> to
>>>> question #119, it sounds like we’re using the presence/absence of the
>>>> relevant RPCs in ApiVersionsResponse as the to-push-or-not-to-push
>>>> indicator. Do I have that correct?
>>>>>
>>>>> Thanks,
>>>>> Kirk
>>>>>
>>>>>> On Sep 21, 2023, at 7:42 AM, Andrew Schofield <
>>>> andrew_schofield_j...@outlook.com> wrote:
>>>>>>
>>>>>> Hi Jun,
>>>>>> Thanks for your comments. I’ve updated the KIP to clarify where
>>>> necessary.
>>>>>>
>>>>>> 110. Yes, agree. The motivation section mentions this.
>>>>>>
>>>>>> 111. The replacement of ‘-‘ with ‘.’ for metric names and the
>>&

Re: [DISCUSS] KIP-714: Client metrics and observability

2023-10-06 Thread Andrew Schofield
Hi Matthias,
Thanks for your comments. I agree that a follow-up KIP for Kafka Streams makes 
sense. This KIP currently has made a bit
of an effort to embrace KS, but it’s not enough by a long way.

I have removed `application.id <http://application.id/>`. This should be done 
properly in the follow-up KIP. I don’t believe there’s a downside to
removing it from this KIP.

I have reworded the statement about temporarily. In practice, the 
implementation of this KIP that’s going on while the voting
progresses happens to use delta temporality, but that’s an implementation 
detail. Supporting clients must support both
temporalities.

I thought about exposing the client instance ID as a metric, but non-numeric 
metrics are not usual practice and tools
do not universally support them. I don’t think the KIP is improved by adding 
one now.

I have also added constants for the various Config classes for 
ENABLE_METRICS_PUSH_CONFIG, including to
StreamsConfig. It’s best to be explicit about this.

Thanks,
Andrew

> On 2 Oct 2023, at 23:47, Matthias J. Sax  wrote:
>
> Hi,
>
> I did not pay attention to this KIP in the past; seems it was on-hold for a 
> while.
>
> Overall it sounds very useful, and I think we should extend this with a 
> follow up KIP for Kafka Streams. What is unclear to me at this point is the 
> statement:
>
>> Kafka Streams applications have an application.id configured and this 
>> identifier should be included as the application_id metrics label.
>
> The `application.id` is currently only used as the (main) consumer's 
> `group.id` (and is part of an auto-generated `client.id` if the user does not 
> set one).
>
> This comment related to:
>
>> The following labels should be added by the client as appropriate before 
>> metrics are pushed.
>
> Given that Kafka Streams uses the consumer/producer/admin client as "black 
> boxes", a client does at this point not know that it's part of a Kafka 
> Streams application, and thus, it won't be able to attach any such label to 
> the metrics it sends. (Also producer and admin don't even know the value of 
> `application.id` -- only the (main) consumer, indirectly via `group.id`, but 
> also restore and global consumer don't know it, because they don't have 
> `group.id` set).
>
> While I am totally in favor of the proposal, I am wondering how we intent to 
> implement it in clean way? Or would we do ok to have some internal client 
> APIs that KS can use to "register" itself with the client?
>
>
>
>> While clients must support both temporalities, the broker will initially 
>> only send GetTelemetrySubscriptionsResponse.DeltaTemporality=True
>
> Not sure if I can follow. How make the decision about DELTA or CUMULATIVE 
> metrics? Should the broker side plugin not decide what metrics it what to 
> receive in which form? So what does "initially" mean -- the broker won't ship 
> with a default plugin implementation?
>
>
>
>> The following method is added to the Producer, Consumer, and Admin client 
>> interfaces:
>
> Should we add anything to Kafka Streams to expose the underlying clients' 
> assigned client-instance-ids programmatically? I am also wondering if clients 
> should report their assigned client-instance-ids as metrics itself (for this 
> case, Kafka Streams won't need to do anything, because we already expose all 
> client metrics).
>
> If we add anything programmatic, we need to make it simple, given that Kafka 
> Streams has many clients per `StreamThread` and may have multiple threads.
>
>
>
>> enable.metrics.push
> It might be worth to add this to `StreamsConfig`, too? It set via 
> StreamsConfig, we would forward it to all clients automatically.
>
>
>
>
> -Matthias
>
>
> On 9/29/23 5:45 PM, David Jacot wrote:
>> Hi Andrew,
>> Thanks for driving this one. I haven't read all the KIP yet but I already
>> have an initial question. In the Threading section, it is written
>> "KafkaConsumer: the "background" thread (based on the consumer threading
>> refactor which is underway)". If I understand this correctly, it means
>> that KIP-714 won't work if the "old consumer" is used. Am I correct?
>> Cheers,
>> David
>> On Fri, Sep 22, 2023 at 12:18 PM Andrew Schofield <
>> andrew_schofield_j...@outlook.com> wrote:
>>> Hi Philip,
>>> No, I do not think it should actively search for a broker that supports
>>> the new
>>> RPCs. In general, either all of the brokers or none of the brokers will
>>> support it.
>>> In the window, where the cluster is being upgraded or client telemetry

Re: [DISCUSS] KIP-714: Client metrics and observability

2023-10-09 Thread Andrew Schofield
Hi Matthias,
Good point. Makes sense to me.

Is this something that can also be included in the proposed Kafka Streams 
follow-on KIP, or would you prefer that I add it to KIP-714?
I have a slight preference for the former to put all of the KS enhancements 
into a separate KIP.

Thanks,
Andrew

> On 7 Oct 2023, at 02:12, Matthias J. Sax  wrote:
>
> Thanks Andrew. SGTM.
>
> One point you did not address is the idea to add a method to `KafkaStreams` 
> similar to the proposed `clientInstanceId()` that will be added to 
> consumer/producer/admin clients.
>
> Without addressing this, Kafka Streams users won't have a way to get the 
> assigned `instanceId` of the internally created clients, and thus it would be 
> very difficult for them to know which metrics that the broker receives belong 
> to a Kafka Streams app. It seems they would only find the `instanceIds` in 
> the log4j output if they enable client logging?
>
> Of course, because there is multiple clients inside Kafka Streams, the return 
> type cannot be an single "String", but must be some some complex data 
> structure -- we could either add a new class, or return a Map 
> using a client key that maps to the `instanceId`.
>
> For example we could use the following key:
>
>   [Global]StreamThread[-][-restore][consumer|producer]
>
> (Of course, only the valid combination.)
>
> Or maybe even better, we might want to return a `Future` because collection 
> all the `instanceId` might be a blocking all on each client? I have already a 
> few idea how it could be implemented but I don't think it must be discussed 
> on the KIP, as it's an implementation detail.
>
> Thoughts?
>
>
> -Matthias
>
> On 10/6/23 4:21 AM, Andrew Schofield wrote:
>> Hi Matthias,
>> Thanks for your comments. I agree that a follow-up KIP for Kafka Streams 
>> makes sense. This KIP currently has made a bit
>> of an effort to embrace KS, but it’s not enough by a long way.
>> I have removed `application.id <http://application.id/>`. This should be 
>> done properly in the follow-up KIP. I don’t believe there’s a downside to
>> removing it from this KIP.
>> I have reworded the statement about temporarily. In practice, the 
>> implementation of this KIP that’s going on while the voting
>> progresses happens to use delta temporality, but that’s an implementation 
>> detail. Supporting clients must support both
>> temporalities.
>> I thought about exposing the client instance ID as a metric, but non-numeric 
>> metrics are not usual practice and tools
>> do not universally support them. I don’t think the KIP is improved by adding 
>> one now.
>> I have also added constants for the various Config classes for 
>> ENABLE_METRICS_PUSH_CONFIG, including to
>> StreamsConfig. It’s best to be explicit about this.
>> Thanks,
>> Andrew
>>> On 2 Oct 2023, at 23:47, Matthias J. Sax  wrote:
>>>
>>> Hi,
>>>
>>> I did not pay attention to this KIP in the past; seems it was on-hold for a 
>>> while.
>>>
>>> Overall it sounds very useful, and I think we should extend this with a 
>>> follow up KIP for Kafka Streams. What is unclear to me at this point is the 
>>> statement:
>>>
>>>> Kafka Streams applications have an application.id configured and this 
>>>> identifier should be included as the application_id metrics label.
>>>
>>> The `application.id` is currently only used as the (main) consumer's 
>>> `group.id` (and is part of an auto-generated `client.id` if the user does 
>>> not set one).
>>>
>>> This comment related to:
>>>
>>>> The following labels should be added by the client as appropriate before 
>>>> metrics are pushed.
>>>
>>> Given that Kafka Streams uses the consumer/producer/admin client as "black 
>>> boxes", a client does at this point not know that it's part of a Kafka 
>>> Streams application, and thus, it won't be able to attach any such label to 
>>> the metrics it sends. (Also producer and admin don't even know the value of 
>>> `application.id` -- only the (main) consumer, indirectly via `group.id`, 
>>> but also restore and global consumer don't know it, because they don't have 
>>> `group.id` set).
>>>
>>> While I am totally in favor of the proposal, I am wondering how we intent 
>>> to implement it in clean way? Or would we do ok to have some internal 
>>> client APIs that KS can use to "register" itself with the client?
>>>
>>>
>>&g

Re: [DISCUSS] KIP-714: Client metrics and observability

2023-10-10 Thread Andrew Schofield
Matthias,
Yes, I think that’s a sensible way forward and the interface you propose looks 
good. I’ll update the KIP accordingly.

Thanks,
Andrew

> On 10 Oct 2023, at 23:01, Matthias J. Sax  wrote:
>
> Andrew,
>
> yes I would like to get this change into KIP-714 right way. Seems to be 
> important, as we don't know if/when a follow-up KIP for Kafka Streams would 
> land.
>
> I was also thinking (and discussed with a few others) how to expose it, and 
> we would propose the following:
>
> We add a new method to `KafkaStreams` class:
>
>public ClientsInstanceIds clientsInstanceIds(Duration timeout);
>
> The returned object is like below:
>
>  public class ClientsInstanceIds {
>// we only have a single admin client per KS instance
>String adminInstanceId();
>
>// we only have a single global consumer per KS instance (if any)
>// Optional<> because we might not have global-thread
>Optional globalConsumerInstanceId();
>
>// return a  ClientInstanceId> mapping
>// for the underlying (restore-)consumers/producers
>Map mainConsumerInstanceIds();
>Map restoreConsumerInstanceIds();
>Map producerInstanceIds();
> }
>
> For the `threadKey`, we would use some pattern like this:
>
>  [Stream|StateUpdater]Thread-
>
>
> Would this work from your POV?
>
>
>
> -Matthias
>
>
> On 10/9/23 2:15 AM, Andrew Schofield wrote:
>> Hi Matthias,
>> Good point. Makes sense to me.
>> Is this something that can also be included in the proposed Kafka Streams 
>> follow-on KIP, or would you prefer that I add it to KIP-714?
>> I have a slight preference for the former to put all of the KS enhancements 
>> into a separate KIP.
>> Thanks,
>> Andrew
>>> On 7 Oct 2023, at 02:12, Matthias J. Sax  wrote:
>>>
>>> Thanks Andrew. SGTM.
>>>
>>> One point you did not address is the idea to add a method to `KafkaStreams` 
>>> similar to the proposed `clientInstanceId()` that will be added to 
>>> consumer/producer/admin clients.
>>>
>>> Without addressing this, Kafka Streams users won't have a way to get the 
>>> assigned `instanceId` of the internally created clients, and thus it would 
>>> be very difficult for them to know which metrics that the broker receives 
>>> belong to a Kafka Streams app. It seems they would only find the 
>>> `instanceIds` in the log4j output if they enable client logging?
>>>
>>> Of course, because there is multiple clients inside Kafka Streams, the 
>>> return type cannot be an single "String", but must be some some complex 
>>> data structure -- we could either add a new class, or return a 
>>> Map using a client key that maps to the `instanceId`.
>>>
>>> For example we could use the following key:
>>>
>>>   [Global]StreamThread[-][-restore][consumer|producer]
>>>
>>> (Of course, only the valid combination.)
>>>
>>> Or maybe even better, we might want to return a `Future` because collection 
>>> all the `instanceId` might be a blocking all on each client? I have already 
>>> a few idea how it could be implemented but I don't think it must be 
>>> discussed on the KIP, as it's an implementation detail.
>>>
>>> Thoughts?
>>>
>>>
>>> -Matthias
>>>
>>> On 10/6/23 4:21 AM, Andrew Schofield wrote:
>>>> Hi Matthias,
>>>> Thanks for your comments. I agree that a follow-up KIP for Kafka Streams 
>>>> makes sense. This KIP currently has made a bit
>>>> of an effort to embrace KS, but it’s not enough by a long way.
>>>> I have removed `application.id <http://application.id/>`. This should be 
>>>> done properly in the follow-up KIP. I don’t believe there’s a downside to
>>>> removing it from this KIP.
>>>> I have reworded the statement about temporarily. In practice, the 
>>>> implementation of this KIP that’s going on while the voting
>>>> progresses happens to use delta temporality, but that’s an implementation 
>>>> detail. Supporting clients must support both
>>>> temporalities.
>>>> I thought about exposing the client instance ID as a metric, but 
>>>> non-numeric metrics are not usual practice and tools
>>>> do not universally support them. I don’t think the KIP is improved by 
>>>> adding one now.
>>>> I have also added constants for the various Config classes for 
>>>> ENABLE_METRICS_PUSH_CONFIG, including to
>>&g

Re: [DISCUSS] KIP-932: Queues for Kafka

2023-10-11 Thread Andrew Schofield
Hi Jack,
Thanks for your comments.

I have added a new section on Log Retention which describes the behaviour of 
the SPSO as the LSO advances. That makes total sense
and was an omission from the KIP.

I have added the other ideas as potential future work. I do like the idea of 
having the SPSO influence the advancements of the LSO
for topics which are primarily being using with share groups.

I have published an updated version of the KIP.

Thanks,
Andrew

> On 4 Oct 2023, at 10:09, Jack Vanlightly  wrote:
> 
> I would like to see more explicit discussion of topic retention and share 
> groups. There are a few options here from simple to more sophisticated. There 
> are also topic-level and share-group level options.
> 
> The simple thing would be to ensure that the SPSO of each share group is 
> bounded by the Log Start Offset (LSO) of each partition which itself is 
> managed by the retention policy. This is a topic-level control which applies 
> to all share-groups. I would say that this shared retention is the largest 
> drawback of modeling queues on shared logs and this is worth noting.
> 
> More sophisticated approaches can be to allow the LSO to advance not (only) 
> by retention policy but by the advancement of the lowest SPSO. This can keep 
> the amount of data lower by garbage collecting messages that have been 
> acknowledged by all share groups. Some people may like that behaviour on 
> those topics where share groups are the only consumption model and no replay 
> is needed.
> 
> There are per-share-group possibilities such as share-group TTLs where 
> messages can be archived on a per share group basis.
> 
> Thanks
> Jack



Re: [DISCUSS] KIP-714: Client metrics and observability

2023-10-12 Thread Andrew Schofield
;>> the timeout
>>> parameter.
>>>
>>> So the API as proposed makes sense to me.
>>>
>>>
>>> On Wed, Oct 11, 2023 at 6:48 PM Matthias J. Sax  wrote:
>>>
>>>> In can answer 130 and 131.
>>>>
>>>> 130) We cannot guarantee that all clients are already initialized due to
>>>> race conditions. We plan to not allow calling
>>>> `KafkaStreams#clientsInstanceIds()` when the state is not RUNNING (or
>>>> REBALANCING) though -- guess this slipped on the KIP and should be
>>>> added? But because StreamThreads can be added dynamically (and producer
>>>> might be created dynamically at runtime; cf below), we still cannot
>>>> guarantee that all clients are already initialized when the method is
>>>> called. Of course, we assume that all clients are most likely initialize
>>>> on the happy path, and blocking calls to `client.clientInstanceId()`
>>>> should be rare.
>>>>
>>>> To address the worst case, we won't do a naive implementation and just
>>>> loop over all clients, but fan-out the call to the different
>>>> StreamThreads (and GlobalStreamThread if it exists), and use Futures to
>>>> gather the results.
>>>>
>>>> Currently, `StreamThreads` has 3 clients (if ALOS or EOSv2 is used), so
>>>> we might do 3 blocking calls in the worst case (for EOSv1 we get a
>>>> producer per tasks, and we might end up doing more blocking calls if the
>>>> producers are not initialized yet). Note that EOSv1 is already
>>>> deprecated, and we are also working on thread refactoring that will
>>>> reduce the number of client on StreamThread to 2 -- and we have more
>>>> refactoring planned to reduce the number of clients even further.
>>>>
>>>> Inside `KafakStreams#clientsInstanceIds()` we might only do single
>>>> blocking call for the admin client (ie, `admin.clientInstanceId()`).
>>>>
>>>> I agree that we need to do some clever timeout management, but it seems
>>>> to be more of an implementation detail?
>>>>
>>>> Do you have any particular concerns, or does the proposed implementation
>>>> as sketched above address your question?
>>>>
>>>>
>>>> 130) If the Topology does not have a global-state-store, there won't be
>>>> a GlobalThread and thus not global consumer. Thus, we return an Optional.
>>>>
>>>>
>>>>
>>>> On three related question for Andrew.
>>>>
>>>> (1) Why is the method called `clientInstanceId()` and not just plain
>>>> `instanceId()`?
>>>>
>>>> (2) Why so we return a `String` while but not a UUID type? The added
>>>> protocol request/response classes use UUIDs.
>>>>
>>>> (3) Would it make sense to have an overloaded `clientInstanceId()`
>>>> method that does not take any parameter but uses `default.api.timeout`
>>>> config (this config does no exist on the producer though, so we could
>>>> only have it for consumer and admin at this point). We could of course
>>>> also add overloads like this later if user request them (and/or add
>>>> `default.api.timeout.ms` to the producer, too).
>>>>
>>>> Btw: For KafkaStreams, I think `clientsInstanceIds` still makes sense as
>>>> a method name though, as `KafkaStreams` itself does not have an
>>>> `instanceId` -- we can also not have a timeout-less overload, because
>>>> `KafkaStreams` does not have a `default.api.timeout.ms` config either
>>>> (and I don't think it make sense to add).
>>>>
>>>>
>>>>
>>>> -Matthias
>>>>
>>>> On 10/11/23 5:07 PM, Jun Rao wrote:
>>>>> Hi, Andrew,
>>>>>
>>>>> Thanks for the updated KIP. Just a few more minor comments.
>>>>>
>>>>> 130. KafkaStreams.clientsInstanceId(Duration timeout): Does it wait for
>>>> all
>>>>> consumer/producer/adminClient instances to be initialized? Are all
>>> those
>>>>> instances created during KafkaStreams initialization?
>>>>>
>>>>> 131. Why does globalConsumerInstanceId() return Optional while
>>>>> other consumer instances don't return Optional?
>>>>>
>>>>> 132. ClientMetricsSubscriptionRequestCount: Do we need this since we
>>>> have a
>>&g

Re: [DISCUSS] KIP-714: Client metrics and observability

2023-10-12 Thread Andrew Schofield
Hi Jun,
Thanks for your comments.

130. As Matthias described, and I am adding to the KIP, the 
`KafkaStreams#clientInstanceIds` method
is only permitted when the state is RUNNING or REBALANCING. Also, clients can 
be added dynamically
so the maps might change over time. If it’s in a permitted state, the method is 
prepared to wait up to the
supplied timeout to get the client instance ids. It does not return a partial 
result - it returns a result or
fails.

131. I’ve refactored the `ClientsInstanceIds` object and the global consumer is 
now part of the map
of consumers. There is no need for the Optional any longer. I’ve also renamed 
it `ClientInstanceIds`.

132. My reading of 
`(kafka.network:type=RequestMetrics,name=RequestsPerSec,request=*)` is that
It does not support every request type - it supports Produce, FetchConsumer and 
FetchFollower.
Consequently, I think the ClientMetricsSubscriptionRequestCount is not 
instantly obsolete.

If I’ve misunderstood, please let me know.

Thanks,
Andrew


> On 12 Oct 2023, at 01:07, Jun Rao  wrote:
>
> Hi, Andrew,
>
> Thanks for the updated KIP. Just a few more minor comments.
>
> 130. KafkaStreams.clientsInstanceId(Duration timeout): Does it wait for all
> consumer/producer/adminClient instances to be initialized? Are all those
> instances created during KafkaStreams initialization?
>
> 131. Why does globalConsumerInstanceId() return Optional while
> other consumer instances don't return Optional?
>
> 132. ClientMetricsSubscriptionRequestCount: Do we need this since we have a
> set of generic metrics
> (kafka.network:type=RequestMetrics,name=RequestsPerSec,request=*) that
> report Request rate for every request type?
>
> Thanks,
>
> Jun
>
> On Wed, Oct 11, 2023 at 1:47 PM Matthias J. Sax  wrote:
>
>> Thanks!
>>
>> On 10/10/23 11:31 PM, Andrew Schofield wrote:
>>> Matthias,
>>> Yes, I think that’s a sensible way forward and the interface you propose
>> looks good. I’ll update the KIP accordingly.
>>>
>>> Thanks,
>>> Andrew
>>>
>>>> On 10 Oct 2023, at 23:01, Matthias J. Sax  wrote:
>>>>
>>>> Andrew,
>>>>
>>>> yes I would like to get this change into KIP-714 right way. Seems to be
>> important, as we don't know if/when a follow-up KIP for Kafka Streams would
>> land.
>>>>
>>>> I was also thinking (and discussed with a few others) how to expose it,
>> and we would propose the following:
>>>>
>>>> We add a new method to `KafkaStreams` class:
>>>>
>>>>public ClientsInstanceIds clientsInstanceIds(Duration timeout);
>>>>
>>>> The returned object is like below:
>>>>
>>>>  public class ClientsInstanceIds {
>>>>// we only have a single admin client per KS instance
>>>>String adminInstanceId();
>>>>
>>>>// we only have a single global consumer per KS instance (if any)
>>>>// Optional<> because we might not have global-thread
>>>>Optional globalConsumerInstanceId();
>>>>
>>>>    // return a  ClientInstanceId> mapping
>>>>// for the underlying (restore-)consumers/producers
>>>>Map mainConsumerInstanceIds();
>>>>Map restoreConsumerInstanceIds();
>>>>Map producerInstanceIds();
>>>> }
>>>>
>>>> For the `threadKey`, we would use some pattern like this:
>>>>
>>>>  [Stream|StateUpdater]Thread-
>>>>
>>>>
>>>> Would this work from your POV?
>>>>
>>>>
>>>>
>>>> -Matthias
>>>>
>>>>
>>>> On 10/9/23 2:15 AM, Andrew Schofield wrote:
>>>>> Hi Matthias,
>>>>> Good point. Makes sense to me.
>>>>> Is this something that can also be included in the proposed Kafka
>> Streams follow-on KIP, or would you prefer that I add it to KIP-714?
>>>>> I have a slight preference for the former to put all of the KS
>> enhancements into a separate KIP.
>>>>> Thanks,
>>>>> Andrew
>>>>>> On 7 Oct 2023, at 02:12, Matthias J. Sax  wrote:
>>>>>>
>>>>>> Thanks Andrew. SGTM.
>>>>>>
>>>>>> One point you did not address is the idea to add a method to
>> `KafkaStreams` similar to the proposed `clientInstanceId()` that will be
>> added to consumer/producer/admin clients.
>>>>>>
>>>>>> Without addressing this, Kafka Streams users won't have a way to get
>&g

Re: [DISCUSS] KIP-714: Client metrics and observability

2023-10-12 Thread Andrew Schofield
Hi Matthias,
I’ll answer (1) to (3).

(1) The KIP uses the phrase “client instance id” and the method name mirrors 
that. Personally, I’m
comfortable with the current name.

(2) That’s a good point. I’ll update it to use a Kafka Uuid instead.

(3) Although it’s a trivial thing to add an overload with no timeout parameter, 
the fact that it doesn’t really fit in the
Producer interface makes me prefer not to. I’d rather keep the timeout explicit 
on the method and keep the signature the
same across all three client interfaces that implement it.

I’ll update the KIP now.l

Thanks,
Andrew

> On 12 Oct 2023, at 02:47, Matthias J. Sax  wrote:
>
> In can answer 130 and 131.
>
> 130) We cannot guarantee that all clients are already initialized due to race 
> conditions. We plan to not allow calling `KafkaStreams#clientsInstanceIds()` 
> when the state is not RUNNING (or REBALANCING) though -- guess this slipped 
> on the KIP and should be added? But because StreamThreads can be added 
> dynamically (and producer might be created dynamically at runtime; cf below), 
> we still cannot guarantee that all clients are already initialized when the 
> method is called. Of course, we assume that all clients are most likely 
> initialize on the happy path, and blocking calls to 
> `client.clientInstanceId()` should be rare.
>
> To address the worst case, we won't do a naive implementation and just loop 
> over all clients, but fan-out the call to the different StreamThreads (and 
> GlobalStreamThread if it exists), and use Futures to gather the results.
>
> Currently, `StreamThreads` has 3 clients (if ALOS or EOSv2 is used), so we 
> might do 3 blocking calls in the worst case (for EOSv1 we get a producer per 
> tasks, and we might end up doing more blocking calls if the producers are not 
> initialized yet). Note that EOSv1 is already deprecated, and we are also 
> working on thread refactoring that will reduce the number of client on 
> StreamThread to 2 -- and we have more refactoring planned to reduce the 
> number of clients even further.
>
> Inside `KafakStreams#clientsInstanceIds()` we might only do single blocking 
> call for the admin client (ie, `admin.clientInstanceId()`).
>
> I agree that we need to do some clever timeout management, but it seems to be 
> more of an implementation detail?
>
> Do you have any particular concerns, or does the proposed implementation as 
> sketched above address your question?
>
>
> 130) If the Topology does not have a global-state-store, there won't be a 
> GlobalThread and thus not global consumer. Thus, we return an Optional.
>
>
>
> On three related question for Andrew.
>
> (1) Why is the method called `clientInstanceId()` and not just plain 
> `instanceId()`?
>
> (2) Why so we return a `String` while but not a UUID type? The added protocol 
> request/response classes use UUIDs.
>
> (3) Would it make sense to have an overloaded `clientInstanceId()` method 
> that does not take any parameter but uses `default.api.timeout` config (this 
> config does no exist on the producer though, so we could only have it for 
> consumer and admin at this point). We could of course also add overloads like 
> this later if user request them (and/or add `default.api.timeout.ms` to the 
> producer, too).
>
> Btw: For KafkaStreams, I think `clientsInstanceIds` still makes sense as a 
> method name though, as `KafkaStreams` itself does not have an `instanceId` -- 
> we can also not have a timeout-less overload, because `KafkaStreams` does not 
> have a `default.api.timeout.ms` config either (and I don't think it make 
> sense to add).
>
>
>
> -Matthias
>
> On 10/11/23 5:07 PM, Jun Rao wrote:
>> Hi, Andrew,
>> Thanks for the updated KIP. Just a few more minor comments.
>> 130. KafkaStreams.clientsInstanceId(Duration timeout): Does it wait for all
>> consumer/producer/adminClient instances to be initialized? Are all those
>> instances created during KafkaStreams initialization?
>> 131. Why does globalConsumerInstanceId() return Optional while
>> other consumer instances don't return Optional?
>> 132. ClientMetricsSubscriptionRequestCount: Do we need this since we have a
>> set of generic metrics
>> (kafka.network:type=RequestMetrics,name=RequestsPerSec,request=*) that
>> report Request rate for every request type?
>> Thanks,
>> Jun
>> On Wed, Oct 11, 2023 at 1:47 PM Matthias J. Sax  wrote:
>>> Thanks!
>>>
>>> On 10/10/23 11:31 PM, Andrew Schofield wrote:
>>>> Matthias,
>>>> Yes, I think that’s a sensible way forward and the interface you propose
>>> looks good. I’ll update the KIP accordingly.
>>>>
>>>> Thanks,

Re: [DISCUSS] KIP-714: Client metrics and observability

2023-10-13 Thread Andrew Schofield
Hi Jun,
Thanks for the clarifications.

131. The client instance ids returned from 
KafkaStreams.clientInstanceIds(Duration) correspond to the
client_instance_id labels added by the broker to the metrics pushed from the 
clients. This should be
sufficient information to enable correlation between the metrics available in 
the client, and the metrics
pushed to the broker.

132. Yes, I see. I used JMX to look at the metrics on my broker and you’re 
entirely right. I will
remove the redundant metric from the KIP.

Thanks,
Andrew

> On 12 Oct 2023, at 20:12, Jun Rao  wrote:
>
> Hi, Andrew,
>
> Thanks for the reply.
>
> 131. Could we also document how one could correlate each client instance in
> KStreams with the labels for the metrics received by the brokers?
>
> 132. The documentation for RequestsPerSec is not complete. If you trace
> through how
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/network/RequestChannel.scala#L71
> <https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/network/RequestChannel.scala#L579>
> is
> implemented, it includes every API key tagged with the corresponding
> listener.
>
> Jun
>
> On Thu, Oct 12, 2023 at 11:42 AM Andrew Schofield <
> andrew_schofield_j...@outlook.com> wrote:
>
>> Hi Jun,
>> Thanks for your comments.
>>
>> 130. As Matthias described, and I am adding to the KIP, the
>> `KafkaStreams#clientInstanceIds` method
>> is only permitted when the state is RUNNING or REBALANCING. Also, clients
>> can be added dynamically
>> so the maps might change over time. If it’s in a permitted state, the
>> method is prepared to wait up to the
>> supplied timeout to get the client instance ids. It does not return a
>> partial result - it returns a result or
>> fails.
>>
>> 131. I’ve refactored the `ClientsInstanceIds` object and the global
>> consumer is now part of the map
>> of consumers. There is no need for the Optional any longer. I’ve also
>> renamed it `ClientInstanceIds`.
>>
>> 132. My reading of
>> `(kafka.network:type=RequestMetrics,name=RequestsPerSec,request=*)` is that
>> It does not support every request type - it supports Produce,
>> FetchConsumer and FetchFollower.
>> Consequently, I think the ClientMetricsSubscriptionRequestCount is not
>> instantly obsolete.
>>
>> If I’ve misunderstood, please let me know.
>>
>> Thanks,
>> Andrew
>>
>>
>>> On 12 Oct 2023, at 01:07, Jun Rao  wrote:
>>>
>>> Hi, Andrew,
>>>
>>> Thanks for the updated KIP. Just a few more minor comments.
>>>
>>> 130. KafkaStreams.clientsInstanceId(Duration timeout): Does it wait for
>> all
>>> consumer/producer/adminClient instances to be initialized? Are all those
>>> instances created during KafkaStreams initialization?
>>>
>>> 131. Why does globalConsumerInstanceId() return Optional while
>>> other consumer instances don't return Optional?
>>>
>>> 132. ClientMetricsSubscriptionRequestCount: Do we need this since we
>> have a
>>> set of generic metrics
>>> (kafka.network:type=RequestMetrics,name=RequestsPerSec,request=*) that
>>> report Request rate for every request type?
>>>
>>> Thanks,
>>>
>>> Jun
>>>
>>> On Wed, Oct 11, 2023 at 1:47 PM Matthias J. Sax 
>> wrote:
>>>
>>>> Thanks!
>>>>
>>>> On 10/10/23 11:31 PM, Andrew Schofield wrote:
>>>>> Matthias,
>>>>> Yes, I think that’s a sensible way forward and the interface you
>> propose
>>>> looks good. I’ll update the KIP accordingly.
>>>>>
>>>>> Thanks,
>>>>> Andrew
>>>>>
>>>>>> On 10 Oct 2023, at 23:01, Matthias J. Sax  wrote:
>>>>>>
>>>>>> Andrew,
>>>>>>
>>>>>> yes I would like to get this change into KIP-714 right way. Seems to
>> be
>>>> important, as we don't know if/when a follow-up KIP for Kafka Streams
>> would
>>>> land.
>>>>>>
>>>>>> I was also thinking (and discussed with a few others) how to expose
>> it,
>>>> and we would propose the following:
>>>>>>
>>>>>> We add a new method to `KafkaStreams` class:
>>>>>>
>>>>>>   public ClientsInstanceIds clientsInstanceIds(Duration timeout);
>>>>>>
>>>>>> The returned object is like below:
>>>>>>
>>>>>

Re: KIP-991: Allow DropHeaders SMT to drop headers by wildcard/regexp

2023-10-15 Thread Andrew Schofield
Hi Roman,
Thanks for the KIP. I think it’s an interesting idea, but I think the KIP 
document needs some
more details added before it’s ready for review. For example, here’s a KIP in 
the same
area which was delivered in an earlier version of Kafka. I think this is a good 
KIP to copy
for a suitable level of detail and description 
(https://cwiki.apache.org/confluence/display/KAFKA/KIP-585%3A+Filter+and+Conditional+SMTs).

Hope this helps.

Thanks,
Andrew

> On 15 Oct 2023, at 21:02, Roman Schmitz  wrote:
>
> Hi all,
>
> While working with different customers I came across the case several times
> that we'd like to not only explicitly remove headers by name but by pattern
> / regexp. Here is a KIP for this feature!
>
> Please let me know if you have any comments, questions, or suggestions!
>
> https://cwiki.apache.org/confluence/x/oYtEE
>
> Thanks,
> Roman



Re: [VOTE] KIP-714: Client metrics and observability

2023-10-16 Thread Andrew Schofield
The vote for KIP-714 has now concluded and the KIP is APPROVED.

The votes are:
Binding:
   +4 (Jason, Matthias, Sophie, Jun)
Non-binding:
   +3 (Milind, Kirk, Philip)
   -1 (Ryanne)

This KIP aims to improve monitoring and troubleshooting of client
performance by enabling clients to push metrics to brokers. The lack of
consistent telemetry across clients is an operational gap, and many cluster
operators do not have control over the clients. Often, asking the client owner
to change the configuration or even application code in order to troubleshoot
problems is not workable. This is why the KIP enables the broker to request
metrics from clients, giving a consistent, cross-platform mechanism.

The feature is enabled by configuring a metrics plugin on the brokers which
implements the ClientTelemetry interface. In the absence of a plugin with this
interface, the brokers do not even support the new RPCs in this KIP and the
clients will not attempt or be able to push metrics. So, a vanilla Apache Kafka
broker will not collect metrics.

I would like to make available an open-source implementation of the 
ClientTelemetry
interface that works with an open-source monitoring solution.

The KIP does put support for OTLP serialisation into the client, so there are
new dependencies in the Java client, which are bundled and relocated (shaded).
OTLP also opens up other use cases involving OpenTelemetry in the future, which
is emerging as the de facto standard for telemetry, and observability in 
general.

Thanks to everyone who has contributed to KIP-714 since Magnus Edenhill
kicked it all off in February 2021.

Andrew

> On 14 Oct 2023, at 01:52, Jun Rao  wrote:
>
> Hi, Andrew,
>
> Thanks for the KIP. +1 from me too.
>
> Jun
>
> On Wed, Oct 11, 2023 at 4:00 PM Sophie Blee-Goldman 
> wrote:
>
>> This looks great! +1 (binding)
>>
>> Sophie
>>
>> On Wed, Oct 11, 2023 at 1:46 PM Matthias J. Sax  wrote:
>>
>>> +1 (binding)
>>>
>>> On 9/13/23 5:48 PM, Jason Gustafson wrote:
>>>> Hey Andrew,
>>>>
>>>> +1 on the KIP. For many users of Kafka, it may not be fully understood
>>> how
>>>> much of a challenge client monitoring is. With tens of clients in a
>>>> cluster, it is already difficult to coordinate metrics collection. When
>>>> there are thousands of clients, and when the cluster operator has no
>>>> control over them, it is essentially impossible. For the fat clients
>> that
>>>> we have, the lack of useful telemetry is a huge operational gap.
>>>> Consistency between clients has also been a major challenge. I think
>> the
>>>> effort toward standardization in this KIP will have some positive
>> impact
>>>> even in deployments which have effective client-side monitoring.
>>> Overall, I
>>>> think this proposal will provide a lot of value across the board.
>>>>
>>>> Best,
>>>> Jason
>>>>
>>>> On Wed, Sep 13, 2023 at 9:50 AM Philip Nee 
>> wrote:
>>>>
>>>>> Hey Andrew -
>>>>>
>>>>> Thank you for taking the time to reply to my questions. I'm just
>> adding
>>>>> some notes to this discussion.
>>>>>
>>>>> 1. epoch: It can be helpful to know the delta of the client side and
>> the
>>>>> actual leader epoch.  It is helpful to understand why sometimes commit
>>>>> fails/client not making progress.
>>>>> 2. Client connection: If the client selects the "wrong" connection to
>>> push
>>>>> out the data, I assume the request would timeout; which should lead to
>>>>> disconnecting from the node and reselecting another node as you
>>> mentioned,
>>>>> via the least loaded node.
>>>>>
>>>>> Cheers,
>>>>> P
>>>>>
>>>>>
>>>>> On Tue, Sep 12, 2023 at 10:40 AM Andrew Schofield <
>>>>> andrew_schofield_j...@outlook.com> wrote:
>>>>>
>>>>>> Hi Philip,
>>>>>> Thanks for your vote and interest in the KIP.
>>>>>>
>>>>>> KIP-714 does not introduce any new client metrics, and that’s
>>>>> intentional.
>>>>>> It does
>>>>>> tell how that all of the client metrics can have their names
>>> transformed
>>>>>> into
>>>>>> equivalent "telemetry metric names”, and then potentially used in
>>> metrics
>>>>>> subscriptions.
>>>>>>
>>>&g

[DISCUSS] KIP-1000: List Client Metrics Configuration Resources

2023-11-07 Thread Andrew Schofield
Hi,
I would like to start discussion of a small KIP which fills a gap in the 
administration of client metrics configuration.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-1000%3A+List+Client+Metrics+Configuration+Resources

Thanks,
Andrew

Re: [DISCUSS] KIP-994: Minor Enhancements to ListTransactions and DescribeTransactions APIs

2023-11-07 Thread Andrew Schofield
Hi Artem,
I think you make a very good point. This also looks to me like it deserves a 
version bump for the request.

Andrew

> On 8 Nov 2023, at 04:12, Artem Livshits  
> wrote:
>
> Hi Raman,
>
> Thank you for the KIP.  I think using the tagged field
> in DescribeTransactionsResponse should be good -- if either the client or
> the server don't support it, it's not printed, which is reasonable behavior.
>
> For the ListTransactionsRequest, though, I think using the tagged field
> could lead to a subtle compatibility issue if a new client is used with old
> server: the client could specify the DurationFilter, but the old server
> would ignore it and list all transactions instead, which could be
> misleading or potentially even dangerous if the results are used in a
> script for some automation.  I think a more desirable behavior would be to
> fail if the server doesn't support the new filter, which we should be able
> to achieve if we bump version of the ListTransactionsRequest and add
> DurationFilter as a regular field.
>
> -Artem
>
> On Tue, Nov 7, 2023 at 2:20 AM Raman Verma  wrote:
>
>> I would like to start a discussion on KIP-994
>>
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-994%3A+Minor+Enhancements+to+ListTransactions+and+DescribeTransactions+APIs
>>



[VOTE] KIP-1000: List Client Metrics Configuration Resources

2023-11-15 Thread Andrew Schofield
Hi,
I’d like to start the voting for KIP-1000: List Client Metrics Configuration 
Resources.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-1000%3A+List+Client+Metrics+Configuration+Resources

Thanks,
Andrew

Re: [VOTE] KIP-1000: List Client Metrics Configuration Resources

2023-11-16 Thread Andrew Schofield
Hi Apoorv,
Thanks for your vote.

Initially, I put support for zkBroker in order to be able to control the error 
response in this case.
I have validated the error handling for this RPC on a ZK cluster in which the 
RPC is not supported,
and the error is entirely understandable. Consequently, I have removed 
`zkBroker` for this new RPC.

Thanks,
Andrew

> On 16 Nov 2023, at 13:51, Apoorv Mittal  wrote:
>
> Thanks a lot for writing the KIP Andrew. This is much required to list all
> configured client metrics resources.
>
> I have one minor question related to the zkBroker listener in the new RPC.
> As the client-metrics resource is not supported in Zookeeper mode hence
> shouldn't we disallow ListClientMetricsResourcesRequest for
> Zookeper in the APIVersion request itself?
>
> +1(non-binding)
>
> Regards,
> Apoorv Mittal
> +44 7721681581
>
>
> On Wed, Nov 15, 2023 at 4:58 PM Andrew Schofield <
> andrew_schofield_j...@outlook.com> wrote:
>
>> Hi,
>> I’d like to start the voting for KIP-1000: List Client Metrics
>> Configuration Resources.
>>
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1000%3A+List+Client+Metrics+Configuration+Resources
>>
>> Thanks,
>> Andrew



Re: [DISCUSS] KIP-1000: List Client Metrics Configuration Resources

2023-11-16 Thread Andrew Schofield
Hi Jun,
KIP-714 includes `kafka-client-metrics.sh` which provides an easier way to work 
with client metrics config
than the general-purpose `kafka-configs.sh`. So, this new RPC will actually be 
used in the
`kafka-client-metrics.sh` tool.

Thanks,
Andrew

> On 16 Nov 2023, at 18:00, Jun Rao  wrote:
>
> Hi, Andrew,
>
> Thanks for the KIP. Just one comment.
>
> Should we extend ConfigCommand or add a new tool to list client metrics?
>
> Thanks,
>
> Jun
>
> On Tue, Nov 7, 2023 at 9:42 AM Andrew Schofield <
> andrew_schofield_j...@outlook.com> wrote:
>
>> Hi,
>> I would like to start discussion of a small KIP which fills a gap in the
>> administration of client metrics configuration.
>>
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1000%3A+List+Client+Metrics+Configuration+Resources
>>
>> Thanks,
>> Andrew



Re: [DISCUSS] KIP-1000: List Client Metrics Configuration Resources

2023-11-17 Thread Andrew Schofield
Hi Jason,
Thanks for your comments.

1) Any broker can handle this API, so admin clients will choose a node randomly.
2) I was following the RPCs for configs which support controller and broker. 
However,
looking at all of the List… and Describe… RPCs, I see that the majority are 
broker-only.
I have change the KIP to have only “broker” in the “listeners”.

Thanks,
Andrew

> On 16 Nov 2023, at 18:16, Jason Gustafson  wrote:
>
> Hey Andrew,
>
> Thanks for the KIP. Just clarifying a couple small details.
>
> 1. I assume any broker can handle this API, so admin clients will choose a
> node randomly?
> 2. Does the controller need to support this API? If not, we can drop
> "controller" from "listeners."
>
> Thanks,
> Jason
>
> On Thu, Nov 16, 2023 at 10:00 AM Jun Rao  wrote:
>
>> Hi, Andrew,
>>
>> Thanks for the KIP. Just one comment.
>>
>> Should we extend ConfigCommand or add a new tool to list client metrics?
>>
>> Thanks,
>>
>> Jun
>>
>> On Tue, Nov 7, 2023 at 9:42 AM Andrew Schofield <
>> andrew_schofield_j...@outlook.com> wrote:
>>
>>> Hi,
>>> I would like to start discussion of a small KIP which fills a gap in the
>>> administration of client metrics configuration.
>>>
>>>
>>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1000%3A+List+Client+Metrics+Configuration+Resources
>>>
>>> Thanks,
>>> Andrew
>>



Re: [DISCUSS] KIP-1000: List Client Metrics Configuration Resources

2023-11-18 Thread Andrew Schofield
Hi Jun,
This is an example of inconsistency between the tools. I do agree that
it would be nice to have `--list` on the `kafka-client-metrics.sh` tool.
However, `kafka-config.sh` has the convention that `--describe` with no
entity name supplied first gets a list of all the entities, and then describes
them all (ConfigCommand.scala/describeResourceConfig). `kafka-topics.sh`
also describes all topics if you use `—describe` with no topic name.

So, I have added a new `--list` option to `kafka-client-metrics.sh` in this KIP.
The `--describe` option remains as before - if you supply a name, it describes
just that resource, and if you do not, it describes all resources.

Thanks,
Andrew

> On 17 Nov 2023, at 22:13, Jun Rao  wrote:
>
> Hi, Andrew,
>
> Thanks for the reply. KIP-714 proposes to use the following command for
> listing metric subscriptions.
>
> kafka-client-metrics.sh --bootstrap-server $BROKERS --describe
> kafka-configs.sh --bootstrap-server $BROKERS --describe --entity-type
> client-metrics
>
> Should we use --list instead to be more consistent with other tools like
> (kafka-topics)?
>
> Thanks,
>
> Jun
>
> On Thu, Nov 16, 2023 at 10:23 AM Andrew Schofield <
> andrew_schofield_j...@outlook.com> wrote:
>
>> Hi Jun,
>> KIP-714 includes `kafka-client-metrics.sh` which provides an easier way to
>> work with client metrics config
>> than the general-purpose `kafka-configs.sh`. So, this new RPC will
>> actually be used in the
>> `kafka-client-metrics.sh` tool.
>>
>> Thanks,
>> Andrew
>>
>>> On 16 Nov 2023, at 18:00, Jun Rao  wrote:
>>>
>>> Hi, Andrew,
>>>
>>> Thanks for the KIP. Just one comment.
>>>
>>> Should we extend ConfigCommand or add a new tool to list client metrics?
>>>
>>> Thanks,
>>>
>>> Jun
>>>
>>> On Tue, Nov 7, 2023 at 9:42 AM Andrew Schofield <
>>> andrew_schofield_j...@outlook.com> wrote:
>>>
>>>> Hi,
>>>> I would like to start discussion of a small KIP which fills a gap in the
>>>> administration of client metrics configuration.
>>>>
>>>>
>>>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1000%3A+List+Client+Metrics+Configuration+Resources
>>>>
>>>> Thanks,
>>>> Andrew
>>
>>



Re: [VOTE] KIP-1000: List Client Metrics Configuration Resources

2023-11-20 Thread Andrew Schofield
The voting for this KIP is now complete.

+3 binding (Jun, Jason, Matthias)
+2 non-binding (Doğuşcan, Apoorv)

Thanks,
Andrew

> On 20 Nov 2023, at 21:42, Jason Gustafson  wrote:
>
> +1 Thanks for the KIP!
>
> On Mon, Nov 20, 2023 at 9:31 AM Jun Rao  wrote:
>
>> Hi, Andrew,
>>
>> Thanks for the KIP. +1
>>
>> Jun
>>
>> On Thu, Nov 16, 2023 at 9:12 AM Andrew Schofield <
>> andrew_schofield_j...@outlook.com> wrote:
>>
>>> Hi Apoorv,
>>> Thanks for your vote.
>>>
>>> Initially, I put support for zkBroker in order to be able to control the
>>> error response in this case.
>>> I have validated the error handling for this RPC on a ZK cluster in which
>>> the RPC is not supported,
>>> and the error is entirely understandable. Consequently, I have removed
>>> `zkBroker` for this new RPC.
>>>
>>> Thanks,
>>> Andrew
>>>
>>>> On 16 Nov 2023, at 13:51, Apoorv Mittal 
>>> wrote:
>>>>
>>>> Thanks a lot for writing the KIP Andrew. This is much required to list
>>> all
>>>> configured client metrics resources.
>>>>
>>>> I have one minor question related to the zkBroker listener in the new
>>> RPC.
>>>> As the client-metrics resource is not supported in Zookeeper mode hence
>>>> shouldn't we disallow ListClientMetricsResourcesRequest for
>>>> Zookeper in the APIVersion request itself?
>>>>
>>>> +1(non-binding)
>>>>
>>>> Regards,
>>>> Apoorv Mittal
>>>> +44 7721681581
>>>>
>>>>
>>>> On Wed, Nov 15, 2023 at 4:58 PM Andrew Schofield <
>>>> andrew_schofield_j...@outlook.com> wrote:
>>>>
>>>>> Hi,
>>>>> I’d like to start the voting for KIP-1000: List Client Metrics
>>>>> Configuration Resources.
>>>>>
>>>>>
>>>>>
>>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1000%3A+List+Client+Metrics+Configuration+Resources
>>>>>
>>>>> Thanks,
>>>>> Andrew




Re: [DISCUSS] KIP-932: Queues for Kafka

2024-04-29 Thread Andrew Schofield
Hi Jun,
Thanks for the reply and sorry for the delay in responding.

123. Yes, I didn’t quite get your point earlier. The member
epoch is bumped by the GC when it sends a new assignment.
When the member sends its next heartbeat, it echoes back
the member epoch, which will confirm the receipt of the
assignment. It would send the same member epoch even
after recovery of a network disconnection, so that should
be sufficient to cope with this eventuality.

125. Yes, I have added it to the table which now matches
the text earlier in the KIP. Thanks.

140. Yes, I have added it to the table which now matches
the text earlier in the KIP. I’ve also added more detail for
the case where the entire share group is being deleted.

141. Yes! Sorry for confusing things.

Back to the original question for this point. To delete a share
group, should the GC write a tombstone for each
ShareGroupMemberMetadata record?

Tombstones are necessary to delete ShareGroupMemberMetadata
records. But, deletion of a share group is only possible when
the group is already empty, so the tombstones will have
been written as a result of the members leaving the group.

143. Yes, that’s right.

147. The measurement is certainly from the point of view
of the client, but it’s driven by sending and receiving heartbeats
rather than whether the client triggered the rebalance itself.
The client decides when it enters and leaves reconciliation
of the assignment, and measures this period.


Thanks,
Andrew


> On 26 Apr 2024, at 09:43, Jun Rao  wrote:
> 
> Hi, Andrew,
> 
> Thanks for the reply.
> 
> 123. "Rather than add the group epoch to the ShareGroupHeartbeat, I have
> decided to go for TopicPartitions in ShareGroupHeartbeatRequest which
> mirrors ConsumerGroupHeartbeatRequest."
> ShareGroupHeartbeat.MemberEpoch is the group epoch, right? Is that enough
> for confirming the receipt of the new assignment?
> 
> 125. This also means that "Alter share group offsets" needs to write a
> ShareGroupPartitionMetadata record, if the partition is not already
> initialized.
> 
> 140. In the table for "Delete share group offsets", we need to add a step
> to write a ShareGroupPartitionMetadata record with DeletingTopics.
> 
> 141. Hmm, ShareGroupMemberMetadata is stored in the __consumer_offsets
> topic, which is a compacted topic, right?
> 
> 143. So, the client sends DescribeShareGroupOffsets requests to GC, which
> then forwards it to SC?
> 
> 147. I guess a client only knows the rebalance triggered by itself, but not
> the ones triggered by other members or topic/partition changes?
> 
> Jun
> 
> On Thu, Apr 25, 2024 at 4:19 AM Andrew Schofield 
> wrote:
> 
>> Hi Jun,
>> Thanks for the response.
>> 
>> 123. Of course, ShareGroupHearbeat started off as ConsumerGroupHeartbeat
>> and then unnecessary fields were removed. In the network issue case,
>> there is not currently enough state being exchanged to be sure an
>> assignment
>> was received.
>> 
>> Rather than add the group epoch to the ShareGroupHeartbeat, I have decided
>> to go for TopicPartitions in ShareGroupHeartbeatRequest which mirrors
>> ConsumerGroupHeartbeatRequest. It means the share group member does
>> confirm the assignment it is using, and that can be used by the GC to
>> safely
>> stop repeating the assignment in heartbeat responses.
>> 
>> 125. Ah, yes. This is indeed something possible with a consumer group
>> and share groups should support it too. This does of course imply that
>> ShareGroupPartitionMetadataValue needs an array of partitions, not
>> just the number.
>> 
>> 140. Yes, good spot. There is an inconsistency here in consumer groups
>> where you can use AdminClient.deleteConsumerGroupOffsets at the
>> partition level, but kafka-consumer-groups.sh --delete only operates
>> at the topic level.
>> 
>> Personally, I don’t think it’s sensible to delete offsets at the partition
>> level only. You can reset them, but if you’re actively using a topic with
>> a share group, I don’t see why you’d want to delete offsets rather than
>> reset. If you’ve finished using a topic with a share group and want to
>> clean
>> up, use delete.
>> 
>> So, I’ve changed the AdminClient.deleteConsumerGroupOffsets to be
>> topic-based and the RPCs behind it.
>> 
>> The GC reconciles the cluster state with the ShareGroupPartitionMetadata
>> to spot deletion of topics and the like. However, when the offsets for
>> a topic were deleted manually, the topic very like still exists so
>> reconciliation
>> alone is not going to be able to continue an interrupted operation that
>> has started. So, I’ve added DeletingTopics back into
>&g

Re: [DISCUSS] KIP-932: Queues for Kafka

2024-05-01 Thread Andrew Schofield
Hi Jun,
Thanks for your reply.

147. Perhaps the easiest is to take a look at the code in
o.a.k.clients.consumer.internal.MembershipManagerImpl.
This class is part of the new consumer group protocol
code in the client. It makes state transitions based on
the heartbeat requests and responses, and it makes a
judgement about whether an assignment received is
equal to what it already is using. When a state transition
is deemed to be the beginning or end of a rebalance
from the point of view of this client, it counts towards the
rebalance metrics.

Share groups will follow the same path.

150. I do not consider it a concern. Rebalancing a share group
is less disruptive than rebalancing a consumer group. If the assignor
Has information about existing assignments, it can use it. It is
true that this information cannot be replayed from a topic and will
sometimes be unknown as a result.

151. I don’t want to rename TopicPartitionsMetadata to
simply TopicPartitions (it’s information about the partitions of
a topic) because we then have an array of plurals.
I’ve renamed Metadata to Info. That’s a bit less cumbersome.

152. Fixed.

153. It’s the GC. Fixed.

154. The UNKNOWN “state” is essentially a default for situations where
the code cannot understand data it received. For example, let’s say that
Kafka 4.0 has groups with states EMPTY, STABLE, DEAD. If Kafka 4.1
introduced another state THINKING, a tool built with Kafka 4.0 would not
know what THINKING meant. It will use “UNKNOWN” to indicate that the
state was something that it could not understand.

155. No, it’s a the level of the share-partition. If the offsets for just
one share-partition is reset, only the state epoch for that partition is
updated.

156. Strictly speaking, it’s redundant. I think having the StartOffset
separate gives helpful clarity and I prefer to retain it.

157. Yes, you are right. There’s no reason why a leader change needs
to force a ShareSnapshot. I’ve added leaderEpoch to the ShareUpdate.

158. Although ReadShareGroupOffsetsState is a bit of a mouthful,
having “State” in the name makes it clear that this one the family of
inter-broker RPCs served by the share coordinator. The admin RPCs
such as DescribeShareGroupOffsets do not include “State”.

159. Fixed.

160. Fixed.

Thanks,
Andrew

> On 2 May 2024, at 00:29, Jun Rao  wrote:
> 
> Hi, Andrew,
> 
> Thanks for the reply.
> 
> 147. "The measurement is certainly from the point of view of the client,
> but it’s driven by sending and receiving heartbeats rather than whether the
> client triggered the rebalance itself."
> Hmm, how does a client know which heartbeat response starts a rebalance?
> 
> 150. PartitionAssignor takes existing assignments into consideration. Since
> GC doesn't persist the assignment for share groups, it means that
> ShareGroupPartitionAssignor can't reliably depend on existing assignments.
> Is that a concern?
> 
> 151. ShareGroupPartitionMetadataValue: Should we rename
> TopicPartitionsMetadata and TopicMetadata since there is no metadata?
> 
> 152. ShareGroupMetadataKey: "versions": "3"
> The versions should be 11.
> 
> 153. ShareGroupDescription.coordinator(): The description says "The share
> group coordinator". Is that the GC or SC?
> 
> 154. "A share group has only three states - EMPTY , STABLE and DEAD".
>  What about UNKNOWN?
> 
> 155. WriteShareGroupState: StateEpoch is at the group level, not partition
> level, right?
> 
> 156. ShareSnapshotValue: Is StartOffset redundant since it's the same as
> the smallest FirstOffset in StateBatches?
> 
> 157. Every leader change forces a ShareSnapshotValue write to persist the
> new leader epoch. Is that a concern? An alternative is to include
> leaderEpoch in ShareUpdateValue.
> 
> 158. ReadShareGroupOffsetsState: The state is the offsets. Should we rename
> it to something like ReadShareGroupStartOffset?
> 
> 159. members are assigned members round-robin => members are assigned
> round-robin
> 
> 160. "may called": typo
> 
> Jun
> 
> On Mon, Apr 29, 2024 at 10:11 AM Andrew Schofield 
> wrote:
> 
>> Hi Jun,
>> Thanks for the reply and sorry for the delay in responding.
>> 
>> 123. Yes, I didn’t quite get your point earlier. The member
>> epoch is bumped by the GC when it sends a new assignment.
>> When the member sends its next heartbeat, it echoes back
>> the member epoch, which will confirm the receipt of the
>> assignment. It would send the same member epoch even
>> after recovery of a network disconnection, so that should
>> be sufficient to cope with this eventuality.
>> 
>> 125. Yes, I have added it to the table which now matches
>> the text earlier in the KIP. Thanks.
>> 
>> 140. Yes, I have

Re: [DISCUSS] KIP-1041: Drop `offsets.commit.required.acks` config in 4.0 (deprecate in 3.8)

2024-05-02 Thread Andrew Schofield
Hi David,
I think this KIP is a very good idea. It would be good to get rid of this cruft.

Thanks,
Andrew

> On 2 May 2024, at 18:54, David Jacot  wrote:
>
> Hi folks,
>
> I have put together a very small KIP to
> deprecate offsets.commit.required.acks in 3.8 and remove it in 4.0. See the
> motivation for the reason.
>
> KIP: https://cwiki.apache.org/confluence/x/9YobEg
>
> Please let me know what you think.
>
> Best,
> David



Re: [DISCUSS] KIP-1033: Add Kafka Streams exception handler for exceptions occuring during processing

2024-05-02 Thread Andrew Schofield
Hi,
I’ve changed my mind on this one having read through the comments.

I don’t think the exception handler should be able to mess with the headers
to the detriment of the code that called the handler.

While I like the hygiene of having an ImmutableHeaders interface,
I feel we can use the existing interface to get the effect we desire.

Thanks,
Andrew

> On 3 May 2024, at 03:40, Sophie Blee-Goldman  wrote:
> 
> I tend to agree that we should just return a pure Headers instead of
> introducing a new class/interface to protect overwriting them. I think a
> pretty good case has been made already so I won't add onto it, just wanted
> to voice my support.
> 
> Is that the only remaining question on this KIP? Might be ok to move to a
> vote now?
> 
> On Wed, May 1, 2024 at 8:05 AM Lianet M.  wrote:
> 
>> Hi all, thanks Damien for the KIP!
>> 
>> After looking into the KIP and comments, my only concern is aligned with
>> one of Matthias comments, around the ImmutableHeaders introduction, with
>> the motivation not being clear enough. The existing handlers already expose
>> the headers (indirectly). Ex.
>> ProductionExceptionHandler.handleSerializationException provides the
>> ProducerRecord as an argument, so they are already exposed in those
>> callbacks through record.headers(). Is there a reason to think that it
>> would be a problem to expose the headers in the
>> new ProcessingExceptionHandler, but that it's not a problem for the
>> existing handler?
>> 
>> If there is no real concern about the KS engine requiring those headers, it
>> feels hard to mentally justify the complexity we transfer to the user by
>> exposing a new concept into the callbacks to represent the headers. In the
>> end, it strays aways from the simple/consistent representation of Headers
>> used all over. Even if eventually the KS engine needs to use the headers
>> after the callbacks with certainty that they were not altered, still feels
>> like it's something we could attempt to solve internally, without having to
>> transfer "new concepts" into the user (ex. the deep-copy as it was
>> suggested, seems like the kind of trade-off that would maybe be acceptable
>> here to gain simplicity and consistency among the handlers with a single
>> existing representation of Headers).
>> 
>> Best!
>> 
>> Lianet
>> 
>> 
>> 
>> On Tue, Apr 30, 2024 at 9:36 PM Matthias J. Sax  wrote:
>> 
>>> Thanks for the update.
>>> 
>>> I am wondering if we should use `ReadOnlyHeaders` instead of
>>> `ImmutableHeaders` as interface name?
>>> 
>>> Also, the returned `Header` interface is technically not immutable
>>> either, because `Header#key()` returns a mutable byte-array... Would we
>>> need a `ReadOnlyHeader` interface?
>>> 
>>> If yes, it seems that `ReadOnlyHeaders` should not be a super-interface
>>> of `Headers` but it would rather be a standalone interface, and a
>>> wrapper for a `Headers` instance? And `ReadOnlyHeader` would return some
>>> immutable type instead of `byte[]` for the value()?
>>> 
>>> An alternative would be to deep-copy the value byte-array what would not
>>> be free, but given that we are talking about exception handling, it
>>> would not be on the hot code path, and thus might be acceptable?
>>> 
>>> 
>>> The above seems to increase the complexity significantly though. Hence,
>>> I have seconds thoughts on the immutability question:
>>> 
>>> Do we really need to worry about mutability after all, because in the
>>> end, KS runtime won't read the Headers instance after the handler was
>>> called, and if a user modifies the passed in headers, there won't be any
>>> actual damage (ie, no side effects)? For this case, it might even be ok
>>> to also not add `ImmutableHeaders` to begin with?
>>> 
>>> 
>>> 
>>> Sorry for the forth and back (yes, forth and back, because back and
>>> forth does not make sense -- it's not logical -- just trying to fix
>>> English :D) as I did bring up the immutability question in the first
>>> place...
>>> 
>>> 
>>> 
>>> -Matthias
>>> 
>>> On 4/25/24 5:56 AM, Loic Greffier wrote:
 Hi Matthias,
 
 I have updated the KIP regarding points 103 and 108.
 
 103.
 I have suggested a new `ImmutableHeaders` interface to deal with the
 immutability concern of the headers, which is basically the `Headers`
 interface without the write accesses.
 
 public interface ImmutableHeaders {
 Header lastHeader(String key);
 Iterable headers(String key);
 Header[] toArray();
 }
 
 The `Headers` interface can be updated accordingly:
 
 public interface Headers extends ImmutableHeaders, Iterable {
 //…
 }
 
 Loïc
>>> 
>> 



Re: [DISCUSS] KIP-932: Queues for Kafka

2024-05-02 Thread Andrew Schofield
Hi Jun,
Thanks for the response.

147. I am trying to get a correspondence between the concepts and
metrics of consumer groups and share groups. In both cases,
the client doesn’t strictly know when the rebalance starts. All it knows
is when it has work to do in order to perform its part of a rebalance.
I am proposing that share groups and consumer groups use
equivalent logic.

I could remove the rebalance metrics from the client because I do
understand that they are making a judgement about when a rebalance
starts, but it’s their own part of the rebalance they are measuring.

I tend to think these metrics are better than no metrics and
will at least enable administrators to see how much rebalance
activity the members of share groups are experiencing.

150. The simple assignor does not take existing assignments into
consideration. The ShareGroupPartitionAssignor interface would
permit this, but the simple assignor does not currently use it.

The simple assignor assigns partitions in two ways:
a) Distribute the members across the partitions by hashed member ID.
b) If any partitions have no members assigned, distribute the members
across these partitions round-robin.

The (a) partitions will be quite stable. The (b) partitions will be less
stable. By using existing assignment information, it could make (b)
partition assignment more stable, whether the assignments are
persisted or not. Perhaps it would be worth changing the simple
assignor in order to make (b) more stable.

I envisage more sophisticated assignors in the future which could use
existing assignments and also other dynamic factors such as lag.

If it transpires that there is significant benefit in persisting assignments
specifically to help smooth assignment in the event of GC change,
it would be quite an easy enhancement. I am not inclined to persist
the assignments in this KIP. 

158. Ah, yes. I see. Of course, I want the names as consistent and
understandable too. I suggest renaming
ReadShareGroupOffsetsState to ReadShareGroupStateSummary.
I haven’t changed the KIP yet, so let me know if that’s OK.

Thanks,
Andrew

> On 2 May 2024, at 22:18, Jun Rao  wrote:
> 
> Hi, Andrew,
> 
> Thanks for the reply.
> 
> 147. " it makes a judgement about whether an assignment received is equal
> to what it already is using."
> If a client receives an assignment different from what it has, it indicates
> the end of the rebalance. But how does the client know when the rebalance
> starts? In the shareHeartbeat design, the new group epoch is propagated
> together with the new assignment in the response.
> 
> 150. It could be a potential concern if each GC change forces significant
> assignment changes. Does the default assignor take existing assignments
> into consideration?
> 
> 155. Good point. Sounds good.
> 
> 158. My concern with the current naming is that it's not clear what the
> difference is between ReadShareGroupOffsetsState and ReadShareGroupState.
> The state in the latter is also offsets.
> 
> Jun
> 
> On Wed, May 1, 2024 at 9:51 PM Andrew Schofield 
> wrote:
> 
>> Hi Jun,
>> Thanks for your reply.
>> 
>> 147. Perhaps the easiest is to take a look at the code in
>> o.a.k.clients.consumer.internal.MembershipManagerImpl.
>> This class is part of the new consumer group protocol
>> code in the client. It makes state transitions based on
>> the heartbeat requests and responses, and it makes a
>> judgement about whether an assignment received is
>> equal to what it already is using. When a state transition
>> is deemed to be the beginning or end of a rebalance
>> from the point of view of this client, it counts towards the
>> rebalance metrics.
>> 
>> Share groups will follow the same path.
>> 
>> 150. I do not consider it a concern. Rebalancing a share group
>> is less disruptive than rebalancing a consumer group. If the assignor
>> Has information about existing assignments, it can use it. It is
>> true that this information cannot be replayed from a topic and will
>> sometimes be unknown as a result.
>> 
>> 151. I don’t want to rename TopicPartitionsMetadata to
>> simply TopicPartitions (it’s information about the partitions of
>> a topic) because we then have an array of plurals.
>> I’ve renamed Metadata to Info. That’s a bit less cumbersome.
>> 
>> 152. Fixed.
>> 
>> 153. It’s the GC. Fixed.
>> 
>> 154. The UNKNOWN “state” is essentially a default for situations where
>> the code cannot understand data it received. For example, let’s say that
>> Kafka 4.0 has groups with states EMPTY, STABLE, DEAD. If Kafka 4.1
>> introduced another state THINKING, a tool built with Kafka 4.0 would not
>> know what THINKING meant. It will use “UNKNOWN” to in

Re: [VOTE] KIP-1036: Extend RecordDeserializationException exception

2024-05-03 Thread Andrew Schofield
Hi Fred,
Thanks for the KIP. It’s turned out nice and elegant I think. Definitely a 
worthwhile improvement.

+1 (non-binding)

Thanks,
Andrew

> On 30 Apr 2024, at 14:02, Frédérik Rouleau  
> wrote:
>
> Hi all,
>
> As there is no more activity for a while on the discuss thread, I think we
> can start a vote.
> The KIP is available on
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1036%3A+Extend+RecordDeserializationException+exception
>
>
> If you have some feedback or suggestions, please participate to the
> discussion thread:
> https://lists.apache.org/thread/or85okygtfywvnsfd37kwykkq5jq7fy5
>
> Best regards,
> Fred



Re: [DISCUSS] KIP-932: Queues for Kafka

2024-05-03 Thread Andrew Schofield
Hi Jun,
Thanks for your reply.

147. Yes, I see what you mean. The rebalance latency will indeed
be very short by comparison. I have removed the rebalance latency
metrics from the client and retained the rebalance count and rate.

150. Yes, I think so. I have tweaked the text so that the simple
assignor will take into account existing assignment information when
it has it, which would just minimise unnecessary churn of (b).

158. I’ve changed it to ReadShareGroupStateSummary.

Thanks,
Andrew


> On 3 May 2024, at 22:17, Jun Rao  wrote:
> 
> Hi, Andrew,
> 
> Thanks for the reply.
> 
> 147. There seems to be some difference between consumer groups and share
> groups. In the consumer groups, if a client receives a heartbeat response
> to revoke some partitions, it may have to commit offsets before revoking
> partitions or it may have to call the rebalance callbacks provided by the
> user. This may take some time and can be reflected in the rebalance time
> metric. In the share groups, none of that exists. If a client receives some
> added/revoked partitions, it accepts them immediately, right? So, does that
> practically make the rebalance time always 0?
> 
> 150. I guess in the common case, there will be many more members than
> partitions. So the need for (b) will be less common. We can probably leave
> the persisting of the assignment out for now.
> 
> 158. The new name sounds good to me.
> 
> Jun
> 
> On Thu, May 2, 2024 at 10:21 PM Andrew Schofield 
> wrote:
> 
>> Hi Jun,
>> Thanks for the response.
>> 
>> 147. I am trying to get a correspondence between the concepts and
>> metrics of consumer groups and share groups. In both cases,
>> the client doesn’t strictly know when the rebalance starts. All it knows
>> is when it has work to do in order to perform its part of a rebalance.
>> I am proposing that share groups and consumer groups use
>> equivalent logic.
>> 
>> I could remove the rebalance metrics from the client because I do
>> understand that they are making a judgement about when a rebalance
>> starts, but it’s their own part of the rebalance they are measuring.
>> 
>> I tend to think these metrics are better than no metrics and
>> will at least enable administrators to see how much rebalance
>> activity the members of share groups are experiencing.
>> 
>> 150. The simple assignor does not take existing assignments into
>> consideration. The ShareGroupPartitionAssignor interface would
>> permit this, but the simple assignor does not currently use it.
>> 
>> The simple assignor assigns partitions in two ways:
>> a) Distribute the members across the partitions by hashed member ID.
>> b) If any partitions have no members assigned, distribute the members
>> across these partitions round-robin.
>> 
>> The (a) partitions will be quite stable. The (b) partitions will be less
>> stable. By using existing assignment information, it could make (b)
>> partition assignment more stable, whether the assignments are
>> persisted or not. Perhaps it would be worth changing the simple
>> assignor in order to make (b) more stable.
>> 
>> I envisage more sophisticated assignors in the future which could use
>> existing assignments and also other dynamic factors such as lag.
>> 
>> If it transpires that there is significant benefit in persisting
>> assignments
>> specifically to help smooth assignment in the event of GC change,
>> it would be quite an easy enhancement. I am not inclined to persist
>> the assignments in this KIP.
>> 
>> 158. Ah, yes. I see. Of course, I want the names as consistent and
>> understandable too. I suggest renaming
>> ReadShareGroupOffsetsState to ReadShareGroupStateSummary.
>> I haven’t changed the KIP yet, so let me know if that’s OK.
>> 
>> Thanks,
>> Andrew
>> 
>>> On 2 May 2024, at 22:18, Jun Rao  wrote:
>>> 
>>> Hi, Andrew,
>>> 
>>> Thanks for the reply.
>>> 
>>> 147. " it makes a judgement about whether an assignment received is equal
>>> to what it already is using."
>>> If a client receives an assignment different from what it has, it
>> indicates
>>> the end of the rebalance. But how does the client know when the rebalance
>>> starts? In the shareHeartbeat design, the new group epoch is propagated
>>> together with the new assignment in the response.
>>> 
>>> 150. It could be a potential concern if each GC change forces significant
>>> assignment changes. Does the default assignor take existing assignments
>>> into consideration?
>>> 
>&

Re: [DISCUSS] KIP-932: Queues for Kafka

2024-05-03 Thread Andrew Schofield
Hi Jun,
Thanks for your reply.

161. ShareGroupListing and ShareGroupDescription are using
the same pattern as ConsumerGroupListing and
ConsumerGroupDescription. I have gone for consistency which
I think is probably best here. It’s what I would expect if I had previously
used the admin API for consumer groups and was looking to use it for
share groups. I agree it’s a bit weird.

162. GroupListing contains the only information which is properly
in common between a ConsumerGroupListing and a ShareGroupListing.
ListGroupsResponse.ProtocolType is interpreted to provide the
group type. I know that the ListGroups RPC also includes the group
state, but that’s as a string and there’s no common enum for the states
of all types of group. As a result, I have exposed group type but not
state on this API.

Previously in the discussion for this KIP, I mentioned that I would
create another KIP for the administration of groups, in particular
how the administrator can ensure that particular group IDs
are used for the group type they desire. At the moment, I think
keeping ListGroups in this KIP is a good idea. If we actually want
to make it more sophisticated, perhaps that would be better with
the group administration KIP.

163. It will be one higher than the latest version at the time we are
ready to deliver this feature for real. When we are on the cusp of
delivery, I’ll update the KIP with the final value.

164. KRaft only. All the RPCs are “broker” only. None of the code will
be merged until after 3.8 has branched.

Thanks,
Andrew

> On 4 May 2024, at 00:12, Jun Rao  wrote:
> 
> Hi, Andrew,
> 
> Thanks for the reply. A few more comments.
> 
> 161. ShareGroupListing.state() returns an optional, but
> ShareGroupDescription.state() does not. Should we make them consistent?
> Also, it seems a bit weird to return optional with an UNKNOWN state.
> 
> 162. Should GroupListing include ProtocolType and GroupState too?
> 
> 163. What is the value of group.version to gate the queuing feature?
> 
> 164. Is the queueing feature only supported on KRaft clusters? For example,
> the feature tool seems to be built only for the KRaft cluster.
> 
> Jun
> 
> On Fri, May 3, 2024 at 10:32 AM Andrew Schofield 
> wrote:
> 
>> Hi Jun,
>> Thanks for your reply.
>> 
>> 147. Yes, I see what you mean. The rebalance latency will indeed
>> be very short by comparison. I have removed the rebalance latency
>> metrics from the client and retained the rebalance count and rate.
>> 
>> 150. Yes, I think so. I have tweaked the text so that the simple
>> assignor will take into account existing assignment information when
>> it has it, which would just minimise unnecessary churn of (b).
>> 
>> 158. I’ve changed it to ReadShareGroupStateSummary.
>> 
>> Thanks,
>> Andrew
>> 
>> 
>>> On 3 May 2024, at 22:17, Jun Rao  wrote:
>>> 
>>> Hi, Andrew,
>>> 
>>> Thanks for the reply.
>>> 
>>> 147. There seems to be some difference between consumer groups and share
>>> groups. In the consumer groups, if a client receives a heartbeat response
>>> to revoke some partitions, it may have to commit offsets before revoking
>>> partitions or it may have to call the rebalance callbacks provided by the
>>> user. This may take some time and can be reflected in the rebalance time
>>> metric. In the share groups, none of that exists. If a client receives
>> some
>>> added/revoked partitions, it accepts them immediately, right? So, does
>> that
>>> practically make the rebalance time always 0?
>>> 
>>> 150. I guess in the common case, there will be many more members than
>>> partitions. So the need for (b) will be less common. We can probably
>> leave
>>> the persisting of the assignment out for now.
>>> 
>>> 158. The new name sounds good to me.
>>> 
>>> Jun
>>> 
>>> On Thu, May 2, 2024 at 10:21 PM Andrew Schofield <
>> andrew_schofi...@live.com>
>>> wrote:
>>> 
>>>> Hi Jun,
>>>> Thanks for the response.
>>>> 
>>>> 147. I am trying to get a correspondence between the concepts and
>>>> metrics of consumer groups and share groups. In both cases,
>>>> the client doesn’t strictly know when the rebalance starts. All it knows
>>>> is when it has work to do in order to perform its part of a rebalance.
>>>> I am proposing that share groups and consumer groups use
>>>> equivalent logic.
>>>> 
>>>> I could remove the rebalance metrics from the client because I do
>>>> understand that they are making a judgement about when a r

Re: [VOTE] KIP-1041: Drop `offsets.commit.required.acks` config in 4.0 (deprecate in 3.8)

2024-05-08 Thread Andrew Schofield
Hi,
Thanks for the KIP.

+1 (non-binding)

Thanks,
Andrew

> On 8 May 2024, at 15:48, David Jacot  wrote:
>
> Hi folks,
>
> I'd like to start a voting thread for KIP-1041: Drop
> `offsets.commit.required.acks` config in 4.0 (deprecate in 3.8).
>
> KIP: https://cwiki.apache.org/confluence/x/9YobEg
>
> Best,
> David



Re: [DISCUSS] KIP-1038: Add Custom Error Handler to Producer

2024-05-13 Thread Andrew Schofield
Hi Alieh,
Just a few more comments on the KIP. It is looking much less risky now the scope
is tighter.

[AJS1] It would be nice to have default implementations of the handle methods
so an implementor would not need to implement both themselves.

[AJS2] Producer configurations which are class names usually end in “.class”.
I suggest “custom.exception.handler.class”.

[AJS3] If I implemented a handler, and I set a non-default value for one of the
new configuations, what happens? I would expect that the handler takes
precedence. I wasn’t quite clear what “the control will follow the handler
instructions” meant.

[AJS4] Because you now have an enum for the RecordTooLargeExceptionResponse,
I don’t think you need to state in the comment for ProducerExceptionHandler that
RETRY will be interpreted as FAIL.

Thanks,
Andrew

> On 13 May 2024, at 14:53, Alieh Saeedi  wrote:
>
> Hi all,
>
>
> Thanks for the very interesting discussion during my PTO.
>
>
> KIP updates and addressing concerns:
>
>
> 1) Two handle() methods are defined in ProducerExceptionHandler for the two
> exceptions with different input parameters so that we have
> handle(RecordTooLargeException e, ProducerRecord record) and
> handle(UnknownTopicOrPartitionException e, ProducerRecord record)
>
>
> 2) The ProducerExceptionHandler extends `Closable` as well.
>
>
> 3) The KIP suggests having two more configuration parameters with boolean
> values:
>
> - `drop.invalid.large.records` with a default value of `false` for
> swallowing too large records.
>
> - `retry.unknown.topic.partition` with a default value of `true` that
> performs RETRY for `max.block.ms` ms, encountering the
> UnknownTopicOrPartitionException.
>
>
> Hope the main concerns are addressed so that we can go forward with voting.
>
>
> Cheers,
>
> Alieh
>
> On Thu, May 9, 2024 at 11:25 PM Artem Livshits
>  wrote:
>
>> Hi Mathias,
>>
>>> [AL1] While I see the point, I would think having a different callback
>> for every exception might not really be elegant?
>>
>> I'm not sure how to assess the level of elegance of the proposal, but I can
>> comment on the technical characteristics:
>>
>> 1. Having specific interfaces that codify the logic that is currently
>> prescribed in the comments reduce the chance of making a mistake.
>> Commments may get ignored, misuderstood or etc. but if the contract is
>> codified, the compilier will help to enforce the contract.
>> 2. Given that the logic is trickier than it seems (the record-too-large is
>> an example that can easily confuse someone who's not intimately familiar
>> with the nuances of the batching logic), having a little more hoops to jump
>> would give a greater chance that whoever tries to add a new cases pauses
>> and thinks a bit more.
>> 3. As Justine pointed out, having different method will be a forcing
>> function to go through a KIP rather than smuggle new cases through
>> implementation.
>> 4. Sort of a consequence of the previous 3 -- all those things reduce the
>> chance of someone writing the code that works with 2 errors and then when
>> more errors are added in the future will suddenly incorrectly ignore new
>> errors (the example I gave in the previous email).
>>
>>> [AL2 cont.] Similar to AL1, I see such a handler to some extend as
>> business logic. If a user puts a bad filter condition in their KS app, and
>> drops messages
>>
>> I agree that there is always a chance to get a bug and lose messages, but
>> there are generally separation of concerns that has different risk profile:
>> the filtering logic may be more rigorously tested and rarely changed (say
>> an application developer does it), but setting the topics to produce may be
>> done via configuration (e.g. a user of the application does it) and it's
>> generally an expectation that users would get an error when configuration
>> is incorrect.
>>
>> What could be worse is that UnknownTopicOrPartitionException can be an
>> intermittent error, i.e. with a generally correct configuration, there
>> could be metadata propagation problem on the cluster and then a random set
>> of records could get lost.
>>
>>> [AL3] Maybe I misunderstand what you are saying, but to me, checking the
>> size of the record upfront is exactly what the KIP proposes? No?
>>
>> It achieves the same result but solves it differently, my proposal:
>>
>> 1. Application checks the validity of a record (maybe via a new
>> validateRecord method) before producing it, and can just exclude it or
>> return an error to the user.
>> 2. Application produces the record -- at this point there are no records
>> that could return record too large, they were either skipped at step 1 or
>> we didn't get here because step 1 failed.
>>
>> Vs. KIP's proposal
>>
>> 1. Application produces the record.
>> 2. Application gets a callback.
>> 3. Application returns the action on how to proceed.
>>
>> The advantage of the former is the clarity of semantics -- the record is
>> invalid (property of the record, not a function of s

Re: [DISCUSS] KIP-1038: Add Custom Error Handler to Producer

2024-05-14 Thread Andrew Schofield
Hi Alieh,
Just one final comment.

[AJS5] Existing classes use Retriable, not Retryable. For example:
https://kafka.apache.org/21/javadoc/org/apache/kafka/common/errors/RetriableException.html

I suggest RetriableResponse and NonRetriableResponse.

Thanks,
Andrew

> On 13 May 2024, at 23:17, Alieh Saeedi  wrote:
>
> Hi all,
>
>
> Thanks for all the valid points you listed.
>
>
> KIP updates and addressing concerns:
>
>
> 1) The KIP now suggests two Response types: `RetryableResponse` and
> `NonRetryableResponse`
>
>
> 2) `custom.exception.handler` is changed to `custom.exception.handler.class`
>
>
> 3) The KIP clarifies that `In the case of an implemented handler for the
> specified exception, the handler takes precedence.`
>
>
> 4)  There is now a `default` implementation for both handle() methods.
>
>
> 5)  @Chris: for `UnknownTopicOrPartition`, the default is already retrying
> for 60s. (In fact, the default value of `max.block.ms`). If the handler
> instructs to FAIL or SWALLOW, there will be no retry, and if the handler
> instructs to RETRY, that will be the default behavior, which follows the
> values in already existing config parameters such as `max.block.ms`. Does
> that make sense?
>
>
> Hope the changes and explanations are convincing :)
>
>
> Cheers,
>
> Alieh
>
> On Mon, May 13, 2024 at 6:40 PM Justine Olshan 
> wrote:
>
>> Oh I see. The type isn't the error type but a newly defined type for the
>> response. Makes sense and works for me.
>>
>> Justine
>>
>> On Mon, May 13, 2024 at 9:13 AM Chris Egerton 
>> wrote:
>>
>>> If we have dedicated methods for each kind of exception
>>> (handleRecordTooLarge, handleUnknownTopicOrPartition, etc.), doesn't that
>>> provide sufficient constraint? I'm not suggesting we eliminate these
>>> methods, just that we change their return types to something more
>> flexible.
>>>
>>> On Mon, May 13, 2024, 12:07 Justine Olshan >>
>>> wrote:
>>>
>>>> I'm not sure I agree with the Retriable and NonRetriableResponse
>> comment.
>>>> This doesn't limit the blast radius or enforce certain errors are used.
>>>> I think we might disagree on how controlled these interfaces can be...
>>>>
>>>> Justine
>>>>
>>>> On Mon, May 13, 2024 at 8:40 AM Chris Egerton >>
>>>> wrote:
>>>>
>>>>> Hi Alieh,
>>>>>
>>>>> Thanks for the updates! I just have a few more thoughts:
>>>>>
>>>>> - I don't think a boolean property is sufficient to dictate retries
>> for
>>>>> unknown topic partitions, though. These errors can occur if a topic
>> has
>>>>> just been created, which can occur if, for example, automatic topic
>>>>> creation is enabled for a multi-task connector. This is why I
>> proposed
>>> a
>>>>> timeout instead of a boolean (and see my previous email for why
>>> reducing
>>>>> max.block.ms for a producer is not a viable alternative). If it
>> helps,
>>>> one
>>>>> way to reproduce this yourself is to add the line
>>>>> `fooProps.put(TASKS_MAX_CONFIG, "10");` to the integration test here:
>>>>>
>>>>>
>>>>
>>>
>> https://github.com/apache/kafka/blob/5439914c32fa00d634efa7219699f1bc21add839/connect/runtime/src/test/java/org/apache/kafka/connect/integration/SourceConnectorsIntegrationTest.java#L134
>>>>> and then check the logs afterward for messages like "Error while
>>> fetching
>>>>> metadata with correlation id  :
>>>> {foo-topic=UNKNOWN_TOPIC_OR_PARTITION}".
>>>>>
>>>>> - I also don't think we need custom XxxResponse enums for every
>>> possible
>>>>> method; it seems like this will lead to a lot of duplication and
>>>> cognitive
>>>>> overhead if we want to expand the error handler in the future.
>>> Something
>>>>> more flexible like RetriableResponse and NonRetriableResponse could
>>>>> suffice.
>>>>>
>>>>> - Finally, the KIP still doesn't state how the handler will or won't
>>> take
>>>>> precedence over existing retry properties. If I set `retries` or `
>>>>> delivery.timeout.ms` or `max.block.ms` to low values, will that
>> cause
>>>>> retries to cease even if my custom handler would otherwise kee

Re: [VOTE] KIP-1038: Add Custom Error Handler to Producer

2024-05-14 Thread Andrew Schofield
Hi Alieh,
Thanks for the KIP.

+1 (non-binding)

Andrew

> On 7 May 2024, at 16:56, Alieh Saeedi  wrote:
>
> Hi all,
>
> It seems that we have no more comments, discussions, or feedback on
> KIP-1038; therefore, I’d like to open voting for the KIP: Add Custom Error
> Handler to Producer
> 
>
>
> Cheers,
> Alieh



Re: [VOTE] KIP-932: Queues for Kafka

2024-05-14 Thread Andrew Schofield
Hi,
I have made a small update to the KIP as a result of testing the new
share consumer with client telemetry (KIP-714).

I’ve added telemetry metric names to the table of client metrics and
also updated the metric group names so that the resulting client metrics
sent to the broker have consistent names.

Thanks,
Andrew

> On 8 May 2024, at 12:51, Manikumar  wrote:
>
> Hi Andrew,
>
> Thanks for the KIP.  Great write-up!
>
> +1 (binding)
>
> Thanks,
>
> On Wed, May 8, 2024 at 12:17 PM Satish Duggana  
> wrote:
>>
>> Hi Andrew,
>> Thanks for the nice KIP, it will allow other messaging use cases to be
>> onboarded to Kafka.
>>
>> +1 from me.
>>
>> Satish.
>>
>> On Tue, 7 May 2024 at 03:41, Jun Rao  wrote:
>>>
>>> Hi, Andrew,
>>>
>>> Thanks for the KIP. +1
>>>
>>> Jun
>>>
>>> On Mon, Mar 18, 2024 at 11:00 AM Edoardo Comar 
>>> wrote:
>>>
>>>> Thanks Andrew,
>>>>
>>>> +1 (binding)
>>>>
>>>> Edo
>>>>
>>>> On Mon, 18 Mar 2024 at 16:32, Kenneth Eversole
>>>>  wrote:
>>>>>
>>>>> Hi Andrew
>>>>>
>>>>> + 1 (Non-Binding)
>>>>>
>>>>> This will be great addition to Kafka
>>>>>
>>>>> On Mon, Mar 18, 2024 at 8:27 AM Apoorv Mittal 
>>>>> wrote:
>>>>>
>>>>>> Hi Andrew,
>>>>>> Thanks for writing the KIP. This is indeed going to be a valuable
>>>> addition
>>>>>> to the Kafka, excited to see the KIP.
>>>>>>
>>>>>> + 1 (Non-Binding)
>>>>>>
>>>>>> Regards,
>>>>>> Apoorv Mittal
>>>>>> +44 7721681581
>>>>>>
>>>>>>
>>>>>> On Sun, Mar 17, 2024 at 11:16 PM Andrew Schofield <
>>>>>> andrew_schofield_j...@outlook.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>> I’ve been working to complete KIP-932 over the past few months and
>>>>>>> discussions have quietened down.
>>>>>>>
>>>>>>> I’d like to open the voting for KIP-932:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-932%3A+Queues+for+Kafka
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Andrew
>>>>>>
>>>>



Re: [VOTE] KIP-932: Queues for Kafka

2024-05-19 Thread Andrew Schofield
Hi,
KIP-932 has been approved.

+5 (binding) - Edoardo Comar, Jun Rao, Satish Duggana, Manikumar Reddy,
   David Jacot
+4 (non-binding) - Lianet Magrans, Apoorv Mittal, Kenneth Eversole,
   Omnia Ibrahim

Thanks very much to everyone who put so much effort into reviewing this
complicated KIP. The aim is to have an early access release in Apache
Kafka 4.0.

Thanks,
Andrew

> On 16 May 2024, at 15:45, Omnia Ibrahim  wrote:
>
> Thanks for the KIP Andrew +1 none binding from me
>
>> On 16 May 2024, at 14:23, David Jacot  wrote:
>>
>> Hi Andrew,
>>
>> Thanks for the KIP! This is really exciting! +1 (binding) from me.
>>
>> One note regarding the partition assignor interface changes that you
>> proposed, it would be great to get the changes in 3.8 in order to not break
>> the API of KIP-848 after the preview.
>>
>> Best,
>> David
>>
>> On Wed, May 15, 2024 at 10:37 PM Jun Rao  wrote:
>>
>>> Hi, Andrew,
>>>
>>> Thanks for the update. Should we mark whether those metrics are
>>> standard/required for KIP-714?
>>>
>>> Jun
>>>
>>> On Tue, May 14, 2024 at 7:31 AM Andrew Schofield <
>>> andrew_schofi...@live.com>
>>> wrote:
>>>
>>>> Hi,
>>>> I have made a small update to the KIP as a result of testing the new
>>>> share consumer with client telemetry (KIP-714).
>>>>
>>>> I’ve added telemetry metric names to the table of client metrics and
>>>> also updated the metric group names so that the resulting client metrics
>>>> sent to the broker have consistent names.
>>>>
>>>> Thanks,
>>>> Andrew
>>>>
>>>>> On 8 May 2024, at 12:51, Manikumar  wrote:
>>>>>
>>>>> Hi Andrew,
>>>>>
>>>>> Thanks for the KIP.  Great write-up!
>>>>>
>>>>> +1 (binding)
>>>>>
>>>>> Thanks,
>>>>>
>>>>> On Wed, May 8, 2024 at 12:17 PM Satish Duggana <
>>> satish.dugg...@gmail.com>
>>>> wrote:
>>>>>>
>>>>>> Hi Andrew,
>>>>>> Thanks for the nice KIP, it will allow other messaging use cases to be
>>>>>> onboarded to Kafka.
>>>>>>
>>>>>> +1 from me.
>>>>>>
>>>>>> Satish.
>>>>>>
>>>>>> On Tue, 7 May 2024 at 03:41, Jun Rao 
>>> wrote:
>>>>>>>
>>>>>>> Hi, Andrew,
>>>>>>>
>>>>>>> Thanks for the KIP. +1
>>>>>>>
>>>>>>> Jun
>>>>>>>
>>>>>>> On Mon, Mar 18, 2024 at 11:00 AM Edoardo Comar <
>>> edoardli...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks Andrew,
>>>>>>>>
>>>>>>>> +1 (binding)
>>>>>>>>
>>>>>>>> Edo
>>>>>>>>
>>>>>>>> On Mon, 18 Mar 2024 at 16:32, Kenneth Eversole
>>>>>>>>  wrote:
>>>>>>>>>
>>>>>>>>> Hi Andrew
>>>>>>>>>
>>>>>>>>> + 1 (Non-Binding)
>>>>>>>>>
>>>>>>>>> This will be great addition to Kafka
>>>>>>>>>
>>>>>>>>> On Mon, Mar 18, 2024 at 8:27 AM Apoorv Mittal <
>>>> apoorvmitta...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Andrew,
>>>>>>>>>> Thanks for writing the KIP. This is indeed going to be a valuable
>>>>>>>> addition
>>>>>>>>>> to the Kafka, excited to see the KIP.
>>>>>>>>>>
>>>>>>>>>> + 1 (Non-Binding)
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Apoorv Mittal
>>>>>>>>>> +44 7721681581
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sun, Mar 17, 2024 at 11:16 PM Andrew Schofield <
>>>>>>>>>> andrew_schofield_j...@outlook.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>> I’ve been working to complete KIP-932 over the past few months
>>> and
>>>>>>>>>>> discussions have quietened down.
>>>>>>>>>>>
>>>>>>>>>>> I’d like to open the voting for KIP-932:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-932%3A+Queues+for+Kafka
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Andrew
>>>>>>>>>>
>>>>>>>>
>>>>
>>>>
>>>
>



Re: [VOTE] KIP-932: Queues for Kafka

2024-05-21 Thread Andrew Schofield
Hi Jun,
All the client metrics are standard. None are required.

I’ve updated the KIP accordingly.

Thanks,
Andrew

> On 15 May 2024, at 21:36, Jun Rao  wrote:
> 
> Hi, Andrew,
> 
> Thanks for the update. Should we mark whether those metrics are
> standard/required for KIP-714?
> 
> Jun
> 
> On Tue, May 14, 2024 at 7:31 AM Andrew Schofield 
> wrote:
> 
>> Hi,
>> I have made a small update to the KIP as a result of testing the new
>> share consumer with client telemetry (KIP-714).
>> 
>> I’ve added telemetry metric names to the table of client metrics and
>> also updated the metric group names so that the resulting client metrics
>> sent to the broker have consistent names.
>> 
>> Thanks,
>> Andrew
>> 
>>> On 8 May 2024, at 12:51, Manikumar  wrote:
>>> 
>>> Hi Andrew,
>>> 
>>> Thanks for the KIP.  Great write-up!
>>> 
>>> +1 (binding)
>>> 
>>> Thanks,
>>> 
>>> On Wed, May 8, 2024 at 12:17 PM Satish Duggana 
>> wrote:
>>>> 
>>>> Hi Andrew,
>>>> Thanks for the nice KIP, it will allow other messaging use cases to be
>>>> onboarded to Kafka.
>>>> 
>>>> +1 from me.
>>>> 
>>>> Satish.
>>>> 
>>>> On Tue, 7 May 2024 at 03:41, Jun Rao  wrote:
>>>>> 
>>>>> Hi, Andrew,
>>>>> 
>>>>> Thanks for the KIP. +1
>>>>> 
>>>>> Jun
>>>>> 
>>>>> On Mon, Mar 18, 2024 at 11:00 AM Edoardo Comar 
>>>>> wrote:
>>>>> 
>>>>>> Thanks Andrew,
>>>>>> 
>>>>>> +1 (binding)
>>>>>> 
>>>>>> Edo
>>>>>> 
>>>>>> On Mon, 18 Mar 2024 at 16:32, Kenneth Eversole
>>>>>>  wrote:
>>>>>>> 
>>>>>>> Hi Andrew
>>>>>>> 
>>>>>>> + 1 (Non-Binding)
>>>>>>> 
>>>>>>> This will be great addition to Kafka
>>>>>>> 
>>>>>>> On Mon, Mar 18, 2024 at 8:27 AM Apoorv Mittal <
>> apoorvmitta...@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi Andrew,
>>>>>>>> Thanks for writing the KIP. This is indeed going to be a valuable
>>>>>> addition
>>>>>>>> to the Kafka, excited to see the KIP.
>>>>>>>> 
>>>>>>>> + 1 (Non-Binding)
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> Apoorv Mittal
>>>>>>>> +44 7721681581
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Sun, Mar 17, 2024 at 11:16 PM Andrew Schofield <
>>>>>>>> andrew_schofield_j...@outlook.com> wrote:
>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> I’ve been working to complete KIP-932 over the past few months and
>>>>>>>>> discussions have quietened down.
>>>>>>>>> 
>>>>>>>>> I’d like to open the voting for KIP-932:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-932%3A+Queues+for+Kafka
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Andrew




Re: [VOTE] KIP-899: Allow producer and consumer clients to rebootstrap

2024-05-22 Thread Andrew Schofield
Hi,
I’ll add an item to the plan for KIP-932 to do this for the KafkaShareConsumer.
My guess is that there will be actually nothing to do in practice because of
code in common with the KafkaConsumer, but definitely worth me checking it out.

Thanks,
Andrew

> On 22 May 2024, at 18:28, Jun Rao  wrote:
>
> Hi, Ivan,
>
> Thanks for the reply. KafkaShareConsumer doesn't take all configuration
> values from KafkaConsumer. So, we need to make a note that this new config
> will be part of KafkaShareConsumer too.
>
> Jun
>
> On Wed, May 22, 2024 at 9:45 AM Ivan Yurchenko  wrote:
>
>> Hi!
>>
>> I had a look at the KIP-932, and it seems KafkaShareConsumer is to be
>> configured the same way as the normal consumer using key-value props. As I
>> understand correctly, no adaptation is needed for it to benefit from
>> KIP-899?
>>
>> Meanwhile, the PR [1] is open for review. If there are comments that
>> require changes, we can address them in the PR or in case it's already
>> merged, afterwards.
>>
>> Best,
>> Ivan
>>
>> [1] https://github.com/apache/kafka/pull/13277
>>
>> On Thu, May 16, 2024, at 01:52, Jun Rao wrote:
>>> Hi, Ivan,
>>>
>>> You are right. StreamsConfigs can take all existing consumer configs,
>> with
>>> or without prefixes. So, we don't need to add the new config to
>>> StreamsConfig explicitly.
>>>
>>> For KIP-932, it says for each new consumer config, we need to determine
>>> whether it should be added to ShareConsumer config too.
>>>
>>> Thanks,
>>>
>>> Jun
>>>
>>> On Wed, May 15, 2024 at 12:16 PM Ivan Yurchenko  wrote:
>>>
>>>> Hi Jun,
>>>>
>>>> Thank you for you comment. I was thinking that this
>>>> `metadata.recovery.strategy` could be passed to the relevant consumer
>> in
>>>> streams using the `restore.consumer.` prefix. I that what you meant or
>> I
>>>> misunderstand?
>>>> As for the KIP-932, I'll have a closer look.
>>>>
>>>> Ivan
>>>>
>>>>
>>>> On Wed, May 15, 2024, at 20:14, Jun Rao wrote:
>>>>> Hi, Ivan,
>>>>>
>>>>> Thanks for the KIP. +1
>>>>>
>>>>> Just a minor comment. Should we add metadata.recovery.strategy to the
>>>>> Streams and the newly introduced ShareConsumer (KIP-932) too?
>>>>>
>>>>> Jun
>>>>>
>>>>> On Wed, May 8, 2024 at 11:35 AM Manikumar >>
>>>> wrote:
>>>>>
>>>>>> Thanks for the KIP.
>>>>>>
>>>>>> +1 (binding).
>>>>>>
>>>>>> On Wed, Apr 17, 2024 at 7:50 PM Omnia Ibrahim <
>> o.g.h.ibra...@gmail.com
>>>>>
>>>>>> wrote:
>>>>>>>
>>>>>>> Hi Ivan,
>>>>>>> Thanks for the KIP this is a very nice feature to have.
>>>>>>> +1(non-binding)
>>>>>>> Omnia
>>>>>>>> On 15 Apr 2024, at 14:33, Andrew Schofield <
>>>> andrew_schofi...@live.com>
>>>>>> wrote:
>>>>>>>>
>>>>>>>> Thanks for the KIP
>>>>>>>>
>>>>>>>> +1 (non-binding)
>>>>>>>>
>>>>>>>> Andrew
>>>>>>>>
>>>>>>>>> On 15 Apr 2024, at 14:16, Chris Egerton
>> 
>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hi Ivan,
>>>>>>>>>
>>>>>>>>> Thanks for the KIP. After the recent changes, this LGTM. +1
>>>> (binding)
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>>
>>>>>>>>> Chris
>>>>>>>>>
>>>>>>>>> On Wed, Aug 2, 2023 at 12:15 AM Ivan Yurchenko <
>>>>>> ivan0yurche...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> The discussion [1] for KIP-899 [2] has been open for quite
>> some
>>>>>> time. I'd
>>>>>>>>>> like to put the KIP up for a vote.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Ivan
>>>>>>>>>>
>>>>>>>>>> [1]
>>>> https://lists.apache.org/thread/m0ncbmfxs5m87sszby2jbmtjx2bdpcdl
>>>>>>>>>> [2]
>>>>>>>>>>
>>>>>>>>>>
>>>>>>
>>>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-899%3A+Allow+producer+and+consumer+clients+to+rebootstrap
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>



Re: [VOTE] KIP 1047 - Introduce new org.apache.kafka.tools.api.Decoder to replace kafka.serializer.Decoder

2024-05-24 Thread Andrew Schofield
Thanks for the KIP.

+1 (non-binding)

Thanks,
Andrew

> On 23 May 2024, at 18:48, Chia-Ping Tsai  wrote:
>
>
> +1
>
> Thanks for Yang to take over this!
>
>> Frank Yang  於 2024年5月24日 凌晨12:27 寫道:
>>
>> Hi all,
>>
>> I would like to start a vote on KIP-1047: Introduce new
>> org.apache.kafka.tools.api.Decoder to replace kafka.serializer.Decoder.
>>
>> KIP: 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1047+Introduce+new+org.apache.kafka.tools.api.Decoder+to+replace+kafka.serializer.Decoder
>>
>> Discussion thread: 
>> https://lists.apache.org/thread/n3k6vb4vddl1s5nopcyglnddtvzp4j63
>>
>> Thanks and regards,
>> PoAn



[DISCUSS] KIP-1043: Administration of groups

2024-06-04 Thread Andrew Schofield
Hi,
I would like to start a discussion thread on KIP-1043: Administration of 
groups. This KIP enhances the command-line tools to make it easier to 
administer groups on clusters with a variety of types of groups.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-1043%3A+Administration+of+groups

Thanks.
Andrew

Re: [DISCUSS] KIP-1043: Administration of groups

2024-06-06 Thread Andrew Schofield
Hi Kirk,
Thanks for your comments.

1. I’m a big fan of consistency in these things and the method signatures match
ListConsumerGroupsResult and ListShareGroupsResult.

2. Yes, client-side filtering.

3. I didn’t offer “classic” as an option for --group-type. I’ve kicked the 
options
around in my mind for a while and I decided that using --group-type as a way of
filtering types in a way that a normal user would understand them was a good
place to start. For example, I didn’t have `--protocol consumer` for consumer 
groups
and `--group-type share` for share groups, even though that’s technically more
correct.

Since KIP-848, the set of consumer groups is actually formed from those which
use the classic protocol and those which use the modern protocol. This tool
gives you both together when you use `--group-type consumer`, which is exactly
what kafka-consumer-groups.sh does.

Do you think - -group-type classic is helpful? It would give a list of all 
groups using
any variant of the classic group protocol. I can easily add it.

4, 5. Yes, maybe the wording of the message could improve. These things are 
always
tricky. I went with “Group CG1 is not a share group.” because it doesn’t 
require the tool
to interpret the group type in order to generate the message.

Imagine this scenario. You are using kafka-share-groups.sh --describe and you’ve
used the group ID of a consumer group. Here are some options:

a) “Group CG1 is not a share group.”
b) “Incorrect group type (Consumer). Group CG1 is not a share group.”
c) “Group CG1 has the wrong type for this operation. It is not a share group."

I don’t think “There is already a (consumer) group named ‘CG1’” is quite right.

Any preference?

6. Yes, it is a change in behaviour which is why I mention it in the KIP.
Personally, I think that’s OK because the existing message is misleading
and could definitely cause frustration. Let’s see what other reviewers think.

Thanks,
Andrew

> On 6 Jun 2024, at 00:44, Kirk True  wrote:
>
> Hi Andrew,
>
> Thanks for the KIP! I don’t have much experience as a Kafka operator, but 
> this seems like a very sane proposal.
>
> Questions & comments:
>
> 1. Do you think the ListGroupsResult.all() method is a bit of a potential 
> ‘foot gun’? I can imagine cases where developers reach for that without 
> understanding its potential of throwing errors. It could lead to cases where 
> all() works in development but not in production.
>
> 2. Based on the LIST_GROUPS RPC, it appears that filtering is all performed 
> client side, correct? (I know that’s not specific to this KIP, but just want 
> to make sure I understand.)
>
> 3. For kafka-groups.sh --list, is ‘classic’ valid for --group-type? If so, 
> should we allow users of kafka-groups.sh --list to provide multiple 
> --group-type arguments?
>
> 4. In the last kafka-share-groups.sh --create example (“ConsumerGroup”), the 
> error simply states that “Group 'ConsumerGroup’ is not a share group.” I’m 
> assuming that’s the case where the user gets a failure when there’s already a 
> group named “ConsumerGroup”, right? If so, the error should be something like 
> “There is already a (consumer) group named ’ConsumerGroup’”.
>
> 5. In the last kafka-share-groups.sh --describe example, how hard is it to 
> add the type of group that CG1 is, just for a bit of clarity for the user?
>
> 6. In the kafka-consumer-groups.sh section, it states "if that group exists 
> but is not a consumer group, the command fails with a message indicating that 
> the group type is incorrect, rather than the existing message that the group 
> does not exist.” That sounds like a change that could trip up some brittle 
> scripts somewhere, but I don’t know if that’s a serious problem.
>
> Thanks!
>
>> On Jun 4, 2024, at 10:08 AM, Andrew Schofield  
>> wrote:
>>
>> Hi,
>> I would like to start a discussion thread on KIP-1043: Administration of 
>> groups. This KIP enhances the command-line tools to make it easier to 
>> administer groups on clusters with a variety of types of groups.
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1043%3A+Administration+of+groups
>>
>> Thanks.
>> Andrew
>



Re: [DISCUSS] KIP-1050: Consistent error handling for Transactions

2024-06-06 Thread Andrew Schofield
Hi Kaushik,
Thanks for the KIP. This is definitely an area that needs clearing up so it’s 
good to see it.

A few initial questions.

1. If I understand correctly, you are proposing to change the superclass of all
subclasses of o.a.k.common.errors.ApiException which can be thrown by the 
producer
or transaction APIs. Is this accurate?

2. You have 4 exception types (and 2 subtypes) in the list in Proposed Changes. 
So, I
would expect 5 new exception classes with names with a direct correspondence to
the list of types, or at least an explicit statement of which of the new 
exceptions
maps to each of the types. For example, “Producer-Recoverable” doesn’t seem to
have a new exception. I’m a bit confused.

3. Some of the error types, the “producer retriable” ones, can be handled
directly within the producer code and do not need to be surfaced to the 
applications.
I suppose this means there is no need to make any change at all for these 
exceptions.
By changing these exceptions too, I suppose it makes it simple in the producer 
to
figure out which errors can be retried internally.

Could you be explicit about which exceptions are going to be changed and which
class is the new superclass? For one thing, having a table would make it obvious
to the code reviewers whether the intended change was being made.

4. A few nits about the Java code snippet for the new exception types.

a. ApiException has 4 constructors: (), (String), (Throwable) and (String, 
Throwable).
I think you’ll need 4 constructors for each of your new exceptions.

b. In some cases, the class names and the constructor names are inconsistent.

5. I believe that a producer application which received 
TopicAuthorizationException
could continue to use the same Producer attempting to use the topic while the
administrator fixed the ACLs to grant access, and then the operations would 
start
working as soon as the access was in place. Is this true for the authorization
exceptions in this KIP? Does it vary by resource type, such as transactional ID?

Thanks,
Andrew

> On 6 Jun 2024, at 12:57, Kaushik Raina  wrote:
>
> Hi everyone, I would like to start a discussion thread for KIP-1050:
> Consistent error handling for Transactions
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1050%3A+Consistent+error+handling+for+Transactions
>
>
> Thanks
> Kaushik Raina



Re: [DISCUSS] KIP-1017: A health check endpoint for Kafka Connect

2024-06-07 Thread Andrew Schofield
Hi Chris,
This KIP looks good to me. I particularly like the explanation of how the 
result will specifically
check the worker health in ways that have previously caused trouble.

Thanks,
Andrew

> On 7 Jun 2024, at 16:18, Mickael Maison  wrote:
>
> Hi Chris,
>
> Happy Friday! The KIP looks good to me. +1
>
> Thanks,
> Mickael
>
> On Fri, Jan 26, 2024 at 8:41 PM Chris Egerton  wrote:
>>
>> Hi all,
>>
>> Happy Friday! I'd like to kick off discussion for KIP-1017, which (as the
>> title suggests) proposes adding a health check endpoint for Kafka Connect:
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1017%3A+Health+check+endpoint+for+Kafka+Connect
>>
>> This is one of the longest-standing issues with Kafka Connect and I'm
>> hoping we can finally put it in the ground soon. Looking forward to hearing
>> people's thoughts!
>>
>> Cheers,
>>
>> Chris



Re: [VOTE] KIP-1017: A health check endpoint for Kafka Connect

2024-06-14 Thread Andrew Schofield
Hi Chris,
Thanks for the KIP.

+1 (non-binding)

Thanks,
Andrew

> On 14 Jun 2024, at 15:48, Chris Egerton  wrote:
>
> Hi all,
>
> Happy Friday! I'd like to kick off a vote on KIP-1017.
>
> Design doc:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1017%3A+Health+check+endpoint+for+Kafka+Connect
>
> Discussion thread:
> https://lists.apache.org/thread/95nto84wtogdgk3v97rxcwxr258wnjnt
>
> Cheers,
>
> Chris



Re: [DISCUSS] KIP-1059: Enable the Producer flush() method to clear the latest send() error

2024-06-20 Thread Andrew Schofield
Hi Alieh,
Thanks for the KIP.

I *really* don’t like adding a config which changes the behaviour of the
flush() method. We already have too many configs. But I totally understand
the problem that you’re trying to solve and some of the other suggestions
in this thread seem neater.

Personally, I would add another method to KafkaProducer. Not an overload
on flush() because this is not flush() at all. Using Matthias’s options,
I prefer (3).

Thanks,
Andrew

> On 20 Jun 2024, at 15:08, Lianet M.  wrote:
>
> Hi all, thanks for the KIP Alieh!
>
> LM1. Totally agree with Artem's point about the config not being the most
> explicit/flexible way to express this capability. Getting then to Matthias
> 4 options, what I don't like about 3 and 4 is that it seems they might not
> age very well? Aren't we going to be wanting some other twist to the flush
> semantics that will have us adding yet another param to it, or another
> overloaded method? I truly don't have the context to answer that, but if it
> feels like a realistic future maybe adding some kind FlushOptions params to
> the flush would be better from an extensibility point of view. It would
> only have the clearErrors option available for now but could accept any
> other we may need. I find that this would remove the "ugliness" Matthias
> pointed out for 3. and 4.
>
> LM2. No matter how we end up expressing the different semantics for flush,
> let's make sure we update the KIP on the flush and commitTransaction java
> docs. It currently states that  flush "clears the last exception" and
> commitTransaction "will NOT throw" if called after flush, but it really all
> depends on the config/options/method used.
>
> LM3. I find it would be helpful to include an example to show the new flow
> that we're unblocking (I see this as the great gain here): flush with clear
> error option enabled -> catch and do whatever error handling we want ->
> commitTransaction successfully
>
> Thanks!
>
> Lianet
>
> On Wed, Jun 19, 2024 at 11:26 PM Chris Egerton 
> wrote:
>
>> Hi Matthias,
>>
>> I like the alternatives you've listed. One more that might help is if,
>> instead of overloading flush(), we overloaded commitTransaction() to
>> something like commitTransaction(boolean tolerateRecordErrors). This seems
>> slightly cleaner in that it takes the behavioral change we want, which only
>> applies to transactional producers, to an API method that is only used for
>> transactional producers. It would also avoid the issue of whether or not
>> flush() (or a new variant of it with altered semantics) should throw or
>> not. Thoughts?
>>
>> Hi Alieh,
>>
>> Thanks for the KIP, I like this direction a lot more than the pluggable
>> handler!
>>
>> I share Artem's concerns that enabling this behavior via configuration
>> doesn't seem like a great fit. It's likely that application code will be
>> written in a style that only works with one type of behavior from
>> transactional producers, so requiring that application code to declare its
>> expectations for the behavior of its producer seems more appropriate than,
>> e.g., allowing users deploying that application to tweak a configuration
>> file that gets fed to producers spun up inside it.
>>
>> Cheers,
>>
>> Chris
>>
>> On Wed, Jun 19, 2024 at 10:32 PM Matthias J. Sax  wrote:
>>
>>> Thanks for the KIP Alieh. I actually like the KIP as-is, but think
>>> Arthem raises very good points...
>>>
>>> Seems we have four options on how to move forward?
>>>
>>>  1. add config to allow "silent error clearance" as the KIP proposes
>>>  2. change flush() to clear error and let it throw
>>>  3. add new flushAndThrow()` (or better name) which clears error and
>>> throws
>>>  4. add `flush(boolean clearAndThrow)` and let user pick (and deprecate
>>> existing `flush()`)
>>>
>>> For (2), given that it would be a behavior change, we might also need a
>>> public "feature flag" config.
>>>
>>> It seems, both (1) and (2) have the issue Artem mentioned. (3) and (4)
>>> would be safer to this end, however, for both we kinda get an ugly API?
>>>
>>> Not sure right now if I have any preference. Seems we need to pick some
>>> evil and that there is no clear best solution? Would be good to her from
>>> others what they think
>>>
>>>
>>> -Matthias
>>>
>>>
>>> On 6/18/24 8:39 PM, Artem Livshits wrote:
 Hi Alieh,

 Thank you for the KIP.  I have a couple of suggestions:

 AL1.  We should throw an error from flush after we clear it.  This
>> would
 make it so that both "send + commit" and "send + flush + commit" (the
 latter looks like just a more verbose way to express the former, and it
 would be intuitive if it behaves the same) would throw if the
>> transaction
 has an error (so if the code is written either way it's going be
>>> correct).
 At the same time, the latter could be extended by the caller to
>> intercept
 exceptions from flush, ignore as needed, and commit the transaction.
>>> This
 solution would keep basi

Re: [DISCUSS] KIP-1059: Enable the Producer flush() method to clear the latest send() error

2024-06-20 Thread Andrew Schofield
Hi Artem,
I think you make a good point which is worth further consideration. If
any of the existing methods is really ripe for a change here, it’s the
send() that actually caused the problem. If that can be fixed so there are
no situations in which a lurking error breaks a transaction, that might be
the best.

Thanks,
Andrew

> On 21 Jun 2024, at 01:51, Artem Livshits  
> wrote:
>
>> I thought we still wait for requests (and their errors) to come in and
> could handle fatal errors appropriately.
>
> We do wait for requests, but my understanding is that when
> commitTransaction("ignore send errors") we want to ignore errors.  So if we
> do
>
> 1. send
> 2. commitTransaction("ignore send errors")
>
> the commit will succeed.  You can look at the example in
> https://issues.apache.org/jira/browse/KAFKA-9279 and just substitute
> commitTransaction with commitTransaction("ignore send errors") and we get
> the buggy behavior back :-).  Actually, this would potentially be even
> worse than the original buggy behavior because the bug was that we ignored
> errors that happened in the "send()" method itself, not necessarily the
> ones that we got from the broker.
>
> Actually, looking at https://github.com/apache/kafka/pull/11508/files,
> wouldn't a better solution be to just throw the error from the "send"
> method itself, rather than trying to set it to be thrown during commit?
> This way the example in https://issues.apache.org/jira/browse/KAFKA-9279
> would be fixed, and at the same time it would give an opportunity for KS to
> catch the error and ignore it if needed.  Not sure if we need a KIP for
> that, just do a better fix of the old bug.
>
> -Artem
>
> On Thu, Jun 20, 2024 at 4:58 PM Justine Olshan 
> wrote:
>
>> I'm a bit late to the party, but the discussion here looks reasonable.
>> Moving the logic to a transactional method makes sense to me and makes me
>> feel a bit better about keeping the complexity in the methods relevant to
>> the issue.
>>
>>> One minor concern is that if we set "ignore send
>> errors" (or whatever we decide to name it) option without explicit flush,
>> it'll actually lead to broken behavior as the application won't be able to
>> stop a commit from proceeding even on fatal errors.
>>
>> Is this with respect to the case a request is still inflight when we call
>> commitTransaction? I thought we still wait for requests (and their errors)
>> to come in and could handle fatal errors appropriately.
>>
>> Justine
>>
>> On Thu, Jun 20, 2024 at 4:32 PM Artem Livshits
>>  wrote:
>>
>>> Hi Matthias (and other folks who suggested ideas),
>>>
>>>> maybe `commitTransaction(CommitOptions)` or similar could be a good
>> way
>>> forward?
>>>
>>> I like this approach.  One minor concern is that if we set "ignore send
>>> errors" (or whatever we decide to name it) option without explicit flush,
>>> it'll actually lead to broken behavior as the application won't be able
>> to
>>> stop a commit from proceeding even on fatal errors.  But I guess we'll
>> just
>>> have to clearly document it.
>>>
>>> In some way we are basically adding a flag to optionally restore the
>>> https://issues.apache.org/jira/browse/KAFKA-9279 bug, which is the
>>> motivation for all these changes, anyway :-).
>>>
>>> -Artem
>>>
>>>
>>> On Thu, Jun 20, 2024 at 2:18 PM Matthias J. Sax 
>> wrote:
>>>
>>>> Seems the option to use a config does not get a lot of support.
>>>>
>>>> So we need to go with some form or "overload / new method". I think
>>>> Chris' point about not coupling it to `flush()` but rather
>>>> `commitTransaction()` is actually a very good one; for non-tx case, the
>>>> different flush variants would not make sense.
>>>>
>>>> I also like Lianet's idea to pass in some "options" object, so maybe
>>>> `commitTransaction(CommitOptions)` or similar could be a good way
>>>> forward? It's much better than a `boolean` parameter, aesthetically, as
>>>> we as extendable in the future if necessary.
>>>>
>>>> Given that we would pass in an optional parameter, we might not even
>>>> need to deprecate the existing `commitTransaction()` method?
>>>>
>>>>
>>>>
>>>> -Matthias
>>>>
>>>> On 6/20/24 9:12 AM, Andrew Schofield 

Re: [DISCUSS] KIP-1059: Enable the Producer flush() method to clear the latest send() error

2024-06-24 Thread Andrew Schofield
n), the error is
>> actually
>>>>> noticed by the producer before send() returns, so it should be
>> possible to
>>>>> throw directly.
>>>>> Cheers,
>>>>> Chris
>>>>> On Fri, Jun 21, 2024, 14:25 Matthias J. Sax  wrote:
>>>>>> Not sure if we can change send and make it throw, given that send() is
>>>>>> async? That is why users can register a `Callback` to begin with,
>> right?
>>>>>>
>>>>>> And Alieh's point about backward compatibility is also a fair concern.
>>>>>>
>>>>>>
>>>>>>> Actually, this would potentially be even
>>>>>>> worse than the original buggy behavior because the bug was that we
>>>>>> ignored
>>>>>>> errors that happened in the "send()" method itself, not necessarily
>> the
>>>>>>> ones that we got from the broker.
>>>>>>
>>>>>> My understanding was that `commitTx(swallowError)` would only swallow
>>>>>> `send()` errors, not errors about the actually commit. I agree that it
>>>>>> would be very bad to swallow errors about the actual tx commit...
>>>>>>
>>>>>> It's a fair question if this might be too subtle; to make it explicit,
>>>>>> we could use `CommitOpions#ignorePendingSendErors()` [working name] to
>>>>>> make it clear.
>>>>>>
>>>>>>
>>>>>> If we think it's too subtle to change commit to swallow send() errors,
>>>>>> maybe going with `flush(FlushOptions)` would be clearer (and we can
>> use
>>>>>> `FlushOption#swallowSendErrorsForTransactions()` [working name] to be
>>>>>> explicitly that the `FlushOption` for now has only an effect for TX).
>>>>>>
>>>>>>
>>>>>> Thoughts?
>>>>>>
>>>>>>
>>>>>> -Matthias
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 6/21/24 4:10 AM, Alieh Saeedi wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>>
>>>>>>> It is very exciting to see all the experts here raising very good
>> points.
>>>>>>>
>>>>>>> As we go further, we see more and more options to improve our
>> solution,
>>>>>>> which makes concluding and updating the KIP impossible.
>>>>>>>
>>>>>>>
>>>>>>> The main suggestions so far are:
>>>>>>>
>>>>>>> 1. `flush` with `flushOptions` as input parameter
>>>>>>>
>>>>>>> 2. `commitTx` with `commitOptions` as input parameter
>>>>>>>
>>>>>>> 3. `send` must throw the exception
>>>>>>>
>>>>>>>
>>>>>>> My concern about the 3rd suggestion:
>>>>>>>
>>>>>>> 1. Does the change cause any issue with backward compatibility?
>>>>>>>
>>>>>>> 2. The `send (bad record)` already transits the transaction to the
>> error
>>>>>>> state. No user, including Streams is able to transit the transaction
>> back
>>>>>>> from the error state. Do you mean we remove the
>>>>>>> `maybeTransitionToErrorState(e)` from here
>>>>>>> <
>>>>>>
>> https://github.com/apache/kafka/blob/9b5b434e2a6b2d5290ea403fc02859b1c523d8aa/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L1112
>>>>>>>
>>>>>>> as well?
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Alieh
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jun 21, 2024 at 8:45 AM Andrew Schofield <
>>>>>> andrew_schofi...@live.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Artem,
>>>>>>>> I think you make a good point which is worth further consideration.
>> If
>>>>>>>> any of the existing methods is really ripe for a change here, it’s
>> the
>>>>>>>> send() that actually caused the problem. If that can be fixed so
>> there
>>>>>> are
>>>>>>&g

Re: [DISCUSS] KIP-1059: Enable the Producer flush() method to clear the latest send() error

2024-06-24 Thread Andrew Schofield
Hi Chris,
That works for me too. I slightly prefer an option on flush(), but what you 
suggested
works too.

Thanks,
Andrew

> On 24 Jun 2024, at 15:14, Chris Egerton  wrote:
>
> Hi Andrew,
>
> I like a lot of what you said, but I still believe it's better to override
> commitTransaction than flush. Users will already have to manually opt in to
> ignoring errors encountered during transactions, and we can document
> recommended usage (i.e., explicitly invoking flush() before invoking
> commitTransaction(ignoreRecordErrors)) in the newly-introduced method. I
> don't believe it's worth the increased cognitive load on users with
> non-transactional producers to introduce an overloaded flush() variant.
>
> Cheers,
>
> Chris
>
> On Mon, Jun 24, 2024 at 9:39 AM Andrew Schofield 
> wrote:
>
>> Hi Alieh,
>> Thanks for driving this. Unfortunately, there are many parts of the API
>> which
>> are a bit unfortunate and it’s tricky to make small improvements that
>> don’t have
>> downsides.
>>
>> I don’t like the idea of using a configuration because configuration is
>> often
>> outside the application and changing the behaviour of someone else’s
>> application
>> without understanding it is risky. Anything which embeds a transactional
>> producer
>> could have its behaviour changed unexpectedly.
>>
>> It would be been much nicer if send() didn’t fail silently and change the
>> transaction
>> state. But, because it’s an asynchronous operation, I don’t really think
>> we can
>> just make it throw all exceptions, even though I really think that
>> `send()` is the
>> method with the problem here.
>>
>> The contract of `flush()` is that it makes sure that all preceding sends
>> will have
>> completed, so it should be true that a well written application would be
>> able to
>> know which records were OK because of the Future returned
>> by the `send()` method. It should be able to determine whether it wants to
>> commit
>> the transaction even if some of the intended operations didn’t succeed.
>>
>> What we don’t currently have is a way for the application to say to the
>> KafkaProducer
>> that it knows the outcome of sending the records and to confirm that it
>> wants to proceed.
>> Then it would not be necessary for `commitTransaction()` to throw an
>> exception to
>> report a historical error which the application might choose to ignore.
>>
>> Having read the comments, I think the KIP is on the right lines focusing
>> on the `flush()`
>> method. My suggestion is that we introduce an option on `flush()` to be
>> used before
>> `commitTransaction()` for applications that want to be able to commit
>> transactions which
>> had known failed operations.
>>
>> The code would be:
>>
>>   producer.beginTransaction();
>>
>>   future1 = producer.send(goodRecord1);
>>   future2 = producer.send(badRecord); // The future from this call will
>> complete exceptionally
>>   future3 = producer.send(goodRecord2);
>>
>>   producer.flush(FlushOption.TRANSACTION_READY);
>>
>>   // At this point, we know that all 3 futures are complete and the
>> transaction contains 2 records
>>   producer.commitTransaction();
>>
>> I wouldn’t deprecate `flush()` with no option. It just uses the default
>> option which behaves
>> like today.
>>
>> Why did I suggest an option on `flush()` rather than
>> `commitTransaction()`? Because with
>> `flush()`, it’s clear when the application is stating that it’s seen all
>> of the results from its
>> `send()` calls and it’s ready to proceed. If it has to rely on flushing
>> that occurs inside
>> `commitTransaction()`, I don’t see it’s as clear-cut.
>>
>> Thanks,
>> Andrew
>>
>>
>>
>>> On 24 Jun 2024, at 13:44, Alieh Saeedi 
>> wrote:
>>>
>>> Hi all,
>>> Thanks for the interesting discussion.
>>>
>>> I assume that now the main questions are as follows:
>>>
>>> 1. Do we need to transit the transcation to the error state for API
>>> exceptions?
>>> 2. Should we throw the API exception in `send()` instead of returning a
>>> future error?
>>> 3. If the answer to question (1) is NO and to question (2) is YES, do we
>>> need to change the current `flush` or `commitTnx` at all?
>>>
>>> Cheers,
>>> Alieh
>>>
>>> On Sat, Jun 22, 2024 at 3:21 AM Matthias J. Sax 
>> wrote:
>>>
>>>> Hey K

  1   2   3   4   5   6   7   >