[GitHub] [kafka-site] orchome opened a new pull request #357: Update powered-by.html

2021-06-01 Thread GitBox


orchome opened a new pull request #357:
URL: https://github.com/apache/kafka-site/pull/357


   Added information about OrcHome Inc. to powerd-by.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (KAFKA-12870) RecordAccumulator stuck in a flushing state

2021-06-01 Thread Niclas Lockner (Jira)
Niclas Lockner created KAFKA-12870:
--

 Summary: RecordAccumulator stuck in a flushing state
 Key: KAFKA-12870
 URL: https://issues.apache.org/jira/browse/KAFKA-12870
 Project: Kafka
  Issue Type: Bug
  Components: producer , streams
Affects Versions: 2.8.0, 2.6.1
Reporter: Niclas Lockner


After a Kafka Stream with exactly once enabled has performed its first commit, 
the RecordAccumulator within the stream's internal producer gets stuck in a 
state where all subsequent ProducerBatches that get allocated are immediately 
flushed instead of being held in memory until they expire, regardless of the 
stream's linger or batch size config.

This is reproduced in the example code found at , 
which can be run with ./gradlew run --args=

The example has a producer that sends 1 record/sec to one topic, and a Kafka 
stream with EOS enabled that forwards the records from that topic to another 
topic with the configuration linger = 5 sec, commit interval = 10 sec.

 

The expected behavior when running the example is that the stream's 
ProducerBatches will expire (or get flushed because of the commit) every 5th 
second, and that the stream's producer will send a ProduceRequest every 5th 
second with an expired ProducerBatch that contains 5 records.

The actual behavior is that the ProducerBatch is made immediately available for 
the Sender, and the Sender sends one ProduceRequest for each record.

 

The example code contains a copy of the RecordAccumulator class (copied from 
kafka-clients 2.8.0) with some additional logging added to
 * RecordAccumulator#ready(Cluster, long)
 * RecordAccumulator#beginFlush()
 * RecordAccumulator#awaitFlushCompletion()

These log entries show
 * that the batches are considered sendable because a flush is in progress
 * that Sender.maybeSendAndPollTransactionalRequest() calls RecordAccumulator's 
beginFlush() without also calling awaitFlushCompletion(), and that this makes 
RecordAccumulator's flushesInProgress jump between 1-2 instead of the expected 
0-1.

 

This issue is not reproducible in version 2.3.1.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12871) Kafka-connect : rest api connect config, fields are not ordered

2021-06-01 Thread raphael auv (Jira)
raphael auv created KAFKA-12871:
---

 Summary: Kafka-connect : rest api connect config, fields are not 
ordered
 Key: KAFKA-12871
 URL: https://issues.apache.org/jira/browse/KAFKA-12871
 Project: Kafka
  Issue Type: Improvement
Reporter: raphael auv


We you query the rest api of kafka-connect o get the config of a connector the 
fields are not ordered in alphabetical order 

[http://my_kafka_connect_cluster:8083/connectors/my_connector_1/]

answer :


{code:java}
{
   "name":"my_connector_1",
   "config":{
  "connector.class":"Something",
  "errors.log.include.messages":"true",
  "tasks.max":"2",
  "buffer.flush.time":"300",
  "topics.regex":"^(?:.*",
  "errors.deadletterqueue.context.headers.enable":"true",
  "buffer.count.records":"100",
  "name":"my_connector_1",
  "errors.log.enable":"true",
  "key.converter":"org.apache.kafka.connect.storage.StringConverter",
  "buffer.size.bytes":"2000"
   },
   "tasks":[
  {
 "connector":"my_connector_1",
 "task":0
  },
  {
 "connector":"my_connector_1",
 "task":1
  }
   ],
   "type":"sink"
}{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] KIP-744: Migrate TaskMetadata to interface with internal implementation

2021-06-01 Thread Bruno Cadonna

Hi Josep,

Thank you for the KIP!

Specifying TaskMetadata as an interface lifts the guarantee that 
equals() and hashCode() on the TaskMetadata return consistent values for 
the same content. Note that the current TaskMetadata implements both 
methods, so theoretically users might already rely on them.


The List interface in Java defines its equals()[1] and hashCode()[2] 
methods specifically and all correct implementations must adhere to that 
definition.


Maybe it makes sense to add such a guarantee to the TaskMetadata 
interface on which users can rely similarly to the Java's List interface.


Regarding your question about ThreadMetadata, I had the same thought. I 
would add it to the KIP unless the extended KIP risks not to be accepted 
before KIP freeze for 3.0.


Best,
Bruno


[1] 
https://docs.oracle.com/javase/8/docs/api/java/util/List.html#equals-java.lang.Object-

[2] https://docs.oracle.com/javase/8/docs/api/java/util/List.html#hashCode--




On 31.05.21 12:36, Josep Prat wrote:

I'm starting to wonder if we should make ThreadMetadata follow the same
path as TaskMetadata and be converted to an interface + internal
implementation. What are your thoughts?


Thanks in advance,
Josep



On Mon, May 31, 2021 at 12:10 PM Josep Prat  wrote:


Hi there,
While looking closely at the classes affected by this change, I realized
that ThreadMetadata has a reference in its public API to the soon to be
deprecated TaskMetadata. For this reason, I updated the KIP to reflect that
2 more changes are needed for this KIP:
ThreadMetadata:
- Deprecate activeTasks() and standbyTasks()
- Add getActiveTasks() and getStandbyTasks() that return the new Interface
instead.

I also wrote it on the KIP, but this thing crossed my mind, do we need to
keep source compatibility for this particular change?


I would be really grateful if you could provide any feedback on the KIP (
https://cwiki.apache.org/confluence/x/XIrOCg)

Thanks in advance,

On Fri, May 28, 2021 at 10:24 AM Josep Prat  wrote:


Hi there,
I updated the KIP page with Sophie's feedback. As she already mentioned,
the intention would be to include this KIP in release 3.0.0 so we can avoid
a deprecation cycle for the getTaskID method introduced in KIP-740, I hope
I managed to capture this in the KIP description.

Just adding the link again for convenience:
https://cwiki.apache.org/confluence/x/XIrOCg

Thanks in advance,

On Thu, May 27, 2021 at 10:08 PM Josep Prat  wrote:


Hi Sophie,

Thanks for the feedback, I'll update the KIP tomorrow with your
feedback. They are all good points, and you are right, my phrasing could be
misleading.


Best,

On Thu, May 27, 2021 at 10:02 PM Sophie Blee-Goldman
 wrote:


Thanks for the KIP! I'm on board with the overall proposal, just a few
comments:

1) The motivation section says

TaskMetadata should have never been a class available for the general

public, but more of an internal class



which is a bit misleading as it seems to imply that TaskMetadata itself
was
never meant to be part of the public API
at all. It might be better to phrase this as "TaskMetadata was never
intended to be a public class that a user might
need to instantiate, but rather an API for exposing metadata which is
better served as an interface" --- or something
to that effect.

2) You touch on this in a later section, but it would be good to call
out
directly in the *Public Interfaces* section that
you are proposing to remove the `public TaskId getTaskId()` method that
we
added in KIP-740. Also I just want to
note that to do so will require getting this KIP into 3.0, otherwise
we'll
need to go through a deprecation cycle for
that API. I don't anticipate this being a problem as KIP freeze is still
two weeks away, but it would be good to clarify.

3) nit: we should put the new internal implementation class under
the org.apache.kafka.streams.processor.internals
package instead of under org.apache.kafka.streams.internals. But this
is an
implementation detail and as such
doesn't need to be covered by the KIP in the first place.

- Sophie

On Thu, May 27, 2021 at 1:55 AM Josep Prat 


wrote:


I deliberately picked the most conservative approach of creating a new
Interface, instead of transforming the current class into an

interface.

Feedback is most welcome!

Best,

On Thu, May 27, 2021 at 10:26 AM Josep Prat 

wrote:



Hi there,
I would like to propose KIP-744, to introduce TaskMetadata as an
interface, to keep the its implementation as internal use.
This KIP can be seen as a spin-off of KIP-740.

https://cwiki.apache.org/confluence/x/XIrOCg

Best,
--

Josep Prat

*Aiven Deutschland GmbH*

Immanuelkirchstraße 26, 10405 Berlin

Amtsgericht Charlottenburg, HRB 209739 B

Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen

*m:* +491715557497

*w:* aiven.io

*e:* josep.p...@aiven.io




--

Josep Prat

*Aiven Deutschland GmbH*

Immanuelkirchstraße 26, 10405 Berlin

Amtsgericht Charlottenburg, HRB 209739 B

Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen

[jira] [Resolved] (KAFKA-12866) Kafka requires ZK root access even when using a chroot

2021-06-01 Thread Manikumar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-12866.
---
Resolution: Fixed

> Kafka requires ZK root access even when using a chroot
> --
>
> Key: KAFKA-12866
> URL: https://issues.apache.org/jira/browse/KAFKA-12866
> Project: Kafka
>  Issue Type: Bug
>  Components: core, zkclient
>Affects Versions: 2.6.1, 2.8.0, 2.7.1, 2.6.2
>Reporter: Igor Soarez
>Assignee: Igor Soarez
>Priority: Major
> Fix For: 3.0.0
>
>
> When a Zookeeper chroot is configured, users do not expect Kafka to need 
> Zookeeper access outside of that chroot.
> h1. Why is this important?
> A zookeeper cluster may be shared with other Kafka clusters or even other 
> applications. It is an expected security practice to restrict each 
> cluster/application's access to it's own Zookeeper chroot.
> h1. Steps to reproduce
> h2. Zookeeper setup
> Using the zkCli, create a chroot for Kafka, make it available to Kafka but 
> lock the root znode.
>  
> {code:java}
> [zk: localhost:2181(CONNECTED) 1] create /somechroot
> Created /some
> [zk: localhost:2181(CONNECTED) 2] setAcl /somechroot world:anyone:cdrwa
> [zk: localhost:2181(CONNECTED) 3] addauth digest test:12345
> [zk: localhost:2181(CONNECTED) 4] setAcl / 
> digest:test:Mx1uO9GLtm1qaVAQ20Vh9ODgACg=:cdrwa{code}
>  
> h2. Kafka setup
> Configure the chroot in broker.properties:
>  
> {code:java}
> zookeeper.connect=localhost:2181/somechroot{code}
>  
>  
> h2. Expected behavior
> The expected behavior here is that Kafka will use the chroot without issues.
> h2. Actual result
> Kafka fails to start with a fatal exception:
> {code:java}
> org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = 
> NoAuth for /chroot
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:120)
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
> at kafka.zookeeper.AsyncResponse.maybeThrow(ZooKeeperClient.scala:583)
> at kafka.zk.KafkaZkClient.createRecursive(KafkaZkClient.scala:1729)
> at 
> kafka.zk.KafkaZkClient.makeSurePersistentPathExists(KafkaZkClient.scala:1627)
> at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1957)
> at 
> kafka.zk.ZkClientAclTest.testChrootExistsAndRootIsLocked(ZkClientAclTest.scala:60)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] KIP-744: Migrate TaskMetadata to interface with internal implementation

2021-06-01 Thread Josep Prat
Hi Bruno,

Thanks for your feedback. I think it makes sense to add the hasCode() and
equals() definitions with its guarantees on the interface level.

I'm also inclined to add the ThreadMetadata conversion to an interface into
this KIP. Implementation-wise is not that different. And on API level
modification is at a similar complexity as the current proposed change.
I'll shortly update the KIP.

Best,



On Tue, Jun 1, 2021 at 3:09 PM Bruno Cadonna  wrote:

> Hi Josep,
>
> Thank you for the KIP!
>
> Specifying TaskMetadata as an interface lifts the guarantee that
> equals() and hashCode() on the TaskMetadata return consistent values for
> the same content. Note that the current TaskMetadata implements both
> methods, so theoretically users might already rely on them.
>
> The List interface in Java defines its equals()[1] and hashCode()[2]
> methods specifically and all correct implementations must adhere to that
> definition.
>
> Maybe it makes sense to add such a guarantee to the TaskMetadata
> interface on which users can rely similarly to the Java's List interface.
>
> Regarding your question about ThreadMetadata, I had the same thought. I
> would add it to the KIP unless the extended KIP risks not to be accepted
> before KIP freeze for 3.0.
>
> Best,
> Bruno
>
>
> [1]
>
> https://docs.oracle.com/javase/8/docs/api/java/util/List.html#equals-java.lang.Object-
> [2]
> https://docs.oracle.com/javase/8/docs/api/java/util/List.html#hashCode--
>
>
>
>
> On 31.05.21 12:36, Josep Prat wrote:
> > I'm starting to wonder if we should make ThreadMetadata follow the same
> > path as TaskMetadata and be converted to an interface + internal
> > implementation. What are your thoughts?
> >
> >
> > Thanks in advance,
> > Josep
> >
> >
> >
> > On Mon, May 31, 2021 at 12:10 PM Josep Prat  wrote:
> >
> >> Hi there,
> >> While looking closely at the classes affected by this change, I realized
> >> that ThreadMetadata has a reference in its public API to the soon to be
> >> deprecated TaskMetadata. For this reason, I updated the KIP to reflect
> that
> >> 2 more changes are needed for this KIP:
> >> ThreadMetadata:
> >> - Deprecate activeTasks() and standbyTasks()
> >> - Add getActiveTasks() and getStandbyTasks() that return the new
> Interface
> >> instead.
> >>
> >> I also wrote it on the KIP, but this thing crossed my mind, do we need
> to
> >> keep source compatibility for this particular change?
> >>
> >>
> >> I would be really grateful if you could provide any feedback on the KIP
> (
> >> https://cwiki.apache.org/confluence/x/XIrOCg)
> >>
> >> Thanks in advance,
> >>
> >> On Fri, May 28, 2021 at 10:24 AM Josep Prat 
> wrote:
> >>
> >>> Hi there,
> >>> I updated the KIP page with Sophie's feedback. As she already
> mentioned,
> >>> the intention would be to include this KIP in release 3.0.0 so we can
> avoid
> >>> a deprecation cycle for the getTaskID method introduced in KIP-740, I
> hope
> >>> I managed to capture this in the KIP description.
> >>>
> >>> Just adding the link again for convenience:
> >>> https://cwiki.apache.org/confluence/x/XIrOCg
> >>>
> >>> Thanks in advance,
> >>>
> >>> On Thu, May 27, 2021 at 10:08 PM Josep Prat 
> wrote:
> >>>
>  Hi Sophie,
> 
>  Thanks for the feedback, I'll update the KIP tomorrow with your
>  feedback. They are all good points, and you are right, my phrasing
> could be
>  misleading.
> 
> 
>  Best,
> 
>  On Thu, May 27, 2021 at 10:02 PM Sophie Blee-Goldman
>   wrote:
> 
> > Thanks for the KIP! I'm on board with the overall proposal, just a
> few
> > comments:
> >
> > 1) The motivation section says
> >
> > TaskMetadata should have never been a class available for the general
> >> public, but more of an internal class
> >
> >
> > which is a bit misleading as it seems to imply that TaskMetadata
> itself
> > was
> > never meant to be part of the public API
> > at all. It might be better to phrase this as "TaskMetadata was never
> > intended to be a public class that a user might
> > need to instantiate, but rather an API for exposing metadata which is
> > better served as an interface" --- or something
> > to that effect.
> >
> > 2) You touch on this in a later section, but it would be good to call
> > out
> > directly in the *Public Interfaces* section that
> > you are proposing to remove the `public TaskId getTaskId()` method
> that
> > we
> > added in KIP-740. Also I just want to
> > note that to do so will require getting this KIP into 3.0, otherwise
> > we'll
> > need to go through a deprecation cycle for
> > that API. I don't anticipate this being a problem as KIP freeze is
> still
> > two weeks away, but it would be good to clarify.
> >
> > 3) nit: we should put the new internal implementation class under
> > the org.apache.kafka.streams.processor.internals
> > package instead of under org.apache.k

[jira] [Resolved] (KAFKA-12519) Consider Removing Streams Old Built-in Metrics Version

2021-06-01 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna resolved KAFKA-12519.
---
Resolution: Fixed

> Consider Removing Streams Old Built-in Metrics Version 
> ---
>
> Key: KAFKA-12519
> URL: https://issues.apache.org/jira/browse/KAFKA-12519
> Project: Kafka
>  Issue Type: Task
>  Components: streams
>Reporter: Bruno Cadonna
>Assignee: Bruno Cadonna
>Priority: Blocker
>  Labels: needs-kip
> Fix For: 3.0.0
>
>
> We refactored the Streams' built-in metrics in KIP-444 and the new structure 
> was released in 2.5. We should consider removing the old structure in the 
> upcoming 3.0 release. This would give us the opportunity to simplify the code 
> around the built in metrics since we would not need to consider different 
> versions anymore.   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] KIP-739: Block Less on KafkaProducer#send

2021-06-01 Thread Nakamura
Hi Colin,

Sorry, I still don't follow.

Right now `KafkaProducer#send` seems to trigger a metadata fetch.  Today,
we block on that before returning.  Is your proposal that we move the
metadata fetch out of `KafkaProducer#send` entirely?

Even if the metadata fetch moves to be non-blocking, I think we still need
to deal with the problems we've discussed before if the fetch happens in
the `KafkaProducer#send` method.  How do we maintain the ordering semantics
of `KafkaProducer#send`?  How do we prevent our buffer from filling up?
Which thread is responsible for checking poll()?

The only approach I can see that would avoid this would be moving the
metadata fetch to happen at a different time.  But it's not clear to me
when would be a more appropriate time to do the metadata fetch than
`KafkaProducer#send`.

I think there's something I'm missing here.  Would you mind helping me
figure out what it is?

Best,
Moses

On Sun, May 30, 2021 at 5:35 PM Colin McCabe  wrote:

> On Tue, May 25, 2021, at 11:26, Nakamura wrote:
> > Hey Colin,
> >
> > For the metadata case, what would fixing the bug look like?  I agree that
> > we should fix it, but I don't have a clear picture in my mind of what
> > fixing it should look like.  Can you elaborate?
> >
>
> If the blocking metadata fetch bug were fixed, neither the producer nor
> the consumer would block while fetching metadata. A poll() call would
> initiate a metadata fetch if needed, and a subsequent call to poll() would
> handle the results if needed. Basically the same paradigm we use for other
> network communication in the producer and consumer.
>
> best,
> Colin
>
> > Best,
> > Moses
> >
> > On Mon, May 24, 2021 at 1:54 PM Colin McCabe  wrote:
> >
> > > Hi all,
> > >
> > > I agree that we should give users the option of having a fully async
> API,
> > > but I don't think external thread pools or queues are the right
> direction
> > > to go here. They add performance overheads and don't address the root
> > > causes of the problem.
> > >
> > > There are basically two scenarios where we block, currently. One is
> when
> > > we are doing a metadata fetch. I think this is clearly a bug, or at
> least
> > > an implementation limitation. From the user's point of view, the fact
> that
> > > we are doing a metadata fetch is an implementation detail that really
> > > shouldn't be exposed like this. We have talked about fixing this in the
> > > past. I think we just should spend the time to do it.
> > >
> > > The second scenario is where the client has produced too much data in
> too
> > > little time. This could happen if there is a network glitch, or the
> server
> > > is slower than expected. In this case, the behavior is intentional and
> not
> > > a bug. To understand this, think about what would happen if we didn't
> > > block. We would start buffering more and more data in memory, until
> finally
> > > the application died with an out of memory error. That would be
> frustrating
> > > for users and wouldn't add to the usability of Kafka.
> > >
> > > We could potentially have an option to handle the out-of-memory
> scenario
> > > differently by returning an error code immediately rather than
> blocking.
> > > Applications would have to be rewritten to handle this properly, but
> it is
> > > a possibility. I suspect that most of them wouldn't use this, but we
> could
> > > offer it as a possibility for async purists (which might include
> certain
> > > frameworks). The big problem the users would have to solve is what to
> do
> > > with the record that they were unable to produce due to the buffer full
> > > issue.
> > >
> > > best,
> > > Colin
> > >
> > >
> > > On Thu, May 20, 2021, at 10:35, Nakamura wrote:
> > > > >
> > > > > My suggestion was just do this in multiple steps/phases, firstly
> let's
> > > fix
> > > > > the issue of send being misleadingly asynchronous (i.e. internally
> its
> > > > > blocking) and then later one we can make the various
> > > > > threadpools configurable with a sane default.
> > > >
> > > > I like that approach. I updated the "Which thread should be
> responsible
> > > for
> > > > waiting" part of KIP-739 to add your suggestion as my recommended
> > > approach,
> > > > thank you!  If no one else has major concerns about that approach,
> I'll
> > > > move the alternatives to "rejected alternatives".
> > > >
> > > > On Thu, May 20, 2021 at 7:26 AM Matthew de Detrich
> > > >  wrote:
> > > >
> > > > > @
> > > > >
> > > > > Nakamura
> > > > > On Wed, May 19, 2021 at 7:35 PM Nakamura  wrote:
> > > > >
> > > > > > @Ryanne:
> > > > > > In my mind's eye I slightly prefer the throwing the "cannot
> enqueue"
> > > > > > exception to satisfying the future immediately with the "cannot
> > > enqueue"
> > > > > > exception?  But I agree, it would be worth doing more research.
> > > > > >
> > > > > > @Matthew:
> > > > > >
> > > > > > > 3. Using multiple thread pools is definitely recommended for
> > > different
> > > > > > > types of tasks, for seriali

[jira] [Reopened] (KAFKA-12598) Remove deprecated --zookeeper in ConfigCommand

2021-06-01 Thread Ismael Juma (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma reopened KAFKA-12598:
-

> Remove deprecated --zookeeper in ConfigCommand
> --
>
> Key: KAFKA-12598
> URL: https://issues.apache.org/jira/browse/KAFKA-12598
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Luke Chen
>Assignee: Luke Chen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12872) KIP-724: Drop support for message formats v0 and v1

2021-06-01 Thread Ismael Juma (Jira)
Ismael Juma created KAFKA-12872:
---

 Summary: KIP-724: Drop support for message formats v0 and v1
 Key: KAFKA-12872
 URL: https://issues.apache.org/jira/browse/KAFKA-12872
 Project: Kafka
  Issue Type: Bug
Reporter: Ismael Juma
Assignee: Ismael Juma






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[DISCUSS] KIP-724: Drop support for message formats v0 and v1

2021-06-01 Thread Ismael Juma
Hi all,

It's time to start the process of sunsetting message formats v0 and v1 in
order to establish a new baseline in terms of supported client/broker
behavior and to improve maintainability & supportability of Kafka. Please
take a look at the proposal:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-724%3A+Drop+support+for+message+formats+v0+and+v1

Ismael


Re: [DISCUSS] KIP-618: Atomic commit of source connector records and offsets

2021-06-01 Thread Chris Egerton
Hi Gunnar,

Thanks for taking a look! I've addressed the low-hanging fruit in the KIP;
responses to other comments inline here:

> * TransactionContext: What's the use case for the methods accepting a
source record (commitTransaction(SourceRecord
record), abortTransaction(SourceRecord record))?

This allows developers to decouple transaction boundaries from record
batches. If a connector has a configuration that dictates how often it
returns from "SourceTask::poll", for example, it may be easier to define
multiple transactions within a single batch or a single transaction across
several batches than to retrofit the connector's poll logic to work with
transaction boundaries.

> * SourceTaskContext: Instead of guarding against NSME, is there a way for
a
connector to query the KC version and thus derive its capabilities? Going
forward, a generic API for querying capabilities could be nice, so a
connector can query for capabilities of the runtime in a safe and
compatible way.

This would be a great quality-of-life improvement for connector and
framework developers alike, but I think it may be best left for a separate
KIP. The current approach, clunky though it may be, seems like a nuisance
at worst. It's definitely worth addressing but I'm not sure we have the
time to think through all the details thoroughly enough in time for the
upcoming KIP freeze.

> * SourceConnector: Would it make sense to merge the two methods perhaps
and
return one enum of { SUPPORTED, NOT_SUPPORTED, SUPPORTED_WITH_BOUNDARIES }?

Hmm... at first glance I like the idea of merging the two methods a lot.
The one thing that gives me pause is that there may be connectors that
would like to define their own transaction boundaries without providing
exactly-once guarantees. We could add UNSUPPORTED_WITH_BOUNDARIES to
accommodate that, but then, it might actually be simpler to keep the two
methods separate in case we add some third variable to the mix that would
also have to be reflected in the possible ExactlyOnceSupport enum values.

> Or, alternatively return an enum from canDefineTransactionBoundaries(),
too; even if it only has two values now, that'd allow for extension in the
future

This is fine by me; we just have to figure out exactly which enum values
would be suitable. It's a little clunky but right now I'm toying with
something like "ConnectorDefinedTransactionBoundaries" with values of
"SUPPORTED" and "NOT_SUPPORTED" and a default of "NOT_SUPPORTED". If we
need more granularity in the future then we can deprecate one or both of
them and add new values. Thoughts?

> And one general question: in Debezium, we have some connectors that
produce
records "out-of-bands" to a schema history topic via their own custom
producer. Is there any way envisionable where such a producer would
participate in the transaction managed by the KC runtime environment?

To answer the question exactly as asked: no; transactions cannot be shared
across producers and until/unless that is changed (which seems unlikely)
this won't be possible. However, I'm curious why a source connector would
spin up its own producer instead of using "SourceTask::poll" to provide
records to Connect. Is it easier to consume from that topic when the
connector can define its own (de)serialization format? I'm optimistic that
if we understand the use case for the separate producer we may still be
able to help bridge the gap here, one way or another.

> One follow-up question after thinking some more about this; is there any
limit in terms of duration or size of in-flight, connector-controlled
transactions? In case of Debezium for instance, there may be cases where we
tail the TX log from an upstream source database, not knowing whether the
events we receive belong to a committed or aborted transaction. Would it be
valid to emit all these events via a transactional task, and in case we
receive a ROLLBACK event eventually, to abort the pending Kafka
transaction? Such source transactions could be running for a long time
potentially, e.g. hours or days (at least in theory). Or would this sort of
usage not be considered a reasonable one?

I think the distinction between reasonable and unreasonable usage here is
likely dependent on use cases that people are trying to satisfy with their
connector, but if I had to guess, I'd say that a different approach is
probably warranted in most cases if the transaction spans across entire
days at a time. If there's no concern about data not being visible to
downstream consumers until its transaction is committed, and the number of
records in the transaction isn't so large that the amount of memory
required to buffer them all locally on a consumer before delivering them to
the downstream application is reasonable, it would technically be possible
though. Connect users would have to be mindful of the following:

- A separate offsets topic for the connector would be highly recommended in
order to avoid crippling other connectors with hanging transactions

Re: [DISCUSS] KIP-744: Migrate TaskMetadata to interface with internal implementation

2021-06-01 Thread Josep Prat
Hi all,

I managed to update the KIP to include the changes Bruno proposed and also
extended it to include the migration of ThreadMetadata to an interface plus
internal implementation. Please share any feedback or problems you might
see with this approach.

As the deadline for KIPs approaches, I plan to submit this KIP for voting
on Friday, unless issues arise.

Let me know if this makes sense, as this is my first KIP.

Thanks in advance.

On Tue, Jun 1, 2021 at 3:43 PM Josep Prat  wrote:

> Hi Bruno,
>
> Thanks for your feedback. I think it makes sense to add the hasCode() and
> equals() definitions with its guarantees on the interface level.
>
> I'm also inclined to add the ThreadMetadata conversion to an interface
> into this KIP. Implementation-wise is not that different. And on API level
> modification is at a similar complexity as the current proposed change.
> I'll shortly update the KIP.
>
> Best,
>
>
>
> On Tue, Jun 1, 2021 at 3:09 PM Bruno Cadonna  wrote:
>
>> Hi Josep,
>>
>> Thank you for the KIP!
>>
>> Specifying TaskMetadata as an interface lifts the guarantee that
>> equals() and hashCode() on the TaskMetadata return consistent values for
>> the same content. Note that the current TaskMetadata implements both
>> methods, so theoretically users might already rely on them.
>>
>> The List interface in Java defines its equals()[1] and hashCode()[2]
>> methods specifically and all correct implementations must adhere to that
>> definition.
>>
>> Maybe it makes sense to add such a guarantee to the TaskMetadata
>> interface on which users can rely similarly to the Java's List interface.
>>
>> Regarding your question about ThreadMetadata, I had the same thought. I
>> would add it to the KIP unless the extended KIP risks not to be accepted
>> before KIP freeze for 3.0.
>>
>> Best,
>> Bruno
>>
>>
>> [1]
>>
>> https://docs.oracle.com/javase/8/docs/api/java/util/List.html#equals-java.lang.Object-
>> [2]
>> https://docs.oracle.com/javase/8/docs/api/java/util/List.html#hashCode--
>>
>>
>>
>>
>> On 31.05.21 12:36, Josep Prat wrote:
>> > I'm starting to wonder if we should make ThreadMetadata follow the same
>> > path as TaskMetadata and be converted to an interface + internal
>> > implementation. What are your thoughts?
>> >
>> >
>> > Thanks in advance,
>> > Josep
>> >
>> >
>> >
>> > On Mon, May 31, 2021 at 12:10 PM Josep Prat 
>> wrote:
>> >
>> >> Hi there,
>> >> While looking closely at the classes affected by this change, I
>> realized
>> >> that ThreadMetadata has a reference in its public API to the soon to be
>> >> deprecated TaskMetadata. For this reason, I updated the KIP to reflect
>> that
>> >> 2 more changes are needed for this KIP:
>> >> ThreadMetadata:
>> >> - Deprecate activeTasks() and standbyTasks()
>> >> - Add getActiveTasks() and getStandbyTasks() that return the new
>> Interface
>> >> instead.
>> >>
>> >> I also wrote it on the KIP, but this thing crossed my mind, do we need
>> to
>> >> keep source compatibility for this particular change?
>> >>
>> >>
>> >> I would be really grateful if you could provide any feedback on the
>> KIP (
>> >> https://cwiki.apache.org/confluence/x/XIrOCg)
>> >>
>> >> Thanks in advance,
>> >>
>> >> On Fri, May 28, 2021 at 10:24 AM Josep Prat 
>> wrote:
>> >>
>> >>> Hi there,
>> >>> I updated the KIP page with Sophie's feedback. As she already
>> mentioned,
>> >>> the intention would be to include this KIP in release 3.0.0 so we can
>> avoid
>> >>> a deprecation cycle for the getTaskID method introduced in KIP-740, I
>> hope
>> >>> I managed to capture this in the KIP description.
>> >>>
>> >>> Just adding the link again for convenience:
>> >>> https://cwiki.apache.org/confluence/x/XIrOCg
>> >>>
>> >>> Thanks in advance,
>> >>>
>> >>> On Thu, May 27, 2021 at 10:08 PM Josep Prat 
>> wrote:
>> >>>
>>  Hi Sophie,
>> 
>>  Thanks for the feedback, I'll update the KIP tomorrow with your
>>  feedback. They are all good points, and you are right, my phrasing
>> could be
>>  misleading.
>> 
>> 
>>  Best,
>> 
>>  On Thu, May 27, 2021 at 10:02 PM Sophie Blee-Goldman
>>   wrote:
>> 
>> > Thanks for the KIP! I'm on board with the overall proposal, just a
>> few
>> > comments:
>> >
>> > 1) The motivation section says
>> >
>> > TaskMetadata should have never been a class available for the
>> general
>> >> public, but more of an internal class
>> >
>> >
>> > which is a bit misleading as it seems to imply that TaskMetadata
>> itself
>> > was
>> > never meant to be part of the public API
>> > at all. It might be better to phrase this as "TaskMetadata was never
>> > intended to be a public class that a user might
>> > need to instantiate, but rather an API for exposing metadata which
>> is
>> > better served as an interface" --- or something
>> > to that effect.
>> >
>> > 2) You touch on this in a later section, but it would be good to
>> call
>> > out
>> 

Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #183

2021-06-01 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 472401 lines...]
[2021-06-01T16:36:26.926Z] [INFO] Parameter: package, Value: myapps
[2021-06-01T16:36:26.926Z] [INFO] Parameter: packageInPathFormat, Value: myapps
[2021-06-01T16:36:26.926Z] [INFO] Parameter: package, Value: myapps
[2021-06-01T16:36:26.926Z] [INFO] Parameter: version, Value: 0.1
[2021-06-01T16:36:26.926Z] [INFO] Parameter: groupId, Value: streams.examples
[2021-06-01T16:36:26.926Z] [INFO] Parameter: artifactId, Value: streams.examples
[2021-06-01T16:36:26.926Z] [INFO] Project created from Archetype in dir: 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/quickstart/test-streams-archetype/streams.examples
[2021-06-01T16:36:26.926Z] [INFO] 

[2021-06-01T16:36:26.926Z] [INFO] BUILD SUCCESS
[2021-06-01T16:36:26.926Z] [INFO] 

[2021-06-01T16:36:26.926Z] [INFO] Total time:  1.957 s
[2021-06-01T16:36:26.926Z] [INFO] Finished at: 2021-06-01T16:36:26Z
[2021-06-01T16:36:26.926Z] [INFO] 

[Pipeline] dir
[2021-06-01T16:36:28.527Z] Running in 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/quickstart/test-streams-archetype/streams.examples
[Pipeline] {
[Pipeline] sh
[2021-06-01T16:36:29.369Z] 
[2021-06-01T16:36:29.369Z] SaslSslAdminIntegrationTest > 
testCreateTopicsResponseMetadataAndConfig() PASSED
[2021-06-01T16:36:29.369Z] 
[2021-06-01T16:36:29.369Z] SaslSslAdminIntegrationTest > 
testAttemptToCreateInvalidAcls() STARTED
[2021-06-01T16:36:31.383Z] + mvn compile
[2021-06-01T16:36:33.525Z] [INFO] Scanning for projects...
[2021-06-01T16:36:33.525Z] [INFO] 
[2021-06-01T16:36:33.525Z] [INFO] -< 
streams.examples:streams.examples >--
[2021-06-01T16:36:33.525Z] [INFO] Building Kafka Streams Quickstart :: Java 0.1
[2021-06-01T16:36:33.525Z] [INFO] [ jar 
]-
[2021-06-01T16:36:33.525Z] [INFO] 
[2021-06-01T16:36:33.525Z] [INFO] --- maven-resources-plugin:2.6:resources 
(default-resources) @ streams.examples ---
[2021-06-01T16:36:33.525Z] [INFO] Using 'UTF-8' encoding to copy filtered 
resources.
[2021-06-01T16:36:33.525Z] [INFO] Copying 1 resource
[2021-06-01T16:36:33.525Z] [INFO] 
[2021-06-01T16:36:33.525Z] [INFO] --- maven-compiler-plugin:3.1:compile 
(default-compile) @ streams.examples ---
[2021-06-01T16:36:34.593Z] [INFO] Changes detected - recompiling the module!
[2021-06-01T16:36:34.593Z] [INFO] Compiling 3 source files to 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/quickstart/test-streams-archetype/streams.examples/target/classes
[2021-06-01T16:36:35.662Z] [INFO] 

[2021-06-01T16:36:35.662Z] [INFO] BUILD SUCCESS
[2021-06-01T16:36:35.662Z] [INFO] 

[2021-06-01T16:36:35.662Z] [INFO] Total time:  2.876 s
[2021-06-01T16:36:35.662Z] [INFO] Finished at: 2021-06-01T16:36:35Z
[2021-06-01T16:36:35.662Z] [INFO] 

[Pipeline] }
[Pipeline] // dir
[Pipeline] }
[Pipeline] // dir
[Pipeline] }
[Pipeline] // dir
[Pipeline] }
[Pipeline] // withEnv
[Pipeline] }
[Pipeline] // withEnv
[Pipeline] }
[Pipeline] // withEnv
[Pipeline] }
[Pipeline] // node
[Pipeline] }
[Pipeline] // timestamps
[Pipeline] }
[Pipeline] // timeout
[Pipeline] }
[Pipeline] // stage
[Pipeline] }
[2021-06-01T16:36:56.324Z] 
[2021-06-01T16:36:56.324Z] SaslSslAdminIntegrationTest > 
testAttemptToCreateInvalidAcls() PASSED
[2021-06-01T16:36:56.324Z] 
[2021-06-01T16:36:56.324Z] SaslSslAdminIntegrationTest > 
testAclAuthorizationDenied() STARTED
[2021-06-01T16:37:27.304Z] 
[2021-06-01T16:37:27.304Z] SaslSslAdminIntegrationTest > 
testAclAuthorizationDenied() PASSED
[2021-06-01T16:37:27.304Z] 
[2021-06-01T16:37:27.304Z] SaslSslAdminIntegrationTest > testAclOperations() 
STARTED
[2021-06-01T16:37:50.478Z] 
[2021-06-01T16:37:50.478Z] SaslSslAdminIntegrationTest > testAclOperations() 
PASSED
[2021-06-01T16:37:50.478Z] 
[2021-06-01T16:37:50.478Z] SaslSslAdminIntegrationTest > testAclOperations2() 
STARTED
[2021-06-01T16:38:17.064Z] 
[2021-06-01T16:38:17.064Z] SaslSslAdminIntegrationTest > testAclOperations2() 
PASSED
[2021-06-01T16:38:17.064Z] 
[2021-06-01T16:38:17.064Z] SaslSslAdminIntegrationTest > testAclDelete() STARTED
[2021-06-01T16:38:43.786Z] 
[2021-06-01T16:38:43.786Z] SaslSslAdminIntegrationTest > testAclDelete() PASSED
[2021-06-01T16:38:43.786Z] 
[2021-06-01T16:38:43.786Z] TransactionsTest > testBumpTransactionalEpoch() 
STARTED
[2021-06-01T16:39:00.353Z] 
[2021-06-01T16:39:00.353Z] TransactionsTest > testBumpTransactionalEpoc

Re: [DISCUSS] KIP-739: Block Less on KafkaProducer#send

2021-06-01 Thread Colin McCabe
On Tue, Jun 1, 2021, at 07:00, Nakamura wrote:
> Hi Colin,
> 
> Sorry, I still don't follow.
> 
> Right now `KafkaProducer#send` seems to trigger a metadata fetch.  Today,
> we block on that before returning.  Is your proposal that we move the
> metadata fetch out of `KafkaProducer#send` entirely?
> 

KafkaProducer#send is supposed to initiate non-blocking I/O, but not wait for 
it to complete.

There's more information about non-blocking I/O in Java here: 
https://en.wikipedia.org/wiki/Non-blocking_I/O_%28Java%29

>
> Even if the metadata fetch moves to be non-blocking, I think we still need
> to deal with the problems we've discussed before if the fetch happens in
> the `KafkaProducer#send` method.  How do we maintain the ordering semantics
> of `KafkaProducer#send`?

How are the ordering semantics of `KafkaProducer#send` related to the metadata 
fetch?

>  How do we prevent our buffer from filling up?

That is not related to the metadata fetch. Also, I already proposed a solution 
(returning an error) if this is a concern.

> Which thread is responsible for checking poll()?

The same client thread that always has been responsible for checking poll.

> 
> The only approach I can see that would avoid this would be moving the
> metadata fetch to happen at a different time.  But it's not clear to me
> when would be a more appropriate time to do the metadata fetch than
> `KafkaProducer#send`.
> 

It's not about moving the metadata fetch to happen at a different time. It's 
about using non-blocking I/O, like we do for other network I/O. (And actually, 
if you want to get really technical, we do this for the metadata fetch too, 
it's just that we have a hack that loops to transform it back into blocking 
I/O.)

best,
Colin

> I think there's something I'm missing here.  Would you mind helping me
> figure out what it is?
> 
> Best,
> Moses
> 
> On Sun, May 30, 2021 at 5:35 PM Colin McCabe  wrote:
> 
> > On Tue, May 25, 2021, at 11:26, Nakamura wrote:
> > > Hey Colin,
> > >
> > > For the metadata case, what would fixing the bug look like?  I agree that
> > > we should fix it, but I don't have a clear picture in my mind of what
> > > fixing it should look like.  Can you elaborate?
> > >
> >
> > If the blocking metadata fetch bug were fixed, neither the producer nor
> > the consumer would block while fetching metadata. A poll() call would
> > initiate a metadata fetch if needed, and a subsequent call to poll() would
> > handle the results if needed. Basically the same paradigm we use for other
> > network communication in the producer and consumer.
> >
> > best,
> > Colin
> >
> > > Best,
> > > Moses
> > >
> > > On Mon, May 24, 2021 at 1:54 PM Colin McCabe  wrote:
> > >
> > > > Hi all,
> > > >
> > > > I agree that we should give users the option of having a fully async
> > API,
> > > > but I don't think external thread pools or queues are the right
> > direction
> > > > to go here. They add performance overheads and don't address the root
> > > > causes of the problem.
> > > >
> > > > There are basically two scenarios where we block, currently. One is
> > when
> > > > we are doing a metadata fetch. I think this is clearly a bug, or at
> > least
> > > > an implementation limitation. From the user's point of view, the fact
> > that
> > > > we are doing a metadata fetch is an implementation detail that really
> > > > shouldn't be exposed like this. We have talked about fixing this in the
> > > > past. I think we just should spend the time to do it.
> > > >
> > > > The second scenario is where the client has produced too much data in
> > too
> > > > little time. This could happen if there is a network glitch, or the
> > server
> > > > is slower than expected. In this case, the behavior is intentional and
> > not
> > > > a bug. To understand this, think about what would happen if we didn't
> > > > block. We would start buffering more and more data in memory, until
> > finally
> > > > the application died with an out of memory error. That would be
> > frustrating
> > > > for users and wouldn't add to the usability of Kafka.
> > > >
> > > > We could potentially have an option to handle the out-of-memory
> > scenario
> > > > differently by returning an error code immediately rather than
> > blocking.
> > > > Applications would have to be rewritten to handle this properly, but
> > it is
> > > > a possibility. I suspect that most of them wouldn't use this, but we
> > could
> > > > offer it as a possibility for async purists (which might include
> > certain
> > > > frameworks). The big problem the users would have to solve is what to
> > do
> > > > with the record that they were unable to produce due to the buffer full
> > > > issue.
> > > >
> > > > best,
> > > > Colin
> > > >
> > > >
> > > > On Thu, May 20, 2021, at 10:35, Nakamura wrote:
> > > > > >
> > > > > > My suggestion was just do this in multiple steps/phases, firstly
> > let's
> > > > fix
> > > > > > the issue of send being misleadingly asynchronous (i.e. internally

[jira] [Created] (KAFKA-12873) Log truncation due to divergence should also remove snapshots

2021-06-01 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-12873:
--

 Summary: Log truncation due to divergence should also remove 
snapshots
 Key: KAFKA-12873
 URL: https://issues.apache.org/jira/browse/KAFKA-12873
 Project: Kafka
  Issue Type: Sub-task
  Components: log
Reporter: Jose Armando Garcia Sancio


It should not be possible for log truncation to truncate past the 
high-watermark and we know that snapshots are less than the high-watermark.

Having said that I think we should add code that removes any snapshot that is 
greater than the log end offset after a log truncation. Currently the code that 
does log truncation is in `KafkaMetadataLog::truncateTo`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-7749) confluent does not provide option to set consumer properties at connector level

2021-06-01 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch resolved KAFKA-7749.
--
Fix Version/s: 2.3.0
   Resolution: Fixed

[KIP-458|https://cwiki.apache.org/confluence/display/KAFKA/KIP-458%3A+Connector+Client+Config+Override+Policy]
 introduced in AK 2.3.0 added support for connector-specific client overrides 
like the one described here.

Marking as resolved.

> confluent does not provide option to set consumer properties at connector 
> level
> ---
>
> Key: KAFKA-7749
> URL: https://issues.apache.org/jira/browse/KAFKA-7749
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Manjeet Duhan
>Priority: Major
> Fix For: 2.3.0
>
>
> _We want to increase consumer.max.poll.record to increase performance but 
> this  value can only be set in worker properties which is applicable to all 
> connectors given cluster._
>  __ 
> _Operative Situation :- We have one project which is communicating with 
> Elasticsearch and we set consumer.max.poll.record=500 after multiple 
> performance tests which worked fine for an year._
>  _Then one more project onboarded in the same cluster which required 
> consumer.max.poll.record=5000 based on their performance tests. This 
> configuration is moved to production._
>   _Admetric started failing as it was taking more than 5 minutes to process 
> 5000 polled records and started throwing commitfailed exception which is 
> vicious cycle as it will process same data over and over again._
>  __ 
> _We can control above if start consumer using plain java but this control was 
> not available at each consumer level in confluent connector._
> _I have overridden kafka code to accept connector properties which will be 
> applied to single connector and others will keep on using default properties 
> . These changes are already running in production for more than 5 months._
> _Some of the properties which were useful for us._
> max.poll.records
> max.poll.interval.ms
> request.timeout.ms
> key.deserializer
> value.deserializer
> heartbeat.interval.ms
> session.timeout.ms
> auto.offset.reset
> connections.max.idle.ms
> enable.auto.commit
>  
> auto.commit.interval.ms
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12874) Increase default consumer session timeout to 40s (KIP-735)

2021-06-01 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-12874:
---

 Summary: Increase default consumer session timeout to 40s (KIP-735)
 Key: KAFKA-12874
 URL: https://issues.apache.org/jira/browse/KAFKA-12874
 Project: Kafka
  Issue Type: Improvement
Reporter: Jason Gustafson
Assignee: Jason Gustafson


As documented in KIP-735, we will increase the default session timeout to 40s: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-735%3A+Increase+default+consumer+session+timeout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] KIP-739: Block Less on KafkaProducer#send

2021-06-01 Thread Ryanne Dolan
Colin, the issue for me isn't so much whether non-blocking I/O is used or
not, but the fact that the caller observes a long time between calling
send() and receiving the returned future. This behavior can be considered
"blocking" whether or not I/O is involved.

> How are the ordering semantics of `KafkaProducer#send` related to the
metadata fetch?
> I already proposed a solution (returning an error)

There is a subtle difference between failing immediately vs blocking for
metadata, related to ordering in the face of retries. Say we set the send
timeout to max-long (or something high enough that we rarely encounter
timeouts in practice), and set max inflight requests to 1. Today, we can
reasonably assume that calling send() in sequence to a specific partition
will result in the corresponding sequence landing on that partition,
regardless of how the caller handles retries. The caller might not handle
retries at all. But if we can fail immediately (e.g. when the metadata
isn't yet ready), then the caller must handle retries carefully.
Specifically, the caller must retry each send() before proceeding to the
next. This basically means that the caller must block on each send() in
order to maintain the proper sequence -- how else would the caller know
whether it will need to retry or not?

In other words, failing immediately punts the problem to the caller to
handle, while the caller is less-equipped to deal with it. I don't think we
should do that, at least not in the default case.

I actually don't have any objections to this approach so long as it's
opt-in. It sounds like you are suggesting to fix the bug for everyone, but
I don't think we can do that without subtly breaking things.

Ryanne

On Tue, Jun 1, 2021 at 12:31 PM Colin McCabe  wrote:

> On Tue, Jun 1, 2021, at 07:00, Nakamura wrote:
> > Hi Colin,
> >
> > Sorry, I still don't follow.
> >
> > Right now `KafkaProducer#send` seems to trigger a metadata fetch.  Today,
> > we block on that before returning.  Is your proposal that we move the
> > metadata fetch out of `KafkaProducer#send` entirely?
> >
>
> KafkaProducer#send is supposed to initiate non-blocking I/O, but not wait
> for it to complete.
>
> There's more information about non-blocking I/O in Java here:
> https://en.wikipedia.org/wiki/Non-blocking_I/O_%28Java%29
>
> >
> > Even if the metadata fetch moves to be non-blocking, I think we still
> need
> > to deal with the problems we've discussed before if the fetch happens in
> > the `KafkaProducer#send` method.  How do we maintain the ordering
> semantics
> > of `KafkaProducer#send`?
>
> How are the ordering semantics of `KafkaProducer#send` related to the
> metadata fetch?
>
> >  How do we prevent our buffer from filling up?
>
> That is not related to the metadata fetch. Also, I already proposed a
> solution (returning an error) if this is a concern.
>
> > Which thread is responsible for checking poll()?
>
> The same client thread that always has been responsible for checking poll.
>
> >
> > The only approach I can see that would avoid this would be moving the
> > metadata fetch to happen at a different time.  But it's not clear to me
> > when would be a more appropriate time to do the metadata fetch than
> > `KafkaProducer#send`.
> >
>
> It's not about moving the metadata fetch to happen at a different time.
> It's about using non-blocking I/O, like we do for other network I/O. (And
> actually, if you want to get really technical, we do this for the metadata
> fetch too, it's just that we have a hack that loops to transform it back
> into blocking I/O.)
>
> best,
> Colin
>
> > I think there's something I'm missing here.  Would you mind helping me
> > figure out what it is?
> >
> > Best,
> > Moses
> >
> > On Sun, May 30, 2021 at 5:35 PM Colin McCabe  wrote:
> >
> > > On Tue, May 25, 2021, at 11:26, Nakamura wrote:
> > > > Hey Colin,
> > > >
> > > > For the metadata case, what would fixing the bug look like?  I agree
> that
> > > > we should fix it, but I don't have a clear picture in my mind of what
> > > > fixing it should look like.  Can you elaborate?
> > > >
> > >
> > > If the blocking metadata fetch bug were fixed, neither the producer nor
> > > the consumer would block while fetching metadata. A poll() call would
> > > initiate a metadata fetch if needed, and a subsequent call to poll()
> would
> > > handle the results if needed. Basically the same paradigm we use for
> other
> > > network communication in the producer and consumer.
> > >
> > > best,
> > > Colin
> > >
> > > > Best,
> > > > Moses
> > > >
> > > > On Mon, May 24, 2021 at 1:54 PM Colin McCabe 
> wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I agree that we should give users the option of having a fully
> async
> > > API,
> > > > > but I don't think external thread pools or queues are the right
> > > direction
> > > > > to go here. They add performance overheads and don't address the
> root
> > > > > causes of the problem.
> > > > >
> > > > > There are basically two scen

Re: [DISCUSS] KIP-739: Block Less on KafkaProducer#send

2021-06-01 Thread Nakamura
Hi Colin,

> KafkaProducer#send is supposed to initiate non-blocking I/O, but not wait
for it to complete.
>
> There's more information about non-blocking I/O in Java here:
> https://en.wikipedia.org/wiki/Non-blocking_I/O_%28Java%29
I think we're talking past each other a bit.  I know about non-blocking
I/O.  The problem I'm facing is how to preserve the existing semantics
without blocking.  Right now callers assume their work is enqueued in-order
after `KafkaProducer#send` returns.  We can't simply return a future that
represents the metadata fetch, because of that assumption.  We need to
maintain order somehow.  That is what all of the different queues we're
proposing are intended to do.

> How are the ordering semantics of `KafkaProducer#send` related to the
metadata fetch?
KafkaProducer#send currently enqueues after it has the metadata, and it
passes the TopicPartition struct as part of the data when enqueueing.  We
can either update that data structure to be able to work with partial
metadata, or we can add a new queue on top.  I outline both potential
approaches in the current KIP.

> That is not related to the metadata fetch. Also, I already proposed a
solution (returning an error) if this is a concern.
Unfortunately it is, because `KafkaProducer#send` conflates the two of
them.  That seems to be the central difficulty of preserving the semantics
here.

> The same client thread that always has been responsible for checking poll.
Please pretend I've never contributed to Kafka before :). Which thread is
that?

Best,
Moses

On Tue, Jun 1, 2021 at 3:12 PM Ryanne Dolan  wrote:

> Colin, the issue for me isn't so much whether non-blocking I/O is used or
> not, but the fact that the caller observes a long time between calling
> send() and receiving the returned future. This behavior can be considered
> "blocking" whether or not I/O is involved.
>
> > How are the ordering semantics of `KafkaProducer#send` related to the
> metadata fetch?
> > I already proposed a solution (returning an error)
>
> There is a subtle difference between failing immediately vs blocking for
> metadata, related to ordering in the face of retries. Say we set the send
> timeout to max-long (or something high enough that we rarely encounter
> timeouts in practice), and set max inflight requests to 1. Today, we can
> reasonably assume that calling send() in sequence to a specific partition
> will result in the corresponding sequence landing on that partition,
> regardless of how the caller handles retries. The caller might not handle
> retries at all. But if we can fail immediately (e.g. when the metadata
> isn't yet ready), then the caller must handle retries carefully.
> Specifically, the caller must retry each send() before proceeding to the
> next. This basically means that the caller must block on each send() in
> order to maintain the proper sequence -- how else would the caller know
> whether it will need to retry or not?
>
> In other words, failing immediately punts the problem to the caller to
> handle, while the caller is less-equipped to deal with it. I don't think we
> should do that, at least not in the default case.
>
> I actually don't have any objections to this approach so long as it's
> opt-in. It sounds like you are suggesting to fix the bug for everyone, but
> I don't think we can do that without subtly breaking things.
>
> Ryanne
>
> On Tue, Jun 1, 2021 at 12:31 PM Colin McCabe  wrote:
>
> > On Tue, Jun 1, 2021, at 07:00, Nakamura wrote:
> > > Hi Colin,
> > >
> > > Sorry, I still don't follow.
> > >
> > > Right now `KafkaProducer#send` seems to trigger a metadata fetch.
> Today,
> > > we block on that before returning.  Is your proposal that we move the
> > > metadata fetch out of `KafkaProducer#send` entirely?
> > >
> >
> > KafkaProducer#send is supposed to initiate non-blocking I/O, but not wait
> > for it to complete.
> >
> > There's more information about non-blocking I/O in Java here:
> > https://en.wikipedia.org/wiki/Non-blocking_I/O_%28Java%29
> >
> > >
> > > Even if the metadata fetch moves to be non-blocking, I think we still
> > need
> > > to deal with the problems we've discussed before if the fetch happens
> in
> > > the `KafkaProducer#send` method.  How do we maintain the ordering
> > semantics
> > > of `KafkaProducer#send`?
> >
> > How are the ordering semantics of `KafkaProducer#send` related to the
> > metadata fetch?
> >
> > >  How do we prevent our buffer from filling up?
> >
> > That is not related to the metadata fetch. Also, I already proposed a
> > solution (returning an error) if this is a concern.
> >
> > > Which thread is responsible for checking poll()?
> >
> > The same client thread that always has been responsible for checking
> poll.
> >
> > >
> > > The only approach I can see that would avoid this would be moving the
> > > metadata fetch to happen at a different time.  But it's not clear to me
> > > when would be a more appropriate time to do the metadata fetch than
> > > `KafkaProduce

[jira] [Created] (KAFKA-12875) Change Log layer segment map mutations to avoid absence of active segment

2021-06-01 Thread Kowshik Prakasam (Jira)
Kowshik Prakasam created KAFKA-12875:


 Summary: Change Log layer segment map mutations to avoid absence 
of active segment
 Key: KAFKA-12875
 URL: https://issues.apache.org/jira/browse/KAFKA-12875
 Project: Kafka
  Issue Type: Improvement
Reporter: Kowshik Prakasam


[https://github.com/apache/kafka/pull/10650] showed a case where active segment 
was absent when Log layer segments were iterated. We should investigate Log 
layer code to see if we can change Log layer segment map mutations to avoid 
absence of active segment at any given point. For example, if we are clearing 
all segments and creating a new one, maybe we can reverse the order to create a 
new segment first and then clear the old ones later.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12876) Log.roll() could forever delete producer state snapshot of empty active segment

2021-06-01 Thread Kowshik Prakasam (Jira)
Kowshik Prakasam created KAFKA-12876:


 Summary: Log.roll() could forever delete producer state snapshot 
of empty active segment
 Key: KAFKA-12876
 URL: https://issues.apache.org/jira/browse/KAFKA-12876
 Project: Kafka
  Issue Type: Bug
Reporter: Kowshik Prakasam


In Log.scala, during roll, if there is an existing segment of 0 size with the 
newOffsetToRoll then we end up 
[deleting|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/Log.scala#L1610]
 the active segment asynchronously. This will also delete the producer state 
snapshot. However, we also [take a producer 
snapshot|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/Log.scala#L1639]
 on newOffsetToRoll before we add the new segment. This addition could race 
with snapshot deletion and we can end up losing the snapshot forever. So, in 
this case the fix is to not delete the snapshot because we end up recreating it 
anyway.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kafka-site] hachikuji merged pull request #355: Add Axios (https://www.axios.com/) to the list of the "Powered By ❤️"

2021-06-01 Thread GitBox


hachikuji merged pull request #355:
URL: https://github.com/apache/kafka-site/pull/355


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (KAFKA-12877) Fix KRPC files with missing flexibleVersions annotation

2021-06-01 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-12877:


 Summary: Fix KRPC files with missing flexibleVersions annotation
 Key: KAFKA-12877
 URL: https://issues.apache.org/jira/browse/KAFKA-12877
 Project: Kafka
  Issue Type: Bug
Reporter: Colin McCabe
Assignee: Colin McCabe


Some KRPC files do not specify their flexibleVersions. Unfortunately, in this 
case, we default to not supporting any flexible versions. This is a poor 
default, since the flexible format is both more efficient (usually) and 
flexible.

Make flexibleVersions explicit and disallow setting anything except "0+" on new 
RPC and metadata records.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12878) Support --bootstrap-server kafka-streams-application-reset

2021-06-01 Thread Neil Buesing (Jira)
Neil Buesing created KAFKA-12878:


 Summary: Support --bootstrap-server kafka-streams-application-reset
 Key: KAFKA-12878
 URL: https://issues.apache.org/jira/browse/KAFKA-12878
 Project: Kafka
  Issue Type: Improvement
  Components: tools
Reporter: Neil Buesing
Assignee: Mitchell
 Fix For: 2.5.0


This is a unambitious initial move toward standardizing the command line tools. 
We have favored the name {{\-\-bootstrap-server}} in all new tools since it 
matches the config {{bootstrap.server}} which is used by all clients. Some 
older commands use {{\-\-broker-list}} or {{\-\-bootstrap-servers}} and maybe 
other exotic variations. We should support {{\-\-bootstrap-server}} in all 
commands and deprecate the other options.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


request to be added to contribution list

2021-06-01 Thread Neil Buesing
I am looking to contribute to Apache Kafka.

To start I have realized that KIP-499 to standardize on command line
arguments has not updated kafka-streams-application-reset, so KAFKA-12878
is to deprecate --bootstrap-servers and use --bootstrap-server.

username: nbuesing
email: n...@buesing.dev

Thanks,

Neil Buesing


[jira] [Resolved] (KAFKA-12709) Admin API for ListTransactions

2021-06-01 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-12709.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

> Admin API for ListTransactions
> --
>
> Key: KAFKA-12709
> URL: https://issues.apache.org/jira/browse/KAFKA-12709
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
> Fix For: 3.0.0
>
>
> Add the `listTransactions` API described in KIP-664: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-664%3A+Provide+tooling+to+detect+and+abort+hanging+transactions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: request to be added to contribution list

2021-06-01 Thread Bill Bejeck
Hi Neil,

You're all set now.

-Bill

On Tue, Jun 1, 2021 at 8:45 PM Neil Buesing  wrote:

> I am looking to contribute to Apache Kafka.
>
> To start I have realized that KIP-499 to standardize on command line
> arguments has not updated kafka-streams-application-reset, so KAFKA-12878
> is to deprecate --bootstrap-servers and use --bootstrap-server.
>
> username: nbuesing
> email: n...@buesing.dev
>
> Thanks,
>
> Neil Buesing
>


[VOTING] KIP-693: Client-side Circuit Breaker for Partition Write Errors

2021-06-01 Thread Guoqiang Shu
Dear All,
We would like to get a vote on this proposal. The implementation is linked to 
the KIP, and we have ran this in our production setup for a while.
https://issues.apache.org/jira/browse/KAFKA-12793

Thanks in advance!

//George//


Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #184

2021-06-01 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 472922 lines...]
[2021-06-02T03:02:41.302Z] 
[2021-06-02T03:02:41.302Z] SaslPlainSslEndToEndAuthorizationTest > 
testProduceConsumeTopicAutoCreateTopicCreateAcl() STARTED
[2021-06-02T03:02:50.966Z] 
[2021-06-02T03:02:50.966Z] SaslPlainSslEndToEndAuthorizationTest > 
testProduceConsumeTopicAutoCreateTopicCreateAcl() PASSED
[2021-06-02T03:02:50.966Z] 
[2021-06-02T03:02:50.966Z] SaslPlainSslEndToEndAuthorizationTest > 
testProduceConsumeWithWildcardAcls() STARTED
[2021-06-02T03:03:01.073Z] 
[2021-06-02T03:03:01.073Z] SaslPlainSslEndToEndAuthorizationTest > 
testProduceConsumeWithWildcardAcls() PASSED
[2021-06-02T03:03:01.073Z] 
[2021-06-02T03:03:01.073Z] SaslPlainSslEndToEndAuthorizationTest > 
testNoConsumeWithDescribeAclViaSubscribe() STARTED
[2021-06-02T03:03:12.478Z] 
[2021-06-02T03:03:12.478Z] SaslPlainSslEndToEndAuthorizationTest > 
testNoConsumeWithDescribeAclViaSubscribe() PASSED
[2021-06-02T03:03:12.478Z] 
[2021-06-02T03:03:12.478Z] SaslPlainSslEndToEndAuthorizationTest > 
testNoConsumeWithoutDescribeAclViaAssign() STARTED
[2021-06-02T03:03:23.417Z] 
[2021-06-02T03:03:23.417Z] SaslPlainSslEndToEndAuthorizationTest > 
testNoConsumeWithoutDescribeAclViaAssign() PASSED
[2021-06-02T03:03:23.417Z] 
[2021-06-02T03:03:23.417Z] SaslPlainSslEndToEndAuthorizationTest > 
testNoGroupAcl() STARTED
[2021-06-02T03:03:34.084Z] 
[2021-06-02T03:03:34.084Z] SaslPlainSslEndToEndAuthorizationTest > 
testNoGroupAcl() PASSED
[2021-06-02T03:03:34.084Z] 
[2021-06-02T03:03:34.084Z] SaslPlainSslEndToEndAuthorizationTest > 
testNoProduceWithDescribeAcl() STARTED
[2021-06-02T03:03:46.696Z] 
[2021-06-02T03:03:46.696Z] SaslPlainSslEndToEndAuthorizationTest > 
testNoProduceWithDescribeAcl() PASSED
[2021-06-02T03:03:46.696Z] 
[2021-06-02T03:03:46.696Z] SaslPlainSslEndToEndAuthorizationTest > 
testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl() STARTED
[2021-06-02T03:03:59.056Z] 
[2021-06-02T03:03:59.056Z] SaslPlainSslEndToEndAuthorizationTest > 
testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl() PASSED
[2021-06-02T03:03:59.056Z] 
[2021-06-02T03:03:59.056Z] SaslPlainSslEndToEndAuthorizationTest > 
testProduceConsumeViaSubscribe() STARTED
[2021-06-02T03:04:09.096Z] 
[2021-06-02T03:04:09.096Z] SaslPlainSslEndToEndAuthorizationTest > 
testProduceConsumeViaSubscribe() PASSED
[2021-06-02T03:04:09.096Z] 
[2021-06-02T03:04:09.096Z] SaslPlainSslEndToEndAuthorizationTest > 
testTwoConsumersWithDifferentSaslCredentials() STARTED
[2021-06-02T03:04:19.674Z] 
[2021-06-02T03:04:19.675Z] SaslPlainSslEndToEndAuthorizationTest > 
testTwoConsumersWithDifferentSaslCredentials() PASSED
[2021-06-02T03:04:19.675Z] 
[2021-06-02T03:04:19.675Z] SaslPlainSslEndToEndAuthorizationTest > testAcls() 
STARTED
[2021-06-02T03:04:28.530Z] 
[2021-06-02T03:04:28.530Z] SaslPlainSslEndToEndAuthorizationTest > testAcls() 
PASSED
[2021-06-02T03:04:28.530Z] 
[2021-06-02T03:04:28.530Z] SaslSslAdminIntegrationTest > 
testCreateDeleteTopics() STARTED
[2021-06-02T03:05:22.473Z] 
[2021-06-02T03:05:22.473Z] SaslSslAdminIntegrationTest > 
testCreateDeleteTopics() PASSED
[2021-06-02T03:05:22.473Z] 
[2021-06-02T03:05:22.473Z] SaslSslAdminIntegrationTest > 
testAuthorizedOperations() STARTED
[2021-06-02T03:06:08.183Z] 
[2021-06-02T03:06:08.183Z] SaslSslAdminIntegrationTest > 
testAuthorizedOperations() PASSED
[2021-06-02T03:06:08.183Z] 
[2021-06-02T03:06:08.183Z] SaslSslAdminIntegrationTest > testAclDescribe() 
STARTED
[2021-06-02T03:07:00.738Z] 
[2021-06-02T03:07:00.738Z] SaslSslAdminIntegrationTest > testAclDescribe() 
PASSED
[2021-06-02T03:07:00.738Z] 
[2021-06-02T03:07:00.738Z] SaslSslAdminIntegrationTest > 
testLegacyAclOpsNeverAffectOrReturnPrefixed() STARTED
[2021-06-02T03:07:41.416Z] 
[2021-06-02T03:07:41.416Z] SaslSslAdminIntegrationTest > 
testLegacyAclOpsNeverAffectOrReturnPrefixed() PASSED
[2021-06-02T03:07:41.416Z] 
[2021-06-02T03:07:41.416Z] SaslSslAdminIntegrationTest > 
testCreateTopicsResponseMetadataAndConfig() STARTED
[2021-06-02T03:08:25.784Z] 
[2021-06-02T03:08:25.784Z] SaslSslAdminIntegrationTest > 
testCreateTopicsResponseMetadataAndConfig() PASSED
[2021-06-02T03:08:25.784Z] 
[2021-06-02T03:08:25.784Z] SaslSslAdminIntegrationTest > 
testAttemptToCreateInvalidAcls() STARTED
[2021-06-02T03:09:05.030Z] 
[2021-06-02T03:09:05.030Z] SaslSslAdminIntegrationTest > 
testAttemptToCreateInvalidAcls() PASSED
[2021-06-02T03:09:05.030Z] 
[2021-06-02T03:09:05.030Z] SaslSslAdminIntegrationTest > 
testAclAuthorizationDenied() STARTED
[2021-06-02T03:09:45.867Z] 
[2021-06-02T03:09:45.867Z] SaslSslAdminIntegrationTest > 
testAclAuthorizationDenied() PASSED
[2021-06-02T03:09:45.867Z] 
[2021-06-02T03:09:45.867Z] SaslSslAdminIntegrationTest > testAclOperations() 
STARTED
[2021-06-02T03:10:32.283Z] 
[2021-06-02T03:10:32.283Z] SaslSslAdminIntegrationTest > testAclOperations() 
PASSED
[2021-06-02T03:10:32.283Z] 
[2021-06-02T

[jira] [Resolved] (KAFKA-12845) Rollback change which requires join key to be non null on KStream->GlobalKTable

2021-06-01 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-12845.
-
Resolution: Duplicate

> Rollback change which requires join key to be non null on 
> KStream->GlobalKTable
> ---
>
> Key: KAFKA-12845
> URL: https://issues.apache.org/jira/browse/KAFKA-12845
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 2.7.0
>Reporter: Pedro Gontijo
>Priority: Major
>
> As part of [KAFKA-10277|https://issues.apache.org/jira/browse/KAFKA-10277] 
> the behavior for KStream->GlobalKtable joins was changed to require non null 
> join keys.
> But it seems reasonable that not every record will have an existing 
> relationship (and hence a key) with the join globalktable. Think about a 
> User>Car for instance, or PageView>Product. An empty/zero key could be 
> returned by the KeyMapper but that will make a totally unnecessary search 
> into the store.
> I do not think that makes sense for any GlobalKtable join (inner or left) but 
> for left join it sounds even more strange.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)