[jira] [Resolved] (KAFKA-7756) Leader: -1 after topic delete

2018-12-27 Thread zhws (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhws resolved KAFKA-7756.
-
Resolution: Fixed

> Leader: -1 after topic delete
> -
>
> Key: KAFKA-7756
> URL: https://issues.apache.org/jira/browse/KAFKA-7756
> Project: Kafka
>  Issue Type: Bug
>Reporter: zhws
>Priority: Blocker
> Attachments: image-2018-12-19-17-03-42-912.png, 
> image-2018-12-19-17-07-27-850.png, image-2018-12-19-17-10-25-784.png
>
>
> 1、when i first delete topic "deleteTestTwo",it's successed. I can see the 
> delete log and zookeeper delete node too.
> !image-2018-12-19-17-03-42-912.png!
>  
> 2、But when i create this topic and delete again.
> !image-2018-12-19-17-07-27-850.png!
> I just see the file delete log.
> Zookeeper still have this node, and i execute describe shell as follows
> !image-2018-12-19-17-10-25-784.png!
>  
> if some people know the reason, please tell me.thanks
> kafka version : 2.0
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7770) Add method that gives 100% guarantee that topic has been created

2018-12-27 Thread Tomasz Szlek (JIRA)
Tomasz Szlek created KAFKA-7770:
---

 Summary: Add method that gives 100% guarantee that topic has been 
created
 Key: KAFKA-7770
 URL: https://issues.apache.org/jira/browse/KAFKA-7770
 Project: Kafka
  Issue Type: Improvement
Reporter: Tomasz Szlek


Current Kafka client API provides method to create topics but it indicates that 
"It may take several seconds after \{{CreateTopicsResult}} returns success for 
all the brokers to become aware that the topics have been created".

If possible, it would be good to have a yet another method that will give 100% 
guarantee that topic has been created and all brokers are aware of it. Just for 
reference I raised the same issue 
[here|https://stackoverflow.com/questions/53910783/kafka-producer-throws-received-unknown-topic-or-partition-error-when-sending-t].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-7625) Kafka Broker node JVM crash - kafka.coordinator.transaction.TransactionCoordinator.$anonfun$handleEndTransaction

2018-12-27 Thread Ismael Juma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-7625.

Resolution: Not A Bug

Closing since it's not a Kafka issue. Someone else reported that changing the 
GC from G1 also made the issue go away. In any case, upgrading to the latest 
version of JDK 8 is the recommended path.

> Kafka Broker node JVM crash - 
> kafka.coordinator.transaction.TransactionCoordinator.$anonfun$handleEndTransaction
> 
>
> Key: KAFKA-7625
> URL: https://issues.apache.org/jira/browse/KAFKA-7625
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.0.0
> Environment:  environment:os.version=2.6.32-754.2.1.el6.x86_64 
> java.version=1.8.0_92 
> environment:zookeeper.version=3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03,
>  built on 06/29/2018 00:39 GMT (org.apache.zookeeper.ZooKeeper)
> Kafka commitId : 3402a8361b734732 
>Reporter: Sebastian Puzoń
>Priority: Critical
> Attachments: hs_err_pid10238.log, hs_err_pid15119.log, 
> hs_err_pid19131.log, hs_err_pid19405.log, hs_err_pid20124.log, 
> hs_err_pid22373.log, hs_err_pid22386.log, hs_err_pid22633.log, 
> hs_err_pid24681.log, hs_err_pid25513.log, hs_err_pid25701.log, 
> hs_err_pid26844.log, hs_err_pid27156.log, hs_err_pid27290.log, 
> hs_err_pid4194.log, hs_err_pid4299.log
>
>
> I observe broker node JVM crashes with same problematic frame:
> {code:java}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7ff4a2588261, pid=24681, tid=0x7ff3b9bb1700
> #
> # JRE version: Java(TM) SE Runtime Environment (8.0_92-b14) (build 
> 1.8.0_92-b14)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.92-b14 mixed mode 
> linux-amd64 compressed oops)
> # Problematic frame:
> # J 9736 C1 
> kafka.coordinator.transaction.TransactionCoordinator.$anonfun$handleEndTransaction$7(Lkafka/coordinator/transaction/TransactionCoordinator;Ljava/lang/String;JSLorg/apache/kafka/common/requests/TransactionResult;Lkafka/coordinator/transaction/TransactionMetadata;)Lscala/util/Either;
>  (518 bytes) @ 0x7ff4a2588261 [0x7ff4a25871a0+0x10c1]
> #
> # Failed to write core dump. Core dumps have been disabled. To enable core 
> dumping, try "ulimit -c unlimited" before starting Java again
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> #
> ---  T H R E A D  ---
> Current thread (0x7ff4b356f800):  JavaThread "kafka-request-handler-3" 
> daemon [_thread_in_Java, id=24781, 
> stack(0x7ff3b9ab1000,0x7ff3b9bb2000)]
> {code}
> {code:java}
> Stack: [0x7ff3b9ab1000,0x7ff3b9bb2000],  sp=0x7ff3b9bafca0,  free 
> space=1019k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
> code)
> J 9736 C1 
> kafka.coordinator.transaction.TransactionCoordinator.$anonfun$handleEndTransaction$7(Lkafka/coordinator/transaction/TransactionCoordinator;Ljava/lang/String;JSLorg/apache/kafka/common/requests/TransactionResult;Lkafka/coordinator/transaction/TransactionMetadata;)Lscala/util/Either;
>  (518 bytes) @ 0x7ff4a2588261 [0x7ff4a25871a0+0x10c1]
> J 10456 C2 
> kafka.coordinator.transaction.TransactionCoordinator.$anonfun$handleEndTransaction$5(Lkafka/coordinator/transaction/TransactionCoordinator;Ljava/lang/String;JSLorg/apache/kafka/common/requests/TransactionResult;ILscala/Option;)Lscala/util/Either;
>  (192 bytes) @ 0x7ff4a1d413f0 [0x7ff4a1d41240+0x1b0]
> J 9303 C1 
> kafka.coordinator.transaction.TransactionCoordinator$$Lambda$1107.apply(Ljava/lang/Object;)Ljava/lang/Object;
>  (32 bytes) @ 0x7ff4a245f55c [0x7ff4a245f3c0+0x19c]
> J 10018 C2 
> scala.util.Either$RightProjection.flatMap(Lscala/Function1;)Lscala/util/Either;
>  (43 bytes) @ 0x7ff4a1f242c4 [0x7ff4a1f24260+0x64]
> J 9644 C1 
> kafka.coordinator.transaction.TransactionCoordinator.sendTxnMarkersCallback$1(Lorg/apache/kafka/common/protocol/Errors;Ljava/lang/String;JSLorg/apache/kafka/common/requests/TransactionResult;Lscala/Function1;ILkafka/coordinator/transaction/TxnTransitMetadata;)V
>  (251 bytes) @ 0x7ff4a1ef6254 [0x7ff4a1ef5120+0x1134]
> J 9302 C1 
> kafka.coordinator.transaction.TransactionCoordinator$$Lambda$1106.apply(Ljava/lang/Object;)Ljava/lang/Object;
>  (40 bytes) @ 0x7ff4a24747ec [0x7ff4a24745a0+0x24c]
> J 10125 C2 
> kafka.coordinator.transaction.TransactionStateManager.updateCacheCallback$1(Lscala/collection/Map;Ljava/lang/String;ILkafka/coordinator/transaction/TxnTransitMetadata;Lscala/Function1;Lscala/Function1;Lorg/apache/kafka/common/TopicPartition;)V
>  (892 bytes) @ 0x7ff4a27045ec [0x7ff4a2703c60+0x98c]
> J 10051 C2 
> k

Re: KIP-213 - Scalable/Usable Foreign-Key KTable joins - Rebooted.

2018-12-27 Thread Adam Bellemare
Hi All

Sorry for the delay - holidays and all. I have since updated the KIP with
John's original suggestion and have pruned a number of the no longer
relevant diagrams. Any more comments would be welcomed, otherwise I will
look to kick off the vote again shortly.

Thanks
Adam

On Mon, Dec 17, 2018 at 7:06 PM Adam Bellemare 
wrote:

> Hi John and Guozhang
>
> Ah yes, I lost that in the mix! Thanks for the convergent solutions - I do
> think that the attachment that John included makes for a better design. It
> should also help with overall performance as very high-cardinality foreign
> keyed data (say millions of events with the same entity) will be able to
> leverage the multiple nodes for join functionality instead of having it all
> performed in one node. There is still a bottleneck in the right table
> having to propagate all those events, but with slimmer structures, less IO
> and no need to perform the join I think the throughput will be much higher
> in those scenarios.
>
> Okay, I am convinced. I will update the KIP accordingly to a Gliffy
> version of John's diagram and ensure that the example flow matches
> correctly. Then I can go back to working on the PR to match the diagram.
>
> Thanks both of you for all the help - very much appreciated.
>
> Adam
>
>
>
>
>
>
>
> On Mon, Dec 17, 2018 at 6:39 PM Guozhang Wang  wrote:
>
>> Hi John,
>>
>> Just made a pass on your diagram (nice hand-drawing btw!), and obviously
>> we
>> are thinking about the same thing :) A neat difference that I like, is
>> that
>> in the pre-join repartition topic we can still send message in the format
>> of `K=k, V=(i=2)` while using "i" as the partition key in
>> StreamsPartition,
>> this way we do not need to even augment the key for the repartition topic,
>> but just do a projection on the foreign key part but trim all other
>> fields:
>> as long as we still materialize the store as `A-2` co-located with the
>> right KTable, that is fine.
>>
>> As I mentioned in my previous email, I also think this has a few
>> advantages
>> on saving over-the-wire bytes as well as disk bytes.
>>
>> Guozhang
>>
>>
>> On Mon, Dec 17, 2018 at 3:17 PM John Roesler  wrote:
>>
>> > Hi Guozhang,
>> >
>> > Thanks for taking a look! I think Adam's already addressed your
>> questions
>> > as well as I could have.
>> >
>> > Hi Adam,
>> >
>> > Thanks for updating the KIP. It looks great, especially how all the
>> > need-to-know information is right at the top, followed by the details.
>> >
>> > Also, thanks for that high-level diagram. Actually, now that I'm looking
>> > at it, I think part of my proposal got lost in translation, although I
>> do
>> > think that what you have there is also correct.
>> >
>> > I sketched up a crude diagram based on yours and attached it to the KIP
>> > (I'm not sure if attached or inline images work on the mailing list):
>> >
>> https://cwiki.apache.org/confluence/download/attachments/74684836/suggestion.png
>> > . It's also attached to this email for convenience.
>> >
>> > Hopefully, you can see how it's intended to line up, and which parts are
>> > modified.
>> > At a high level, instead of performing the join on the right-hand side,
>> > we're essentially just registering interest, like "LHS key A wishes to
>> > receive updates for RHS key 2". Then, when there is a new "interest" or
>> any
>> > updates to the RHS records, it "broadcasts" its state back to the LHS
>> > records who are interested in it.
>> >
>> > Thus, instead of sending the LHS values to the RHS joiner workers and
>> then
>> > sending the join results back to the LHS worke be co-partitioned and
>> > validated, we instead only send the LHS *keys* to the RHS workers and
>> then
>> > only the RHS k/v back to be joined by the LHS worker.
>> >
>> > I've been considering both your diagram and mine, and I *think* what I'm
>> > proposing has a few advantages.
>> >
>> > Here are some points of interest as you look at the diagram:
>> > * When we extract the foreign key and send it to the Pre-Join
>> Repartition
>> > Topic, we can send only the FK/PK pair. There's no need to worry about
>> > custom partitioner logic, since we can just use the foreign key plainly
>> as
>> > the repartition record key. Also, we save on transmitting the LHS value,
>> > since we only send its key in this step.
>> > * We also only need to store the RHSKey:LHSKey mapping in the
>> > MaterializedSubscriptionStore, saving on disk. We can use the same rocks
>> > key format you proposed and the same algorithm involving range scans
>> when
>> > the RHS records get updated.
>> > * Instead of joining on the right side, all we do is compose a
>> > re-repartition record so we can broadcast the RHS k/v pair back to the
>> > original LHS partition. (this is what the "rekey" node is doing)
>> > * Then, there is a special kind of Joiner that's co-resident in the same
>> > StreamTask as the LHS table, subscribed to the Post-Join Repartition
>> Topic.
>> > ** This Joiner is *not* triggered directl

[jira] [Resolved] (KAFKA-7707) Some code is not necessary

2018-12-27 Thread Jason Gustafson (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-7707.

Resolution: Not A Problem

> Some code is not necessary
> --
>
> Key: KAFKA-7707
> URL: https://issues.apache.org/jira/browse/KAFKA-7707
> Project: Kafka
>  Issue Type: Improvement
>Reporter: huangyiming
>Priority: Minor
> Attachments: image-2018-12-05-18-01-46-886.png
>
>
> In the trunk branch in 
> [BufferPool.java|https://github.com/apache/kafka/blob/578205cadd0bf64d671c6c162229c4975081a9d6/clients/src/main/java/org/apache/kafka/clients/producer/internals/BufferPool.java#L174],
>  i think the code can clean,is not necessary,it will never execute
> {code:java}
> if (!(this.nonPooledAvailableMemory == 0 && this.free.isEmpty()) && 
> !this.waiters.isEmpty())
> this.waiters.peekFirst().signal();
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Kafka tests on a remote cluster

2018-12-27 Thread Parviz deyhim
+dev@kafka.apache.org

On Wed, Dec 26, 2018 at 8:53 PM Parviz deyhim  wrote:

> Thanks fair points. Probably best if I simplify the question: How does
> Kafka community run tests besides using mocked local Kafka components?
> Surely there are tests to confirm different failure scenarios such as
> losing a broker in a real clustered environment (multi node cluster with
> Ip, port, hostnsmes and etc). The answer would be a good starting point for
> me.
>
> On Wed, Dec 26, 2018 at 6:11 PM Stephen Powis 
> wrote:
>
>> Without looking into how the integration tests work my best guess is
>> within
>> the context they were written to run in, it doesn't make sense to run them
>> against a remote cluster.  The "internal" cluster is running the same
>> code,
>> so why require having to coordinate with an external dependency?
>>
>> For the use case you gave, and I'm not sure if tests exist that cover this
>> behavior or not -- Running the brokers locally in the context of the tests
>> mean that those tests have control over the brokers (IE shut them off,
>> restart them, etc.. programmatically) and validate behavior.  To
>> coordinate
>> these operations on a remote broker would be significantly more difficult.
>>
>> Not sure this helps...but perhaps you're either asking the wrong questions
>> or trying to go about solving your problem using the wrong set of tools?
>> My gut feeling says if you want to do a full scale multi-server load / HA
>> test, Kafka's test suite is not the best place to start.
>>
>> Stephen
>>
>>
>>
>> On Thu, Dec 27, 2018 at 10:53 AM Parviz deyhim  wrote:
>>
>> > Hi,
>> >
>> > I'm looking to see who has done this before and get some guidance. On
>> > frequent basis I like to run basic tests on a remote Kafka cluster while
>> > some random chaos/faults are being performed. In other words I like to
>> run
>> > chaos engineering tasks (network outage, disk outage, etc) and see how
>> > Kafka behaves. For example:
>> >
>> > 1) bring some random Broker node down
>> > 2) send 2000 messages
>> > 3) consumes messages
>> > 4) confirm there's no data loss
>> >
>> > My questions: I'm pretty sure most of the scenarios I'm looking to test
>> > have been covered under Kafka's integration,unit and other existing
>> tests.
>> > What I cannot figure out is how to run those tests on a remote cluster
>> vs.
>> > a local one which the tests seems to run on. For example I like to run
>> the
>> > following command but the tests to be executed on a remote cluster of my
>> > choice:
>> >
>> > ./gradlew cleanTest integrationTest
>> >
>> > Any guidance/help would be appreciated.
>> >
>> > Thanks
>> >
>>
>


[jira] [Created] (KAFKA-7771) Group/Transaction coordinators should update assignment based on current partition count

2018-12-27 Thread Jason Gustafson (JIRA)
Jason Gustafson created KAFKA-7771:
--

 Summary: Group/Transaction coordinators should update assignment 
based on current partition count
 Key: KAFKA-7771
 URL: https://issues.apache.org/jira/browse/KAFKA-7771
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson


In GroupMetadataManager and TransactionStateManager, we cache the number of 
partitions assigned to __consumer_offsets and __transaction_state respectively. 
This is used to compute the expected partition for a given group.id or 
transactional.id. The value is computed only once when the broker starts up and 
it is based on the number of partitions if the topic exists or the value 
configured by `offsets.topic.num.partitions` in the case of the group 
coordinator (or `transaction.state.log.num.partitions` for the transaction 
coordinator). 

The problem is that Kafka supports manual creation of these topics as well and 
the number of partitions doesn't have to match the value configured. If the 
topic is created with a mismatching number of partitions, then the cached 
values will not be correct and coordinator lookup may fail.

To fix this, when brokers receive metadata updates from the controller, they 
should pass the current number of partitions for these internal topics to the 
respective coordinators so that the count can be updated. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7772) Dynamically adjust log level in Connect workers

2018-12-27 Thread Arjun Satish (JIRA)
Arjun Satish created KAFKA-7772:
---

 Summary: Dynamically adjust log level in Connect workers
 Key: KAFKA-7772
 URL: https://issues.apache.org/jira/browse/KAFKA-7772
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Reporter: Arjun Satish
Assignee: Arjun Satish


Currently, Kafka provides a JMX interface to dynamically modify log levels of 
different active loggers. It would be good to have a similar interface for 
Connect as well. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7773) Use verifiable consumer in system tests to avoid reliance on console consumer idle timeout

2018-12-27 Thread Jason Gustafson (JIRA)
Jason Gustafson created KAFKA-7773:
--

 Summary: Use verifiable consumer in system tests to avoid reliance 
on console consumer idle timeout
 Key: KAFKA-7773
 URL: https://issues.apache.org/jira/browse/KAFKA-7773
 Project: Kafka
  Issue Type: Improvement
  Components: system tests
Reporter: Jason Gustafson
Assignee: Jason Gustafson


We have a few system tests which use `ProduceConsumeValidateTest`, which relies 
on the console consumer's idle timeout for shutdown. We should convert these 
systems tests to use the verifiable consumer instead so that we can control 
shutdown more finely and we have more validation options. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)