[ https://issues.apache.org/jira/browse/KAFKA-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16560099#comment-16560099 ]
Fernando Vega edited comment on KAFKA-5407 at 7/27/18 6:31 PM: --------------------------------------------------------------- [~omkreddy] [~hachikuji] [~huxi_2b] Just double checking. I try this again and I found a few things: a- Once I upgraded the cluster, I attempted to use the new consumer file again for the mirrormakers we have whitelisting the same topics and I get the same exception. b- However I did another test, using same exact configs that the production topics used the only difference was I created a single topic in order to check if the issue was something related with Kafka or the package installed. I was able to mirror my dummy messages using all new files and configs that we have for production, and it worked just fine. But with the current production topics it doesn't c- Also we have seeing that sometimes the mirrormaker threads die with no reason, I see messages in the logs where it states that the mirrormaker was shutdown successfully, however we haven't stop them or restart them in order to see this message. d- sometimes when we use consumer group scrip to check the lag of consumption we see the list of the topic and its consumers, but in some cases when we display the information we see the topics not having consumers, so what we do is stop mm remove the consumer group and start the mm and that seem to fix it. if you guys can provide any suggestion that will be great, also any tool that you guys suggest that we can use to check, monitor or understand troubleshooting this behavior will be great as well. Listed below are current configs: {noformat} ### ### This file is managed by Puppet. ### # See http://kafka.apache.org/documentation.html#brokerconfigs for default values. # The id of the broker. This must be set to a unique integer for each broker. broker.id=31 # The port the socket server listens on port=9092 # A comma seperated list of directories under which to store log files log.dirs=/kafka1/datalog,/kafka2/datalog,/kafka3/datalog,/kafka4/datalog,/kafka5/datalog,/kafka6/datalog,/kafka7/datalog,/kafka8/datalog,/kafka9/datalog,/kafka10/datalog # Zookeeper connection string (see zookeeper docs for details). # This is a comma separated host:port pairs, each corresponding to a zk # server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002". # You can also append an optional chroot string to the urls to specify the # root directory for all kafka znodes. zookeeper.connect=zookeeper1-repl:2181,zookeeper2-repl:2181,zookeeper3-repl:2181,zookeeper4-repl:2181,zookeeper5-repl:2181/replication/kafka # Additional configuration options may follow here auto.leader.rebalance.enable=true delete.topic.enable=true socket.receive.buffer.bytes=1048576 socket.send.buffer.bytes=1048576 default.replication.factor=2 auto.create.topics.enable=true num.partitions=1 num.network.threads=8 num.io.threads=40 log.retention.hours=1 log.roll.hours=1 num.replica.fetchers=8 zookeeper.connection.timeout.ms=30000 zookeeper.session.timeout.ms=30000 inter.broker.protocol.version=0.10.2 log.message.format.version=0.8.2 {noformat} Producer {noformat} # Producer # sjc2 bootstrap.servers=app454.sjc2.com:9092,app455.sjc2.com:9092,app456.sjc2.com:9092,app457.sjc2.com:9092,app458.sjc2.com:9092,app459.sjc2.com:9092 # Producer Configurations acks=0 buffer.memory=67108864 compression.type=gzip linger.ms=10 reconnect.backoff.ms=100 request.timeout.ms=120000 retry.backoff.ms=1000 {noformat} Consumer {noformat} bootstrap.servers=app043.atl2.com:9092,app044.atl2.com:9092,app045.atl2.com:9092,app046.atl2.com:9092,app047.atl2.com:9092,app048.atl2.com:9092 group.id=MirrorMaker_atl1 partition.assignment.strategy=org.apache.kafka.clients.consumer.RoundRobinAssignor receive.buffer.bytes=1048576 send.buffer.bytes=1048576 session.timeout.ms=250000 key.deserializer=org.apache.kafka.common.serialization.Deserializer value.deserializer=org.apache.kafka.common.serialization.Deserializer {noformat} was (Author: fvegaucr): [~omkreddy] [~hachikuji] [~huxi_2b] Just double checking. I try this again and I found a few things: a- Once I upgraded the cluster, I attempted to use the new consumer file again for the mirrormakers we have whitelisting the same topics and I get the same exception. b- However I did another test, using same exact configs that the production topics used the only difference was I created a single topic in order to check if the issue was something realted with Kafka or the package installed and mirroring using all new files, and it worked just fine. But with the current production topics it doesnt? c- Also we have seeing that sometimes the mirrormaker threads die with no reason, I see messages in the logs where it states that the mirrormaker was shutdown successfully, however we havent stop them or restart them in order to see this message. d- sometimes when we use consumer group scrip to check the LAG of consumption we see the list of the topic and its consumers, but in some cases when we display the information we see the topics not having consumers, so what we do is stop mm remove the consumer group and start the mm and that seem to fix it. if you guys can provide any suggestion that will be great, also any tool that you guys suggest that we can use to check, monitor or understand troubleshooting this behavior will be great as well. > Mirrormaker dont start after upgrade > ------------------------------------ > > Key: KAFKA-5407 > URL: https://issues.apache.org/jira/browse/KAFKA-5407 > Project: Kafka > Issue Type: Bug > Components: consumer > Affects Versions: 0.10.2.1 > Environment: Operating system > CentOS 6.8 > HW > Board Mfg : HP > Board Product : ProLiant DL380p Gen8 > CPU's x2 > Product Manufacturer : Intel > Product Name : Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz > Memory Type : DDR3 SDRAM > SDRAM Capacity : 2048 MB > Total Memory: : 64GB > Hardrives size and layout: > 9 drives using jbod > drive size 3.6TB each > Reporter: Fernando Vega > Priority: Critical > Attachments: broker.hkg1.new, debug.hkg1.new, > mirrormaker-repl-sjc2-to-hkg1.log.8 > > > Currently Im upgrading the cluster from 0.8.2-beta to 0.10.2.1 > So I followed the rolling procedure: > Here the config files: > Consumer > {noformat} > # > # Cluster: repl > # Topic list(goes into command line): > REPL-ams1-global,REPL-atl1-global,REPL-sjc2-global,REPL-ams1-global-PN_HXIDMAP_.*,REPL-atl1-global-PN_HXIDMAP_.*,REPL-sjc2-global-PN_HXIDMAP_.*,REPL-ams1-global-PN_HXCONTEXTUALV2_.*,REPL-atl1-global-PN_HXCONTEXTUALV2_.*,REPL-sjc2-global-PN_HXCONTEXTUALV2_.* > bootstrap.servers=app001:9092,app002:9092,app003:9092,app004:9092 > group.id=hkg1_cluster > auto.commit.interval.ms=60000 > partition.assignment.strategy=org.apache.kafka.clients.consumer.RoundRobinAssignor > {noformat} > Producer > {noformat} > hkg1 > # # Producer > # # hkg1 > bootstrap.servers=app001:9092,app002:9092,app003:9092,app004:9092 > compression.type=gzip > acks=0 > {noformat} > Broker > {noformat} > auto.leader.rebalance.enable=true > delete.topic.enable=true > socket.receive.buffer.bytes=1048576 > socket.send.buffer.bytes=1048576 > default.replication.factor=2 > auto.create.topics.enable=true > num.partitions=1 > num.network.threads=8 > num.io.threads=40 > log.retention.hours=1 > log.roll.hours=1 > num.replica.fetchers=8 > zookeeper.connection.timeout.ms=30000 > zookeeper.session.timeout.ms=30000 > inter.broker.protocol.version=0.10.2 > log.message.format.version=0.8.2 > {noformat} > I tried also using stock configuraiton with no luck. > The error that I get is this: > {noformat} > 2017-06-07 12:24:45,476] INFO ConsumerConfig values: > auto.commit.interval.ms = 60000 > auto.offset.reset = latest > bootstrap.servers = [app454.sjc2.mytest.com:9092, > app455.sjc2.mytest.com:9092, app456.sjc2.mytest.com:9092, > app457.sjc2.mytest.com:9092, app458.sjc2.mytest.com:9092, > app459.sjc2.mytest.com:9092] > check.crcs = true > client.id = MirrorMaker_hkg1-1 > connections.max.idle.ms = 540000 > enable.auto.commit = false > exclude.internal.topics = true > fetch.max.bytes = 52428800 > fetch.max.wait.ms = 500 > fetch.min.bytes = 1 > group.id = MirrorMaker_hkg1 > heartbeat.interval.ms = 3000 > interceptor.classes = null > key.deserializer = class > org.apache.kafka.common.serialization.ByteArrayDeserializer > max.partition.fetch.bytes = 1048576 > max.poll.interval.ms = 300000 > max.poll.records = 500 > metadata.max.age.ms = 300000 > metric.reporters = [] > metrics.num.samples = 2 > metrics.recording.level = INFO > metrics.sample.window.ms = 30000 > partition.assignment.strategy = > [org.apache.kafka.clients.consumer.RoundRobinAssignor] > receive.buffer.bytes = 65536 > reconnect.backoff.ms = 50 > request.timeout.ms = 305000 > retry.backoff.ms = 100 > sasl.jaas.config = null > sasl.kerberos.kinit.cmd = /usr/bin/kinit > sasl.kerberos.min.time.before.relogin = 60000 > sasl.kerberos.service.name = null > sasl.kerberos.ticket.renew.jitter = 0.05 > sasl.kerberos.ticket.renew.window.factor = 0.8 > sasl.mechanism = GSSAPI > security.protocol = PLAINTEXT > send.buffer.bytes = 131072 > session.timeout.ms = 10000 > ssl.cipher.suites = null > ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1] > ssl.endpoint.identification.algorithm = null > ssl.key.password = null > ssl.keymanager.algorithm = SunX509 > ssl.keystore.location = null > ssl.keystore.password = null > ssl.keystore.type = JKS > ssl.protocol = TLS > ssl.provider = null > ssl.secure.random.implementation = null > ssl.trustmanager.algorithm = PKIX > ssl.truststore.location = null > ssl.truststore.password = null > ssl.truststore.type = JKS > value.deserializer = class > org.apache.kafka.common.serialization.ByteArrayDeserializer > INFO Kafka commitId : e89bffd6b2eff799 > (org.apache.kafka.common.utils.AppInfoParser) > [2017-06-07 12:24:45,497] INFO [mirrormaker-thread-0] Starting mirror maker > thread mirrormaker-thread-0 (kafka.tools.MirrorMaker$MirrorMakerThread) > [2017-06-07 12:24:45,497] INFO [mirrormaker-thread-1] Starting mirror maker > thread mirrormaker-thread-1 (kafka.tools.MirrorMaker$MirrorMakerThread) > [2017-06-07 12:24:48,619] INFO Discovered coordinator > app458.sjc2.mytest.com:9092 (id: 2147483613 rack: null) for group > MirrorMaker_hkg1. > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2017-06-07 12:24:48,620] INFO Discovered coordinator > app458.sjc2.mytest.com:9092 (id: 2147483613 rack: null) for group > MirrorMaker_hkg1. > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2017-06-07 12:24:48,625] INFO Revoking previously assigned partitions [] for > group MirrorMaker_hkg1 > (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) > [2017-06-07 12:24:48,625] INFO Revoking previously assigned partitions [] for > group MirrorMaker_hkg1 > (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) > [2017-06-07 12:24:48,648] INFO (Re-)joining group MirrorMaker_hkg1 > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2017-06-07 12:24:48,649] INFO (Re-)joining group MirrorMaker_hkg1 > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2017-06-07 12:24:53,560] FATAL [mirrormaker-thread-1] Mirror maker thread > failure due to (kafka.tools.MirrorMaker$MirrorMakerThread) > org.apache.kafka.common.KafkaException: Unexpected error from SyncGroup: The > server experienced an unexpected error when processing the request > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:548) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:521) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:784) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:765) > at > org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:186) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:149) > at > org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:116) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:493) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:322) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:253) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:172) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:347) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:303) > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:290) > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1029) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:995) > at > kafka.tools.MirrorMaker$MirrorMakerNewConsumer.receive(MirrorMaker.scala:625) > at kafka.tools.MirrorMaker$MirrorMakerThread.run(MirrorMaker.scala:431) > {noformat} > Im using mirrormaker -- This message was sent by Atlassian JIRA (v7.6.3#76005)