[ 
https://issues.apache.org/jira/browse/KAFKA-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16560099#comment-16560099
 ] 

Fernando Vega edited comment on KAFKA-5407 at 7/27/18 6:31 PM:
---------------------------------------------------------------

[~omkreddy] [~hachikuji] [~huxi_2b]
 Just double checking. I try this again and I found a few things:

 

a- Once I upgraded the cluster, I attempted to use the new consumer file again 
for the mirrormakers  we have whitelisting the same topics and I get the same 
exception.

b- However I did another test, using same exact configs that the production 
topics used the only difference was I created a single topic in order to check 
if the issue was something related with Kafka or the package installed. I was 
able to mirror my dummy messages using all new files and configs that we have 
for production, and it worked just fine. But with the current production topics 
it doesn't

c- Also we have seeing that sometimes the mirrormaker threads die with no 
reason, I see messages in the logs where it states that the mirrormaker was 
shutdown successfully, however we haven't stop them or restart them in order to 
see this message.

d- sometimes when we use consumer group scrip to check the lag of consumption 
we see the list of the topic and its consumers, but in some cases when we 
display the information we see the topics not having consumers, so what we do 
is stop mm remove the consumer group and start the mm and that seem to fix it.

if you guys can provide any suggestion that will be great, also any tool that 
you guys suggest that we can use to check, monitor or understand 
troubleshooting this behavior will be great as well.

Listed below are current configs:
{noformat}

###
### This file is managed by Puppet.
###

# See http://kafka.apache.org/documentation.html#brokerconfigs for default 
values.

# The id of the broker. This must be set to a unique integer for each broker.
broker.id=31

# The port the socket server listens on
port=9092

# A comma seperated list of directories under which to store log files
log.dirs=/kafka1/datalog,/kafka2/datalog,/kafka3/datalog,/kafka4/datalog,/kafka5/datalog,/kafka6/datalog,/kafka7/datalog,/kafka8/datalog,/kafka9/datalog,/kafka10/datalog

# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.

zookeeper.connect=zookeeper1-repl:2181,zookeeper2-repl:2181,zookeeper3-repl:2181,zookeeper4-repl:2181,zookeeper5-repl:2181/replication/kafka
 # Additional configuration options may follow here
auto.leader.rebalance.enable=true
delete.topic.enable=true
socket.receive.buffer.bytes=1048576
socket.send.buffer.bytes=1048576
default.replication.factor=2
auto.create.topics.enable=true
num.partitions=1
num.network.threads=8
num.io.threads=40
log.retention.hours=1
log.roll.hours=1
num.replica.fetchers=8
zookeeper.connection.timeout.ms=30000
zookeeper.session.timeout.ms=30000
inter.broker.protocol.version=0.10.2
log.message.format.version=0.8.2

{noformat}

Producer
{noformat}
# Producer
# sjc2
bootstrap.servers=app454.sjc2.com:9092,app455.sjc2.com:9092,app456.sjc2.com:9092,app457.sjc2.com:9092,app458.sjc2.com:9092,app459.sjc2.com:9092

# Producer Configurations
acks=0
buffer.memory=67108864
compression.type=gzip
linger.ms=10
reconnect.backoff.ms=100
request.timeout.ms=120000
retry.backoff.ms=1000
{noformat}

Consumer
{noformat}
bootstrap.servers=app043.atl2.com:9092,app044.atl2.com:9092,app045.atl2.com:9092,app046.atl2.com:9092,app047.atl2.com:9092,app048.atl2.com:9092

group.id=MirrorMaker_atl1

partition.assignment.strategy=org.apache.kafka.clients.consumer.RoundRobinAssignor

receive.buffer.bytes=1048576
send.buffer.bytes=1048576

session.timeout.ms=250000

key.deserializer=org.apache.kafka.common.serialization.Deserializer
value.deserializer=org.apache.kafka.common.serialization.Deserializer

{noformat}


was (Author: fvegaucr):
[~omkreddy] [~hachikuji] [~huxi_2b]
Just double checking. I try this again and I found a few things:

 

a- Once I upgraded the cluster, I attempted to use the new consumer file again 
for the mirrormakers  we have whitelisting the same topics and I get the same 
exception. 

b- However I did another test, using same exact configs that the production 
topics used the only difference was I created a single topic in order to check 
if the issue was something realted with Kafka or the package installed and 
mirroring using all new files, and it worked just fine. But with the current 
production topics it doesnt?

c- Also we have seeing that sometimes the mirrormaker threads die with no 
reason, I see messages in the logs where it states that the mirrormaker was 
shutdown successfully, however we havent stop them or restart them in order to 
see this message.

d- sometimes when we use consumer group scrip to check the LAG of consumption 
we see the list of the topic and its consumers, but in some cases when we 
display the information we see the topics not having consumers, so what we do 
is stop mm remove the consumer group and start the mm and that seem to fix it.



if you guys can provide any suggestion that will be great, also any tool that 
you guys suggest that we can use to check, monitor or understand 
troubleshooting this behavior will be great as well.

> Mirrormaker dont start after upgrade
> ------------------------------------
>
>                 Key: KAFKA-5407
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5407
>             Project: Kafka
>          Issue Type: Bug
>          Components: consumer
>    Affects Versions: 0.10.2.1
>         Environment: Operating system
> CentOS 6.8
> HW
> Board Mfg             : HP
>  Board Product         : ProLiant DL380p Gen8
> CPU's x2
> Product Manufacturer  : Intel
>  Product Name          :  Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz
>  Memory Type           : DDR3 SDRAM
>  SDRAM Capacity        : 2048 MB
>  Total Memory:             : 64GB
> Hardrives size and layout:
> 9 drives using jbod
> drive size 3.6TB each
>            Reporter: Fernando Vega
>            Priority: Critical
>         Attachments: broker.hkg1.new, debug.hkg1.new, 
> mirrormaker-repl-sjc2-to-hkg1.log.8
>
>
> Currently Im upgrading the cluster from 0.8.2-beta to 0.10.2.1
> So I followed the rolling procedure:
> Here the config files:
> Consumer
> {noformat}
> #
> # Cluster: repl
> # Topic list(goes into command line): 
> REPL-ams1-global,REPL-atl1-global,REPL-sjc2-global,REPL-ams1-global-PN_HXIDMAP_.*,REPL-atl1-global-PN_HXIDMAP_.*,REPL-sjc2-global-PN_HXIDMAP_.*,REPL-ams1-global-PN_HXCONTEXTUALV2_.*,REPL-atl1-global-PN_HXCONTEXTUALV2_.*,REPL-sjc2-global-PN_HXCONTEXTUALV2_.*
> bootstrap.servers=app001:9092,app002:9092,app003:9092,app004:9092
> group.id=hkg1_cluster
> auto.commit.interval.ms=60000
> partition.assignment.strategy=org.apache.kafka.clients.consumer.RoundRobinAssignor
> {noformat}
> Producer
> {noformat}
>  hkg1
> # # Producer
> # # hkg1
> bootstrap.servers=app001:9092,app002:9092,app003:9092,app004:9092
> compression.type=gzip
> acks=0
> {noformat}
> Broker
> {noformat}
> auto.leader.rebalance.enable=true
> delete.topic.enable=true
> socket.receive.buffer.bytes=1048576
> socket.send.buffer.bytes=1048576
> default.replication.factor=2
> auto.create.topics.enable=true
> num.partitions=1
> num.network.threads=8
> num.io.threads=40
> log.retention.hours=1
> log.roll.hours=1
> num.replica.fetchers=8
> zookeeper.connection.timeout.ms=30000
> zookeeper.session.timeout.ms=30000
> inter.broker.protocol.version=0.10.2
> log.message.format.version=0.8.2
> {noformat}
> I tried also using stock configuraiton with no luck.
> The error that I get is this:
> {noformat}
> 2017-06-07 12:24:45,476] INFO ConsumerConfig values:
>       auto.commit.interval.ms = 60000
>       auto.offset.reset = latest
>       bootstrap.servers = [app454.sjc2.mytest.com:9092, 
> app455.sjc2.mytest.com:9092, app456.sjc2.mytest.com:9092, 
> app457.sjc2.mytest.com:9092, app458.sjc2.mytest.com:9092, 
> app459.sjc2.mytest.com:9092]
>       check.crcs = true
>       client.id = MirrorMaker_hkg1-1
>       connections.max.idle.ms = 540000
>       enable.auto.commit = false
>       exclude.internal.topics = true
>       fetch.max.bytes = 52428800
>       fetch.max.wait.ms = 500
>       fetch.min.bytes = 1
>       group.id = MirrorMaker_hkg1
>       heartbeat.interval.ms = 3000
>       interceptor.classes = null
>       key.deserializer = class 
> org.apache.kafka.common.serialization.ByteArrayDeserializer
>       max.partition.fetch.bytes = 1048576
>       max.poll.interval.ms = 300000
>       max.poll.records = 500
>       metadata.max.age.ms = 300000
>       metric.reporters = []
>       metrics.num.samples = 2
>       metrics.recording.level = INFO
>       metrics.sample.window.ms = 30000
>       partition.assignment.strategy = 
> [org.apache.kafka.clients.consumer.RoundRobinAssignor]
>       receive.buffer.bytes = 65536
>       reconnect.backoff.ms = 50
>       request.timeout.ms = 305000
>       retry.backoff.ms = 100
>       sasl.jaas.config = null
>       sasl.kerberos.kinit.cmd = /usr/bin/kinit
>       sasl.kerberos.min.time.before.relogin = 60000
>       sasl.kerberos.service.name = null
>       sasl.kerberos.ticket.renew.jitter = 0.05
>       sasl.kerberos.ticket.renew.window.factor = 0.8
>       sasl.mechanism = GSSAPI
>       security.protocol = PLAINTEXT
>       send.buffer.bytes = 131072
>       session.timeout.ms = 10000
>       ssl.cipher.suites = null
>       ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
>       ssl.endpoint.identification.algorithm = null
>       ssl.key.password = null
>       ssl.keymanager.algorithm = SunX509
>       ssl.keystore.location = null
>       ssl.keystore.password = null
>       ssl.keystore.type = JKS
>       ssl.protocol = TLS
>       ssl.provider = null
>       ssl.secure.random.implementation = null
>       ssl.trustmanager.algorithm = PKIX
>       ssl.truststore.location = null
>       ssl.truststore.password = null
>       ssl.truststore.type = JKS
>       value.deserializer = class 
> org.apache.kafka.common.serialization.ByteArrayDeserializer
> INFO Kafka commitId : e89bffd6b2eff799 
> (org.apache.kafka.common.utils.AppInfoParser)
> [2017-06-07 12:24:45,497] INFO [mirrormaker-thread-0] Starting mirror maker 
> thread mirrormaker-thread-0 (kafka.tools.MirrorMaker$MirrorMakerThread)
> [2017-06-07 12:24:45,497] INFO [mirrormaker-thread-1] Starting mirror maker 
> thread mirrormaker-thread-1 (kafka.tools.MirrorMaker$MirrorMakerThread)
> [2017-06-07 12:24:48,619] INFO Discovered coordinator 
> app458.sjc2.mytest.com:9092 (id: 2147483613 rack: null) for group 
> MirrorMaker_hkg1. 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2017-06-07 12:24:48,620] INFO Discovered coordinator 
> app458.sjc2.mytest.com:9092 (id: 2147483613 rack: null) for group 
> MirrorMaker_hkg1. 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2017-06-07 12:24:48,625] INFO Revoking previously assigned partitions [] for 
> group MirrorMaker_hkg1 
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2017-06-07 12:24:48,625] INFO Revoking previously assigned partitions [] for 
> group MirrorMaker_hkg1 
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2017-06-07 12:24:48,648] INFO (Re-)joining group MirrorMaker_hkg1 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2017-06-07 12:24:48,649] INFO (Re-)joining group MirrorMaker_hkg1 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2017-06-07 12:24:53,560] FATAL [mirrormaker-thread-1] Mirror maker thread 
> failure due to  (kafka.tools.MirrorMaker$MirrorMakerThread)
> org.apache.kafka.common.KafkaException: Unexpected error from SyncGroup: The 
> server experienced an unexpected error when processing the request
>       at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:548)
>       at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:521)
>       at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:784)
>       at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:765)
>       at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:186)
>       at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:149)
>       at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:116)
>       at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:493)
>       at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:322)
>       at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:253)
>       at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:172)
>       at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:347)
>       at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:303)
>       at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:290)
>       at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1029)
>       at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:995)
>       at 
> kafka.tools.MirrorMaker$MirrorMakerNewConsumer.receive(MirrorMaker.scala:625)
>       at kafka.tools.MirrorMaker$MirrorMakerThread.run(MirrorMaker.scala:431)
> {noformat}
> Im using mirrormaker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to