[ https://issues.apache.org/jira/browse/KAFKA-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14020446#comment-14020446 ]
Jun Rao commented on KAFKA-1298: -------------------------------- Sriharsha, Thanks for the patch. Do we know why the controlled shutdown take that long in the case with just one broker? I was thinking this should only add a few ms overhead. So, instead of turning controlled shutdown off in the unit tests, perhaps we should just improve the performance of controlled shutdown? Looking at the code, it seems if replication factor is 1, the controller can just send ack back immediately w/o having to send any requests like StopReplica. > Controlled shutdown tool doesn't seem to work out of the box > ------------------------------------------------------------ > > Key: KAFKA-1298 > URL: https://issues.apache.org/jira/browse/KAFKA-1298 > Project: Kafka > Issue Type: Improvement > Reporter: Jay Kreps > Assignee: Sriharsha Chintalapani > Labels: usability > Attachments: KAFKA-1298.patch, KAFKA-1298.patch > > > Download Kafka and try to use our shutdown tool. Got this: > bin/kafka-run-class.sh kafka.admin.ShutdownBroker --zookeeper localhost:2181 > --broker 0 > [2014-03-06 16:58:23,636] ERROR Operation failed due to controller failure > (kafka.admin.ShutdownBroker$) > java.io.IOException: Failed to retrieve RMIServer stub: > javax.naming.ServiceUnavailableException [Root exception is > java.rmi.ConnectException: Connection refused to host: > jkreps-mn.linkedin.biz; nested exception is: > java.net.ConnectException: Connection refused] > at > javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:340) > at > javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:249) > at > kafka.admin.ShutdownBroker$.kafka$admin$ShutdownBroker$$invokeShutdown(ShutdownBroker.scala:56) > at kafka.admin.ShutdownBroker$.main(ShutdownBroker.scala:109) > at kafka.admin.ShutdownBroker.main(ShutdownBroker.scala) > Caused by: javax.naming.ServiceUnavailableException [Root exception is > java.rmi.ConnectException: Connection refused to host: > jkreps-mn.linkedin.biz; nested exception is: > java.net.ConnectException: Connection refused] > at > com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:101) > at > com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:185) > at javax.naming.InitialContext.lookup(InitialContext.java:392) > at > javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1888) > at > javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1858) > at > javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:257) > ... 4 more > Caused by: java.rmi.ConnectException: Connection refused to host: > jkreps-mn.linkedin.biz; nested exception is: > java.net.ConnectException: Connection refused > at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:601) > at > sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:198) > at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:184) > at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:322) > at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source) > at > com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:97) > ... 9 more > Caused by: java.net.ConnectException: Connection refused > at java.net.PlainSocketImpl.socketConnect(Native Method) > at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:382) > at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:241) > at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:228) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:431) > at java.net.Socket.connect(Socket.java:527) > at java.net.Socket.connect(Socket.java:476) > at java.net.Socket.<init>(Socket.java:373) > at java.net.Socket.<init>(Socket.java:187) > at > sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:22) > at > sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:128) > at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:595) > ... 14 more > Oh god, RMI?????!!!??? > Presumably this is because we stopped setting the JMX port by default. This > is good because setting the JMX port breaks the quickstart which requires > running multiple nodes on a single machine. The root cause imo is just using > RMI here instead of our regular RPC. -- This message was sent by Atlassian JIRA (v6.2#6252)