You can add set -x
in start-server.sh, and it will show you what the script is trying to do. -Eric On Wed, Jul 18, 2012 at 4:12 PM, Cardon, Tejay E <[email protected]> wrote: > Eric, > Good to know about the tracers. I setup 4 tracers for an 8 node setup, but > I'll back that down to just the 1. As for the .out or .err files on > tservers, I've got nothing. There is no evidence that those servers were > ever touched. I'm thinking the next step would be to execute the > start-here.sh script on each tserver and look for errors. Is that the best > approach, and if so, what arguments should I pass? I'm digging into the > start-all.sh script for answers, but if someone already knows what my > arguments are.... all the better. > > Thanks, > Tejay > > -----Original Message----- > From: Eric Newton [mailto:[email protected]] > Sent: Wednesday, July 18, 2012 2:00 PM > To: [email protected] > Subject: EXTERNAL: Re: There are no tablet servers > > Don't start a tracer on every server. Just start one on a master server. > You won't need more than 1 until you get several hundred servers. > > Do you have anything in the .out or .err files on the tserver hosts? > If the files don't exist, something is failing in the ssh to those hosts. > > -Eric > > On Wed, Jul 18, 2012 at 2:15 PM, Cardon, Tejay E <[email protected]> > wrote: >> All, >> >> I'm running into a strange challenge in my latest Accumulo installation. >> I've developed some chef recipes for deploying Accumulo, and have >> tested them on three clusters now with no problems. Using the same >> scripts, I recent did another deployment, but I'm having trouble on this one. >> >> >> >> After installing Accumulo, updating the config files, and setting up >> passwordless ssh, I ran: >> >> ./accumulo init >> >> >> >> Everything went normally with me setting the instanceID and password >> >> >> >> Then I ran >> >> ./start-all.sh >> >> >> Again, everything went smoothly with the following output: >> >> bash-3.2$ ./start-all.sh >> >> Starting tablet servers and loggers ....... done >> >> Starting tablet server on de8-9a-8f-83-be-52 >> >> Starting logger on de8-9a-8f-83-be-52 >> >> Starting tablet server on d04-7d-7b-06-5e-48 >> >> Starting logger on de8-9a-8f-d3-3e-f8 >> >> Starting tablet server on d04-7d-7b-06-5d-f4 >> >> Starting logger on d04-7d-7b-06-5e-48 >> >> Starting logger on d04-7d-7b-06-5d-f4 >> >> Starting tablet server on de8-9a-8f-d3-3e-f8 >> >> 18 12:48:50,970 [server.Accumulo] INFO : Attempting to talk to >> zookeeper >> >> 18 12:48:51,182 [server.Accumulo] INFO : Zookeeper connected and >> initialized, attemping to talk to HDFS >> >> 18 12:48:51,568 [server.Accumulo] INFO : Connected to HDFS >> >> Starting master on d04-7d-7b-06-5d-80 >> >> Starting garbage collector on d04-7d-7b-06-5e-ba >> >> Starting monitor on d04-7d-7b-06-5e-ba >> >> Starting tracer on d04-7d-7b-06-5d-80 >> >> Starting tracer on de8-9a-8f-d3-3e-f8 >> >> Starting tracer on d04-7d-7b-06-5e-48 >> >> >> >> I can also run a stop-all.sh with no complaints from the script. >> >> >> >> However, if I try to start the Accumulo shell, I get >> >> >> >> bash-3.2$ ./accumulo shell >> >> Enter current password for 'hdfs'@'test4': ****** >> >> 18 13:00:17,906 [impl.ServerClient] WARN : There are no tablet servers: >> check that zookeeper and accumulo are running. >> >> >> >> If I check the tablet server machines I find that they do not have any >> Accumulo processes running, and the master does not have any tablet >> server logs. (it does have the tracer logs, however). >> >> >> >> I've attached the log files here (without the empty ones). There is >> an error trying to "clean up old log sort" and a thrift error. >> >> I'm at a loss for where to begin on the debugging for this. Any >> thoughts would be greatly appreciated. >> >> >> >> >> >> 18 12:48:54,100 [master.CoordinateRecoveryTask] ERROR: Error cleaning >> up old Log Sort jobsjava.io.IOException: Call to /10.1.24.65:50030 >> failed on local >> exception: java.io.EOFException >> >> >> >> 18 12:48:57,016 [impl.ServerClient] DEBUG: ClientService request >> failed null, retrying ... >> >> org.apache.thrift.transport.TTransportException: Failed to connect to >> a server >> >> at >> org.apache.accumulo.core.client.impl.ThriftTransportPool.getAnyTranspo >> rt(ThriftTransportPool.java:437) >> >> at >> org.apache.accumulo.core.client.impl.ServerClient.getConnection(Server >> Client.java:145) >> >> at >> org.apache.accumulo.core.client.impl.ServerClient.getConnection(Server >> Client.java:123) >> >> at >> org.apache.accumulo.core.client.impl.ServerClient.executeRaw(ServerCli >> ent.java:105) >> >> at >> org.apache.accumulo.core.client.impl.ServerClient.execute(ServerClient >> .java:71) >> >> at >> org.apache.accumulo.core.client.impl.ConnectorImpl.<init>(ConnectorImp >> l.java:75) >> >> at >> org.apache.accumulo.server.client.HdfsZooInstance.getConnector(HdfsZoo >> Instance.java:145) >> >> at >> org.apache.accumulo.server.trace.TraceServer.<init>(TraceServer.java:1 >> 52) >> >> at >> org.apache.accumulo.server.trace.TraceServer.main(TraceServer.java:222 >> ) >> >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native >> Method) >> >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j >> ava:39) >> >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess >> orImpl.java:25) >> >> at java.lang.reflect.Method.invoke(Method.java:597) >> >> at org.apache.accumulo.start.Main$1.run(Main.java:89) >> >> at java.lang.Thread.run(Thread.java:662) >> >> >> >>
