Thanks Florin. It does indeed seem to be a memory problem. Turns out that there were no ack's happening either because I was emitting from a while loop in nextTuple() and it never left the nextTuple() function.
On Fri, May 6, 2016 at 11:59 AM, Spico Florin <[email protected]> wrote: > Hello! > If you have a look at the last line of your log you can see: > java.lang.OutOfMemoryError: *GC overhead limit exceeded* > So you don't have enough memory for your worker. This is the reason that > the connection of the worker to ZoooKeper dies. The worker sends heartbeats > to ZK. If worker dies then no heartbeat to ZK. Therefore you have > connection timeout. > You can increase the JVM memory by setting up this via Config > property topology.worker.childopts . Config *conf.setProperty("* > * topology.worker.childopts**", "-Xms1G -Xmx1G").* This is to set up you > JVM heap memory. > > > To answer your question: *Does Zookeeper store a queue of unprocessed > tuples until the Bolts are ready to process them?* > *No.* Storm has internal queues to buffer the tuples. It is using a LMAX > disruptor queues to send/receive tuples from collocated JVM executors > (spouts and bolts) and different incoming/outgoing queues for > receiving/sending tuples to external workers (JVM). > Please have a detailed description here. > http://www.michael-noll.com/blog/2013/06/21/understanding-storm-internal-message-buffers/ > > I hope that these help. > Regards, > Florin > > > > > > > On Fri, May 6, 2016 at 8:22 AM, Navin Ipe <[email protected] > > wrote: > >> Hi, >> >> I have a topology where if a spout emits 1 tuple, a Bolt-A takes that >> tuple and emits 20 tuples. The next Bolt-B takes Bolt-A's tuples and emits >> 50 more tuples for each of Bolt-A's tuples. Tuples are always anchored. >> >> *Question:* >> When a light-weight spout emits a few tuples and Bolt-B has to process an >> exponential number of tuples, Bolt-A and B will receive tuples faster than >> they can process it. Does Zookeeper store a queue of unprocessed tuples >> until the Bolts are ready to process them? >> >> *Reason I'm asking:* >> Because I get a session timeout when I run a single instance of the >> bolts. When I increase the parallelism and tasks to 5, it runs longer >> before timing out. When I increase it to 15, it runs even longer before >> timing out. >> >> *The various error messages:* >> 587485 [main-SendThread(localhost:2001)] INFO o.a.s.s.o.a.z.ClientCnxn - >> Client session timed out, have not heard from server in 13645ms for >> sessionid 0x154846bbee00003, closing socket connection and attempting >> reconnect >> >> 599655 [main-EventThread] INFO o.a.s.s.o.a.c.f.s.ConnectionStateManager >> - State change: SUSPENDED >> >> 614868 [main-SendThread(localhost:2001)] INFO o.a.s.s.o.a.z.ClientCnxn - >> Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2001. Will >> not attempt to authenticate using SASL (unknown error) >> >> 614869 [main-SendThread(localhost:2001)] INFO o.a.s.s.o.a.z.ClientCnxn - >> Socket connection established to localhost/0:0:0:0:0:0:0:1:2001, initiating >> session >> >> 607952 [SessionTracker] INFO o.a.s.s.o.a.z.s.ZooKeeperServer - Expiring >> session 0x154846bbee00003, timeout of 20000ms exceeded >> >> 621928 [main-EventThread] WARN o.a.s.c.zookeeper-state-factory - >> Received event :disconnected::none: with disconnected Writer Zookeeper. >> >> 627967 [Curator-Framework-0] WARN o.a.s.s.o.a.c.ConnectionState - >> Connection attempt unsuccessful after 25535 (greater than max timeout of >> 20000). Resetting connection and trying again with a new connection. >> 627967 [timer] INFO o.a.s.c.healthcheck - () >> >> 631511 [ProcessThread(sid:0 cport:-1):] INFO >> o.a.s.s.o.a.z.s.PrepRequestProcessor - Processed session termination for >> sessionid: 0x154846bbee00008 >> 631511 [main-SendThread(localhost:2001)] INFO o.a.s.s.o.a.z.ClientCnxn - >> Opening socket connection to server localhost/127.0.0.1:2001. Will not >> attempt to authenticate using SASL (unknown error) >> 590891 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2001] WARN >> o.a.s.s.o.a.z.s.NIOServerCnxn - caught end of stream exception >> org.apache.storm.shade.org.apache.zookeeper.server.ServerCnxn$EndOfStreamException: >> Unable to read additional data from client sessionid 0x154846bbee00006, >> likely client has closed socket >> >> 635140 [Curator-Framework-0] WARN o.a.s.s.o.a.c.ConnectionState - >> Connection attempt unsuccessful after 32485 (greater than max timeout of >> 20000). Resetting connection and trying again with a new connection. >> >> 625190 [Curator-Framework-0] ERROR o.a.s.s.o.a.c.ConnectionState - >> Connection timed out for connection string (localhost:2001/storm) and >> timeout (15000) / elapsed (18942) >> org.apache.storm.shade.org.apache.curator.CuratorConnectionLossException: >> KeeperErrorCode = ConnectionLoss >> >> 643572 [Curator-ConnectionStateManager-0] INFO o.a.s.zookeeper - >> 192.168.0.101 lost leadership. >> >> 678900 [Thread-79-GenBolt-executor[29 29]] ERROR o.a.s.util - Async loop >> died! >> java.lang.OutOfMemoryError: *GC overhead limit exceeded* >> at java.lang.Long.valueOf(Long.java:840) ~[?:1.8.0_73] >> >> -- >> Regards, >> Navin >> > > -- Regards, Navin
