I see. I will try to debug and see what's going on. Also, what is the difference between worker.childopts and topology.worker.childopts?
Thanks, Nick 2015-06-25 11:10 GMT-04:00 Nathan Leung <[email protected]>: > The nimbus log will tell you which port the worker was started on (look > for the worker hash, it will give supervisor node and port assignments but > requires some decoding). Then take a look at the worker log. Maybe your > initialization is taking too long? > > On Thu, Jun 25, 2015 at 11:06 AM, Nick R. Katsipoulakis < > [email protected]> wrote: > >> Yes, I see the following message which I have not seen before: >> >> 2015-06-24T19:05:28.745+0000 b.s.d.supervisor [INFO] >> fa3de772-cc61-4394-97e2-fcbd85190dd4 still hasn't started >> 2015-06-24T19:05:29.245+0000 b.s.d.supervisor [INFO] >> fa3de772-cc61-4394-97e2-fcbd85190dd4 still hasn't started >> 2015-06-24T19:05:29.746+0000 b.s.d.supervisor [INFO] >> fa3de772-cc61-4394-97e2-fcbd85190dd4 still hasn't started >> 2015-06-24T19:05:30.246+0000 b.s.d.supervisor [INFO] >> fa3de772-cc61-4394-97e2-fcbd85190dd4 still hasn't started >> 2015-06-24T19:05:30.646+0000 b.s.d.supervisor [INFO] Removing code for >> storm id tpch-q5-top-5-1435172243 >> 2015-06-24T19:05:30.747+0000 b.s.d.supervisor [INFO] >> fa3de772-cc61-4394-97e2-fcbd85190dd4 still hasn't started >> 2015-06-24T19:05:31.247+0000 b.s.d.supervisor [INFO] >> fa3de772-cc61-4394-97e2-fcbd85190dd4 still hasn't started >> >> 2015-06-24T19:06:50.327+0000 b.s.d.supervisor [INFO] Worker >> fa3de772-cc61-4394-97e2-fcbd85190dd4 failed to start >> 2015-06-24T19:06:50.329+0000 b.s.d.supervisor [INFO] Shutting down and >> clearing state for id fa3de772-cc61-4394-97e2-fcbd85190dd4. Current >> supervisor time: 1435172810. State: :not-started, Heartbeat: nil >> 2015-06-24T19:06:50.329+0000 b.s.d.supervisor [INFO] Shutting down >> 58e551ba-f944-4aec-9c8f-5621053021dd:fa3de772-cc61-4394-97e2-fcbd85190dd4 >> 2015-06-24T19:06:50.330+0000 b.s.d.supervisor [INFO] Shut down >> 58e551ba-f944-4aec-9c8f-5621053021dd:fa3de772-cc61-4394-97e2-fcbd85190dd4 >> 2015-06-24T19:08:39.743+0000 b.s.d.supervisor [INFO] Shutting down >> supervisor 58e551ba-f944-4aec-9c8f-5621053021dd >> 2015-06-24T19:08:39.745+0000 b.s.event [INFO] Event manager interrupted >> 2015-06-24T19:08:39.745+0000 b.s.event [INFO] Event manager interrupted >> 2015-06-24T19:08:39.748+0000 o.a.s.z.ZooKeeper [INFO] Session: >> 0x24e26a304b50025 closed >> 2015-06-24T19:08:39.748+0000 o.a.s.z.ClientCnxn [INFO] EventThread shut >> down >> >> But no indication on why the above is happening. >> >> Thanks, >> Nick >> >> 2015-06-25 10:52 GMT-04:00 Nathan Leung <[email protected]>: >> >>> Any problems in supervisor or nimbus logs? >>> >>> On Thu, Jun 25, 2015 at 10:49 AM, Nick R. Katsipoulakis < >>> [email protected]> wrote: >>> >>>> I am using m4.xlarge instances, each one with 4 workers per supervisor. >>>> Yes, they are listed. >>>> >>>> Nick >>>> >>>> 2015-06-25 10:47 GMT-04:00 Nathan Leung <[email protected]>: >>>> >>>>> How big are your EC2 instances? Are your supervisors listed in the >>>>> storm UI? >>>>> >>>>> On Thu, Jun 25, 2015 at 10:43 AM, Nick R. Katsipoulakis < >>>>> [email protected]> wrote: >>>>> >>>>>> Nathan, >>>>>> >>>>>> I attempted to put the following line >>>>>> >>>>>> worker.childopts: "-Xmx4096m -XX:+UseConcMarkSweepGC >>>>>> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:NewSize=128m -XX: >>>>>> CMSInitiatingOccupancyFraction=70 -XX: -CMSConcurrentMTEnabled >>>>>> Djava.net.preferIPv4Stack=true" >>>>>> >>>>>> in the supervisor config files, but for some reason workers were not >>>>>> spawned on those machines. To be more precise, I submitted my topology >>>>>> (with storm jar...) and I just waited for it to start executing, but >>>>>> nothing. Any ideas of what might have been the reason? >>>>>> >>>>>> Thanks, >>>>>> Nick >>>>>> >>>>>> 2015-06-25 10:39 GMT-04:00 Nathan Leung <[email protected]>: >>>>>> >>>>>>> In general worker options need to be set in the supervisor config >>>>>>> files. >>>>>>> >>>>>>> On Thu, Jun 25, 2015 at 10:07 AM, Nick R. Katsipoulakis < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hello sy.pan >>>>>>>> >>>>>>>> Thank you for the link. I will try the suggestions. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Nick >>>>>>>> >>>>>>>> 2015-06-24 22:35 GMT-04:00 sy.pan <[email protected]>: >>>>>>>> >>>>>>>>> FYI: >>>>>>>>> >>>>>>>>> >>>>>>>>> https://mail-archives.apache.org/mod_mbox/storm-user/201504.mbox/%3ccafbccrcadux8sl8d99tomrbg9hkmo3gkg-qdv-qkmc-6zxs...@mail.gmail.com%3E >>>>>>>>> >>>>>>>>> >>>>>>>>> 在 2015年6月25日,02:14,Nick R. Katsipoulakis <[email protected]> >>>>>>>>> 写道: >>>>>>>>> >>>>>>>>> Hello all, >>>>>>>>> >>>>>>>>> I am working on an EC2 Storm cluster, and I want the workers in >>>>>>>>> the supervisor machines to use 4GBs of memory, so I add the following >>>>>>>>> line >>>>>>>>> in the machine that hosts the nimbus: >>>>>>>>> >>>>>>>>> worker.childopts-Xmx4096m -XX:+UseConcMarkSweepGC >>>>>>>>> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:NewSize=128m >>>>>>>>> -XX:CMSInitiatingOccupancyFraction=70 -XX: -CMSConcurrentMTEnabled >>>>>>>>> Djava.net.preferIPv4Stack=true >>>>>>>>> However, when I take a look into the workers' logs (on each other >>>>>>>>> machine who is running a supervisor), I do not find the above line on >>>>>>>>> the >>>>>>>>> part that launches the worker with the given arguments. In fact, I >>>>>>>>> find the >>>>>>>>> following line: >>>>>>>>> >>>>>>>>> 2015-06-24T17:52:45.349+0000 b.s.d.worker [INFO] Launching worker >>>>>>>>> for tpch-q5-top-2-1435168361 on >>>>>>>>> 5568726d-ad65-4a7c-ba52-32eed83276ad:6703 >>>>>>>>> with id 829f36fc-eeb9-4eef-ae89-9fb6565e9108 and conf >>>>>>>>> {"dev.zookeeper.path" >>>>>>>>> "/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs" nil, >>>>>>>>> "topology.builtin.metrics.bucket.size.secs" 60, >>>>>>>>> "topology.fall.back.on.java.serialization" true, >>>>>>>>> "topology.max.error.report.per.interval" 5, "zmq.linger.millis" 5000, >>>>>>>>> "topology.skip.missing.kryo.registrations" false, >>>>>>>>> "storm.messaging.netty.client_worker_threads" 4, "ui.childopts" >>>>>>>>> "-Xmx768m", >>>>>>>>> "storm.zookeeper.session.timeout" 20000, "nimbus.reassign" true, >>>>>>>>> "topology.trident.batch.emit.interval.millis" 500, " >>>>>>>>> storm.messaging.netty.flush.check.interval.ms" 10, >>>>>>>>> "nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m", >>>>>>>>> "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib", >>>>>>>>> "storm.home" >>>>>>>>> "/opt/apache-storm-0.9.4", "topology.executor.send.buffer.size" 1024, >>>>>>>>> "storm.local.dir" "/mnt/storm", "storm.messaging.netty.buffer_size" >>>>>>>>> 10485760, "supervisor.worker.start.timeout.secs" 120, >>>>>>>>> "topology.enable.message.timeouts" true, >>>>>>>>> "nimbus.cleanup.inbox.freq.secs" >>>>>>>>> 600, "nimbus.inbox.jar.expiration.secs" 3600, "drpc.worker.threads" >>>>>>>>> 64, >>>>>>>>> "storm.meta.serialization.delegate" >>>>>>>>> "backtype.storm.serialization.DefaultSerializationDelegate", >>>>>>>>> "topology.worker.shared.thread.pool.size" 4, "nimbus.host" >>>>>>>>> "52.25.74.163", >>>>>>>>> "storm.messaging.netty.min_wait_ms" 100, "storm.zookeeper.port" 2181, >>>>>>>>> "transactional.zookeeper.port" nil, >>>>>>>>> "topology.executor.receive.buffer.size" >>>>>>>>> 1024, "transactional.zookeeper.servers" nil, "storm.zookeeper.root" >>>>>>>>> "/storm", "storm.zookeeper.retry.intervalceiling.millis" 30000, >>>>>>>>> "supervisor.enable" true, >>>>>>>>> "storm.messaging.netty.server_worker_threads" 4, >>>>>>>>> "storm.zookeeper.servers" ["172.31.28.73" "172.31.38.251" >>>>>>>>> "172.31.38.252"], >>>>>>>>> "transactional.zookeeper.root" "/transactional", >>>>>>>>> "topology.acker.executors" >>>>>>>>> nil, "topology.transfer.buffer.size" 1024, >>>>>>>>> "topology.worker.childopts" nil, >>>>>>>>> "drpc.queue.size" 128, "worker.childopts" "-Xmx768m", >>>>>>>>> "supervisor.heartbeat.frequency.secs" 5, >>>>>>>>> "topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, "drpc.port" >>>>>>>>> 3772, >>>>>>>>> "supervisor.monitor.frequency.secs" 3, "drpc.childopts" "-Xmx768m", >>>>>>>>> "topology.receiver.buffer.size" 8, "task.heartbeat.frequency.secs" 3, >>>>>>>>> "topology.tasks" nil, "storm.messaging.netty.max_retries" 100, >>>>>>>>> "topology.spout.wait.strategy" >>>>>>>>> "backtype.storm.spout.SleepSpoutWaitStrategy", >>>>>>>>> "nimbus.thrift.max_buffer_size" 1048576, "topology.max.spout.pending" >>>>>>>>> nil, >>>>>>>>> "storm.zookeeper.retry.interval" 1000, " >>>>>>>>> topology.sleep.spout.wait.strategy.time.ms" 1, >>>>>>>>> "nimbus.topology.validator" >>>>>>>>> "backtype.storm.nimbus.DefaultTopologyValidator", >>>>>>>>> "supervisor.slots.ports" >>>>>>>>> [6700 6701 6702 6703], "topology.environment" nil, "topology.debug" >>>>>>>>> false, >>>>>>>>> "nimbus.task.launch.secs" 120, "nimbus.supervisor.timeout.secs" 60, >>>>>>>>> "topology.message.timeout.secs" 30, "task.refresh.poll.secs" 10, >>>>>>>>> "topology.workers" 1, "supervisor.childopts" "-Xmx256m", >>>>>>>>> "nimbus.thrift.port" 6627, "topology.stats.sample.rate" 0.05, >>>>>>>>> "worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer" >>>>>>>>> "backtype.storm.serialization.types.ListDelegateSerializer", >>>>>>>>> "topology.disruptor.wait.strategy" >>>>>>>>> "com.lmax.disruptor.BlockingWaitStrategy", >>>>>>>>> "topology.multilang.serializer" >>>>>>>>> "backtype.storm.multilang.JsonSerializer", "nimbus.task.timeout.secs" >>>>>>>>> 30, >>>>>>>>> "storm.zookeeper.connection.timeout" 15000, "topology.kryo.factory" >>>>>>>>> "backtype.storm.serialization.DefaultKryoFactory", >>>>>>>>> "drpc.invocations.port" >>>>>>>>> 3773, "logviewer.port" 8000, "zmq.threads" 1, >>>>>>>>> "storm.zookeeper.retry.times" >>>>>>>>> 5, "topology.worker.receiver.thread.count" 1, "storm.thrift.transport" >>>>>>>>> "backtype.storm.security.auth.SimpleTransportPlugin", >>>>>>>>> "topology.state.synchronization.timeout.secs" 60, >>>>>>>>> "supervisor.worker.timeout.secs" 30, >>>>>>>>> "nimbus.file.copy.expiration.secs" >>>>>>>>> 600, "storm.messaging.transport" >>>>>>>>> "backtype.storm.messaging.netty.Context", " >>>>>>>>> logviewer.appender.name" "A1", >>>>>>>>> "storm.messaging.netty.max_wait_ms" 1000, "drpc.request.timeout.secs" >>>>>>>>> 600, >>>>>>>>> "storm.local.mode.zmq" false, "ui.port" 8080, "nimbus.childopts" >>>>>>>>> "-Xmx1024m", "storm.cluster.mode" "distributed", >>>>>>>>> "topology.max.task.parallelism" nil, >>>>>>>>> "storm.messaging.netty.transfer.batch.size" 262144, >>>>>>>>> "topology.classpath" >>>>>>>>> nil} >>>>>>>>> >>>>>>>>> which as you can see uses topology.worker.childopts: nil and >>>>>>>>> worker.childops: -Xmx768m. My question is the following: Do I need to >>>>>>>>> add >>>>>>>>> the above line in the storm.yaml files of my supervisor nodes in >>>>>>>>> order to >>>>>>>>> allow the JVM to use up to 4GBs of memory? Also, am I setting the >>>>>>>>> right >>>>>>>>> value for what I am trying to achieve? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Nick >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Nikolaos Romanos Katsipoulakis, >>>>>>>> University of Pittsburgh, PhD candidate >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Nikolaos Romanos Katsipoulakis, >>>>>> University of Pittsburgh, PhD candidate >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Nikolaos Romanos Katsipoulakis, >>>> University of Pittsburgh, PhD candidate >>>> >>> >>> >> >> >> -- >> Nikolaos Romanos Katsipoulakis, >> University of Pittsburgh, PhD candidate >> > > -- Nikolaos Romanos Katsipoulakis, University of Pittsburgh, PhD candidate
