Malcolm, Did the AM-process come up? If so, can you attach its entire log-file?
"> everything will launch fine one time, and then it will do this RUNNING-but-no-Samza thing the next." IIUC, you believe your container is not making progress. If the issue is recurs, can you attach a thread-dump & log-file(s) of the "stuck" container? "> my logs are showing that Samza is not actually starting inside of the container" Can you confirm that logging is actually working? eg: have you verified there is only one log4j binding in your class-path? Did anything change on your end? eg: did you upgrade to a new Samza version/ app-version/yarn-version? Can you roll-back to a known-good version to better isolate the issue? Best, Jagadish On Tue, May 7, 2019 at 3:54 PM Malcolm McFarland <mmcfarl...@cavulus.com> wrote: > As a followup to this, here's what I see when the Samza app tries to start; > it actually seems to be getting to the run-container script, and then > stops: > > > Kafka version : 0.11.0.2 > Kafka commitId : 73be1e1168f91ee2 > Error registering AppInfo mbean > Started coordinator stream writer. > sent SetConfig message with key = samza.autoscaling.server.url and value = > http://ba6ecb67825e:34205/ > Stopping the coordinator stream producer. > Stopping coordinator stream producer. > Stopping producer for system: kafka > Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms. > Webapp is started at (rpc http://ba6ecb67825e:35629/, tracking > http://ba6ecb67825e:34151/, coordinator http://ba6ecb67825e:34205/) > Starting YarnContainerManager. > Upper bound of the thread pool size is 500 > yarn.client.max-cached-nodemanagers-proxies : 0 > Got AM register response. The YARN RM supports container requests with > max-mem: 8192, max-cpu: 32 > Finished starting YarnContainerManager > Starting the Samza task manager > Resource Request created for 0 on ANY_HOST at 1557268807252 > Requesting resources on ANY_HOST for container 0 > Making a request for ANY_HOST > Starting the container allocator thread > Received new token for : ip-10-60-31-121.us-west-2.compute.internal:8032 > Container allocated from RM on ip-10-60-31-121.us-west-2.compute.internal > Container allocated from RM on ip-10-60-31-121.us-west-2.compute.internal > Host affinity not enabled. Saving the samzaResource > container_e39_1557265340810_0003_01_000002 in the buffer for ANY_HOST > Returning a buffered resource: container_e39_1557265340810_0003_01_000002 > for ANY_HOST from preferred-host buffer. > Returning a buffered resource: container_e39_1557265340810_0003_01_000002 > for ANY_HOST from preferred-host buffer. > Cancelling request SamzaResourceRequest{numCores=4, memoryMB=8192, > preferredHost='ANY_HOST', requestID='1507e2c5-e437-409b-821c-ef505ee19b85', > containerID=0, requestTimestampMs=1557268807252} > Found available resources on ANY_HOST. Assigning request for container_id 0 > with timestamp 1557268807252 to resource > container_e39_1557265340810_0003_01_000002 > Received launch request for 0 on hostname > ip-10-60-31-121.us-west-2.compute.internal > Got available container ID (0) for container: Container: [ContainerId: > container_e39_1557265340810_0003_01_000002, NodeId: > ip-10-60-31-121.us-west-2.compute.internal:8032, NodeHttpAddress: > ip-10-60-31-121.us-west-2.compute.internal:8088, Resource: <memory:8192, > vCores:4>, Priority: 1, Token: Token { kind: ContainerToken, service: > 10.60.31.121:8032 }, ] > In runContainer in util: fwkPath= ;cmdPath=./__package/;jobLib= > Container ID 0 using command ./__package//bin/run-container.sh > > Cheers, > Malcolm > > > On Tue, May 7, 2019 at 3:22 PM Malcolm McFarland <mmcfarl...@cavulus.com> > wrote: > > > Hey folks, > > > > We're having some trouble running Samza under YARN. The YARN > > containers are launching fully into the RUNNING state, and I can see > > in the node manager logs that the containers are running, but my logs > > are showing that Samza is not actually starting inside of the > > container. What's really curious is that this is intermittent; > > everything will launch fine one time, and then it will do this > > RUNNING-but-no-Samza thing the next. > > > > I've been trying to get into the AM UI to see what's going on, but I > > see the following error when I try accessing it: > > > > Problem accessing /proxy/application_1557265340810_0002/. Reason: > > Cannot assign requested address (Bind failed) > > Caused by: > > java.net.BindException: Cannot assign requested address (Bind failed) > > > > Has anybody seen this issue with the AM web interface? Also, are there > > any other ways that I could introspect the YARN container to try and > > deduce what's happening? > > > > Cheers, > > Malcolm > > > > > > -- > > Malcolm McFarland > > Cavulus > > > > > > This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any > > unauthorized or improper disclosure, copying, distribution, or use of > > the contents of this message is prohibited. The information contained > > in this message is intended only for the personal and confidential use > > of the recipient(s) named above. If you have received this message in > > error, please notify the sender immediately and delete the original > > message. > > > > > -- > Malcolm McFarland > Cavulus > 1-800-760-6915 > mmcfarl...@cavulus.com > > > This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any > unauthorized or improper disclosure, copying, distribution, or use of the > contents of this message is prohibited. The information contained in this > message is intended only for the personal and confidential use of the > recipient(s) named above. If you have received this message in error, > please notify the sender immediately and delete the original message. > -- Jagadish V, Graduate Student, Department of Computer Science, Stanford University