Re: Samza closing and re-opening kafka connection rapidly, cannot consume or produce, no useful logs

Chris Riccomini Tue, 31 Mar 2015 15:22:07 -0700

Hey Andrew,

It looks like your attachment was stripped by Apache's mailing server.
Looking at the info you pasted, I can tell you that YARN is most likely
unable to provision your containers due to space constraint. Here's the
issue:


Memory Used: 1 GB
Memory Total: 1.76 GB

The YARN AM and YARN container both take 1G. Your AM is requesting a 1G
container, which YARN then queues up, and waits for 1G of space to become
available. Because you only have 760MB left on the node, this will never
happen. The AM (and YARN) will just sit and wait for more resources.

To test this theory, try setting:

yarn.am.container.memory.mb=512
yarn.container.memory.mb=512

The first config sets the AM container's memory to 512MB. The second config
sets the SamzaContainer's container to 512MB. Both of these should fit on a
1.76G node.

Thanks!
Chris

On Tue, Mar 31, 2015 at 2:59 PM, Andrew Sannier <asann...@helixeducation.com
> wrote:

> Thanks so much for getting back to me, Chris.
>
> I’ve attached the AM log from my most recent attempt to run the
> hello-samza wikipedia-feed task. I’ve been using pretty small nodes to
> keep costs down while I test and so forth, so that makes a lot of sense
> (though I definitely hoped I’d configured appropriate memory ceilings).
> Here are the values from the YARN UI:
>
> Containers Running: 1
> Memory Used: 1 GB
>   Memory Total: 1.76 GB
>   Memory Reserved: 0 B
>   VCores Used: 1
>   VCores Total: 8
>   VCores Reserved: 0
> Active Nodes: 1
> Decommissioned Nodes: 0
> Lost Nodes: 0
> Unhealthy Nodes: 0
> Rebooted Nodes: 0
>
>
> Again, much obliged for your response.
>
> Andrew Sannier
>
>
>
> On 3/31/15, 3:54 PM, "Chris Riccomini" <criccom...@apache.org> wrote:
>
> >Hey Andrew,
> >
> >I'm wondering if your YARN cluster doesn't have enough memory to fit both
> >the AM and its containers. The fact that the AM UI shows no running
> >containers is suspicious. Can you check these four settings in your YARN
> >RM's UI:
> >
> >  Memory Used
> >  Memory Total
> >  Memory Reserved
> >  VCores Used
> >  VCores Total
> >  VCores Reserved
> >
> >Can you also attach (or post to gist/pastebin/etc) the YARN AM's full log?
> >
> >Cheers,
> >Chris
> >
> >On Tue, Mar 31, 2015 at 2:32 PM, Andrew Sannier
> ><asann...@helixeducation.com
> >> wrote:
> >
> >> Something to add here: there are a couple of weird things in the Samza
> >> Application Master web UI: Application master task ID is -1, which seems
> >> odd, and the Running Containers table is completely empty. How could
> >>YARN
> >> call a task “Running” if there’s no container?
> >>
> >> Thanks,
> >> Andrew Sannier
> >>
> >>
> >>
> >>
> >>
> >> On 3/31/15, 2:19 PM, "Andrew Sannier" <asann...@helixeducation.com>
> >>wrote:
> >>
> >> >Hi all -
> >> >
> >> >Thanks in advance for your help; I have been totally stuck on this for
> >>a
> >> >couple of days.
> >> >
> >> >I have a small YARN cluster with one ResourceManager and one
> >>NodeManager
> >> >as well as one Zookeeper node and one Kafka node - trying to keep the
> >> >number of moving parts to a minimum. I¹ve been following the guide to
> >> >running Samza on YARN
> >>
> >>>(
> https://samza.apache.org/learn/tutorials/0.8/run-in-multi-node-yarn.htm
> >>>l
> >> )
> >> >,
> >> > and I get to the end of the tutorial with a Running job in the YARN
> >>web
> >> >UI, as expected. However, the job doesn¹t actually appear to do
> >>anything -
> >> >messages are not produced to the ³wikipedia-raw² topic (nor is the
> >>topic
> >> >created), and no data is logged at all.
> >> >
> >> >To that point, I am having a ton of trouble with Samza¹s logging - in
> >> >samza.log.dir on the ResourceManager node, there¹s only
> >>gc.log.0.current,
> >> >and in the YARN log directory I have only the resourcemanager log
> >>which of
> >> >course contains no application information. On the NodeManager side,
> >> >samza.log.dir contains application-manager.log, which ends at "[INFO]
> >> >Requesting 1 container(s) with 1700mb of memory² right after the job
> >> >enters the Running state, it¹s own copy of gc.log.0.current, and stderr
> >> >and stdout which contain no useful information and also don¹t grow
> >>after
> >> >the first second of the job running. In YARN¹s logs, there¹s only the
> >>node
> >> >manager log, which has no errors or warnings and just logs the startup
> >>of
> >> >the container and then its memory usage from then on, which seems fine:
> >> >
> >> >2015-03-31 20:17:34,635 INFO  [Container Monitor]
> >> >monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(408)) -
> >> >Memory usage of ProcessTree 25767 for container-id
> >> >container_1427823389325_0011_01_000001: 104.9 MB of 1 GB physical
> >>memory
> >> >used; 2.4 GB of 3.1 GB virtual memory used
> >> >
> >> >
> >> >What am I missing here? WikipediaFeed.java contains a whole bunch of
> >> >logging statements, but nothing ever hits any file I can find. Even if
> >>you
> >> >can¹t help with the problem I¹m having with hello-samza, I would
> >>greatly
> >> >appreciate any advice on how I can get useful logs from Samza jobs.
> >> >
> >> >I¹ve checked that I can ping the Wikipedia IRC URL and consume
> >> >from/produce to the Kafka cluster with the console shell scripts from
> >>both
> >> >the ResourceManager and NodeManager nodes, and other applications can
> >>work
> >> >with my Kafka and Zookeeper with no issues. From the application-master
> >> >log on the worker node, all I can see is that Samza configures the
> >> >Wikipedia IRC system, starts the Webapp, and requests a container. It
> >> >enters the Running state with YARN, after which point nothing happens
> >>at
> >> >all. There¹s no activity at all in the Kafka or Zookeeper logs.
> >> >
> >> >And that¹s it; the job will run for hours if I let it but at no point
> >>is
> >> >anything produced to Kafka or logged at all. I wrote a simpler task
> >>that
> >> >just accepts a json message from a topic on Kafka, adds a timestamp,
> >>and
> >> >produces to another topic, but almost nothing is different. From
> >> >application-master log:
> >> >
> >> >2015-03-31 20:07:05 ClientUtils$ [INFO] Fetching metadata from broker
> >> >id:0,host:172.31.2.19,port:9092 with correlation id 0 for 1 topic(s)
> >> >Set(test)
> >> >2015-03-31 20:07:05 SyncProducer [INFO] Connected to 172.31.2.19:9092
> >>for
> >> >producing
> >> >2015-03-31 20:07:05 SyncProducer [INFO] Disconnecting from
> >> >172.31.2.19:9092
> >> >2015-03-31 20:07:06 KafkaSystemAdmin$ [INFO] Got metadata: Map(test ->
> >> >SystemStreamMetadata [streamName=test, partitionMetadata={Partition
> >> >[partition=0]=SystemStreamPartitionMetadata [oldestOffset=0,
> >> >newestOffset=4, upcomingOffset=5], Partition
> >> >[partition=1]=SystemStreamPartitionMetadata [oldestOffset=null,
> >> >newestOffset=null, upcomingOffset=0]}])
> >> >
> >> >
> >> >which all looks correct. Then it connects to ResourceManager, starts
> >>the
> >> >Webapp, Requests a container and starts running. All I see in Kafka¹s
> >>log
> >> >is
> >> >
> >> >[2015-03-31 20:07:05,999] INFO Closing socket connection to
> >>/172.31.1.229
> >> .
> >> >(kafka.network.Processor)
> >> >[2015-03-31 20:07:06,090] INFO Closing socket connection to
> >>/172.31.1.229
> >> .
> >> >(kafka.network.Processor)
> >> >
> >> >
> >> >and Zookeeper has nothing to say at all. As before, no new topic is
> >> >created.
> >> >
> >> >So a huge part of this question is just, what am I missing about
> >>logging?
> >> >Where are the actual job/task-level logs? Aside from that, I just have
> >>no
> >> >explanation for why nothing is happening in either of these simple
> >>tasks.
> >> >I would really appreciate any insight anyone can offerŠ
> >> >
> >> >Oh, one more thing - there was an error message in Zookeeper after
> >> >submitting my simple StreamTask that I haven¹t been able to reproduce:
> >> >
> >> >2015-03-31 19:48:28,145 [myid:] - INFO
> >> >[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] -
> >> >Accepted socket connection from /172.31.2.19:41801
> >> >2015-03-31 19:48:28,147 [myid:] - WARN
> >> >[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] -
> >> >Connection request from old client /172.31.2.19:41801; will be dropped
> >>if
> >> >server is in r-o mode
> >> >2015-03-31 19:48:28,148 [myid:] - INFO
> >> >[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] -
> >>Client
> >> >attempting to establish new session at /172.31.2.19:41801
> >> >2015-03-31 19:48:28,149 [myid:] - INFO
> >>[SyncThread:0:ZooKeeperServer@617
> >> ]
> >> >- Established session 0x14c70bd0c3e0006 with negotiated timeout 30000
> >>for
> >> >client /172.31.2.19:41801
> >> >2015-03-31 19:48:28,202 [myid:] - INFO  [ProcessThread(sid:0
> >> >cport:-1)::PrepRequestProcessor@494] - Processed session termination
> >>for
> >> >sessionid: 0x14c70bd0c3e0006
> >> >2015-03-31 19:48:28,206 [myid:] - INFO
> >> >[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] -
> Closed
> >> >socket connection for client /172.31.2.19:41801 which had sessionid
> >> >0x14c70bd0c3e0006
> >> >
> >> >
> >> >172.31.2.19 is the Kafka broker. The job continued unphased; Samza
> >>didn¹t
> >> >log anything about this socket being closed or any kind of error. Not
> >>sure
> >> >if that¹s related.
> >> >
> >> >
> >> >Again, thanks a ton for reading and whatever help you can offer.
> >> >
> >> >Andrew Sannier
> >> >Software Engineer, Big Data
> >> >C: 480-284-1048
> >> >www.helixeducation.com
> >> >
> >> >
> >>
> >>
>
>

Re: Samza closing and re-opening kafka connection rapidly, cannot consume or produce, no useful logs

Reply via email to