Hi,

I am running a simple spark streaming application on hadoop 2.7.0/YARN
(master: yarn-client) with 2 executors in different machines. However,
while the app is running, I can see on the app web UI (tab executors) that
only 1 executor keeps completing tasks over time, the other executor only
works and completes tasks for some seconds. From the logs I can see an
exception arising, though it is not clear what went wrong.

Here is the yarn-nodemanager log:
«
2015-06-17 00:29:50,967 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Starting resource-monitoring for container_1434391147618_0007_01_000003
2015-06-17 00:29:50,977 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Memory usage of ProcessTree 30553 for container-id
container_1434391147618_0007_01_000003: 286.5 MB of 3 GB physical memory
used; 2.7 GB of 6.3 GB virtual memory used
2015-06-17 00:29:53,991 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Memory usage of ProcessTree 30553 for container-id
container_1434391147618_0007_01_000003: 463.7 MB of 3 GB physical memory
used; 2.7 GB of 6.3 GB virtual memory used
2015-06-17 00:29:57,009 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Memory usage of ProcessTree 30553 for container-id
container_1434391147618_0007_01_000003: 465.7 MB of 3 GB physical memory
used; 2.7 GB of 6.3 GB virtual memory used
2015-06-17 00:30:00,024 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Memory usage of ProcessTree 30553 for container-id
container_1434391147618_0007_01_000003: 467.6 MB of 3 GB physical memory
used; 2.7 GB of 6.3 GB virtual memory used
2015-06-17 00:30:03,032 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Memory usage of ProcessTree 30553 for container-id
container_1434391147618_0007_01_000003: 474.0 MB of 3 GB physical memory
used; 2.7 GB of 6.3 GB virtual memory used
2015-06-17 00:30:06,041 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Memory usage of ProcessTree 30553 for container-id
container_1434391147618_0007_01_000003: 480.2 MB of 3 GB physical memory
used; 2.7 GB of 6.3 GB virtual memory used
2015-06-17 00:30:09,053 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Memory usage of ProcessTree 30553 for container-id
container_1434391147618_0007_01_000003: 540.9 MB of 3 GB physical memory
used; 2.7 GB of 6.3 GB virtual memory used
2015-06-17 00:30:12,068 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Memory usage of ProcessTree 30553 for container-id
container_1434391147618_0007_01_000003: 550.9 MB of 3 GB physical memory
used; 2.7 GB of 6.3 GB virtual memory used
2015-06-17 00:30:15,075 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Memory usage of ProcessTree 30553 for container-id
container_1434391147618_0007_01_000003: 551.1 MB of 3 GB physical memory
used; 2.7 GB of 6.3 GB virtual memory used
2015-06-17 00:30:18,090 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Memory usage of ProcessTree 30553 for container-id
container_1434391147618_0007_01_000003: 558.7 MB of 3 GB physical memory
used; 2.7 GB of 6.3 GB virtual memory used
2015-06-17 00:30:20,157 WARN
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit
code from container container_1434391147618_0007_01_000003 is : 1
2015-06-17 00:30:20,157 WARN
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
Exception from container-launch with container ID:
container_1434391147618_0007_01_000003 and exit code: 1
ExitCodeException exitCode=1:
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
        at org.apache.hadoop.util.Shell.run(Shell.java:456)
        at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
        at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
        at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
        at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
2015-06-17 00:30:20,157 INFO
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from
container-launch.
2015-06-17 00:30:20,157 INFO
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id:
container_1434391147618_0007_01_000003
2015-06-17 00:30:20,157 INFO
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 1
2015-06-17 00:30:20,157 INFO
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace:
ExitCodeException exitCode=1:
2015-06-17 00:30:20,157 INFO
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
2015-06-17 00:30:20,157 INFO
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
org.apache.hadoop.util.Shell.run(Shell.java:456)
2015-06-17 00:30:20,157 INFO
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
2015-06-17 00:30:20,158 INFO
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
2015-06-17 00:30:20,158 INFO
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
2015-06-17 00:30:20,158 INFO
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
2015-06-17 00:30:20,158 INFO
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
java.util.concurrent.FutureTask.run(FutureTask.java:262)
2015-06-17 00:30:20,158 INFO
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
2015-06-17 00:30:20,158 INFO
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
2015-06-17 00:30:20,158 INFO
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:       at
java.lang.Thread.run(Thread.java:745)
2015-06-17 00:30:20,158 WARN
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Container exited with a non-zero exit code 1
2015-06-17 00:30:20,158 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
Container container_1434391147618_0007_01_000003 transitioned from RUNNING
to EXITED_WITH_FAILURE
2015-06-17 00:30:20,158 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Cleaning up container container_1434391147618_0007_01_000003
2015-06-17 00:30:20,178 INFO
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
Deleting absolute path :
/tmp/hadoop-myuser/nm-local-dir/usercache/myuser/appcache/application_1434391147618_0007/container_1434391147618_0007_01_000003
2015-06-17 00:30:20,178 WARN
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=myuser
OPERATION=Container Finished - Failed   TARGET=ContainerImpl
RESULT=FAILURE  DESCRIPTION=Container failed with state:
EXITED_WITH_FAILURE    APPID=application_1434391147618_0007
CONTAINERID=container_1434391147618_0007_01_000003
2015-06-17 00:30:20,178 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
Container container_1434391147618_0007_01_000003 transitioned from
EXITED_WITH_FAILURE to DONE
2015-06-17 00:30:20,179 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
Removing container_1434391147618_0007_01_000003 from application
application_1434391147618_0007
2015-06-17 00:30:20,179 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got
event CONTAINER_STOP for appId application_1434391147618_0007
2015-06-17 00:30:20,500 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
Application application_1434391147618_0007 transitioned from RUNNING to
APPLICATION_RESOURCES_CLEANINGUP
2015-06-17 00:30:20,501 INFO
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
Deleting absolute path :
/tmp/hadoop-myuser/nm-local-dir/usercache/myuser/appcache/application_1434391147618_0007
2015-06-17 00:30:20,501 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got
event APPLICATION_STOP for appId application_1434391147618_0007
2015-06-17 00:30:20,501 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
Application application_1434391147618_0007 transitioned from
APPLICATION_RESOURCES_CLEANINGUP to FINISHED
2015-06-17 00:30:20,501 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler:
Scheduling Log Deletion for application: application_1434391147618_0007,
with delay of 10800 seconds
»

Not sure if it is relevant, but in the output of the application I keep
getting this message:
«15/06/17 00:29:53 INFO ShuffledDStream: Time 1434497393000 ms is invalid
as zeroTime is 1434497391000 ms and slideDuration is 4000 ms and difference
is 2000 ms»

I'm using spark 1.3.2.

Any ideas of what can be happening?

Thanks.

Reply via email to